17 Mar
Posted by Ganesh H S , Bangalore, India as zend framework
In the previous article Zend Lucene Search - part4 - Search Results Highlighting i talked about highlighting the keywords in search results.
In this article i will be writing about highlighting the keywords in search results and formating the output display format much similar to most search engines result page using the zend lucene search.
<?phprequire_once ‘Zend/Search/Lucene.php’;
$queryStr= "php";
$snapshotTextLength = 155;
$query = Zend_Search_Lucene_Search_QueryParser::parse($queryStr);
$index = Zend_Search_Lucene::open("/var/www/lucene-data/blog-index");
$results = $index->find($query);
echo "Index contains ".$index->count()." documents.\n\n";
if($index->count())
{
$count = 0;
displayResults($results, $snapshotTextLength);
}
// Format and display the search results
function displaySearchResults(&$results, $snapshotTextLength)
{
if(is_array($results) && count($results))
{
foreach ($results as $result)
{
$data[$count]["article_url"] = $result->url;
$data[$count]["article_title"] = $query->highlightMatches($result->title);
$data[$count]["article_description"] = $query->highlightMatches($result->contents);
$data[$count]["article_created_date_time"] = $result->postedDateTime;
$data[$count]["article_id"] = $result->articleId;
$count++;
// title of each article with URL as link
$searchResultsContent .= sprintf("%“, $data[$count][”article_url”], $data[$count][”article_title”]);
// snapshot of the description
$searchResultsContent .= sprintf(”%s”, substr($data[$count][”article_description”], 0, $snapshotTextLength));
// url
$searchResultsContent .= $data[$count][”article_url”];
// leave 2 lines after each search results
$searchResultsContent .= “<br> <br> <br>”;
}
}
else
{
$searchResultsContent = “No results found, try using different keywords”;
}
return $searchResultsContent;
}
?>
This program is similar to Zend Lucene Search - part3 - retrieving the indexed data , the only difference is i am formating the display format, the output of this program displays the output much similar to what you get in the search engine result page of google.com or search.yahoo.com
Related articles:
Zend Lucene Search - part1 - creating index
Zend Lucene Search - part2 - Real time indexing
Zend Lucene Search - part3 - retrieving the indexed data
Zend Lucene Search - part4 - Search Results Highlighting
Its been 4 years since i started my career, PERL was one of the theory subject in 7th semester B.E. All these years I enjoyed coding in PHP a lot and its very exciting to work on it.
But in Yahoo! i just see lot of very interesting tools been developed in PERL, i always thought why couldn’t it be coded in PHP? may be since i was from PHP programmer i always asked that question myself, but i see lot of my colleagues do lot of coding in PERL, pretty excited about it but not want to give up comfort zone, finally after 11 months i gave a try, i completed my first package done entirely coded in PERL.
I just entered the world of PERL, if you are a PHP programmer and feel PERL is not what you want to learn since you know PHP? I would recommend you give a try, you would love both PHP and PERL.
The best book for beginners in PERL is -
Learning PERL
Related links -
PERL
CPAN - The Comprehensive Perl Archive Network
01 Feb
Posted by Ganesh H S , Bangalore, India as Search Engine Optimization
We generally index website search results in search engines with intention to get more back links from Search Engine results page (SERP).
So we should use robots.txt to disallow website search results pages crawling that don’t add much value for users coming from search engines. Its one of the quality guidelines Google mentions in webmaster guidelines. Its one of the important SEO checklist we should track.
Possible reasons:
Example -
Disallow: /search?p=* in health.yahoo.com/robots.txt
Disallow :/search in http://search.yahoo.com/robots.txt
Disallow: /results in http://youtube.com/robots.txt
18 Jan
Posted by Ganesh H S , Bangalore, India as Search Engine Optimization
In my earlier post i had posted about robots.txt and robots meta tag.
Following are the Search engine optiomization(SEO) checklist related to robots.txt -
1. robots.txt http status code
Search bot (eg: googlebot) before crawling the website it will always requests robots.txt and understands the definition robots.txt and crawls the website locations which is allowed. So its always important for webmaster to check the http status code of the robots.txt of the site and make sure it returns http status code 200 or http status code 404.
Why is that so important to check the http status code?
Search bot before crawling requests for robots.txt-
If the http status code of 200 is returned, it reads and crawls the locations of website which is allowed.
If the http status code of 404 is returned, search bot goes ahead with its job with no restriction on website crawling.
If the page takes lot of time and if there is no response code returns, search bot waits and after sometime it skips crawling because it always respects robots.txt and this can adversely affect the crawling of our website.
2. URLs restricted by robots.txt
Consider the impact of following robots.txt definition
User-agent: *
Disallow: *
It blocks all search bots to crawl the entire website, we should make sure we block only those areas which block.
06 Jul
Posted by Ganesh H S , Bangalore, India as Google, Search Engine Optimization, yahoo
You had a plan for a business, you need a website, now the website is done. What next?
How do you inform search engines that your website existed and inform them to index your website?
When i started working on Search Engine Optimization ( SEO ) for 3 ecommerce sites in 2006, this was the first question i had in mind.
Following are the ways of getting your website indexed by search engines -
04 Jul
Posted by Ganesh H S , Bangalore, India as Google, Search Engine Optimization
I always thought following links are same -
http://ganeshhs.com/search-engine-optimization-seo/noindex-nofollow
http://www.ganeshhs.com/search-engine-optimization-seo/noindex-nofollow
Above links leads to the same page, but it differs with www.
But search engine treats both links are different, i have seen in few cases where we link many a times we ignore www. and in some cases we do include www. in the links.
So what are the impacts?
How do we instruct search engine to treat both the URL’s as same, Google webmasters tool has a option to set the preferred domain
So whats the advantage of set preferred domain ? If i set my preferred domain as ganeshhs.com and next time if Google comes and crawls my website, and if it finds any link starting with www.ganeshhs.com it will follow it as ganeshhs.com and when Google displays my website links in search results it will show the links as ganeshhs.com
It also helps us to fix the external site referrals, few guys started provide links to my website, if suppose their referral link is http://www.ganeshhs.com/category/search-engine-optimization-seo where as my actual article URL was http://ganeshhs.com/category/search-engine-optimization-seo and when google crawls our website through that referral link it will keep the right version of domain what we preferred.
HTML tag tells robots not to index the content of a page, and/or not scan it for links to follow, keeping this metatag for pages which we don’t want to index, nor to follow the links on the webpage is helpful.
In some cases, we come across situations where we keep links to external sites. But what are the impacts of this?
We have to keep external links, but how do we prevent the above scenario -
21 Jun
Posted by Ganesh H S , Bangalore, India as Google, Search Engine Optimization
My blog site ganeshhs.com has now Google Page Rank of 2/10.

When i started first project with zend framework may 2007, there were very few articles/tutorials and my first point of getting info was using search engine, then i realised it would be a great idea if my articles list in search engine and my first eye was on search engine optimization.
Looking at my website analytics i noticed that my recent posts on zend lucene search had more number of unique visits which also increased my website daily visits to average of 100 visits (with more unique visits), and also i started getting backlinks from other websites(namely http://www.phpimpact.com/ etc.) which also contributed for this page rank.
More essentially keywords(relevant to the context of the website/article) helps the articles to get indexed by search engines, following lists some of the blog articles and keywords i targeted and their stats in search engines Yahoo!/Google -
| Keyword | Google Position | Yahoo! Position |
| Zend Lucene Search | Page 1 | Page 1 |
| Zend Auth | Page 1 | Page 2 |
| Zend Registry | Page 1 | - |
| Zend Debug | Page 1 | - |
| Zend Exception | Page 1 | - |
| Zend Config | Page 1 | - |
| Zend Loader | Page 1 | - |
27 Mar
Posted by Ganesh H S , Bangalore, India as zend framework
Zend_Search_Lucene_Search_Query::highlightMatches() method allows the developer to highlight HTML document terms in the context of a search query.
In the previous article Zend Lucene Search - part3 - retrieving the indexed data i talked about retrieving the search results. When we search, highlighting the searched keyword in the search result is one of the important aspect which most search engines follow, in this article i will be writing about highlighting the search results retrieved using the zend lucene search.
<?phprequire_once ‘Zend/Search/Lucene.php’;$queryStr= "php";
$query = Zend_Search_Lucene_Search_QueryParser::parse($queryStr);
$index = Zend_Search_Lucene::open("/var/www/lucene-data/blog-index");
$results = $index->find($query);
echo "Index contains ".$index->count()." documents.\n\n";
if($index->count())
{
$count = 0;
foreach ($results as $result)
{
$data[$count]["article_url"] = $result->url;
$data[$count]["article_title"] = $query->highlightMatches($result->title);
$data[$count]["article_description"] = $query->highlightMatches($result->contents);
$data[$count]["article_created_date_time"] = $result->postedDateTime;
$data[$count]["article_id"] = $result->articleId;
$count++;
}
}
print_R($data);
?>
This program is same as in the Zend Lucene Search - part3 - retrieving the indexed data only one thing differs is now i am calling highlightMatches for the search results returned.
Related articles:
Zend Lucene Search - part1 - creating index
Zend Lucene Search - part2 - Real time indexing
Zend Lucene Search - part3 - retrieving the indexed data
Home Page
17 Mar
Posted by Ganesh H S , Bangalore, India as zend framework
Once the index is created, we are ready use zend lucene search to search the website. In the following example, php is the search keyword used to fetch the relevant search results in the already indexed data.
<?php
require_once ‘Zend/Search/Lucene.php’;$query = "php";
$index = Zend_Search_Lucene::open("/var/www/lucene-data/blog-index");
$results = $index->find($query);
echo "Index contains ".$index->count()." documents.\n\n";
if($index->count())
{
$count = 0;
foreach ($results as $result)
{
$data[$count]["article_url"] = $result->url;
$data[$count]["article_title"] = $query->highlightMatches($result->title);
$data[$count]["article_description"] = $query->highlightMatches($result->contents);
$data[$count]["article_created_date_time"] = $result->postedDateTime;
$data[$count]["article_id"] = $result->articleId;
$count++;
}
}
print_R($data);
?>
To retrieve the index data, first thing we need to do is to open the indexed path.
$index = Zend_Search_Lucene::open("/var/www/lucene-data/blog-index");
Suppose if user search input is -
$query = "php";
We have to use the find method of zend search lucene -
$results = $index->find($query);
To retrieve the total records resulted in the search result, we have to use count method of zend lucene search -
echo "Index contains ".$index->count()." documents.\n\n";
To limit the search result count we have to use setResultSetLimit of zend lucene search -
$index->setResultSetLimit(10);
Related articles:
Zend Lucene Search - part1 - creating index
Zend Lucene Search - part2 - Real time indexing
Zend Lucene Search - part4 - Search Results Highlighting
Home Page
| M | T | W | T | F | S | S |
|---|---|---|---|---|---|---|
| « Mar | ||||||
| 1 | 2 | 3 | 4 | 5 | ||
| 6 | 7 | 8 | 9 | 10 | 11 | 12 |
| 13 | 14 | 15 | 16 | 17 | 18 | 19 |
| 20 | 21 | 22 | 23 | 24 | 25 | 26 |
| 27 | 28 | 29 | 30 | 31 | ||