Ganesh H S

Thoughts on open source technologies, search engine optimization, website security

Spot Award - Yahoo! Finance team award

I got my first Yahoo! team award “spot award” on November 17th 2009. I realized after my school days this is the first award i am receiving at work place, i worked in start up (almost 3 years) if the performance is good they generally give a increase in salary hike, but more than money some times awards leave memorable moments of life.

Its been fun working at Yahoo!, i always loved working on SEO and i got that opportunity for Yahoo! Health, Yahoo! Beta Cricket, Yahoo! weather in 2008, later i have been working as local security engineer for media group projects along with being Backend Engineer, and july 2009 now i moved to Yahoo! Finance.

I would like to acknowledge and thank  few colleagues -  Director of product management Don Chennavasin , Engineering Manager Raghu M, and Vivek Saraf Dedicated Security Engineer from who are always been source of inspiration.

reset-mysql-root-password

To make password more secure i choose  password with alphabets(small and caps), numeric and symbols (!#..),
and sometimes i end up forgetting such passwords.

Can we recover lost mysql root password or is there any way to change mysql root password?
When i was a fresher i have seen myself  when not able to access mysql, i used to uninstall and reinstall mysql server.
Well is there any better way to recover/change/reset mysql root password which is lost? Answer is yes, if you are a sudo user.

Step1: Stop the mysql process

sudo /etc/init.d/mysqld stop

step2: Start mysql in safe mode and skip the check against user privileges.

sudo /usr/bin/mysqld_safe –skip-grant-tables &

Step3: Since mysql is running in safe mode and we have skiped user privileges, now ogin to mysql without using password and use mysql database , update the password and flush the privileges.

mysql -u root
use mysql;
update user set password=PASSWORD(”newpassword”) where User=’root’;
flush privileges;

Step4: Let us test by logging in to mysql using the password. We need to stop the earlier started mysql process which is  running in safe mode with user priviliges check being skipped. Now start the mysql in normal mode and try logging in to mysql using the new password.

sudo /etc/init.d/mysqld stop
sudo /etc/init.d/mysqld start
mysql -u root -pnewpassword

zend Lucene search part5 search engine results page formatting

In the previous article Zend Lucene Search - part4 - Search Results Highlighting i talked about highlighting the keywords in search results.

In this article i will be writing about highlighting the keywords in search results and formating the output display format much similar to most search engines result page using the zend lucene search.

<?phprequire_once ‘Zend/Search/Lucene.php’;

$queryStr= "php";

$snapshotTextLength = 155;

$query = Zend_Search_Lucene_Search_QueryParser::parse($queryStr);

$index = Zend_Search_Lucene::open("/var/www/lucene-data/blog-index");

$results = $index->find($query);

echo "Index contains ".$index->count()." documents.\n\n";

if($index->count())

{

$count = 0;

displayResults($results, $snapshotTextLength);

}

// Format and display the search results

function displaySearchResults(&$results, $snapshotTextLength)

{

if(is_array($results) && count($results))

{

foreach ($results as $result)

{

$data[$count]["article_url"]         		= $result->url;

$data[$count]["article_title"]        		= $query->highlightMatches($result->title);

$data[$count]["article_description"]        = $query->highlightMatches($result->contents);

$data[$count]["article_created_date_time"]  = $result->postedDateTime;

$data[$count]["article_id"]             	= $result->articleId;

$count++;

// title of each article with URL as link

$searchResultsContent .= sprintf("%“, $data[$count][”article_url”], $data[$count][”article_title”]);

// snapshot of the description

$searchResultsContent .= sprintf(”%s”, substr($data[$count][”article_description”], 0, $snapshotTextLength));

// url

$searchResultsContent .= $data[$count][”article_url”];

// leave 2 lines after each search results

$searchResultsContent .= “<br> <br> <br>”;

}

}

else

{

$searchResultsContent = “No results found, try using different keywords”;

}

return $searchResultsContent;

}

?>

This program is similar to Zend Lucene Search - part3 - retrieving the indexed data , the only difference is i am formating the display format, the output of this program displays the output much similar to what you get in the search engine result page of google.com or search.yahoo.com

Related articles:
Zend Lucene Search - part1 - creating index
Zend Lucene Search - part2 - Real time indexing
Zend Lucene Search - part3 - retrieving the indexed data
Zend Lucene Search - part4 - Search Results Highlighting

Enter the world of PERL

Its been 4 years since i started my career, PERL was one of the theory subject in 7th semester B.E. All these years I enjoyed coding in PHP a lot and its very exciting to work on it.

But in Yahoo! i just see lot of very interesting tools been developed in PERL, i always thought why couldn’t it be coded in PHP? may be since i was from PHP programmer i always asked that question myself, but i see lot of my colleagues do lot of coding in PERL, pretty excited about it but not want to give up comfort zone, finally after 11 months i gave a try, i completed my first package done entirely coded in PERL.

I just entered the world of PERL, if you are a PHP programmer and feel PERL is not what you want to learn since you know PHP? I would recommend you give a try, you would love both PHP and PERL.

The best book for beginners in PERL is -
Learning PERL

Related links -
PERL
CPAN - The Comprehensive Perl Archive Network

disallow website search results

We generally index website search results in search engines with intention to get more back links from Search Engine results page (SERP).

So we should use robots.txt to disallow website search results pages crawling that don’t add much value for users coming from search engines. Its one of the quality guidelines Google mentions in webmaster guidelines. Its one of the important SEO checklist we should track.

Possible reasons:

  1. Duplicate content -
    Search results holds a snippet of article title, short description and link to that article and in contrast we also have article page which has article title, article description. So if search bot crawls search results and individual article, it can potentially lead to duplicate content.
  2. Door way pages -
    Search results acts like a doorway pages to individual articles, doorway pages are one of the cloaking techniques that should be avoided.
  3. Search engine subsystem -
    Search engine as a whole strives to provide unique results, clicking on the links(indexed website search pages) in search engine results page which in turn takes to the website search results (running another search in website).

Example -
Disallow: /search?p=* in health.yahoo.com/robots.txt
Disallow :/search in http://search.yahoo.com/robots.txt
Disallow: /results in http://youtube.com/robots.txt

SEO checklist robots.txt

In my earlier post i had posted about robots.txt and robots meta tag.

Following are the Search engine optiomization(SEO) checklist  related to robots.txt -

1. robots.txt http status code

Search bot (eg: googlebot) before crawling the website it will always requests robots.txt and understands the definition robots.txt and crawls the website locations which is allowed. So its always important for webmaster to check the http status code of the robots.txt of the site and make sure it returns http status code 200 or http status code 404.

Why is that so important to check the http status code?

Search bot before crawling requests for robots.txt-

If the http status code of 200 is returned, it reads and crawls the locations of website which is allowed.

If the http status code of 404 is returned, search bot goes ahead with its job with no restriction on website crawling.

If the page takes lot of time and if there is no response code returns, search bot waits and after sometime it skips crawling because it always respects robots.txt and this can adversely affect the crawling of our website.

2. URLs restricted by robots.txt

Consider the impact of following robots.txt definition

User-agent: *
Disallow: *

It blocks all search bots to crawl the entire website, we should make sure we block only those areas which block.

submit site to google yahoo dmoz msn

You had a plan for a business, you need a website, now the website is done. What next?

How do you inform search engines that your website existed and inform them to index your website?

When i started working on Search Engine Optimization ( SEO ) for 3 ecommerce sites in 2006, this was the first question i had in mind.

Following are the ways of getting your website indexed by search engines -

SEO - set preferred domain

I always thought following links are same -
http://ganeshhs.com/search-engine-optimization-seo/noindex-nofollow
http://www.ganeshhs.com/search-engine-optimization-seo/noindex-nofollow

Above links leads to the same page, but it differs with www.
But search engine treats both links are different, i have seen in few cases where we link many a times we ignore www. and in some cases we do include www. in the links.

So what are the impacts?

  1. Search engines keep both the versions of the URLs, when people click on search engine results links which leads to our site with different versions of these URLs, it will drastically affect the page rank and traffic.
  2. These URLs look like different documents to crawlers and create excessive crawling on our website.

How do we instruct search engine to treat both the URL’s as same, Google webmasters tool has a option to set the preferred domain

So whats the advantage of set preferred domain ? If i set my preferred domain as ganeshhs.com and next time if Google comes and crawls my website, and if it finds any link starting with www.ganeshhs.com it will follow it as ganeshhs.com and when Google displays my website links in search results it will show the links as ganeshhs.com

It also helps us to fix the external site referrals, few guys started provide links to my website, if suppose their referral link is http://www.ganeshhs.com/category/search-engine-optimization-seo where as my actual article URL was http://ganeshhs.com/category/search-engine-optimization-seo and when google crawls our website through that referral link it will keep the right version of domain what we preferred.

noindex nofollow

HTML tag tells robots not to index the content of a page, and/or not scan it for links to follow, keeping this metatag for pages which we don’t want to index, nor to follow the links on the webpage is helpful.

In some cases, we come across situations where we keep links to external sites. But what are the impacts of this?

  1. Part of page rank is shared to external website -
    When we link to other websites, our part of our website page rank is shared to
    those external sites, and we may end up sending the search engine crawlers to other side.
  2. Leading Search Engine Crawlers to crawl external website -
    Crawler entered our website to crawl more pages, it will help us to have more indexes in Search Engines, but what did we end up keeping external links, we created a way to Crawler to leave our website and crawl the external websites.

We have to keep external links, but how do we prevent the above scenario -

  • If google.com is a external link, we could use < a href=”http://www.google.com” rel=”noindex, nofollow” > , when the crawler comes across this external link, it tells the crawler not crawl or follow that link.
  • ganeshhs.com google page rank

    My blog site ganeshhs.com has now Google Page Rank of 2/10.
    ganesh-h-s-google-page-rank

    When i started first project with zend framework may 2007, there were very few articles/tutorials and my first point of getting info was using search engine, then i realised it would be a great idea if my articles list in search engine and my first eye was on search engine optimization.

    Looking at my website analytics i noticed that my recent posts on zend lucene search had more number of unique visits which also increased my website daily visits to average of 100 visits (with more unique visits), and also i started getting backlinks from other websites(namely http://www.phpimpact.com/ etc.) which also contributed for this page rank.

    More essentially keywords(relevant to the context of the website/article) helps the articles to get indexed by search engines, following lists some of the blog articles and keywords i targeted and their stats in search engines Yahoo!/Google -

    Keyword Google Position Yahoo! Position
    Zend Lucene Search Page 1 Page 1
    Zend Auth Page 1 Page 2
    Zend Registry Page 1 -
    Zend Debug Page 1 -
    Zend Exception Page 1 -
    Zend Config Page 1 -
    Zend Loader Page 1 -
    « Previous Entries