Sunday, May 25, 2014

SEO for Crawlers

Inform Crawlers through robots.txt and Nofollow tag

Crawler or Spider is a piece of script used to collect the information of all the web pages available in the web. Search engine providers give them a name for the ease of understanding like Googlebot or Bingbot. The important part is that you as an owner of your site need to tell these crawlers what are the URL links to be indexed and what are the referring links from your site are to be considered by the search engine.

What is robots.txt?

robots.txt display
robots.txt display

A "robots.txt" is a text file in the root directory of every website which informs the search engines whether the webpage is allowed to be crawled or not. 


You can simply enter "www.yoursitename.com/robots.txt" in the browser’s address bar to view robots.txt file of your site.

Some of your site’s pages may contain confidential information and if you do not prevent search engines to stop crawling those pages using robots.txt file then all those confidential details will be shown in the search results to the public. Google Webmaster Tools offers a simple robots.txt generator to help you create this file. 

In case of prevent crawling on your subdomains then you need to generate a separate robots.txt file for that subdomain.

Is that enough using robots.txt to hide sensitive information?

It is definitely not a highly secure way to hide your sensitive content from search engines just using robots.txt for the following reasons:
  • As anyone can see the robots.txt file in the browser, some curious user may try to analyze the directories and judge the URLs you are hiding
  • Some search engines do not follow robots.txt exclusion and continue to index your confidential pages
  • Search engines will still show just the URL you blocked (without title and description) if there is a link somewhere on the web pointing your URL

Understanding rel="nofollow" for links

If you do not want certain links on your site to be appeared in the search engine then setting the value of the "rel" attribute of a link to "nofollow" will tell search engines that the link on your site shouldn't be followed.  You need to add “nofollow” to “rel” attribute in your link as shown below:
Adding No Follow Link
Adding No Follow Link

Where can I use Nofollow?

Nofollow can be used in many cases, here we explain some of the important cases:
  • This is very useful to avoid spammy site links entered in the comment section of your blog since blog comment section is highly vulnerable to comment spam like the one shown below. By using No-follow in rel attribute to these links confirms that you are not giving your page's reputation to a spammy site. More over links to the spammy sites will also affect the reputation of your own site in the search engine results.
Comment Spamming
Comment Spamming Example
  • Nofollow will also be useful in forums, guest books and shout-boards. Most of the blogging and forum providers add nofollow to user comments by default otherwise you need edit your comments manually. You can also use comment moderation like entering CAPTCHA code or using social networking for comment login.
  • Nofollow can also be useful when you are referring a link in your site but no interested in passing your outbound link reputation on to it. 
  • If you want to nofollow all the links on any of your site’s page use "nofollow" in your robots meta tag, which is placed inside the <head> tag of that page's HTML as shown below:
Nofollow All the links in a Page
Nofollow All the links in a Page
Content By webnots.com

No comments:

Post a Comment