Thursday, 9 June 2016

how to set crawlers and indexing on blogger

But before we open and start working on Robots.txt, let's have a brief overview of its significance:

Warning! Use with caution. Incorrect use of these features can result in your blog being ignored by search engines.

What is Robots.txt?

With every blog that you create/post on your site, a related Robots.txt file is auto-generated by Blogger. The purpose of this file is to inform incoming robots (spiders, crawlers etc. sent by search engines like Google, Yahoo) about your blog, its structure and to tell whether or not to crawl pages on your blog. You as a blogger would like certain pages of your site to be indexed and crawled by search engines, while others you might prefer not to be indexed, like a label page, demo page or any other irrelevant page.

How do they see Robots.txt?

Well, Robots.txt is the first thing these spiders view as soon as they reach your site. Your Robots.txt is like a hour flight attendant, that directs you to your seat and keep checking that you don't enter private areas. Therefore, all the incoming spiders would only index files that Robots.txt would tell to, keeping others saved from indexing.

Where is Robots.txt located?

You can easily view your Robots.txt file either on your browser by adding /robots.txt to your blog address like http://hemant9807.blogspot.in/robots.txt or by simply signing into your blog and choosing Settings > Search engine Preference > Crawlers and indexing and selecting Edit next to Custom robots.txt.

blogger custom robots

How Robots.txt does looks like?

If you haven't touched your robots.txt file yet, it should look something like this:
User-agent: Mediapartners-Google
Disallow:
User-agent: *
Disallow: /search
Allow: /
Sitemap: http://hemant9807.blogspot.in/feeds/posts/default?orderby=UPDATED
Don't worry if it isn't colored or there isn't any line breaks in code, I colored it and placed line breaks so that you may understand what these words mean.

User-agent:Media partners-Google:
Mediapartners-Google is Google's AdSense robot that would often crawl your site looking for relevant ads to serve on your blog or site. If you disallow this option, they won't be able to see any ads on your specified posts or pages. Similarly, if you are not using Google AdSense ads on your site, simply remove both these lines.

User-agent: *
Those of you with little programming experience must have guessed the symbolic nature of character '*' (wildcard). For others, it specifies that this portion (and the lines beneath) is for all of you incoming spiders, robots, and crawlers.

Disallow: /search
Keyword Disallow, specifies the 'not to' do things for your blog. Add /search next to it, and that means you are guiding robots not to crawl the search pages /search results of your site. Therefore, a page result like  http://hemant9807.blogspot.in/search/label/mylabel will never be crawled and indexed.

Allow: /
Keyword Allow specifies 'to do' things for your blog. Adding '/' means that the robot may crawl your homepage.

2 comments: