You are probably wondering what all of this robots file is about, why this is important for your website and how you can fix it.
The Robots file is like having a very huge tresspassing sign saying: “Do Not Cross!”. Who is it for?
It is for all of the search engines and crawlers.
What is crawling? Avoid it by using robots.txt.
When users surf the web, content is made available for them by search engines through 2 main ways: crawling and indexing.
Imagine crawlers like very tiny robots who, let’s say, are living on the web, and they just go from link to link, trying to gather as much information as possible from each link they stumble upon.
So, crawling place when search engine access publicly available webpages. It basically involves looking at the webpages and following the links on those pages.
Indexing, on the other hand, means gathering information about a page, so that it is made available through search engine results.
The problem with crawling is that sometimes you might not want to allow crawlers to access areas of your website, you might not want some pages to be found via search engines.
Such is the case with accessing pages that use limited server resources. That’s why you might want to use the robots.txt file.
What is the robots.txt file and why is it so important?
It is a text file which allows you to specify how you’d like your site to be crawled. Crawlers generally go through the robots.txt file from your website, before they crawl it.
The robots.txt file is so great because you can specify which parts can and cannot be crawled.
It’s so important because it allows you to control access to the files and directories on your server. It’s like an electronic NO TRESPASSING sign. It tells the Googlebot and other crawlers which files and directories on your server should not be crawled (nor displayed in search engine results).
What is the file’s location?
In order for it to be valid, it must be located on the root of the website host.
For example, in order to control crawling on all URLs below
http://www.yoursite.com/, the robots.txt file must be located at:
A robots.txt file can be placed on subdomains:
http://website.yoursite.com/robots.txt) or on non-standard ports:
http://yoursite.com:8181/robots.txt, but it cannot be placed in a subdirectory:
Add a robots.txt file to your webiste
If you’re using WordPress, you can simply search the web for a WordPress SEO Plugin, which comes with a robots txt file installed in it. For example, you can start using Squirrly SEO WordPress Plugin, which already has the robots file, so you don’t have to do anything. It will automatically post robots txt in your website.
Otherwise, if you don’t have a WordPress site or you don’t want to use plugins, you can use the button on ContentLook, which says: Sent this issue to your team. Afterwards, you can get the technical quide of your team to sort the matter out. We are going to provide you with the exact details of solving the issue.
So, all you need to do is to press the button.
If you want to do it on your own, I have a great source for you. Click here to learn how to create the file: http://www.robotstxt.org/robotstxt.html . You’ll find all the needed information.
If you want to re-check it, click here: http://www.frobee.com/robots-txt-check
And we’re done! Hope your robots.txt file will prevent your site for gathering unneeded annoying crawling!