There are times when you want search engines to crawl your site and index its content and there are times when you don't. Typically, when your site is under development you want to prevent it being crawled and indexed
There are three ways of controlling (maybe "influencing" is a better word) how the main search engine's web crawlers crawl and index your site:
robots
meta tagx-robots-tag
HTTP headerrobots.txt
fileMaybe you use separate development and live servers, and use a robots.txt file or the x-robots-tag header to tell search engines not to crawl or index your development server. Or maybe you develop new content on your live server and you use a robots.txt file or a robots meta tag in the page to tell search engines not to crawl or index it.
In either case, when you go live you need to remember to remove the blocks and let the web crawlers in.
Maybe you use separate development and live servers, and use a robots.txt file or the x-robots-tag header to tell search engines not to crawl or index your development server, or maybe you develop new content on your live server and you use a robots.txt file or a robots meta tag in the page to tell search engines not to crawl or index it. In either case, when you go live you need to remember to change whatever you've used to let the web crawlers back in.
It's easy to forget to do this and there's no way to tell just by looking at your site that there's a problem.
SiteSentry solves this by checking each of these options for every site.
It downloads and scans robots.txt
for a setting which blocks access to all web crawlers and, if it finds it, sends you an error notification. It also looks for specific web crawlers being blocked and will warn you about these.
It checks for an x-robots-tag
in the header returned by your site for a value of none
or noindex
and will send you a notification if it finds it. It also checks for nofollow
, nosnippet
and noarchive
values and will warn you if these are present.
It checks the site's home page for a robots
meta tag containing none
or noindex
and will send you a notification if it finds it and, as with the x-robots-tag
check, it will also warn you if nofollow
, nosnippet
or noarchive
values are present.
Check out our docs for more information about SiteSentry's robots meta tag, x-robots-tag HTTP header and robots.txt checks.