If you’re doing the SEO of your website on your own, you should know that your website crawling rate and index status are 2 important factors that affect your website ranking. Google and other search engines first need to crawl every pages of your website, and then index it in their search result. Crawling your website is an automatic process but you can always speed it up by requesting Google to do so manually.
Here’s the free tool you need – Google Search Console (previously known as Google Webmaster Tools). Sign in with your Google/Gmail account or sign-up for a new account, then follow our methods below to improve your website’s crawling rate & indexibility with a breeze.
First thing first, make sure you have the FTP or backend access of your website, so that you can add your website in Google Search Console and verify it is yours. We can then move on to the methods to improve your lovely website’s crawling rate and indexibility.
Submit Your XML Sitemap
A XML Sitemap is a list of pages on your website that tells search engines about all the pages that exist and how they link to each other. It is important that you have a XML Sitemap for your website as it will let search engine knows how many pages that you have, and allow them to index it.
To create XML Sitemap, you can visit site like Web-site-map.com or using a plugin that is meant for your CMS. Make sure to upload the XML file to root directory that is accessible by everyone.
Make sure your XML sitemap works well, try to avoid orphan page and keep your links flat (less than 4 levels deep).
If your website is large and have more than 50,000 ULRs in the sitemap, Google suggests to split the sitemap. You can then create an index of sitemap that contains all links to your sitemaps. Sounds complicated isn’t it? You can read more about creating a simpler version of your Sitemap here.
Once done, go to your Search Console and go to Crawl > Sitemaps > Add/Test Sitemap. It is better that you test it first and see if it can proceed properly, then only add it. It will then take days or weeks for Google to crawl and fully index your pages.
Robots.txt file in your website root folder is very, very important as a wrongly written robotx.txt file can restrict crawlers’ access to your website completely. This tiny text file gives Google the instructions on how and where their robots should crawl. Although unintended, webmasters often leave their data sensitive areas such as members’ login, accounts details crawl-able by Google.
This is the format that you can use:
Whereas the /directory/ is the directory that you do not want the robot to access. Most of the time, it should be your backend admin login URL.
Fetch As Google
You will never know when Google will decide to send their bots to come crawling your website, but you can do something about it. You can actually request Google bots to crawl your website as soon as you have an update, but it comes with a quota though.
Going to Crawl > Fetch As Google in your Google Search Console, you can insert the URL of your website for Google bots to crawl. There’s 2 options, FETCH and FETCH AND RENDER.
By using FETCH AND RENDER you can take a look at the preview on how Google renders your website.
You have 100 times in a month for the ‘Crawl only this URL’ command and 10 times in a month for the ‘Crawl this URL and its direct link’ command. Whenever you added a lot of new links, it is recommended that you use the second option. So that all pages that link to the URL you submit will be crawled and index.
Also remember to repeat this step whenever you have updates on your website to speed up the index process.
Fixing Crawl Errors
Navigate to Crawl > Crawl Errors in Google Search Console will bring up a list of crawl errors in your website. There’s Server Error, Access Denied and Not found errors.
If you have notification on Server Error, check with your hosting provider on the up time of your website. The server may be down or unreachable at the time when Google tries to crawl your website.
For Access Denied error, the crawler robot is blocked while trying to access that particular page. If you do not want the crawler to crawl the page, remember to use robots.txt to disallow it.
Not Found errors are 404 pages which couldn’t be found. This may be caused by deleted pages, or change of permalinks/URL in the past. You can simply fix this by redirecting the missing URL to some other pages on your website. Having too many Not Found pages in your website can have negative effect on your website SEO.
So, learned a thing or two? Share this with your friends who are also managing websites on their own!
Latest posts by Teoh (see all)
- Here’s 4 Online Business Ideas in 2018 and How You Could Start Now - July 2, 2018
- 0% GST from June 1, What does this mean for Businesses and Consumers? - May 17, 2018
- Greener Office Saves The Earth, Starts from Jumix! - March 26, 2018