Optimising Your Crawl Budget

In Organic Searchby Dewan Chapman

Technical SEO For Your Crawl Budget

When optimising big websites (over 100,000 URLs) technical SEO starts to become the most
important component of your search strategy. Ignoring the ‘under-the- hood’ workings of a
website and how these workings affect search engines’ ability to discover, index and rank
pages, will result in high ‘technical debt’ which takes a lot of time and effort to pay off.
Getting a forensic SEO audit done on your website will bring to light and prioritise any issues
that should be fixed. It’s a worthwhile investment.

One action item that will doubtless come up in an audit is that of crawl budget.

Understanding Crawl Budget

Simply put, crawl budget is the number of times a search engine spider crawls your website in
a certain time frame. For example, Googlebot typically crawls one of our client’s sites 117,240
times a month, therefore our client’s monthly crawl budget is 117,240 pages per month –
that’s a decent budget, but needs to be managed correctly.

Each website’s crawl budget is different based on two factors:

1. Crawl Rate Limit

“Googlebot is designed to be a good citizen of the web. Crawling is its main
priority, while making sure it doesn’t degrade the experience of users visiting a
site. Google calls this the ‘crawl rate limit’ which limits the maximum fetching
rate for a given site.” – Google

The crawl rate can increase (good) or decrease (bad) based on a couple of factors:

• Site speed: if Googlebot can download (or crawl) a site quickly the crawl rate limit
increases, which is good. If a site is slow or responds with server errors the crawl rate
limit decreases, which is bad.
• Limit set in Search Console: website owners can reduce Googlebot’s crawling of their
site. Note that setting higher limits doesn’t automatically increase crawling.

2. Crawl Demand

“Even if the crawl rate limit isn’t reached, if there’s no demand from indexing,
there will be low activity from Googlebot.” – Google

The three factors that play a significant role in determining crawl demand are:

• Popularity: URLs that are more popular on the Internet tend to be crawled more often
to keep them fresher in Google’s index
• Staleness: Google attempts to prevent URLs from becoming stale in the index
• Site-wide events like site migrations may trigger an increase in crawl demand to re-
index the content under the new URLs.

3 Tips for Optimising Crawl Budget

I use Screaming Frog SEO Spider to simulate how Googlebot crawls a website and Google’s
Search Console to help identify issues affecting crawl budget.

1. Investigate Page Load Speed

The first thing to do is check the arch nemesis of crawl rate limit: page load speed!

Infographic of a Google Search Console Graph

Page Load Speed Graph From Google Search Console

The above screenshot is from Google’s Search Console (formally Webmaster Tools) under
Crawl Stats Time spent downloading a page. The sad face is pointing to an increase in the
time it took Googlebot to crawl the site which is waaaaay over the site average and is eating
up crawl budget.

We need to find out which pages could be at fault here. We can check this within Google
Analytics under Behavior ‘Site Speed’ Page Timings:

The page with 83.50% slower page load time could be one of the culprits. So, we need to look
at ways to reduce the page load time for that page and pages like it, this will result in a more
efficient crawl.

2. Manage URL Parameters

The problem of URL parameters is common within ecommerce sites and listing sites which
generate lots of dynamic URLs that load the same content (filters, search results etc.) By
default, Googlebot will treat these URLs as separate pages, this increases the risk of on-site

duplication which can waste crawl budget. Disallowing URL parameters and search results
from being crawled through the robots.txt file is recommended:
Disallow: /search/
Disallow: /*?s=

Some website’s have URL parameters that do not influence the content of the pages, in this
case, make sure you let Googlebot know about it by adding these parameters in your Google
Search Console account, under Crawl -> URL Parameters (this is an advanced feature within
Search Console, an SEO audit will tell you if this step is necessary for your site).

3. Fix HTTP and Redirect Errors

Every time Google fetches a URL, including CSS and JavaScript, it affects crawl budget. You
don’t want to waste your site’s budget on 404/503 pages or unnecessary redirect chains (a
redirect chain is the number of 301/302 redirects that occur before landing on the correct
page. Excessive redirects to a page is wasteful). Take a moment to test your site for any
broken links, server errors and redirect chains. This can be done through the Screaming Frog
SEO Spider.

Conclusion

If your website has less than 100,000 URLs you really don’t need to worry about crawl budget
at all and doing so would be a waste of your time and resources. However, if you have a big
site or a site that auto-generates pages based on URL parameters (like listing sites or
ecommerce) then auditing and prioritising your site’s crawl budget is a must.