The Fragility of SEO
The following case study provides a lesson about the fragility of organic-search visibility, the importance of technical SEO, and the importance of matching relevant content and user experience to search intent. Scenarios like this are why you need to build SEO expertise within your organization or at least have access to experienced SEO practitioners.
Canonicalization. It's a word you don't hear in everyday life, unless you work with someone who does SEO. Google has a succinct help page about why one should canonicalize URLs and ways to do so.
The short explanation is that canonicalization consolidates information about URLs to establish one version of your content with which you want search engines and people to interact.
This case study shows how seemingly small errors can cause big problems, and how quickly the issue can impact traffic (and how quickly you can recover). Identifying details have been removed because of the non-disclosure agreement with the client.
The client released code on September 1 that was meant to include an adjustment to canonicalization rules so that URLs with Google Analytics or other marketing tracking parameters would be canonical to the base URL. Since a website may run many marketing campaigns, the goal of such a change is not to dilute the search engine's index with many URLs that are intended for other channels and which generally don't have different content than what is meant to be found in organic search.
The client had already been working for a couple months on making its website more efficient for search engines to crawl so that the most valuable content would be discovered and revisited more frequently. This should have been a minor change that would serve to prevent hard-to-notice problems from surfacing in the future.
Unfortunately there was a bug that caused all of the site's URLs with query parameters (not just tracking parameters) to be canonical to the version of the URL without query parameters. In other words, the site has pages like domain.com/widget and domain.com/widget?type=awesome where the "awesome" version is a subset of all the widgets (since not all widgets are awesome) and in this case the awesome widgets became canonical to the all widgets page.
Unfortunately in this incident, the awesome widgets were among the top traffic-driving pages on the website. In fact, in that set of affected URLs were 6 of the top 10 traffic-generating pages on the website. This type of URL generally contributes 18% to 20% of the site's Google organic traffic on a daily basis. In an unfortunate irony, Googlebot had been increasing the volume of pages crawled on a daily basis and on 9/1 it crawled nearly 1 million pages, including many of the URLs with query parameters (such as ?type=awesome).
The result was nearly immediate. On Friday, 9/2, the parameterized URLs that were crawled on 9/1 lost all visibility in Google and were replaced by their unintended canonical versions. This parameterized page type was down -67% in visits from its previous two-week daily average. By Tuesday, 9/6, it was down -83%. While your focus may be on the traffic decline here, the really important takeaway is the speed with which Google processes changes in important technical signals such as the canonical link.
Being the Friday of Labor Day weekend, no one noticed the problem right away, both because people were off for the holiday and because it was expected that traffic would be lower on the holiday weekend anyway.
On Monday (Labor Day) I checked my various dashboards to find the following changes in traffic by page type.
Fortunately, I had already set up a segmentation by URL patterns to be able to highlight changes (good or bad) by page type. When working with large websites, it's common to work on improving particular templates in particular sprints rather than working on the whole site at once. So the nosedive of the parameterized page type really stood out.
Custom alerts were in place in Google Analytics, but the threshold of change to trigger the alert was just high enough not to cause the alert. (Google Analytics alerts are pretty easy to set up. While recommended, they are not a replacement for a human being paying attention to the data.)
The client's product and engineering teams were responsive and able to get a fix released to production late Tuesday, 9/6. Just for good measure I used Google Search Console's fetch as Google to ensure the top URLs that were affected by this change were quickly crawled for re-indexing. As another testament to Google's speed, the URLs started returning to normal the same day, Wednesday, 9/7.
Let's look in detail at the effect on a particular page and its unintended canonical. This one happened to be the top traffic-driving page on the site almost every day for the past few months.
The parameterized URL stopped ranking for its relevant phrases. It was replaced by the unintended canonical, which was a less-specific page, though one that had 100x the internal links of the canonical. One might have supposed that the strong signal of internal links might have overcome the less-specific content, but since it didn't in this case we must take away that relevance is the higher priority.
Because of the lower relevance the unintended canonical did not rank as highly as the original page for some of the most searched keyword phrases.
While ranking lower is never desirable because of the general loss in click-through rate that comes with a lower position, just as - or more - damaging was loss of visibility for the long tail of keywords for which the site ranked.
The lower relevance also caused a worse user experience for people who did click through, as we infer from the change in bounce rate during the period. More people who landed on the less-relevant unintended canonical page reacted negatively because they had a harder time finding exactly what they were looking for.
Had this bug continued for too long it could have seriously adversely affected Google's view of the quality of the site by having both a lower than expected click-through rate and a higher bounce rate, indicating a poor result that deserves less visibility. This is an over-simplification, but when Google considers a result (or site) low quality, it often leads to lower rankings. Without significant positive signals, it can be hard to regain the former state, and it's hard to generate enough signal to change course when you're getting less traffic.
This episode of canonical chaos ultimately turned out well since the problem was quickly fixed and the traffic quickly returned. Here are a few takeaways:
- Google reacts quickly when it finds changes in signals it thinks are important (in this case, the canonical link) so when you're releasing something designed to impact organic-search performance, be vigilant in the hours and days following the release
- If possible, put automated QA programs in place so you can catch violations of your SEO rules before you release a bug
- If you don't have someone watching your traffic like a hawk, ensure you have some kind of alerting mechanism in place. It's best to go beyond just the total traffic or channel level to put alerting on key channel + page type combinations.
- If possible, create segments in your web analytics or SEO analytics tools to more easily detect anomalies in subsections of your website (URL patterns or page types are handy for segmentation)
- Improving the volume and frequency of what Google is crawling on your site can create and accelerate opportunities for change (sometimes bad, but mostly good)
- SEO is not a one-time project; it requires ongoing attention
- Relevance and a user experience that helps people find what they're looking for are critical to organic-search succes