When Canonicalization is an Issue
Although extremely hard to pronounce, canonicalization seo is a hot topic right now. Google’s latest and greatest idea, canonicalization is the process of consolidating all duplicate URLs to one original canonical version. If there are a lot of URLs that lead to pretty much the same page, you’re going to make the search engines work extra hard and spend a lot more time crawling all the different URLs. Often times, this means that they’ll miss the important pages of your website because your crawl time is limited or too slow.
Here are some times when canonicalization is an issue:
1. When you have not redirected www and non-www versions of the website and they both resolve
These versions will be the same – but ideally, you need a 301 or permanent re-direct from one to the other in order to deliver the best possible results. Without this simple server redirect in place, you basically have two websites that will be indexed by the search engines – which spells bad news for your results.
2. You’ve changed your URL structure so that your information and content still exists on both the new and old versions
Obviously, you don’t want to lose traffic that is linking to or visiting the new content. In this situation, a 301 redirect is important to use and is usually the best possible way to redirect your traffic – especially since both the search engines and web browsers can follow a 301 redirect. (A 302 redirect can only be followed by a web browser and not a spider)
3. Your URL structure generates infinite URLs
If you have a dynamically generated URL structure that could generate an infinite amount of URLs, you’ll be in trouble! This generally happens in large e-commerce websites that have tons of product listings that can be sorted by price, size, closest to you, color, etc. If the website generates a different URL for each of these results, you could spell trouble. Most often, the reason that this is set up this way is so that your marketing department can add a tracking code to the URL to keep track of the campaigns.
Here’s an example – let’s say you have a new shoe campaign and your marketing department is sending out direct mail pieces, has an email marketing campaign, a blogger relationship database, and just search engine traffic. If the email url is “www.example.com/email”, direct mail is “www.example.com/dmail”, etc. then you can have a bunch of URLs for the same content.
If a spider suspects that the page can load with infinite URL variations, it can fall into a “spider trap” and stop indexing your website. Since there is limited resources for the spiders to crawl your website, important content may be left uncrawled. When this happens, it’s a great idea to use a canonical meta tag or Google Webmaster Tools parameter handling tool.
4. Your pages are blocked by the robots exclusion tag
As you probably know, the robots.txt exclusion helps you block out search engines from indexing the information on your website that you don’t want it to index. While it’s a good practice to use this tag on occasion, it’s very easy to accidentally block the spiders from indexing pages that are relevant and helpful. If your website isn’t being properly indexed – this is the first place I’d look.