Canonical Element Crawling
Canonical URLs inform search engines how to handle duplicate content.
Including a canonical URL within the <head></head>
tags of your webpage will declare it "source of truth":
<link rel="canonical" href="https://example.com/tea/peppermint" />
These tags tell search engine crawlers that:
"The original source of this content is:
https://example.com/tea/peppermint
."
Thus, if the content appears anywhere else, search engines will grant authority to the correct page.
The Site Search Crawler will obey these tags, too.
Be warned: an incorrect implementation will prevent your pages from being indexed.
Two common case are:
Incorrect canonical URLs
Canonical link elements should include the precise URI: https://example.com/tea/peppermint
.
For example, your homepage URI is https://example.com
.
Mistakenly, the homepage URL is included as the canonical link element on every page.
The crawler will follow the link and assume there is only one page - https://example.com
- and the other pages will not be indexed.
This is incorrect!
Each page should have its own unique URI.
The URL seen when you browse the page must match the one within the canonical URL.
Redirect loops
If you have setup redirects, be sure that your canonical URLs do not contradict your redirects.
For example, your pages have https://example.com/tea/peppermint/
as the canonical link element.
But the page is set to redirect to https://example.com/tea/peppermint
.
Note the absence of the trailing slash: /peppermint/
vs. /peppermint
.
The crawler will become stuck in a loop.
It tries to go to ../peppermint
but it will be directed to: ../peppermint/
by the canonical link element, then back again, and again...
And it will be stuck in this loop until it gives up.
Stuck? Looking for help? Contact support or check out the Site Search community forum!