search mobile facets autocomplete spellcheck crawler rankings weights synonyms analytics engage api customize documentation install setup technology content domains user history info home business cart chart contact email activate analyticsalt analytics autocomplete cart contact content crawling custom documentation domains email engage faceted history info install mobile person querybuilder search setup spellcheck synonyms weights engage_search_term engage_related_content engage_next_results engage_personalized_results engage_recent_results success add arrow-down arrow-left arrow-right arrow-up caret-down caret-left caret-right caret-up check close content conversions-small conversions details edit grid help small-info error live magento minus move photo pin plus preview refresh search settings small-home stat subtract text trash unpin wordpress x alert case_deflection advanced-permissions keyword-detection predictive-ai sso

Sitemap.xml Support

The Site Search Crawler supports the Sitemap XML format. Refer to this format for the required and optional elements, character escaping, and other technical considerations and examples.

Using Sitemap can provide a significant speed boost to the crawl.

Instead of examining each page for new links to follow, the crawler will use your sitemap file(s) to download the URLs directly.

The Sitemap Format

The sitemap XML format specifies a list of URLs to index.

Example - Sitemap
<?xml version="1.0" encoding="UTF-8"?>

<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <url>
    <loc>http://www.yourdomain.com/</loc>
  </url>
  <url>
    <loc>http://www.yourdomain.com/faq/</loc>
  </url>
  <url>
    <loc>http://www.yourdomain.com/about/</loc>
  </url>
</urlset>

A sitemap file can also link to a list of other sitemaps:

Example - Sitemap index
<?xml version="1.0" encoding="UTF-8"?>

<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <sitemap>
    <loc>http://www.yoursite.com/sitemap1.xml</loc>
    <lastmod>2012-10-01T18:23:17+00:00</lastmod>
  </sitemap>
  <sitemap>
    <loc>http://www.yoursite.com/sitemap2.xml</loc>
    <lastmod>2012-01-01</lastmod>
  </sitemap>
</sitemapindex>

For full details, review the Sitemaps documentation

Installing Your Sitemap

The Site Search Crawler supports specifying Sitemap files in your robots.txt file.

Example - /robots.txt file with multiple Sitemap URLs
User-agent: *
Sitemap: http://www.yourdomain.com/sitemap1.xml
Sitemap: http://www.yourdomain.com/sitemap2.xml

If no Sitemap files are found in the robots.txt file, the crawler will try to find one at /sitemap.xml.

Unsupported Features

Site Search does not currently support:

  • Pinging to notify the crawler of Sitemap existence.
  • Page priority.
  • Last modification date.
  • Refresh frequency.

Stuck? Looking for help? Contact support or check out the Site Search community forum!