Site Search Meta Tags

We have an Engine schema design guide to help you think through meta tags.

The Site Search Crawler supports a flexible set of meta tags to control how you ingest your website content.

When the crawler visits your webpage, by default, it extracts a standard set of fields (e.g. title, body).

It then indexes that content so it can be searched.

With these meta tags, you can alter the set of fields the crawler extracts to create ideal documents.

Meta Tags
Body-Embedded Data Attribute Tags
Thumbnail Image Tags
Robots Meta Tags

Note...

Your pages must be re-crawled before any code level changes will be received by Site Search!

See Crawler Troubleshooting if your documents seem out-of-sync with your live content.

Meta Tags

The template for a Site Search-friendly meta tag is:

<head>
  <meta class="swiftype" name="[field name]" data-type="[field type]" content="[field content]" />
</head>

Each field must define specific name, type, and content values.

The field type - which is specified in the data-type attribute - must be a Site Search supported field type.

Once a new meta tag has been indexed, custom schema fields are created.

Once created, the data-type cannot be changed.

Choose your field's data-type carefully. The field cannot be deleted!

The next example shows the creation of multiple fields.

As you can see: the tags field is repeated and as a result the crawler extracts an array of tags for this URL.

All field types can be extracted as arrays.

<head>
  <title>page title | website name</title>
  <meta class="swiftype" name="title" data-type="string" content="page title" />
  <meta class="swiftype" name="body" data-type="text" content="this is the body content" />
  <meta class="swiftype" name="price" data-type="float" content="3.99" />
  <meta class="swiftype" name="quantity" data-type="integer" content="12" />
  <meta class="swiftype" name="published_at" data-type="date" content="2013-10-31" />
  <meta class="swiftype" name="store_location" data-type="location" content="20,-10" />
  <meta class="swiftype" name="tags" data-type="string" content="tag1" />
  <meta class="swiftype" name="tags" data-type="string" content="tag2" />
</head>

An important note is that the crawler will not capture default SEO meta tags, like these:

 <head>
  <meta name="description" content="A descriptive descriptor.">
  <meta name="keywords" content="helpful, documentation">
</head>

To be indexed by the crawler, they would need to become Site Search friendly:

 <head>
  <meta class="swiftype" name="description" data-type="string" content="A descriptive descriptor.">
  <meta class="swiftype" name="keywords" data-type="string" content="helpful, documentation">
</head>

And remember: once a field has been created, it can not be deleted.

Body-embedded Data Attribute Tags

Add data attributes to existing elements so you do not repeat tons of text in the <head> of your page:

<body>
  <h1 data-swiftype-name="title" data-swiftype-type="string">title here</h1>
  <div data-swiftype-name="body" data-swiftype-type="text">
    Lots of body content goes here...
    Other content goes here too, and can be of any type, like a price:
    $<span data-swiftype-name="price" data-swiftype-type="float">3.99</span>
  </div>
</body>

Thumbnail Image Tags

Index images from your website and serve them as thumbnails to users in your search results.

Add an image <meta> tag to the <head> that indicates where images are located on your various page types:

<meta class="swiftype" name="image" data-type="enum" content="http://fullurl.com/example.jpg" />

Robots Meta Tag Support

Control which content is crawled on your webpages using robots meta tags.

Using the Robots Meta Tag
Robots Meta Tag Content Values
Directing Instructions at Site Search Crawler Only
Repeating Content Values
Casing, Spacing and Ordering

Using the "robots" meta tag

Place the robots meta tag in the <head> section of your page:

Example - Place the robots meta tag in the head section

<!doctype html>
<html>
  <head>
    <meta name="robots" content="noindex, nofollow">
  </head>
  <body>
    Page content here
  </body>
</html>

Robots meta tag content values

Site Search supports the NOFOLLOW, NOINDEX, and NONE values for the robots tag.

FOLLOW and INDEX are the defaults and are not necessary unless you are overriding a robots meta tag for Site Search.

Other values - such as NOARCHIVE - are ignored.

Use NOINDEX to tell the crawler not to index a page, :

<meta name="robots" content="noindex">

Links from an unindexed page will still be followed.

Use NOFOLLOW to tell the crawler not to follow links from a page.

<meta name="robots" content="nofollow">

Content from a page that has NOFOLLOW will still be indexed.

To not follow links and not index content from a page, use NOINDEX, NOFOLLOW or NONE.

<meta name="robots" content="noindex, nofollow">

NONE is a synonym for the above:

<meta name="robots" content="none">

We recommend specifying the robots directives in a single tag, but multiple tags will be combined if present.

Directing instructions at the Site Search Crawler only

The meta name="robots" will apply your instructions to all web crawlers, including Swiftbot, the crawler.

Use st:robots as the name instead of robots to direct special instructions at the crawler.

Example - st:robots overrides robots for the crawler

<meta name="robots" content="noindex, nofollow">
<meta name="st:robots" content="follow, index">

This example tells other crawlers not to index or follow links from the page, but allows the Site Search to index and follow links.

When any meta name of st:robots is present on the page, all other robots meta rules will be ignored in favor of the st:robots rule.

Repeated content values

The crawler will use the most restrictive robots directives if they are repeated.

<meta name="robots" content="noindex">
<meta name="robots" content="index">

The above is equivalent to NOINDEX.

Casing, spacing, and ordering

Tags, attribute names, and attribute values are all case insensitive.

Multiple attribute values must be separated by a comma, but whitespace is ignored.

Order is not important: NOINDEX, NOFOLLOW is the same as NOFOLLOW, NOINDEX.

The following are considered the same:

<meta name="robots" content="noindex, nofollow">
  <META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW">
  <META name="rOBOTs" content="     noIndex    ,     NOfollow   ">

Stuck? Looking for help? Contact support or check out the Site Search community forum!

Site Search

Guides

Site Search API

API Reference

API Clients

Plugins

Resources