Site Search Meta Tags
The Site Search Crawler supports a flexible set of meta tags to control how you ingest your website content.
When the crawler visits your webpage, by default, it extracts a standard set of fields (e.g. title, body).
It then indexes that content so it can be searched.
With these meta tags, you can alter the set of fields the crawler extracts to create ideal documents.
Your pages must be re-crawled before any code level changes will be received by Site Search!
See Crawler Troubleshooting if your documents seem out-of-sync with your live content.
The template for a Site Search-friendly meta tag is:
<head> <meta class="swiftype" name="[field name]" data-type="[field type]" content="[field content]" /> </head>
Each field must define specific name, type, and content values.
The field type - which is specified in the
data-type attribute - must be a Site Search supported field type.
Once a new meta tag has been indexed, custom schema fields are created.
Once created, the
data-type cannot be changed.
Choose your field's
data-type carefully. The field cannot be deleted!
The next example shows the creation of multiple fields.
As you can see, the tag field is repeated, and as a result the crawler extracts an array of tags for this URL.
All field types can be extracted as arrays.
<head> <title>page title | website name</title> <meta class="swiftype" name="title" data-type="string" content="page title" /> <meta class="swiftype" name="body" data-type="text" content="this is the body content" /> <meta class="swiftype" name="price" data-type="float" content="3.99" /> <meta class="swiftype" name="quantity" data-type="integer" content="12" /> <meta class="swiftype" name="published_at" data-type="date" content="2013-10-31" /> <meta class="swiftype" name="store_location" data-type="location" content="20,-10" /> <meta class="swiftype" name="tags" data-type="string" content="tag1" /> <meta class="swiftype" name="tags" data-type="string" content="tag2" /> </head>
An important note is that the crawler will not capture default SEO meta tags, like these:
<head> <meta name="description" content="A descriptive descriptor."> <meta name="keywords" content="helpful, documentation"> </head>
To be indexed by the crawler, they would need to become Site Search friendly:
<head> <meta class="swiftype" name="description" data-type="string" content="A descriptive descriptor."> <meta class="swiftype" name="keywords" data-type="string" content="helpful, documentation"> </head>
And remember: once a field has been created, it can not be deleted.
Body-embedded Data Attribute Tags
Add data attributes to existing elements so you do not repeat tons of text in the
<head> of your page:
<body> <h1 data-swiftype-name="title" data-swiftype-type="string">title here</h1> <div data-swiftype-name="body" data-swiftype-type="text"> Lots of body content goes here... Other content goes here too, and can be of any type, like a price: $<span data-swiftype-name="price" data-swiftype-type="float">3.99</span> </div> </body>
Thumbnail Image Tags
Index images from your website and serve them as thumbnails to users in your search results.
Add an image
<meta> tag to the
<head> that indicates where images are located on your various page types:
<meta class="swiftype" name="image" data-type="enum" content="http://fullurl.com/example.jpg" />
Robots Meta Tag Support
Control which content is crawled on your webpages using robots meta tags.
- Using the Robots Meta Tag
- Robots Meta Tag Content Values
- Directing Instructions at Site Search Crawler Only
- Repeating Content Values
- Casing, Spacing and Ordering
Using the "robots" meta tag
Place the robots meta tag in the
<head> section of your page:
<!doctype html> <html> <head> <meta name="robots" content="noindex, nofollow"> </head> <body> Page content here </body> </html>
Robots meta tag content values
Site Search supports the
NONE values for the robots tag.
INDEX are the defaults and are not necessary unless you are overriding a robots meta tag for Site Search.
Other values - such as
NOARCHIVE - are ignored.
NOINDEX to tell the crawler not to index a page, :
<meta name="robots" content="noindex">
Links from an unindexed page will still be followed.
NOFOLLOW to tell the crawler not to follow links from a page.
<meta name="robots" content="nofollow">
Content from a page that has
NOFOLLOW will still be indexed.
To not follow links and not index content from a page, use
NOINDEX, NOFOLLOW or
<meta name="robots" content="noindex, nofollow">
NONE is a synonym for the above:
<meta name="robots" content="none">
We recommend specifying the robots directives in a single tag, but multiple tags will be combined if present.
Directing instructions at the Site Search Crawler only
meta name="robots" will apply your instructions to all web crawlers, including Swiftbot, the crawler.
st:robots as the name instead of
robots to direct special instructions at the crawler.
robotsfor the crawler
<meta name="robots" content="noindex, nofollow"> <meta name="st:robots" content="follow, index">
This example tells other crawlers not to index or follow links from the page, but allows the Site Search to index and follow links.
When any meta name of
st:robots is present on the page, all other robots meta rules will be ignored in favor of the
Repeated content values
The crawler will use the most restrictive robots directives if they are repeated.
<meta name="robots" content="noindex"> <meta name="robots" content="index">
The above is equivalent to
Casing, spacing, and ordering
Tags, attribute names, and attribute values are all case insensitive.
Multiple attribute values must be separated by a comma, but whitespace is ignored.
Order is not important:
NOINDEX, NOFOLLOW is the same as
The following are considered the same:
<meta name="robots" content="noindex, nofollow"> <META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW"> <META name="rOBOTs" content=" noIndex , NOfollow ">