Site Search Meta Tags
The Site Search Crawler supports a flexible set of meta tags to control how you ingest your website content.
When the crawler visits your webpage, by default, it extracts a standard set of fields (e.g. title, body).
It then indexes that content so it can be searched.
With these meta tags, you can alter the set of fields the crawler extracts to create ideal documents.
Note...
Your pages must be re-crawled before any code level changes will be received by Site Search!
See Crawler Troubleshooting if your documents seem out-of-sync with your live content.
Meta Tags
The template for a Site Search-friendly meta tag is:
<head>
<meta class="swiftype" name="[field name]" data-type="[field type]" content="[field content]" />
</head>
Each field must define specific name, type, and content values.
The field type - which is specified in the data-type
attribute - must be a Site Search supported field type.
Once a new meta tag has been indexed, custom schema fields are created.
Once created, the data-type
cannot be changed.
Choose your field's data-type
carefully. The field cannot be deleted!
The next example shows the creation of multiple fields.
As you can see: the tags field is repeated and as a result the crawler extracts an array of tags for this URL.
All field types can be extracted as arrays.
<head>
<title>page title | website name</title>
<meta class="swiftype" name="title" data-type="string" content="page title" />
<meta class="swiftype" name="body" data-type="text" content="this is the body content" />
<meta class="swiftype" name="price" data-type="float" content="3.99" />
<meta class="swiftype" name="quantity" data-type="integer" content="12" />
<meta class="swiftype" name="published_at" data-type="date" content="2013-10-31" />
<meta class="swiftype" name="store_location" data-type="location" content="20,-10" />
<meta class="swiftype" name="tags" data-type="string" content="tag1" />
<meta class="swiftype" name="tags" data-type="string" content="tag2" />
</head>
An important note is that the crawler will not capture default SEO meta tags, like these:
<head>
<meta name="description" content="A descriptive descriptor.">
<meta name="keywords" content="helpful, documentation">
</head>
To be indexed by the crawler, they would need to become Site Search friendly:
<head>
<meta class="swiftype" name="description" data-type="string" content="A descriptive descriptor.">
<meta class="swiftype" name="keywords" data-type="string" content="helpful, documentation">
</head>
And remember: once a field has been created, it can not be deleted.
Body-embedded Data Attribute Tags
Add data attributes to existing elements so you do not repeat tons of text in the <head>
of your page:
<body>
<h1 data-swiftype-name="title" data-swiftype-type="string">title here</h1>
<div data-swiftype-name="body" data-swiftype-type="text">
Lots of body content goes here...
Other content goes here too, and can be of any type, like a price:
$<span data-swiftype-name="price" data-swiftype-type="float">3.99</span>
</div>
</body>
Thumbnail Image Tags
Index images from your website and serve them as thumbnails to users in your search results.
Add an image <meta>
tag to the <head>
that indicates where images are located on your various page types:
<meta class="swiftype" name="image" data-type="enum" content="http://fullurl.com/example.jpg" />
Robots Meta Tag Support
Control which content is crawled on your webpages using robots meta tags.
- Using the Robots Meta Tag
- Robots Meta Tag Content Values
- Directing Instructions at Site Search Crawler Only
- Repeating Content Values
- Casing, Spacing and Ordering
Using the "robots" meta tag
Place the robots meta tag in the <head>
section of your page:
head
section
<!doctype html>
<html>
<head>
<meta name="robots" content="noindex, nofollow">
</head>
<body>
Page content here
</body>
</html>
Robots meta tag content values
Site Search supports the NOFOLLOW
, NOINDEX
, and NONE
values for the robots tag.
FOLLOW
and INDEX
are the defaults and are not necessary unless you are overriding a robots meta tag for Site Search.
Other values - such as NOARCHIVE
- are ignored.
Use NOINDEX
to tell the crawler not to index a page, :
<meta name="robots" content="noindex">
Links from an unindexed page will still be followed.
Use NOFOLLOW
to tell the crawler not to follow links from a page.
<meta name="robots" content="nofollow">
Content from a page that has NOFOLLOW
will still be indexed.
To not follow links and not index content from a page, use NOINDEX, NOFOLLOW
or NONE
.
<meta name="robots" content="noindex, nofollow">
NONE
is a synonym for the above:
<meta name="robots" content="none">
We recommend specifying the robots directives in a single tag, but multiple tags will be combined if present.
Directing instructions at the Site Search Crawler only
The meta name="robots"
will apply your instructions to all web crawlers, including Swiftbot, the crawler.
Use st:robots
as the name instead of robots
to direct special instructions at the crawler.
st:robots
overrides robots
for the crawler
<meta name="robots" content="noindex, nofollow">
<meta name="st:robots" content="follow, index">
This example tells other crawlers not to index or follow links from the page, but allows the Site Search to index and follow links.
When any meta name of st:robots
is present on the page, all other robots meta rules will be ignored in favor of the st:robots
rule.
Repeated content values
The crawler will use the most restrictive robots directives if they are repeated.
<meta name="robots" content="noindex">
<meta name="robots" content="index">
The above is equivalent to NOINDEX
.
Casing, spacing, and ordering
Tags, attribute names, and attribute values are all case insensitive.
Multiple attribute values must be separated by a comma, but whitespace is ignored.
Order is not important: NOINDEX, NOFOLLOW
is the same as NOFOLLOW, NOINDEX
.
The following are considered the same:
<meta name="robots" content="noindex, nofollow">
<META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW">
<META name="rOBOTs" content=" noIndex , NOfollow ">
Stuck? Looking for help? Contact support or check out the Site Search community forum!