Robots.txt Support
The Site Search Crawler supports the features of the robots.txt file standard and will respect all of its rules.
A robots.txt file is not required for Site Search to function, but it can help direct the crawler where you do or do not want it to go.
Disallow the Crawler
The robots.txt file can exclude portions of your site from Site Search by disallowing access to the Swiftbot user agent.
Careful! If your robots.txt is set to disallow content that has already been crawled, it will stay in your Engine but no longer be updated!
See Troubleshooting: Removing Documents if you run into that scenario.
/mobile/
path.
User-agent: Swiftbot
Disallow: /mobile/
User-agent: *
Disallow: /
Allow the Crawler
Use the Disallow
rule to permit Swiftbot places which you do not want other crawlers to go.
User-agents
, like those belonging to major search engines. Specifying a User-agent
overrides the wildcard (*
).
User-agent: Swiftbot
Disallow:
User-agent: *
Disallow: /
/documentation/
and disallowing any other User-agent
access to all pages.
User-agent: Swiftbot
Disallow: /documentation/
User-agent: *
Disallow: /
Control the Crawler
You can control the rate at which the Crawler will access your website by using the Crawl-delay
directive with a number indicating seconds.
A crawl is web traffic, so limiting it can reduce bandwidth. Limiting it too much, however, can slow the uptake of new documents!
5
seconds is 17,280 crawls per day.
User-agent: Swiftbot
Crawl-delay: 5
For fine-grained control over how your pages are indexed, you can configure Meta Tags. We even support robots Meta Tags.
Stuck? Looking for help? Contact support or check out the Site Search community forum!