PDF Crawling

Pro and Premium plans can index PDFs up to 10MB in size.

The PDF URLs need to be discoverable within your site’s HTML pages or included in a sitemap.

The Crawler can extract text from:

The body of the PDF document.
Any values within the PDF files standard metadata fields:
- title
- author
- subject
- keywords

By default, the Crawler will try to flatten all the content of the PDF into a body text field.

Images and OCR are not supported. Custom and non-standard fonts can be embedded in the PDF file.

If you'd like more flexibility, please contact support and ask about PDF Extraction Rules in our Premium plan.

Stuck? Looking for help? Contact support or check out the Site Search community forum!