Pro and Premium plans can index PDFs up to 10MB in size.
The PDF URLs need to be discoverable within your site’s HTML pages or included in a sitemap.
The Crawler can extract text from:
- The body of the PDF document.
- Any values within the PDF files standard metadata fields:
By default, the Crawler will try to flatten all the content of the PDF into a body text field.
Images and OCR are not supported. Custom and non-standard fonts can be embedded in the PDF file.
If you'd like more flexibility, please contact support and ask about PDF Extraction Rules in our Premium plan.