What is Search index?
A search index is a body of structured data that a search engine refers to when looking for results that are relevant to a specific query. Indexes are a critical piece of any search system, since they must be tailored to the specific information retrieval method of the search engine’s algorithm. In this manner, the algorithm and the index are inextricably linked to one another. Index can also be used as a verb (indexing), referring to the process of collecting unstructured website data in a structured format that is tailored for the search engine algorithm.
One way to think about indices is to consider the following analogy between a search infrastructure and an office filing system. Imagine you hand an intern a stack of thousands of pieces of paper (documents) and tell them to organize these pieces of paper in a filing cabinet (index) to help the company find information more efficiently. The intern will first have to sort through the papers and get a sense of all the information contained within them, then they will have to decide on a system for arranging them in the filing cabinet, then finally they’ll need to decide what is the most effective manner for searching through and selecting from the files once they are in the cabinet. In this example, the process of organizing and filing the papers corresponds to the process of indexing website content, and the method for searching across these organized files and finding those that are most relevant corresponds to the search algorithm.
Swiftype’s high performance web crawler automatically indexes your websites content in a structured format that is optimized for our search algorithm, or site owners can pass specific information to their search engine index through an API. To customize the fields that comprise their website schema, site owners can use Swiftype’s custom meta tags or API documentation.
Furthermore, site owners can control the scope of their search engine index in the Swiftype dashboard by adding additional domains with blacklist or whitelist rules or eliminating and adding individual pages to their index.