Deep dive: The Meilisearch tokenizer
When you add documents to a Meilisearch index, the tokenization process is handled by an abstract interface called the tokenizer. The tokenizer is responsible for splitting each field by writing system (for example, Latin alphabet, Chinese hanzi). It then applies the corresponding pipeline to each part of each document field. We can break down the tokenization process like so:- Crawl the document(s), splitting each field by script
- Go back over the documents part-by-part, running the corresponding tokenization pipeline, if it exists