4 releases (2 breaking)

0.3.0 Apr 12, 2024
0.2.0 Sep 1, 2023
0.1.1 Jun 23, 2023
0.1.0 Jun 9, 2023

#266 in Database implementations

Download history 14021/week @ 2024-01-31 16451/week @ 2024-02-07 12941/week @ 2024-02-14 26434/week @ 2024-02-21 28769/week @ 2024-02-28 31592/week @ 2024-03-06 32603/week @ 2024-03-13 33747/week @ 2024-03-20 32759/week @ 2024-03-27 29656/week @ 2024-04-03 31856/week @ 2024-04-10 31422/week @ 2024-04-17 36368/week @ 2024-04-24 48191/week @ 2024-05-01 48911/week @ 2024-05-08 41364/week @ 2024-05-15

180,519 downloads per month
Used in 27 crates (8 directly)

MIT license

7KB
117 lines

#Tokenizer-API

An API to interface a tokenizer with tantivy.

The API will be kept stable in order to not break support for existing tokenizers.


lib.rs:

Tokenizer are in charge of chopping text into a stream of tokens ready for indexing. This is an seperate crate from tantivy, so implementors don't need to update for each new tantivy version.

To add support for a tokenizer, implement the Tokenizer trait. Checkout the tantivy repo for some examples.

Dependencies

~0.4–1MB
~24K SLoC