10 releases (2 stable)
new 1.0.1 | May 17, 2024 |
---|---|
1.0.0 | Apr 28, 2024 |
0.7.1 | Apr 4, 2024 |
0.7.0 | Mar 27, 2024 |
0.1.0 | Feb 18, 2024 |
#277 in Machine learning
244 downloads per month
155KB
4K
SLoC
TokenGeeX - Efficient Tokenizer for CodeGeeX
This repository holds the code for the TokenGeeX Rust crate and Python package. TokenGeeX is a tokenizer for CodeGeeX aimed at code and Chinese. It is based on UnigramLM (Taku Kudo 2018).
Dependencies
~9–18MB
~269K SLoC