132 releases
new 0.21.4 | Apr 23, 2024 |
---|---|
0.21.2 |
|
0.20.22 | Nov 28, 2023 |
0.20.7 | Jun 14, 2023 |
0.2.9 | Mar 28, 2019 |
#580 in Machine learning
31,071 downloads per month
Used in 39 crates
(2 directly)
540KB
12K
SLoC
tract-linalg
linalg stands for "linear algebra". This is a misnamer. This crates contains low-level, architecture dependant optimisations used by tract-core.
Functions
- MatMatMul: Extended matrix*matrix product:
- inspired by Gotoblass and BLIS micro kernel approach
- extended for convolution friendly addressing (fused img2col)
- fused output pipeline (min, max, and a few more simple, fast ops)
- f32*f32 -> f32 (à la sgemm)
- i8*i8 -> i32 accumulator -> i32 storage
- i8*i8 -> i32 accumulator -> i8 (with channel zeropoint and scale, and re-quantization pipeline)
- f32 sigmoid and f32 tanh: at f32 precision, by a rationale function (no exponentiation)
- byte-to-byte lookup table
Implementations
generic fallback | armv6, vfp | armv7 neon | armv8 simd | x64 FMA | |
---|---|---|---|---|---|
MatMatMul f32 | 4x4 | 8x4 | 8x8 | 16x6 | |
MatMatMul i8->i8 | 8x4 | 8x8 | |||
MatMatMul i8->i32 | 8x8 | ||||
sigmoid f32 | 4n | 4n | |||
tanh f32 | 4n | 4n | |||
byte lookup |
Dependencies
~8–11MB
~200K SLoC