02:36
Google TurboQuant:3bit量子化KVキャッシュは精度の損失なく、推論速度は最大8倍向上
Google Research has released the TurboQuant quantization compression algorithm, which can compress the KV cache of large language models to 3 bits, reducing memory usage by 6x and improving computational speed by 8x. The algorithm performs excellently across multiple benchmark tests, aiming to address model cache bottlenecks, and will be presented at ICLR 2026.
もっと