Futures
Access hundreds of perpetual contracts
TradFi
Gold
One platform for global traditional assets
Options
Hot
Trade European-style vanilla options
Unified Account
Maximize your capital efficiency
Demo Trading
Introduction to Futures Trading
Learn the basics of futures trading
Futures Events
Join events to earn rewards
Demo Trading
Use virtual funds to practice risk-free trading
Launch
CandyDrop
Collect candies to earn airdrops
Launchpool
Quick staking, earn potential new tokens
HODLer Airdrop
Hold GT and get massive airdrops for free
Launchpad
Be early to the next big token project
Alpha Points
Trade on-chain assets and earn airdrops
Futures Points
Earn futures points and claim airdrop rewards
Google released a paper called TurboQuant, and within 24 hours, the community had ported it to llama.cpp.
What did TurboQuant do? It compressed the KV cache of large models to 3 bits, reducing memory usage by a factor of 6, and sped up inference by 8 times on H100.
The key point is—no retraining needed, no fine-tuning required, and no loss of accuracy. This is one of the reasons chip stocks plummeted.
Samsung and SK Hynix dropped over 6% in Seoul, and Micron fell 6.9% in the US stock market.
The market's concern is—if each model can use 6 times less memory, doesn’t that reduce the demand for HBM?
But I think the market overreacted. The reason is simple. The saved memory won't go to waste. Smaller KV caches mean the same GPU can handle larger contexts and more concurrent requests. Demand won't decrease; it will just be redistributed.
This pattern has repeatedly appeared in tech history—when CPUs get faster, software consumes all the performance headroom. When bandwidth increases, video streaming consumes all the bandwidth. When memory becomes more efficient, models will grow larger and more demanding.
The discussion #20969 on llama.cpp already has a working CPU implementation (pure C, no dependencies) and CUDA kernels.
Someone has run it on Apple Silicon using Metal. This means the barrier to running models locally has dropped another level.
TurboQuant is short-term negative for chip stocks but represents an efficiency dividend for the entire AI industry in the medium term. Those running local models benefit—same Mac can now fit larger models. Chip companies, don’t panic—demand won't disappear; it will be used more efficiently.