According to Ryan Lee, MiniMax's head of developer relations, the company has open-sourced MiniMax Sparse Attention (MSA), a high-performance attention library for NVIDIA Blackwell (SM100) GPUs, under the MIT license. Lee announced M3 model weights will launch on Friday, June 13.
When applied to MiniMax-M3's million-token context inference, MSA reduces attention computation by 28.4x compared to Dense GQA at equivalent configuration. On H800 GPUs, the library achieved 14.2x pre-fill speedup and 7.6x decoding acceleration.