2026-04-07 06:00:28

According to monitoring by 1M AI News, Fireworks AI, a company specializing in AI inference infrastructure, has released a preview of Fireworks Training, expanding from a pure inference platform to an integrated platform for training and deployment. Fireworks AI was founded by Lin Qiao, a former Meta engineer involved in creating PyTorch, and is currently valued at $4 billion, processing 15 trillion tokens daily. The platform offers three levels: 1. Training Agent: designed for product teams without machine learning infrastructure, allowing them to describe tasks and upload data to complete the entire process from training to deployment, currently supporting only LoRA; 2. Managed Training: aimed at machine learning engineers, supporting SFT, DPO, and reinforcement fine-tuning, including full parameter training; 3. Training API: targeted at research teams, enabling customization of loss functions and training cycles, supporting algorithms such as GRPO and DAPO, with full-scale parameter training from single-node Qwen3 8B to Kimi K2.5 (trillion parameters) on 64 NVIDIA B200s. Fireworks AI clients in production inference, AI programming tools Cursor, Vercel, and Genspark have completed advanced reinforcement learning on this platform. Vercel trained an error correction model for its code generation product v0, achieving 93% error-free code generation compared to only 62% for Sonnet 3.5, and improved end-to-end latency by 40 times compared to the previously used closed model. Genspark fine-tuned the open trillion-parameter Kimi K2 model using reinforcement learning to create a deep research agent, increasing tool usage by 33% and reducing costs by 50%. Cursor completed distributed reinforcement learning for Composer 2 across 3-4 clusters worldwide (currently ranks first on CursorBench), sharing the same GPU pool for training and inference. Fireworks AI emphasizes its key technological advantage in numerical consistency between training and inference. MoE (Mixture of Experts) models are numerically more fragile than dense models, where minor changes in hidden states can alter expert routing and amplify cascading effects. Fireworks published KL divergence values between training and inference for all supported models, all below 0.01.

View Original

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.

1 Likes