Gate News message, April 24 — DeepSeek’s V4 technical report reveals that V4-Flash and V4-Pro were pre-trained on 32T and 33T tokens respectively, double the approximately 15T tokens used for V3. The report acknowledges encountering “significant instability challenges” during training, with loss spikes repeatedly occurring due to anomalies in the Mixture-of-Experts (MoE) layer; the routing mechanism itself exacerbates these anomalies, and simple rollback cannot resolve the issue.
DeepSeek implemented two solutions now applied to actual training: Anticipatory Routing, which decouples routing index computation from backbone network updates and automatically triggers only when loss spikes are detected (adding approximately 20% overhead), and SwiGLU Clamping, which directly suppresses anomalies by clamping activation values to a fixed range. The report states both approaches are effective but admits “the underlying principles remain insufficiently understood.”
Susan Zhang, a Google DeepMind researcher who previously worked at Meta AI and OpenAI, commented that the instability triggered by doubling training data “explains the delay.” She described the two solutions as “band-aids” while acknowledging DeepSeek’s technical transparency.
Disclaimer: The information on this page may come from third parties and does not represent the views or opinions of Gate. The content displayed on this page is for reference only and does not constitute any financial, investment, or legal advice. Gate does not guarantee the accuracy or completeness of the information and shall not be liable for any losses arising from the use of this information. Virtual asset investments carry high risks and are subject to significant price volatility. You may lose all of your invested principal. Please fully understand the relevant risks and make prudent decisions based on your own financial situation and risk tolerance. For details, please refer to
Disclaimer.
Related Articles
Web3 AI Infrastructure AIW3 Raises $2M in Seed Funding Led by Buffalo Capital
Gate News message, April 24 — Web3 AI infrastructure platform AIW3 announced the completion of a $2 million seed round funding. The round was led by Buffalo Capital, with GalaXin Capital and Three-stones Ventures participating as co-investors.
AIW3 is transitioning toward an Agent-as-a-Service
GateNews16m ago
Cohere Acquires German AI Firm Aleph Alpha, Secures $600M Investment for European Expansion
Gate News message, April 24 — Canadian AI company Cohere announced plans to acquire German AI firm Aleph Alpha to strengthen its presence in Europe. Schwarz Group, a backer of Aleph Alpha, plans to invest $600 million in Cohere's Series E funding round.
The funding round is expected to close in 202
GateNews58m ago
Xpeng, Xiaomi Lead In-Car AI Push at Beijing Auto Show
Gate News message, April 24 — Chinese automakers showcased advanced in-car AI systems at the Beijing Auto Show on April 24, as the country accelerates its AI Plus strategy and seeks greater independence from foreign semiconductors.
Xpeng demonstrated voice-controlled parking that allows drivers to
GateNews1h ago
Former ByteDance Seed Engineer: ByteDance AI Iteration Takes Six Months vs Google's Three Months
Gate News message, April 24 — Zhang Chi, a former engineer at ByteDance's Seed team and current assistant professor at Peking University, revealed on the podcast "Into Asia" that ByteDance requires approximately six months to complete one full cycle of large language model training (pretraining
GateNews1h ago
OpenAI Engineer Clive Chan Challenges V4 Hardware Recommendations, Citing Errors and Vagueness vs. V3
Gate News message, April 24 — OpenAI engineer Clive Chan has raised detailed objections to the hardware recommendations chapter in the V4 technical report, calling it "surprisingly mediocre and error-prone" compared to the acclaimed V3 version. V3's hardware guidance, which included Q&A sessions
GateNews2h ago
Naver Launches AI Tab Beta as Google Gemini Enters South Korea Search Market
Gate News message, April 24 — Naver announced the start of a closed beta for AI Tab, its new conversational search feature, following Google's launch of Gemini in Chrome in South Korea.
AI Tab will appear alongside Naver's existing search tabs, offering users a dedicated space for conversational
GateNews2h ago