DeepSeek Releases V4 Open-Source Model Series with 1.6T Parameters and MIT License

Gate News message, April 24 — DeepSeek has released the V4 series of open-source models under the MIT License, with weights now available on Hugging Face and ModelScope. The series includes two mixture-of-experts (MoE) models: V4-Pro with 1.6 trillion total parameters and 49 billion activated per token, and V4-Flash with 284 billion total parameters and 13 billion activated per token. Both support a 1 million token context window.

The architecture features three key upgrades: a hybrid attention mechanism combining compressed sparse attention (CSA) and heavily compressed attention (HCA) that significantly reduces long-context overhead—V4-Pro’s inference FLOPs for 1M context is just 27% of V3.2’s, and KV cache (VRAM for storing historical information during inference) is only 10% of V3.2’s; manifold-constrained hyperconnections (mHC) replacing traditional residual connections to enhance cross-layer signal propagation stability; and the Muon optimizer for faster training convergence. Pre-training used over 32 trillion tokens of data.

Post-training employs a two-stage approach: first training domain-specific experts via supervised fine-tuning (SFT) and GRPO reinforcement learning, then merging them into a single model through online distillation. V4-Pro-Max (highest inference mode) claims to be the strongest open-source model with top-tier coding benchmarks and significantly narrowed gaps with closed-source frontier models on reasoning and agent tasks. V4-Flash-Max achieves Pro-level reasoning performance with sufficient compute budget but is limited by parameter scale on pure knowledge and complex agent tasks. Weights are stored in mixed FP4+FP8 precision.

Disclaimer: The information on this page may come from third parties and does not represent the views or opinions of Gate. The content displayed on this page is for reference only and does not constitute any financial, investment, or legal advice. Gate does not guarantee the accuracy or completeness of the information and shall not be liable for any losses arising from the use of this information. Virtual asset investments carry high risks and are subject to significant price volatility. You may lose all of your invested principal. Please fully understand the relevant risks and make prudent decisions based on your own financial situation and risk tolerance. For details, please refer to Disclaimer.

Related Articles

OpenAI Engineer Clive Chan Challenges V4 Hardware Recommendations, Citing Errors and Vagueness vs. V3

Gate News message, April 24 — OpenAI engineer Clive Chan has raised detailed objections to the hardware recommendations chapter in the V4 technical report, calling it "surprisingly mediocre and error-prone" compared to the acclaimed V3 version. V3's hardware guidance, which included Q&A sessions

GateNews29m ago

Naver Launches AI Tab Beta as Google Gemini Enters South Korea Search Market

Gate News message, April 24 — Naver announced the start of a closed beta for AI Tab, its new conversational search feature, following Google's launch of Gemini in Chrome in South Korea. AI Tab will appear alongside Naver's existing search tabs, offering users a dedicated space for conversational

GateNews41m ago

India AI Engineering Hiring Surges 59.5%, Expands Beyond Tech Hubs

LinkedIn's AI Labor Market Report 2026, released on April 24, found that AI engineering hiring in India rose 59.5% year on year, marking the fastest pace among the markets studied by the platform. The growth was driven by demand spreading beyond established tech centers. Cities including

CryptoFrontier1h ago

Commonwealth Bank Cuts 120 Jobs Amid AI Expansion

Commonwealth Bank of Australia announced it will cut approximately 120 jobs as the nation's largest bank reviews roles and expands its use of artificial intelligence, according to Bloomberg. The cuts include 43 roles at Bankwest in Western Australia, with six positions affected by automation. This a

CryptoFrontier1h ago

Cursor reveals the reason for XAI training: the compute power is stuck, and SpaceX has separately secured acquisition options worth $60,000,000,000

Anysphere announced that Cursor will work with xAI to train a new model using Colossus infrastructure to overcome compute bottlenecks; SpaceX has proposed a $60 billion acquisition option (fully acquirable within 2026), and if not, it would pay about $10 billion as collaboration compensation. The two transactions are happening at the same time, reshaping who can train Cursor and who can buy Cursor. Cursor still allows many model backends, but in the long run it depends on whether SpaceX exercises its acquisition option.

ChainNewsAbmedia1h ago

Anthropic’s secondary-market valuation breaks $1 trillion: Forge Global overtakes OpenAI’s $880 billion

According to a report by Decrypt, Forge Global’s Anthropic secondary valuation is about $1 trillion, while OpenAI is about $880 billion, with the first leading reversal appearing in the secondary market. Anthropic’s ARR grew from about $9 billion at the end of 2025 to about $30 billion in March 2026—an increase of 233% over three months—boosting private-market valuations. Secondary valuations differ from primary fundraising, reflecting confidence in exits; going forward, expectations remain on four tracks: technology, policy, business, and narrative.

ChainNewsAbmedia1h ago
Comment
0/400
No comments