DeepSeek V4-Flash goes live on Ollama Cloud, US-hosted: Claude Code, OpenClaw one-click integration

Local AI model running tool Ollama, announced on the X platform on 4/24, will add DeepSeek’s V4-Flash model released the day before by Chinese AI startup DeepSeek into the Ollama Cloud service. The inference host is located in the United States, and it provides three sets of one-click commands so developers can directly plug V4-Flash into mainstream AI software development workflows such as Claude Code, OpenClaw, and Hermes.

deepseek-v4-flash is now available on Ollama’s cloud! Hosted in the US. Try it with Claude Code: ollama launch claude –model deepseek-v4-flash:cloud Try it with OpenClaw: ollama launch openclaw –model deepseek-v4-flash:cloud Try it with Hermes: ollama launch hermes…

— ollama (@ollama) April 24, 2026

DeepSeek V4 Preview: two sizes, 1M context

According to a release announcement from DeepSeek’s official API documentation on 4/24, the DeepSeek-V4 Preview is being open-sourced in two sizes simultaneously:

Model Total parameters Active parameters Positioning DeepSeek-V4-Pro 1.6 trillion 49 billion Targeting a closed-source flagship DeepSeek-V4-Flash 1M 130 billion Fast, efficient, and low-cost

Both use a Mixture-of-Experts (MoE) architecture and natively support a 1 million tokens long context. In its announcement, DeepSeek stated: “1M context is now the default value for all DeepSeek official services.”

Architecture innovation: DSA sparse attention + token-wise compression

The core architectural improvements in the V4 series include:

Token-wise compression together with DSA (DeepSeek Sparse Attention)—significantly reducing the cost of inference computation and KV cache memory under ultra-long context

Compared with V3.2, in a 1 million tokens context scenario, V4-Pro requires only 27% of FLOPs for per-token inference, and the KV cache requires only 10%

Supports switching between Thinking and Non-Thinking dual modes, corresponding to different task-depth reasoning needs

At the API level, it is compatible with both OpenAI ChatCompletions and Anthropic APIs specifications, reducing migration costs for existing Claude/GPT clients.

Ollama Cloud’s three sets of one-click startup commands

Ollama’s official model page provides a cloud inference service using the model identifier deepseek-v4-flash:cloud. Developers can use the following three sets of commands to directly connect V4-Flash into existing AI software development workflows:

Workflow Command Claude Code ollama launch claude --model deepseek-v4-flash:cloud OpenClaw ollama launch openclaw --model deepseek-v4-flash:cloud Hermes ollama launch hermes

Worth noting is the signal of “US-hosted.” For enterprises and Western developers, the biggest concern when using Chinese open-source models is data being sent back to China. Ollama chooses to place the inference layer of V4-Flash in the United States, meaning the prompt and code content do not leave US legal jurisdiction, reducing friction in compliance and data sovereignty.

Why this matters to the AI industry

By connecting three ecosystems that were previously independent—DeepSeek V4-Flash, Ollama Cloud, and Claude Code—three layers of meaning are created:

Cost pathway: With V4-Flash’s 13 billion active parameters far smaller than GPT-5.5 (input $5, output $30 per million tokens) and flagship models like Claude Opus 4.7, for use cases such as small- and medium-sized agent tasks, batch summarization, and test automation, unit costs are expected to drop significantly

A geopolitical-risk intermediary layer: With Ollama as a US-registered intermediary inference layer, it enables enterprise users of Chinese-native models to avoid the concern of “sending data directly to DeepSeek’s Beijing servers,” which is a practical solution for the international spread of open-source models

Instant developer switching: Users of Claude Code and OpenClaw can switch models with a single line in the command line, without changing prompt structure or IDE settings. For scenarios like “multi-model regression testing” and “cost-sensitive batch tasks,” this is a genuine boost to productivity

Tied in with earlier DeepSeek news

This V4 release and the rapid integration with Ollama Cloud occur amid a backdrop where DeepSeek is currently negotiating its first round of external financing, with a valuation of $20 billion. V4 is a key product proof during DeepSeek’s capitalization process; using an open-source strategy plus fast diffusion with international hosting partners is its speed strategy before it establishes an overwhelming developer ecosystem. For OpenAI and Anthropic, an open-source replacement model that can be switched with one line inside Claude Code is a new variable in the race for control of agent workflows.

This article DeepSeek V4-Flash lands on Ollama Cloud, US-hosted: Claude Code, OpenClaw one-click integration first appeared on 链新闻 ABMedia.

Disclaimer: The information on this page may come from third parties and does not represent the views or opinions of Gate. The content displayed on this page is for reference only and does not constitute any financial, investment, or legal advice. Gate does not guarantee the accuracy or completeness of the information and shall not be liable for any losses arising from the use of this information. Virtual asset investments carry high risks and are subject to significant price volatility. You may lose all of your invested principal. Please fully understand the relevant risks and make prudent decisions based on your own financial situation and risk tolerance. For details, please refer to Disclaimer.

Related Articles

GPU Shortage Returns as Microsoft, Amazon Tighten Supply; AI Startups Face 32% Price Hike and Year-End Queues

Gate News message, April 25 — A GPU shortage is resurfacing as major cloud providers including Microsoft and Amazon concentrate computing capacity toward internal teams and major customers like OpenAI and Anthropic, leaving smaller AI startups facing price increases, extended wait times, and stricte

GateNews1h ago

Nvidia Deploys OpenAI Codex AI Agent Across Entire Workforce on Blackwell Infrastructure

Gate News message, April 25 — Nvidia has rolled out OpenAI's Codex, an AI agent powered by GPT-5.5, to its entire workforce following a successful trial with approximately 10,000 employees, according to internal communications from CEO Jensen Huang and OpenAI CEO Sam Altman. Codex is designed to as

GateNews2h ago

US State Dept Warns on DeepSeek AI Model Distillation

The US State Department issued a cable on April 24 to diplomatic and consular posts worldwide warning foreign governments about Chinese efforts to copy American AI systems through distillation, according to Reuters. The cable

CryptoFrontier2h ago

Stanford Professor's Health AI Startup Seeks $100M at $1B Valuation

Gate News message, April 25 — Stanford professor James Zou is raising approximately $100 million for Human Intelligence, a California startup developing AI models for human physiology, at a valuation of roughly $1 billion. The company builds on Zou's research in physiology and AI. His lab's

GateNews2h ago

AI Coding Startup Cognition in Talks for $25B Valuation Funding Round

Gate News message, April 25 — AI coding startup Cognition is in early talks to raise hundreds of millions of dollars or more at approximately a $25 billion valuation, according to people familiar with the matter. Interest has increased following SpaceX's acquisition of a rival AI coding startup. Co

GateNews2h ago

Meta to Deploy Millions of AWS Graviton Chips for AI Workloads

Gate News message, April 25 — Amazon announced on April 24 that Meta will use millions of AWS Graviton chips for AI workloads, marking a significant customer win for AWS's in-house ARM-based processors. The chips will be used for AI inference and general computing rather than

GateNews2h ago
Comment
0/400
No comments