On 24/04/2026, the AI company DeepSeek (headquartered in Hangzhou, China) announced a preview of DeepSeek V4 — a new generation of open-source models that, according to DeepSeek, has "closed the gap" with the leading frontier models. V4 launches with two Mixture-of-Experts (MoE) architecture variants, both supporting a context of up to 1 million tokens.
Quick summary
- When: 24/04/2026 — a preview.
- Who: DeepSeek, a Chinese AI company (Hangzhou).
- Models: open-source, two MoE variants — V4-Pro (large, strong at coding/agentic tasks) and V4-Flash (small, fast, cheap).
- Context: both have 1 million tokens.
- Claim: according to DeepSeek, V4-Pro competes with leading models on reasoning; according to MIT Technology Review it is still ~3–6 months behind on knowledge benchmarks.
What is the DeepSeek V4 preview?
According to TechCrunch, DeepSeek announced the V4 preview on 24/04/2026. This is an open-source model line, continuing DeepSeek's earlier generations that drew attention for their low training/deployment costs. The highlight this time is that DeepSeek has split it into two variants to serve two different sets of needs, rather than a single "all-in-one" model.
Both variants use the Mixture-of-Experts (MoE) architecture — activating only a portion of the parameters ("experts") for each query, which reduces compute cost compared with a model that activates all of its parameters.
Two variants: V4-Pro and V4-Flash
V4-Pro is the larger variant, aimed by DeepSeek at coding and agentic tasks (a model that plans on its own, calls tools, and executes multiple steps). DeepSeek claims V4-Pro competes with leading models such as Claude, GPT-5.x and Gemini 3.x on reasoning (according to DeepSeek).
V4-Flash is the smaller variant, optimized for speed and low cost — suited to high query volumes that need fast, cheap responses. Both keep a 1 million token context window, enough to handle long documents, large codebases or many conversations in a single call.
Selective-attention: cutting long-context costs
A notable technical point, according to the sources, is the selective-attention mechanism. With long context, this mechanism significantly reduces the amount of compute: according to DeepSeek, the compute cost for long context is only about 27% of the V3.2 version, and for V4-Flash this figure can drop to as low as about 10%.
This is important because long context is typically very expensive — costs usually rise quickly with input length. If this efficiency holds up in practice, it makes using a 1 million token context far more viable in terms of cost.
How far has it closed the gap with the top tier?
DeepSeek claims V4-Pro has closed the gap with frontier models on reasoning (according to DeepSeek). However, according to MIT Technology Review, V4 is still about 3–6 months behind the top tier on knowledge benchmarks. In other words: the gap has narrowed considerably but has not been completely closed.
On pricing, V4-Flash is cited at a very competitive level: about $0.14 per 1 million input tokens and $0.28 per 1 million output tokens. As for the input pricing of V4-Pro, the sources do not agree (some cite a very low figure, others several times higher) — so here we do not state a specific number; the figures differ by source and need to be reconfirmed from DeepSeek's official announcement.
A perspective for Vietnamese businesses
The emergence of increasingly powerful open-source models such as DeepSeek, Qwen or Gemma opens up an important option: businesses can run AI in-house (on-premise) on their own infrastructure instead of relying entirely on a foreign provider's API.
Practical benefits:
- Data stays in place. Data never leaves the organization — reducing compliance and leakage risks.
- Operational autonomy. No one can "remotely shut it off" or change the terms abruptly.
- Cost optimization. At high volumes, an open-source model run in-house can be cheaper in the long run.
Of course, on-premise requires suitable infrastructure and operational capability — this is a question that needs to be assessed case by case.
FAQ
Has DeepSeek V4 been officially released?
At the time the sources published (24/04/2026), this was a preview. The information may change when the official version launches — this is for reference.
Is V4-Pro as strong as Claude / GPT-5.x / Gemini 3.x?
According to DeepSeek, V4-Pro competes with these models on reasoning. However, according to MIT Technology Review, V4 is still about 3–6 months behind on knowledge benchmarks. It should be seen as "closing the gap" rather than having caught up.
How much does V4-Pro cost?
The sources currently do not agree on V4-Pro's input pricing, so we do not state a specific number. As for V4-Flash, it is cited at about $0.14 / 1M input tokens and $0.28 / 1M output tokens. Please reconfirm from DeepSeek's official announcement.
Deploy in-house AI with open-source models
Namtech helps businesses run powerful open-source models such as DeepSeek, Qwen and Gemma right on their own infrastructure — data stays in place, operations stay autonomous, with no dependence on foreign providers.
Book a free consultationNote: This article was compiled from public sources as of 22/06/2026; the figures are according to DeepSeek's announcement or the cited sources, and the situation may change. For reference only, not technical advice.