Mistral

Mistral launches Mistral Small 4: merging reasoning, multimodality and coding into one open-source model

On 16/03/2026, French AI company Mistral AI officially launched Mistral Small 4 — an open-source model under the Apache 2.0 license, free to self-host and customize. The most notable point: this is Mistral's first "Small"-tier model to merge three capabilities that were previously separate — reasoning, multimodality and a coding agent — into a single, unified model.

Quick summary

  • When: 16/03/2026, by Mistral AI (France).
  • License: Apache 2.0 — free to self-host and customize.
  • Architecture: MoE with 119B total parameters, only 6B active per token (128 experts, 4 active); 256k token context.
  • What's new: merges Magistral (reasoning) + Pixtral (multimodality) + Devstral (coding agent) into one model, with a reasoning_effort parameter.
  • Efficiency: up to 40% faster, 3× the throughput compared to Mistral Small 3 (per Mistral); runs on ~4× NVIDIA H100.

What is Mistral Small 4?

According to Mistral, Mistral Small 4 was released on 16/03/2026 under the Apache 2.0 license — meaning businesses can download it, self-host it on their own infrastructure and customize it without commercial restrictions. This is the core difference compared to closed models accessible only via API.

On architecture, this is a Mixture-of-Experts (MoE) model with 119 billion total parameters but only 6 billion active parameters per token (128 experts, 4 active at a time). This design gives the model the "knowledge" of a large model while keeping the compute cost per token close to that of a small model. The model supports a context of up to 256k tokens.

AI illustration
The MoE architecture activates only a small fraction of parameters per token, optimizing inference cost. Photo: Google DeepMind / Pexels

Merging three capabilities into one model

The standout point, per Mistral, is that Small 4 unifies three capability lines that were previously separate:

  • Magistral — reasoning capability.
  • Pixtral — multimodal capability (processing both images and text).
  • Devstral — coding agent capability (reading, writing and editing code).

Instead of having to choose and operate multiple separate models for each need, users get a single model that handles all three. Mistral also added a reasoning_effort parameter, allowing you to tune the balance between reasoning depth and speed for each task — quick consideration or careful deliberation.

Performance and efficiency

According to Mistral, compared to the previous generation Mistral Small 3, Small 4 is up to 40% faster and achieves 3× the throughput. Mistral also says the model's output is about 20% shorter on the LiveCodeBench test — that is, more concise answers that still solve the problem, saving tokens.

These are all figures as published by Mistral; businesses should verify them on their own data and real tasks before moving to production.

Pricing and hardware requirements

On API cost, Mistral lists a price of around $0.10 per 1 million input tokens and $0.30 per 1 million output tokens — very competitive for a model that merges multiple capabilities.

Important for the self-hosting direction: according to Mistral, the model can run on around 4 NVIDIA H100 cards. Compared to frontier models that require large GPU clusters, this is a moderate hardware requirement, within reach of many businesses that want to run AI right inside their own infrastructure.

A developer at work
Apache 2.0 plus moderate hardware requirements pave the way for AI running on-premise. Photo: Pexels

Why it matters

Mistral Small 4 reflects the trend of open-source models becoming increasingly powerful, compact and affordable: merging multiple capabilities, a compute-efficient MoE architecture, low API pricing and especially an Apache 2.0 license that gives businesses full freedom to self-host. When a model is good enough yet still runs on moderate hardware, the barriers to deploying AI right inside an organization drop significantly.

FAQ

Is Mistral Small 4 free to use?

The model is released under the Apache 2.0 license, which allows you to download, self-host and customize it freely. If you use it via Mistral's API, it is charged per token (around $0.10 input / $0.30 output per 1 million tokens, per Mistral).

What hardware does a business need to run it?

According to Mistral, the model can run on around 4 NVIDIA H100 cards. This is reference information from the provider; the actual configuration depends on load, desired latency and how the deployment is optimized.

What does "merging three capabilities" mean?

Mistral previously had separate lines: Magistral for reasoning, Pixtral for multimodality, Devstral for the coding agent. Mistral Small 4 unifies all three into a single model, along with a reasoning_effort parameter to tune between reasoning depth and speed.

Deploy on-premise AI with open-source models

With an Apache 2.0 license and moderate hardware requirements, models like Mistral Small 4 are a great fit for on-premise AI deployment. Namtech helps you build a private AI platform running 100% on your own infrastructure — data stays in place, with no dependency on foreign providers.

Book a free consultation

Note: This article is compiled from publicly available sources as of 22/06/2026; the performance and hardware figures are as published by Mistral, and businesses should verify them independently. For reference only, not technical advice.