Written by

CEO - Jacar Systems

Passionate about technology, cloud infrastructure and artificial intelligence. Writes about DevOps, AI, platforms and software from Madrid.

Inteligencia Artificial

apache 2.0 apache-2 llm mistral mixtral moe open weights

Mixtral 8x22B: Open and Powerful Mixture of Experts

April 11, 2024 7 min read 102 reads

Table of contents

Key takeaways
What Mixtral 8x22B Is
Key Benchmarks
Required Hardware: The Limiting Factor
Production Serving
Conclusion

Actualizado: 2026-05-03

Mistral AI released Mixtral 8x22B on April 10, 2024, with their characteristic style: a Twitter magnet link without prior blog post or conferences. The community downloaded weights within hours and the next day benchmarks appeared. It’s the next generation of their MoE (Mixture of Experts) architecture, with 141B total parameters but only 39B active per forward pass. This changes the economics of serving open models.

Key takeaways

Mixtral 8x22B’s MoE architecture activates only 39B of its 141B parameters per token: large-model capability with medium-model inferential cost.
Apache 2.0 with no commercial restrictions — the most permissive large-scale option.
Superior multilingual vs Llama 3 70B, especially in Spanish, French, Italian, and German.
Minimum hardware is an A100 80GB or H100 80GB to serve quantised Q4; a 24 GB consumer GPU doesn’t reach.
Self-hosting pays off if you sustain more than 100M tokens/month; below that, hosted services are more efficient.

What Mixtral 8x22B Is

Sparse Mixture of Experts architecture:

8 “experts” of 22B parameters each.
Router selecting 2 experts per token.
Total: 141B parameters on disk.
Active per forward pass: ~39B.

Result: ~141B capacity with ~39B inferential cost.

Key Benchmarks

Benchmark	Mixtral 8x22B	Llama 3 70B	GPT-4	GPT-3.5
MMLU	77.8	79.5	86.4	70.0
HellaSwag	88.9	88.0	95.3	85.5
GSM8K	78.6	93.0	92.0	57.1
HumanEval	45.1	81.7	88.4	48.1
Multilingual (FR, ES, IT, DE)	Excellent	Good	Excellent	Medium

Superior multilingual vs Llama 3 70B — especially for European enterprise use. Behind on maths vs Llama 3 70B and on coding vs Claude 3 Opus.

Required Hardware: The Limiting Factor

Precision	VRAM
FP16	~280 GB
INT8	~140 GB
INT4 (GGUF Q4_K_M)	~80 GB

A 4090 (24 GB) can’t serve it even quantised. One A100 80GB or H100 80GB handles Q4.

Production Serving

bash

# vLLM with tensor parallel
python -m vllm.entrypoints.openai.api_server 
  --model mistralai/Mixtral-8x22B-Instruct-v0.1 
  --tensor-parallel-size 2 
  --gpu-memory-utilization 0.9 
  --max-model-len 32768

vLLM for best GPU throughput. llama.cpp for portability and mixed CPU-GPU offload.

Conclusion

Mixtral 8x22B confirms that Mistral AI leads the European open frontier. Its MoE architecture attractively balances quality and inferential efficiency. For teams that can afford the hardware, it’s currently the best open option for serious multilingual cases. For those who can’t, Mixtral 8x7B remains valid as lighter option. And for serious production without own GPU, hosted services offer pay-per-token access. The open ecosystem continues closing the gap with closed frontier models.

Was this useful?

[Total: 0 · Average: 0]

Post Views: 102

Written by

Javier Cañete

CEO - Jacar Systems

Passionate about technology, cloud infrastructure and artificial intelligence. Writes about DevOps, AI, platforms and software from Madrid.

Mixtral 8x22B: Open and Powerful Mixture of Experts

Key takeaways

What Mixtral 8x22B Is

Key Benchmarks

Required Hardware: The Limiting Factor

Production Serving

Conclusion

Related posts

Multi-agent systems: LangGraph vs CrewAI vs Autogen in 2026

How to build a production-ready agent with the Anthropic SDK, step by step

Claude Code vs Cursor vs GitHub Copilot in 2026: a comparison with measured tasks

MCP (Model Context Protocol) in 2026: the complete guide for engineering teams