Written by

CEO - Jacar Systems

Passionate about technology, cloud infrastructure and artificial intelligence. Writes about DevOps, AI, platforms and software from Madrid.

Inteligencia Artificial Metodologías

enterprise ai licencias llama llama 2 llm abierto mistral qwen

Choosing an Open LLM for Enterprise in 2024

February 20, 2024 9 min read 61 reads

Table of contents

Key Takeaways
The Candidates
Licence: First, Not Last
Benchmarks: Useful But Limited
Required Hardware
Fine-Tuning: When Yes, When No
Decision Checklist
Conclusion

Actualizado: 2026-05-03

A year ago, “open LLM for enterprise” basically meant Llama 2. Today the panorama is richer: Mistral 7B and Mixtral 8x7B, Qwen 1.5, Yi 34B, DeepSeek, Phi-2, and more. The option fan makes choosing harder, not easier. This article guides decisions with enterprise criteria — not just trendy benchmarks.

Key Takeaways

Licence comes first, not benchmark: choosing a model and discovering the licence does not allow commercial use is mistake number one.
Mistral 7B and Mixtral 8x7B are sweet spots for most mid-size enterprises given their performance, size, and Apache 2.0 licence.
For 80% of enterprise cases, prompt engineering + RAG is sufficient without fine-tuning.
Self-hosting is only justified with high constant volume, data that cannot leave, or strict compliance.
Academic benchmarks are useful but insufficient — always complement with own evaluation on real use cases.

The Candidates

Open models worth serious evaluation at the time of writing:

Llama 2^[1] (Meta): 7B/13B/70B. Solid base, licence restrictive for large commercial use.
Mistral 7B^[2] and Mixtral 8x7B^[3] (Mistral AI): Apache 2.0, excellent performance/size ratio.
Qwen 1.5^[4] (Alibaba): 0.5B up to 72B, strong multilingual support.
Yi^[5] (01.AI): 6B/34B, good in Chinese and English, permissive commercial licence.
DeepSeek^[6]: various sizes, very strong in code and maths.
Phi-2^[7] (Microsoft): 2.7B — small but competitive in reasoning.
CodeLlama^[8] (Meta): Llama fine-tuned for code, various sizes.

Licence: First, Not Last

Mistake number one: choose by benchmark then discover the licence does not allow commercial use. Key situations:

Apache 2.0 (Mistral, Mixtral, Yi): free for commercial use without size restrictions.
MIT (Phi-2): free, research-oriented.
Llama 2 License: allows commercial with clauses — restrictions above 700M MAU, must display “Built with Llama 2”.
Tongyi Qianwen License (Qwen): allows commercial under thresholds, similar to Llama 2.
“Community” licences: sometimes yes, sometimes no. Read the full text before committing.

Benchmarks: Useful But Limited

MMLU, HellaSwag, and GSM8K are the most known. Known limitations: contamination, gaming for specific benchmarks, and distance from real use. Always complement with own evaluations on real use cases.

Required Hardware

The hardware table filters options fast. For inference (not training):

Model	FP16	INT8	INT4 (GGUF)
Llama 2 7B	14 GB	8 GB	4 GB
Mistral 7B	14 GB	8 GB	4 GB
Mixtral 8x7B	94 GB	48 GB	25 GB
Llama 2 70B	140 GB	75 GB	40 GB
Yi 34B	68 GB	38 GB	20 GB

7B quantised fits on consumer RTX 4090 (24 GB) or M2/M3 16 GB laptop. Mixtral 8x7B needs A100 40 GB. For real self-hosting, 7B and 34B are most practical.

Fine-Tuning: When Yes, When No

In most enterprise cases, you don’t need fine-tuning. Prompt engineering + RAG covers 80%. Fine-tune makes sense for very domain-specific data, critical latency where long prompts don’t fit, or specific tone impossible via prompting. Start with prompting; fine-tune only when prompting demonstrably falls short.

Decision Checklist

Seven questions that filter options:

Licence: compatible with your use and scale?
Size / hardware: fits your infrastructure budget?
Languages: covers your users’ languages?
Own benchmark: performs on your real use cases?
Provider: self-host, open API, or hybrid?
Security: alignment sufficient or additional layers needed?
Roadmap: does the project have active continuity?

With that filter, options reduce to 2-3 — and own testing decides there.

Conclusion

The open-LLM ecosystem is mature enough that companies of almost any size find a viable model. Mistral 7B and Mixtral 8x7B are sweet spots for most. Llama 2 remains relevant, especially at 70B. For specialised domains (code, multilingual), DeepSeek, Qwen, and Yi provide valid alternatives. The decision should not be “which has the best MMLU” but “which fits my licence, hardware, language, and real cases”.

Was this useful?

[Total: 15 · Average: 4.5]

Post Views: 61

Written by

Javier Cañete

CEO - Jacar Systems

Passionate about technology, cloud infrastructure and artificial intelligence. Writes about DevOps, AI, platforms and software from Madrid.

Choosing an Open LLM for Enterprise in 2024

Key Takeaways

The Candidates

Licence: First, Not Last

Benchmarks: Useful But Limited

Required Hardware

Fine-Tuning: When Yes, When No

Decision Checklist

Conclusion

Related posts

Multi-agent systems: LangGraph vs CrewAI vs Autogen in 2026

How to build a production-ready agent with the Anthropic SDK, step by step

Claude Code vs Cursor vs GitHub Copilot in 2026: a comparison with measured tasks

MCP (Model Context Protocol) in 2026: the complete guide for engineering teams