LLaMA 2 and the New Wave of Open Language Models
Actualizado: 2026-05-03
On July 18, 2023, Meta released LLaMA 2[1], the second generation of its language model. Unlike LLaMA 1 (research-only), this time the licence allows commercial use — with some caveats for companies over 700 million monthly users. For 99.9% of organisations, that means: you can download it, modify it, and use it in production royalty-free.
This significantly reshapes the open-model landscape.
Key Takeaways
- LLaMA 2 ships in three sizes (7B, 13B, 70B) and two variants (base and chat), with a commercial licence.
- The 70B model matches or beats GPT-3.5 on standard benchmarks and is closer to GPT-4 than any prior open model.
- Hugging Face, Ollama, LM Studio, and llama.cpp make it accessible on consumer hardware.
- Absolute advantages over proprietary APIs: privacy, cost at scale, unconstrained fine-tuning, and low latency.
- Meta’s strategy is competitive, not altruistic — which guarantees continued investment.
What LLaMA 2 Offers
Meta published three sizes: 7B, 13B, and 70B parameters. Each in two variants: base (trained on general text prediction) and chat (RLHF-tuned for assistant-style conversation).
Key characteristics:
- Trained on 2 trillion tokens — twice as many as LLaMA 1 and with more rigorously filtered data.
- 4k-token context window. Limitation vs GPT-4 (8k–32k) or Claude 2 (100k), but extensible via techniques like RoPE scaling.
- Competitive on benchmarks. LLaMA 2 70B matches or beats GPT-3.5 on MMLU, TriviaQA, HumanEval, and others. It doesn’t reach GPT-4, but it’s much closer than any prior open model.
- Commercial licence. The most transformative part. The Llama 2 Community License[2] allows product use for free, with few restrictions.
Ecosystem Impact
In the weeks post-release, the ecosystem explodes in four directions:
- Hugging Face hosts quantised versions[3] in every combination: GGML, GPTQ, AWQ, 4-bit, 8-bit. Running LLaMA 2 7B on an 8 GB GPU is now trivial.
- Community fine-tunes: Vicuna, Wizard, Airoboros — dozens of variants tuned for specific tasks (code, reasoning, roleplay) appear within days.
- Tool integration: Ollama[4], LM Studio[5], llama.cpp[6] run LLaMA 2 locally on Mac, Linux, and Windows in minutes.
- Hosted services (Replicate, Anyscale, Together AI, AWS Bedrock) offer LLaMA 2 with low latency at ~$0.0008 per 1k tokens.
Scenarios Where LLaMA 2 Beats GPT on Total Value
Situations where LLaMA 2 — even if inferior in raw capability — is the better choice over GPT-3.5/4:
- Data privacy. If you can’t send customer data to external APIs (health, finance, defence), on-prem LLaMA 2 is the only way. This constraint is explicit in NIS2 for critical infrastructure data.
- Cost control at scale. Above some volume (~100M tokens/month), hosting your own LLaMA 2 is cheaper than the external API.
- Deep fine-tuning. You can tune weights directly, not just add LoRA layers on a managed base. With QLoRA it’s feasible to tune even the 70B on consumer hardware.
- Low latency. Running local removes API round-trips — relevant for interactive applications.
When GPT-4 Still Wins
And where LLaMA 2 doesn’t yet compete:
- Complex multi-step reasoning. GPT-4 keeps its edge on tasks where quality above 90% is critical.
- Broad multilingual capability. LLaMA 2 works well in English, acceptable in Spanish/French/German, patchy in others.
- Fast product integration. If operational cost isn’t the bottleneck, OpenAI’s API = less friction.
Hardware Requirements
To run LLaMA 2 locally:
| Model | Minimum VRAM (4-bit) | Full VRAM (fp16) |
|---|---|---|
| 7B | 4–6 GB | 14 GB |
| 13B | 8–10 GB | 26 GB |
| 70B | 40–50 GB | 140 GB |
With aggressive quantisation (GGML 4-bit), 7B runs on CPU on modern laptops (~10 tokens/s). 70B needs a big GPU or RAM+SSD offloading techniques, with performance degradation.
The Politics Behind
Meta’s LLaMA 2 strategy isn’t altruistic — it’s competitive. By releasing open models, Meta:
- Reduces OpenAI/Anthropic’s edge (whose monetisation depends on proprietary APIs).
- Accelerates AI tech adoption that Meta can then apply in its own products.
- Wins mindshare among developers, a long-term strategic asset.
This doesn’t make the model less useful — quite the opposite, that motivation ensures continued investment. But it helps to understand it as a strategic move, not philanthropy.
Conclusion
LLaMA 2 marks the moment open LLMs become legitimate production options, not just research tools. For teams with privacy constraints, cost-at-scale pressure, or deep customisation needs, there’s now a viable alternative to proprietary models. The gap with GPT-4 still exists, but it shrinks every few months.