On July 18, 2023, Meta released LLaMA 2, the second generation of its language model. Unlike LLaMA 1 (research-only) this time the licence allows commercial use — with some caveats for companies over 700 million monthly users. For 99.9% of organisations, that means: you can download it, modify it, and use it in production royalty-free.
This significantly reshapes the open-model landscape.
What LLaMA 2 Offers
Meta published three sizes: 7B, 13B, and 70B parameters. Each in two variants: base (trained on general text prediction) and chat (RLHF-tuned for assistant-style conversation).
Key characteristics:
- Trained on 2 trillion tokens — twice as many as LLaMA 1 and with more rigorously filtered data.
- 4k-token context window. Limitation vs GPT-4 (8k-32k) or Claude 2 (100k) but extensible via techniques like RoPE scaling.
- Competitive on benchmarks. LLaMA 2 70B matches or beats GPT-3.5 on MMLU, TriviaQA, HumanEval, and others. It doesn’t reach GPT-4, but brings it much closer than any prior open model.
- Commercial licence. The most transformative part. The Llama 2 Community License allows product use for free, with few restrictions.
Ecosystem Impact
In the two weeks post-release, the ecosystem has exploded:
- Hugging Face hosts quantised versions in every combination: GGML, GPTQ, AWQ, 4-bit, 8-bit. Running LLaMA 2 7B on an 8 GB GPU is now trivial.
- Community fine-tunes: Vicuna, Wizard, Airoboros — dozens of variants tuned for specific tasks (code, reasoning, roleplay) appeared within days.
- Tool integration: Ollama, LM Studio, llama.cpp run LLaMA 2 locally on Mac, Linux, and Windows in minutes.
- Hosted services (Replicate, Anyscale, Together AI, AWS Bedrock) offer LLaMA 2 with low latency at ~$0.0008 per 1k tokens.
Scenarios Where LLaMA 2 Beats GPT on Total Value
Situations where LLaMA 2 — even if inferior in raw capability — is the better choice over GPT-3.5/4:
- Data privacy. If you can’t send customer data to external APIs (health, finance, defence), on-prem LLaMA 2 is the only way.
- Cost control at scale. Above some volume (~100M tokens/month), hosting your own LLaMA 2 is cheaper than the external API.
- Deep fine-tuning. You can tune weights directly, not just add LoRA layers on a managed base. With QLoRA it’s feasible to tune even the 70B on consumer hardware.
- Low latency. Running local removes API round-trips — relevant for interactive applications.
When GPT-4 Still Wins
And where LLaMA 2 doesn’t yet compete:
- Complex multi-step reasoning. GPT-4 keeps its edge on tasks where quality above 90% is critical.
- Broad multilingual capability. LLaMA 2 works well in English, acceptable in Spanish/French/German, patchy in others.
- Fast product integration. If operational cost isn’t the bottleneck, OpenAI’s API = less friction.
Hardware Requirements
To run LLaMA 2 locally:
| Model | Minimum VRAM (4-bit) | Full VRAM (fp16) |
|---|---|---|
| 7B | 4-6 GB | 14 GB |
| 13B | 8-10 GB | 26 GB |
| 70B | 40-50 GB | 140 GB |
With aggressive quantisation (GGML 4-bit), 7B runs on CPU on modern laptops (~10 tokens/s). 70B needs a big GPU or RAM+SSD offloading techniques, with performance degradation.
The Politics Behind
Meta’s LLaMA 2 strategy isn’t altruistic — it’s competitive. By releasing open models, Meta:
- Reduces OpenAI/Anthropic’s edge (whose monetisation depends on proprietary APIs).
- Accelerates AI tech adoption that Meta can then apply in its own products.
- Wins mindshare among developers, a long-term strategic asset.
This doesn’t make the model less useful — quite the opposite, that motivation ensures continued investment. But it helps to understand it as a strategic move, not philanthropy.
See our LLM fine-tuning analysis and Bard/PaLM 2 comparison to contextualise the current ecosystem.
Conclusion
LLaMA 2 marks the moment open LLMs become legitimate production options, not just research tools. For teams with privacy constraints, cost-at-scale pressure, or deep customisation needs, there’s now a viable alternative to proprietary models. The gap with GPT-4 still exists, but shrinks every few months.
Follow us on jacar.es for more on open LLMs, AI architecture, and deployment strategies.