LLaMA 2 and the New Wave of Open Language Models

Representación visual de redes neuronales procesando lenguaje

On July 18, 2023, Meta released LLaMA 2, the second generation of its language model. Unlike LLaMA 1 (research-only) this time the licence allows commercial use — with some caveats for companies over 700 million monthly users. For 99.9% of organisations, that means: you can download it, modify it, and use it in production royalty-free.

This significantly reshapes the open-model landscape.

What LLaMA 2 Offers

Meta published three sizes: 7B, 13B, and 70B parameters. Each in two variants: base (trained on general text prediction) and chat (RLHF-tuned for assistant-style conversation).

Key characteristics:

  • Trained on 2 trillion tokens — twice as many as LLaMA 1 and with more rigorously filtered data.
  • 4k-token context window. Limitation vs GPT-4 (8k-32k) or Claude 2 (100k) but extensible via techniques like RoPE scaling.
  • Competitive on benchmarks. LLaMA 2 70B matches or beats GPT-3.5 on MMLU, TriviaQA, HumanEval, and others. It doesn’t reach GPT-4, but brings it much closer than any prior open model.
  • Commercial licence. The most transformative part. The Llama 2 Community License allows product use for free, with few restrictions.

Ecosystem Impact

In the two weeks post-release, the ecosystem has exploded:

  • Hugging Face hosts quantised versions in every combination: GGML, GPTQ, AWQ, 4-bit, 8-bit. Running LLaMA 2 7B on an 8 GB GPU is now trivial.
  • Community fine-tunes: Vicuna, Wizard, Airoboros — dozens of variants tuned for specific tasks (code, reasoning, roleplay) appeared within days.
  • Tool integration: Ollama, LM Studio, llama.cpp run LLaMA 2 locally on Mac, Linux, and Windows in minutes.
  • Hosted services (Replicate, Anyscale, Together AI, AWS Bedrock) offer LLaMA 2 with low latency at ~$0.0008 per 1k tokens.

Scenarios Where LLaMA 2 Beats GPT on Total Value

Situations where LLaMA 2 — even if inferior in raw capability — is the better choice over GPT-3.5/4:

  • Data privacy. If you can’t send customer data to external APIs (health, finance, defence), on-prem LLaMA 2 is the only way.
  • Cost control at scale. Above some volume (~100M tokens/month), hosting your own LLaMA 2 is cheaper than the external API.
  • Deep fine-tuning. You can tune weights directly, not just add LoRA layers on a managed base. With QLoRA it’s feasible to tune even the 70B on consumer hardware.
  • Low latency. Running local removes API round-trips — relevant for interactive applications.

When GPT-4 Still Wins

And where LLaMA 2 doesn’t yet compete:

  • Complex multi-step reasoning. GPT-4 keeps its edge on tasks where quality above 90% is critical.
  • Broad multilingual capability. LLaMA 2 works well in English, acceptable in Spanish/French/German, patchy in others.
  • Fast product integration. If operational cost isn’t the bottleneck, OpenAI’s API = less friction.

Hardware Requirements

To run LLaMA 2 locally:

Model Minimum VRAM (4-bit) Full VRAM (fp16)
7B 4-6 GB 14 GB
13B 8-10 GB 26 GB
70B 40-50 GB 140 GB

With aggressive quantisation (GGML 4-bit), 7B runs on CPU on modern laptops (~10 tokens/s). 70B needs a big GPU or RAM+SSD offloading techniques, with performance degradation.

The Politics Behind

Meta’s LLaMA 2 strategy isn’t altruistic — it’s competitive. By releasing open models, Meta:

  • Reduces OpenAI/Anthropic’s edge (whose monetisation depends on proprietary APIs).
  • Accelerates AI tech adoption that Meta can then apply in its own products.
  • Wins mindshare among developers, a long-term strategic asset.

This doesn’t make the model less useful — quite the opposite, that motivation ensures continued investment. But it helps to understand it as a strategic move, not philanthropy.

See our LLM fine-tuning analysis and Bard/PaLM 2 comparison to contextualise the current ecosystem.

Conclusion

LLaMA 2 marks the moment open LLMs become legitimate production options, not just research tools. For teams with privacy constraints, cost-at-scale pressure, or deep customisation needs, there’s now a viable alternative to proprietary models. The gap with GPT-4 still exists, but shrinks every few months.

Follow us on jacar.es for more on open LLMs, AI architecture, and deployment strategies.

Entradas relacionadas