Jacar mascot — reading along A laptop whose eyes follow your cursor while you read.
Inteligencia Artificial Tecnología

LLaMA 2 and the New Wave of Open Language Models

LLaMA 2 and the New Wave of Open Language Models

Actualizado: 2026-05-03

On July 18, 2023, Meta released LLaMA 2[1], the second generation of its language model. Unlike LLaMA 1 (research-only), this time the licence allows commercial use — with some caveats for companies over 700 million monthly users. For 99.9% of organisations, that means: you can download it, modify it, and use it in production royalty-free.

This significantly reshapes the open-model landscape.

Key Takeaways

  • LLaMA 2 ships in three sizes (7B, 13B, 70B) and two variants (base and chat), with a commercial licence.
  • The 70B model matches or beats GPT-3.5 on standard benchmarks and is closer to GPT-4 than any prior open model.
  • Hugging Face, Ollama, LM Studio, and llama.cpp make it accessible on consumer hardware.
  • Absolute advantages over proprietary APIs: privacy, cost at scale, unconstrained fine-tuning, and low latency.
  • Meta’s strategy is competitive, not altruistic — which guarantees continued investment.

What LLaMA 2 Offers

Meta published three sizes: 7B, 13B, and 70B parameters. Each in two variants: base (trained on general text prediction) and chat (RLHF-tuned for assistant-style conversation).

Key characteristics:

  • Trained on 2 trillion tokens — twice as many as LLaMA 1 and with more rigorously filtered data.
  • 4k-token context window. Limitation vs GPT-4 (8k–32k) or Claude 2 (100k), but extensible via techniques like RoPE scaling.
  • Competitive on benchmarks. LLaMA 2 70B matches or beats GPT-3.5 on MMLU, TriviaQA, HumanEval, and others. It doesn’t reach GPT-4, but it’s much closer than any prior open model.
  • Commercial licence. The most transformative part. The Llama 2 Community License[2] allows product use for free, with few restrictions.

Ecosystem Impact

In the weeks post-release, the ecosystem explodes in four directions:

  • Hugging Face hosts quantised versions[3] in every combination: GGML, GPTQ, AWQ, 4-bit, 8-bit. Running LLaMA 2 7B on an 8 GB GPU is now trivial.
  • Community fine-tunes: Vicuna, Wizard, Airoboros — dozens of variants tuned for specific tasks (code, reasoning, roleplay) appear within days.
  • Tool integration: Ollama[4], LM Studio[5], llama.cpp[6] run LLaMA 2 locally on Mac, Linux, and Windows in minutes.
  • Hosted services (Replicate, Anyscale, Together AI, AWS Bedrock) offer LLaMA 2 with low latency at ~$0.0008 per 1k tokens.

Scenarios Where LLaMA 2 Beats GPT on Total Value

Situations where LLaMA 2 — even if inferior in raw capability — is the better choice over GPT-3.5/4:

  • Data privacy. If you can’t send customer data to external APIs (health, finance, defence), on-prem LLaMA 2 is the only way. This constraint is explicit in NIS2 for critical infrastructure data.
  • Cost control at scale. Above some volume (~100M tokens/month), hosting your own LLaMA 2 is cheaper than the external API.
  • Deep fine-tuning. You can tune weights directly, not just add LoRA layers on a managed base. With QLoRA it’s feasible to tune even the 70B on consumer hardware.
  • Low latency. Running local removes API round-trips — relevant for interactive applications.

When GPT-4 Still Wins

And where LLaMA 2 doesn’t yet compete:

  • Complex multi-step reasoning. GPT-4 keeps its edge on tasks where quality above 90% is critical.
  • Broad multilingual capability. LLaMA 2 works well in English, acceptable in Spanish/French/German, patchy in others.
  • Fast product integration. If operational cost isn’t the bottleneck, OpenAI’s API = less friction.

Hardware Requirements

To run LLaMA 2 locally:

Model Minimum VRAM (4-bit) Full VRAM (fp16)
7B 4–6 GB 14 GB
13B 8–10 GB 26 GB
70B 40–50 GB 140 GB

With aggressive quantisation (GGML 4-bit), 7B runs on CPU on modern laptops (~10 tokens/s). 70B needs a big GPU or RAM+SSD offloading techniques, with performance degradation.

The Politics Behind

Meta’s LLaMA 2 strategy isn’t altruistic — it’s competitive. By releasing open models, Meta:

  • Reduces OpenAI/Anthropic’s edge (whose monetisation depends on proprietary APIs).
  • Accelerates AI tech adoption that Meta can then apply in its own products.
  • Wins mindshare among developers, a long-term strategic asset.

This doesn’t make the model less useful — quite the opposite, that motivation ensures continued investment. But it helps to understand it as a strategic move, not philanthropy.

Conclusion

LLaMA 2 marks the moment open LLMs become legitimate production options, not just research tools. For teams with privacy constraints, cost-at-scale pressure, or deep customisation needs, there’s now a viable alternative to proprietary models. The gap with GPT-4 still exists, but it shrinks every few months.

Was this useful?
[Total: 12 · Average: 4.6]
  1. LLaMA 2
  2. Llama 2 Community License
  3. quantised versions
  4. Ollama
  5. LM Studio
  6. llama.cpp

Written by

CEO - Jacar Systems

Passionate about technology, cloud infrastructure and artificial intelligence. Writes about DevOps, AI, platforms and software from Madrid.