Written by

CEO - Jacar Systems

Passionate about technology, cloud infrastructure and artificial intelligence. Writes about DevOps, AI, platforms and software from Madrid.

Inteligencia Artificial Tecnología

ia generativa licencias llama 2 llm abierto meta open-source

LLaMA 2 and the New Wave of Open Language Models

August 3, 2023 8 min read 125 reads

Table of contents

Key Takeaways
What LLaMA 2 Offers
Ecosystem Impact
Scenarios Where LLaMA 2 Beats GPT on Total Value
When GPT-4 Still Wins
Hardware Requirements
The Politics Behind
Conclusion

Actualizado: 2026-05-03

On July 18, 2023, Meta released LLaMA 2^[1], the second generation of its language model. Unlike LLaMA 1 (research-only), this time the licence allows commercial use — with some caveats for companies over 700 million monthly users. For 99.9% of organisations, that means: you can download it, modify it, and use it in production royalty-free.

This significantly reshapes the open-model landscape.

Key Takeaways

LLaMA 2 ships in three sizes (7B, 13B, 70B) and two variants (base and chat), with a commercial licence.
The 70B model matches or beats GPT-3.5 on standard benchmarks and is closer to GPT-4 than any prior open model.
Hugging Face, Ollama, LM Studio, and llama.cpp make it accessible on consumer hardware.
Absolute advantages over proprietary APIs: privacy, cost at scale, unconstrained fine-tuning, and low latency.
Meta’s strategy is competitive, not altruistic — which guarantees continued investment.

What LLaMA 2 Offers

Meta published three sizes: 7B, 13B, and 70B parameters. Each in two variants: base (trained on general text prediction) and chat (RLHF-tuned for assistant-style conversation).

Key characteristics:

Trained on 2 trillion tokens — twice as many as LLaMA 1 and with more rigorously filtered data.
4k-token context window. Limitation vs GPT-4 (8k–32k) or Claude 2 (100k), but extensible via techniques like RoPE scaling.
Competitive on benchmarks. LLaMA 2 70B matches or beats GPT-3.5 on MMLU, TriviaQA, HumanEval, and others. It doesn’t reach GPT-4, but it’s much closer than any prior open model.
Commercial licence. The most transformative part. The Llama 2 Community License^[2] allows product use for free, with few restrictions.

Ecosystem Impact

In the weeks post-release, the ecosystem explodes in four directions:

Hugging Face hosts quantised versions^[3] in every combination: GGML, GPTQ, AWQ, 4-bit, 8-bit. Running LLaMA 2 7B on an 8 GB GPU is now trivial.
Community fine-tunes: Vicuna, Wizard, Airoboros — dozens of variants tuned for specific tasks (code, reasoning, roleplay) appear within days.
Tool integration: Ollama^[4], LM Studio^[5], llama.cpp^[6] run LLaMA 2 locally on Mac, Linux, and Windows in minutes.
Hosted services (Replicate, Anyscale, Together AI, AWS Bedrock) offer LLaMA 2 with low latency at ~$0.0008 per 1k tokens.

Scenarios Where LLaMA 2 Beats GPT on Total Value

Situations where LLaMA 2 — even if inferior in raw capability — is the better choice over GPT-3.5/4:

Data privacy. If you can’t send customer data to external APIs (health, finance, defence), on-prem LLaMA 2 is the only way. This constraint is explicit in NIS2 for critical infrastructure data.
Cost control at scale. Above some volume (~100M tokens/month), hosting your own LLaMA 2 is cheaper than the external API.
Deep fine-tuning. You can tune weights directly, not just add LoRA layers on a managed base. With QLoRA it’s feasible to tune even the 70B on consumer hardware.
Low latency. Running local removes API round-trips — relevant for interactive applications.

When GPT-4 Still Wins

And where LLaMA 2 doesn’t yet compete:

Complex multi-step reasoning. GPT-4 keeps its edge on tasks where quality above 90% is critical.
Broad multilingual capability. LLaMA 2 works well in English, acceptable in Spanish/French/German, patchy in others.
Fast product integration. If operational cost isn’t the bottleneck, OpenAI’s API = less friction.

Hardware Requirements

To run LLaMA 2 locally:

Model	Minimum VRAM (4-bit)	Full VRAM (fp16)
7B	4–6 GB	14 GB
13B	8–10 GB	26 GB
70B	40–50 GB	140 GB

With aggressive quantisation (GGML 4-bit), 7B runs on CPU on modern laptops (~10 tokens/s). 70B needs a big GPU or RAM+SSD offloading techniques, with performance degradation.

The Politics Behind

Meta’s LLaMA 2 strategy isn’t altruistic — it’s competitive. By releasing open models, Meta:

Reduces OpenAI/Anthropic’s edge (whose monetisation depends on proprietary APIs).
Accelerates AI tech adoption that Meta can then apply in its own products.
Wins mindshare among developers, a long-term strategic asset.

This doesn’t make the model less useful — quite the opposite, that motivation ensures continued investment. But it helps to understand it as a strategic move, not philanthropy.

Conclusion

LLaMA 2 marks the moment open LLMs become legitimate production options, not just research tools. For teams with privacy constraints, cost-at-scale pressure, or deep customisation needs, there’s now a viable alternative to proprietary models. The gap with GPT-4 still exists, but it shrinks every few months.

Was this useful?

[Total: 12 · Average: 4.6]

Post Views: 125

Written by

Javier Cañete

CEO - Jacar Systems

Passionate about technology, cloud infrastructure and artificial intelligence. Writes about DevOps, AI, platforms and software from Madrid.

LLaMA 2 and the New Wave of Open Language Models

Key Takeaways

What LLaMA 2 Offers

Ecosystem Impact

Scenarios Where LLaMA 2 Beats GPT on Total Value

When GPT-4 Still Wins

Hardware Requirements

The Politics Behind

Conclusion

Related posts

“EU AI Act 2026: a technical checklist for Spanish CTOs”

Agent observability with OpenTelemetry GenAI semconv in 2026

How to install and tune oMLX on M5 Max 128 GB

Multi-agent systems: LangGraph vs CrewAI vs Autogen in 2026