Jacar mascot — reading along A laptop whose eyes follow your cursor while you read.
Inteligencia Artificial

RAG 2.0: knowledge graphs, vectors, and hybrid

RAG 2.0: knowledge graphs, vectors, and hybrid

Actualizado: 2026-05-03

The conversation about RAG in 2025 is different from 2023’s. Two years ago the topic was picking the best vector database and deciding how many tokens fit in context. Today the RAG label hides hybrid systems combining several retrieval sources, graphs capturing entity relationships, and reranking layers bringing the answer closer to what the user needs. This post gathers what I’ve learned building RAG 2.0 in several projects over recent months, which pieces add real value, and which are fashion without substance.

For the knowledge graph ecosystem context underlying these architectures, the analysis of Microsoft GraphRAG in the enterprise and the post on knowledge graphs revived by LLMs cover the relevant technical history. Continuous evaluation of RAG systems is covered in RAG: continuous evaluation. Architecture patterns connecting with agents are described in LLM wrappers in product.

Key takeaways

  • Hybrid search (vector + BM25 lexical) raises retrieved fragment quality by 15–30% over pure vector.
  • Knowledge graphs add structural relationship information that embeddings don’t capture well; they complement, not replace, vectors.
  • A cross-encoder reranker over hybrid search candidates adds 10–20% additional precision in the top-5 fragments.
  • Corpus quality remains the weightiest variable; sophisticated RAG on dirty data performs worse than simple RAG on curated data.
  • Jumping to full RAG 2.0 from scratch is rarely profitable; better to start simple and add layers when failures justify it.

Why the original RAG no longer suffices

First-generation RAG relied on a simple idea: split documents into fragments, compute embeddings, store them in a vector database, search the fragments most similar to the question, and pass them to the model. This idea works well when the question corresponds directly to text: “what’s the return policy”, “what does the manual say about sensor calibration”.

It fails when the question requires combining information from several fragments that aren’t individually similar. “Which employees collaborated with vendor X on European projects” is a question whose answer isn’t in any single fragment; it’s in the relation between several. The vector database returns relevant fragments by lexical similarity but not by structural relationship.

The other classic failure is with specific keywords. If someone asks about a specific error code or reference number, vector search can fail because embeddings don’t capture rare identifiers well. Pure lexical search — the traditional kind — works better in those cases. 2023 RAG ignored this reality because the novelty was the vector side.

What hybrid search adds

The first improvement of RAG 2.0 is hybrid search: combining vector search with lexical search (classic BM25) and fusing results with a ranking function. In tests, this simple combination raises retrieved fragment quality by 15–30% over pure vector, depending on the domain.

The important tuning is the ratio. Giving equal weight to both sources usually works worse than giving more weight to one per the question type:

  • For queries with rare technical terms: more lexical weight.
  • For conceptual queries: more vector weight.

The most robust fusion I’ve seen doesn’t weighted-sum scores but combines ranks: it uses the document’s best rank across either search, with a penalty for appearing in only one. This approach avoids the problem of different score scales between methods.

Knowledge graphs as a structural layer

The second piece of RAG 2.0 is the knowledge graph. Instead of treating documents as plain text, the system extracts entities — people, projects, concepts — and relationships between them. The result is a graph where nodes are entities and edges are explicit relationships.

When a question arrives asking to combine information, the system can navigate the graph to find connected entities and use that structural information alongside textual fragments. “Which employees collaborated with vendor X on European projects” becomes a graph query returning relevant entities with their relations, and the model generates the answer with that context already ordered.

The graph doesn’t replace embeddings, it complements them. Embeddings remain the way to find semantically similar text; the graph adds relationship information that embeddings don’t capture well.

Building the graph without killing the project

The practical problem with graphs is that building them from unstructured text is expensive. Automatic extraction with language models works but leaves errors: duplicated entities with slightly different names, spurious relationships, omissions. If the graph has too much noise, the gain from using it is quickly lost.

What works is a staged approach:

  1. First pass: automatic extraction with a capable model and a bounded schema of entity and relation types.
  2. Second pass: entity normalization by similarity and simple canonicalization rules.
  3. Third pass: human review of high-cardinality relationships before approving the graph for use. Skipping this stage pays back in incorrect answers with a veneer of certainty.

The alternative when the domain already has a formal taxonomy is using that taxonomy as the graph’s backbone and enriching it with relationships extracted from text. Starting from existing structure is always cheaper than building from scratch.

Reranking, the unseen layer

The third piece of RAG 2.0 is reranking: a model that takes fragments retrieved by hybrid search and reorders them by relevance to the specific query. The reranker indexes nothing; it operates only on candidates already emerged from the retrieval stage.

Current reranking models are specialized cross-encoders. They look at the question and each candidate jointly, giving them better sensitivity than encoders processing each separately. The cost is higher per candidate, but since they only run over the top 50 or 100, the impact is manageable.

Adding a reranker raises precision of the top-5 fragments by an additional 10–20% over hybrid search. In RAG 2.0 it’s almost mandatory if the use case is serious.

The evaluation traps

Designing RAG 2.0 without rigorous evaluation is turning off the flight panel and navigating by intuition:

  • Using the generating model itself as judge produces systematic optimism. Judges must be independent.
  • Measuring only on type queries is insufficient. Systems giving perfect answers to the product team’s 20 queries can give mediocre answers to the 2,000 queries users actually make.
  • Ignoring cases where the system should say it doesn’t know. A RAG that confidently invents when information isn’t in the corpus is worse than one answering little but well.

What hasn’t changed

There are things that haven’t changed between 2023 and 2025:

  • Corpus quality remains the weightiest variable: clean data, correct metadata, frequent updates. Sophisticated-architecture RAG on dirty data performs worse than simple RAG on curated data.
  • Chunking remains more art than science: fixed-size fragments work decently but aren’t optimal on documents with strong internal structure.
  • Maintenance cost hasn’t dropped: reindexing large corpora when documents change remains expensive. RAG 2.0 isn’t plug and play; it’s a living system needing care.

How to think about the decision

My recommendation for anyone starting a RAG project in 2025 is to scale the architecture to the problem:

  • If the use-case question maps directly to text: pure vector with a good corpus can suffice.
  • If there’s a mix of conceptual and identifier questions: adding lexical search is easy and profitable.
  • If questions require combining structural information: it’s worth planning a graph.

Jumping to full RAG 2.0 from scratch is tempting but rarely profitable. Complexity grows fast, bugs multiply, and the ability to diagnose why an answer is bad diminishes. Better to start simple, measure which question types fail, and add the corresponding layer when failure justifies it.

Was this useful?
[Total: 13 · Average: 4.2]

Written by

CEO - Jacar Systems

Passionate about technology, cloud infrastructure and artificial intelligence. Writes about DevOps, AI, platforms and software from Madrid.