Jacar mascot — reading along A laptop whose eyes follow your cursor while you read.
Arquitectura Inteligencia Artificial

pgvector: Semantic Search Without Leaving Postgres

pgvector: Semantic Search Without Leaving Postgres

Actualizado: 2026-05-03

Semantic search — retrieving documents by meaning rather than keyword match — has quietly become the foundation of much of the LLM application boom. Behind every RAG system, every assistant that consults internal documentation, every “smart” search bar, sits the same mechanic: a model turns text into a vector of hundreds or thousands of dimensions, and the application looks for vectors close to the query. What’s interesting is that, in most cases, setting that up doesn’t require introducing a brand new database into the stack. pgvector[1] turns an existing PostgreSQL into a perfectly competent vector database.

Key takeaways

  • pgvector extends PostgreSQL with a vector data type and cosine, Euclidean, and inner-product distance operators.
  • The IVFFlat index divides vectors into clusters (k-means) and searches only the clusters nearest the query — approximate nearest-neighbour search (ANN) in exchange for speed.
  • The golden rule for lists: ≈ sqrt(N) at initial load. At query time, ivfflat.probes between 10 and 20 gives 95-98% recall.
  • Decisive advantage: relational filters (WHERE category = 'tech') and vector ranking coexist in the same SQL query — no coordinating two stores.
  • Above 10-20 million vectors with sustained high QPS, dedicated vector databases (Qdrant, Pinecone) perform better.

Why Semantic Search Needs a Different Kind of Index

A modern embedding — think of the 1,536 values produced by OpenAI’s text-embedding-ada-002 — lives in a high-dimensional space. Finding the vector most similar to another is, mathematically, computing a distance (cosine, Euclidean, or inner product) against every stored vector. With ten thousand documents, a sequential scan takes milliseconds. With ten million, it takes seconds, and the user has already left.

Postgres’s classic B-tree indexes, designed to order scalars, don’t work for vectors. A B-tree needs a total order; in a 1,536-dimensional space that order simply doesn’t exist. The industry’s practical answer has been to give up exactness and accept approximate search — ANN, approximate nearest neighbour — accepting losing a small percentage of recall in exchange for orders of magnitude of speed.

pgvector implements this idea with IVFFlat indexes: the table is partitioned into k clusters at build time (essentially k-means), and each query compares only against vectors in the clusters nearest the query point. The probes parameter, tunable per query, controls how many clusters are inspected: more probes means better recall at higher latency; fewer probes, the opposite.

When pgvector Is the Right Call

The honest question isn’t “is pgvector the best?” but “what does my project actually gain by using Qdrant, Pinecone, or Milvus instead of extending the Postgres it already has?”. The answer is usually: less than it looks.

You already have backups, monitoring, replication, and an access policy working on Postgres. Adding a separate vector service doubles that surface: another process to patch, another backup to test, another maintenance window to coordinate.

There’s also an architectural advantage that gets systematically underrated. In a dedicated vector database, metadata (author, category, date, permissions) lives in a different part of the system. Any complex filter requires coordinating two stores. With pgvector, WHERE category = 'tech' AND user_id = 42 and the vector ranking live in the same query and the Postgres planner decides how to execute them together. That single fact eliminates an entire layer of glue code.

That said, pgvector isn’t the universal answer. Above about 10-20 million vectors with sustained high QPS, dedicated databases perform better without as much tuning. Above 2,000 dimensions, memory pressure starts to bite. But for 80% of LLM projects — internal chatbots, documentation assistants, support search — those limits are a long way away.

From CREATE EXTENSION to an Index That Performs

Getting started is almost trivial:

sql
CREATE EXTENSION vector;

CREATE TABLE documents (
    id SERIAL PRIMARY KEY,
    content TEXT,
    category TEXT,
    embedding vector(1536)
);

CREATE INDEX documents_embedding_idx ON documents
USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 1000);

SET ivfflat.probes = 10;

SELECT id, content, 1 - (embedding <=> $1::vector) AS score
FROM documents
WHERE category = 'tech'
ORDER BY embedding <=> $1::vector
LIMIT 5;

The three distance operators: – <=> for cosine distance (semantic similarity, most used). – <-> for Euclidean distance. – <#> for negative inner product.

Readable similarity comes out as 1 - (a <=> b) to get a 0-1 range.

Animación de convergencia de k-means, el algoritmo que IVFFlat usa para particionar los vectores en clusters durante la construcción del índice

Three Operational Details That Make the Difference

  1. Build the index with CREATE INDEX CONCURRENTLY to avoid locking the table during the operation — relevant for any table with writes in production.
  2. Reindex periodically if the insertion pattern is heavy: IVFFlat partitions at build time, and if the distribution of new vectors drifts from the original, recall degrades over time. A monthly or quarterly REINDEX is enough.
  3. Combine relational filters with vector ordering in the same query: Postgres applies the filter first (using its normal indexes) and only ranks vectorially on the resulting subset, which is typically dramatically faster than ranking everything and filtering afterwards.

The golden rule for lists: ≈ sqrt(N) at initial load (for one million vectors, ~1,000 clusters). At query time, ivfflat.probes between 10 and 20 typically gives 95-98% recall with latencies of tens of milliseconds.

Signs You’ve Outgrown pgvector

There are clear symptoms that a project has grown past what the extension comfortably covers:

  • Sustained p95 latency above 100 ms with parameters properly tuned.
  • Indexes that no longer fit in RAM and force disk reads on every query.
  • A need for vector compression techniques to reduce the memory footprint.

At that point, migrating to Qdrant is usually the pragmatic choice: it keeps the self-hostable property and scales much better. But getting there is itself a sign of success, not an architectural failure: it means the project works well enough to stress the infrastructure.

If you use LangChain or Chroma for the RAG pipeline, migrating the retriever from pgvector to Qdrant is a few-line change — as long as you respected the retriever abstraction from the start.

Conclusion

pgvector isn’t the fastest link on the market and doesn’t pretend to be. It’s the piece that lets you add semantic search to an existing product without multiplying the operational surface, reusing the Postgres your team already knows how to operate, back up, and audit. With RAG becoming the default pattern, that combination of pragmatism and conservative engineering is what separates the prototypes that reach production from the ones that stay as demos.

Frequently asked questions

What is semantic search with pgvector?

It converts texts into numerical vectors using an embedding model and stores them in PostgreSQL. Queries are also vectorized and documents with the highest cosine similarity are retrieved, finding conceptually related results even without exact keyword matches.

Which index type to use in pgvector: IVFFlat or HNSW?

HNSW offers better query performance without a training phase, at the cost of more memory. IVFFlat needs an index build step on existing data but uses less RAM. For most new use cases, HNSW is recommended.

Is pgvector good for RAG?

Yes, especially if you already have PostgreSQL. For medium-scale RAG, pgvector with HNSW indexes and hybrid search (semantic + full-text with tsvector) is a robust solution without additional infrastructure.

Was this useful?
[Total: 0 · Average: 0]
  1. pgvector

Written by

CEO - Jacar Systems

Passionate about technology, cloud infrastructure and artificial intelligence. Writes about DevOps, AI, platforms and software from Madrid.