Arquitectura Inteligencia Artificial

embeddings hnsw ivfflat pgvector postgresql rag

pgvector in 2024: HNSW Indexes and Real Scaling

January 21, 2024 12 min read 67 reads

Table of contents

Why pgvector Gains Traction
IVFFlat versus HNSW: The Transition That Changed Everything
Creating and Querying an HNSW Index
Combined Filters and the Ordering Trap
Performance Orientation and the Model Lever
Observability, HA, and Symptoms of Pain
Conclusion

Actualizado: 2026-05-03

pgvector^[1] moved from curiosity to serious infrastructure during 2023. Version 0.5 in August added HNSW (Hierarchical Navigable Small World) as an index type, approaching dedicated vector-DB performance for most cases, and the 0.6 series adds parallel builds and halves memory usage during index construction. For teams already running PostgreSQL, the question “do I need Pinecone, Qdrant, or Weaviate?” increasingly has fewer affirmative answers.

Why pgvector Gains Traction

The traction is not fashion: it’s operations. Embeddings almost always live next to relational metadata — tenant, user, date, category, permissions — and native JOINs between vector and table avoid two-phase pipelines where you first query a vector store, then cross-reference PostgreSQL to filter. With pgvector that join is a single execution plan the planner optimises.

The second reason is cognitive cost. Backups, streaming replication, pg_stat_statements, row-level security, extensions like pg_cron or PostGIS, tooling like pgBadger or pgbackrest: all that operational capital is already paid for. Introducing Pinecone or Qdrant means a second authority of truth, another authentication system, another incident runbook. For small and mid-sized teams, the marginal cost outweighs any performance gain.

The third reason is licensing. PostgreSQL and pgvector are pure BSD, with no commercial clauses or SSPL doubts. In a landscape where Elastic, MongoDB, and Redis have all changed licenses in recent years, that stability is real value.

Where pgvector doesn’t compete: extreme scale (billions of vectors with sub-10ms latency as a hard requirement) or workloads where hybrid search (vector + BM25 + geospatial + complex filters) needs an engine designed from scratch.

IVFFlat versus HNSW: The Transition That Changed Everything

Until 0.5, pgvector only offered IVFFlat: a k-means-based index. It works, but has two important problems: default recall hovers around 70-80%, and raising it drives latency up linearly. On top of that, IVFFlat trains on existing data, so later inserts don’t distribute well and periodic reindexing is needed.

HNSW solves both. It builds a multi-level graph, default-parameter recall exceeds 95% on most datasets, inserts are incremental without quality loss, and latency scales logarithmically with size. The price: more memory per query and a slower build — a problem the 0.6 series attacks with parallel construction.

For the vast majority of current use cases, HNSW is the correct choice.

Creating and Querying an HNSW Index

The minimal flow fits in a handful of SQL lines:

sql

CREATE EXTENSION IF NOT EXISTS vector;

CREATE TABLE docs (
    id         bigserial PRIMARY KEY,
    tenant_id  int NOT NULL,
    content    text,
    embedding  vector(1536)
);

CREATE INDEX ON docs USING hnsw (embedding vector_cosine_ops)
  WITH (m = 16, ef_construction = 64);

SET hnsw.ef_search = 100;

SELECT content FROM docs
 ORDER BY embedding <=> $1
 LIMIT 10;

The three parameters that move the needle: m (how many neighbours each node stores; 16 is reasonable up to a few million vectors), ef_construction (build exhaustiveness; 64 minimum, 128-200 for better quality), and ef_search (main production lever, set per session; start at 40 and raise until measured recall satisfies your case).

On operators: <=> is cosine, <-> is Euclidean, <#> is negative inner product. Each requires its own opclass — not interchangeable.

Combined Filters and the Ordering Trap

The place where pgvector outshines dedicated bases is the “embedding plus WHERE” query. If the WHERE reduces a million rows to a hundred thousand, the vector index is still worth using. If it reduces to a hundred, HNSW is counterproductive — examining the graph to discard 99.99% of nodes costs more than computing a hundred distances directly.

This is where partial indexes earn their keep: creating an HNSW restricted to active tenants bounds the graph when tenant traffic is skewed. Version 0.6 also adds iterative_scan, which scans beyond the initial top-K when WHERE discards too many candidates.

Performance Orientation and the Model Lever

On mid-range hardware — 16 cores, 64 GB RAM, NVMe — an HNSW index over 10 million 1536-dim vectors occupies about 30 GB and answers in 5 ms p50 with ef_search=40. At 100 million the index approaches 300 GB and latency climbs to around 20 ms p50, the point where you need serious scale-up or sharding. Beyond a billion, pgvector falls short and Qdrant or Milvus scale better by design.

Embedding dimension is an underrated lever. Moving from 1536 to 384 dimensions shrinks the index almost four-fold and can drop a dataset that didn’t fit in RAM into the buffer pool.

Observability, HA, and Symptoms of Pain

pgvector inherits the entire PostgreSQL ecosystem: streaming replication, logical replication, Patroni for failover, pgbackrest for backups. A pattern that works well: dedicate a read-only replica to vector queries and leave the primary to absorb inserts.

Monitoring HNSW requires watching pg_stat_user_indexes.idx_scan to confirm the index is actually used, the buffer-cache hit ratio, and critically, effective recall — which you have to measure periodically by comparing approximate vs exact top-10 on a query subset. If HNSW fragments after many updates or deletes, recall degrades silently and nothing in the log warns you.

The symptoms of crossing the line are recognisable: over 100 million vectors with demanding p99, over a thousand vector QPS saturating CPU, indexes that don’t fit in RAM, or the need for hybrid vector-BM25 ranking. Before migrating to Weaviate or Qdrant, exhaust cheap options: scale to 32 cores and 256 GB, move vector load to a dedicated replica, reduce dimensions, or shard by tenant.

Conclusion

pgvector has matured into the reasonable default for teams already running PostgreSQL. The bar to justify a dedicated vector database has moved noticeably upward. The decision has inverted: the default is no longer to add infrastructure, it’s to show with numbers why PostgreSQL isn’t enough. Most of the time, it is.

Was this useful?

[Total: 11 · Average: 4.4]

Post Views: 67

pgvector

Written by

Javier Cañete

CEO - Jacar Systems

Passionate about technology, cloud infrastructure and artificial intelligence. Writes about DevOps, AI, platforms and software from Madrid.

pgvector in 2024: HNSW Indexes and Real Scaling

Why pgvector Gains Traction

IVFFlat versus HNSW: The Transition That Changed Everything

Creating and Querying an HNSW Index

Combined Filters and the Ordering Trap

Performance Orientation and the Model Lever

Observability, HA, and Symptoms of Pain

Conclusion

Related posts

MCP as multi-vendor standard: patterns already mature

Skills and subagents: the agent reuse pattern

Kubernetes 1.35 GA: an operations-side balance sheet

Hybrid RAG in 2026: the patterns that keep winning