Setting up PostgreSQL with pgvector is, in early 2024, the most sensible way to start a retrieval-augmented generation project without introducing a new database into the team’s stack. This guide describes a reproducible install on Debian 12 or Ubuntu 22.04 through 24.04 with PostgreSQL 16 and pgvector 0.6, justifying each decision and operating the database with the same tools we already know: pg_dump, pg_basebackup, pg_stat_statements, and the familiar replica machinery.
Why pgvector instead of a dedicated vector database
The temptation to reach for Pinecone, Weaviate, Milvus, or Qdrant is natural once you start reading about embeddings. All are competent, but each introduces a new system that must be deployed, backed up, monitored, and learned. For volumes below fifty million vectors, especially on workloads that mix semantic search with relational filters, pgvector is usually the economically correct decision: it reuses transactions, joins, GIN indexes on JSONB, and read replicas already tamed.
The honest trade-off is that pgvector does not compete on raw performance with engines written specifically for vectors, nor on advanced features such as product quantization or native hybrid filtering. If the application lives and dies by latency over hundreds of millions of documents, it deserves a fresh evaluation. Below that threshold, running a single system the team already knows pays dividends every time a backup has to be restored at three in the morning.
The official PGDG repository
PostgreSQL versions shipped by distributions usually lag behind the stable branch. The project’s own PGDG repository is the recommended route because it offers PostgreSQL 16 alongside an up-to-date postgresql-16-pgvector package and lets several major versions coexist on the same host. The standard procedure installs curl, ca-certificates, and gnupg, downloads the ACCC4CF8 key, adds the appropriate PGDG line under sources.list.d, and refreshes with apt update; apt install postgresql-16 postgresql-client-16 leaves the server ready under the postgresql@16-main.service systemd unit.
Installing pgvector
Two reasonable paths exist. The postgresql-16-pgvector package from PGDG ships the latest released version (0.6 as of February 2024) and is the default choice. If a specific build is needed, we compile from source by cloning the repository and running make && make install. Unlike many extensions, pgvector does not require shared_preload_libraries; CREATE EXTENSION vector inside each database where we plan to use it is enough, because the extension is per-database, not cluster-wide.
User, database, and extension
The postgres role must never be used by the application. We create a dedicated role, say ragapp, with a strong password produced by a manager, and a ragdb database owned by it. Inside that database we run CREATE EXTENSION IF NOT EXISTS vector and grant the necessary privileges on the public schema. A query against pg_extension should return 0.6.0, which doubles as the first sanity check that binary and catalog agree.
Minimal production configuration
PostgreSQL defaults are conservative so installs do not choke on modest hardware. On a machine with sixteen gigabytes earmarked for the database, four settings in postgresql.conf are worth revisiting:
shared_buffers = 4GB
effective_cache_size = 12GB
work_mem = 64MB
maintenance_work_mem = 512MB
wal_level = replica
max_wal_size = 4GB
max_connections = 100
The rule of thumb places shared_buffers around twenty-five percent of memory and effective_cache_size near seventy-five percent, keeping work_mem modest because it multiplies by connections and by operations within a query. Sixty-four megabytes with a hundred connections and two operations per query can reach thirteen theoretical gigabytes, so measure before pushing it higher. maintenance_work_mem matters especially because CREATE INDEX on HNSW consumes it without restraint.
IVFFlat versus HNSW
Pgvector offers two approximate index structures. IVFFlat is older and more mature: it partitions the space into lists and requires ANALYZE after loading a representative dataset so the planner chooses well; it performs reasonably on a few million vectors and its memory footprint is modest. HNSW, introduced in 0.5 and stable in 0.6, builds a hierarchical graph that delivers a better recall-latency trade-off at the cost of noticeably higher build time and memory use.
The pragmatic 2024 recommendation is to start with HNSW using defaults (m = 16, ef_construction = 64), because in most cases result quality beats IVFFlat without tuning. If the dataset exceeds tens of millions of rows or the maintenance window cannot absorb long rebuilds, IVFFlat with lists set to the square root of the row count is often more appropriate. Creating either index on an empty table costs microseconds; building it on millions of rows takes hours and saturates CPU and IO.
Typical schema and distance operators
A canonical RAG schema combines an identifier, the original text, metadata in JSONB for filtering, and an embedding vector(1536) column when using OpenAI models. A GIN index on metadata accelerates tag filters, while an HNSW index on the vector column with vector_cosine_ops enables the <=> operator for cosine distance. Pgvector also exposes <-> for Euclidean distance and <#> for negative inner product; the choice follows the embedding model, and with OpenAI the correct one is cosine because its vectors arrive pre-normalized. Mixing operators with non-normalized vectors is the most common source of strange results that people later blame on the index.
Access, backup, and observability
By default the server listens on localhost and it should stay that way unless the application lives on a different host. In that case we open listen_addresses to the strictly necessary private interface and add a line to pg_hba.conf that restricts the range and enforces scram-sha-256. Exposing PostgreSQL directly to the public internet is a mistake: an SSH tunnel or a VPN is always preferable. For backup, pg_dump -Fc -Z9 is enough on day one, but as soon as the database grows it pays to adopt pgbackrest or restic with incremental rotations and to rehearse restores regularly; a backup without a tested restore is a hope, not a guarantee. Minimum observability means enabling pg_stat_statements, watching the buffer-cache hit ratio stay above ninety-nine percent, and exporting metrics with postgres_exporter into Prometheus. HNSW indexes degrade if autovacuum is not keeping up, so pg_stat_user_indexes deserves an occasional glance.
Conclusion
Installing PostgreSQL with pgvector is not complicated, but each decision matters more than it looks. Picking the PGDG repository, creating a dedicated role, running the extension in the right database, building the HNSW index at the right moment, and sizing maintenance_work_mem so the build does not drag on for days mark the difference between a deployment that ages gracefully and one that demands manual intervention every few weeks.
The argument for pgvector over dedicated alternatives is operational more than technical. Few organizations genuinely benefit from introducing a vector-specific database when they already run PostgreSQL comfortably; most discover that the apparent performance savings are spent, and then some, on duplicated backup, replication, and monitoring. When the time to migrate to a specialized engine eventually arrives, the starting point will be a production database with its own corpus of embeddings rather than a blank slate, which usually makes postponing the migration the right call.