Written by

CEO - Jacar Systems

Passionate about technology, cloud infrastructure and artificial intelligence. Writes about DevOps, AI, platforms and software from Madrid.

Arquitectura

backup disaster recovery litestream replicacion replication s3 sqlite

Litestream: Near-Real-Time Replication for SQLite

September 2, 2024 13 min read 240 reads

Table of contents

Key takeaways
What it actually solves
How it works under the hood
Basic configuration
Recovery and point-in-time
When it beats managed Postgres
Litestream vs LiteFS
Operation and observability
Conclusion

Actualizado: 2026-05-03

For years, running SQLite in production meant striking a deal with the devil: excellent performance, trivial operations, near-zero cost, but awkward backup. Cron, sqlite3 .backup, hourly rsync, some script shipping the file to a bucket, and crossing your fingers. Litestream^[1], released by Ben Johnson in 2021 and stable in the 0.3 line, closes that gap elegantly: it reads the WAL log SQLite writes anyway and streams it, page by page, to an S3-compatible bucket. The result is a local database with durability equivalent to managed services, without an extra server, pooler, or hand-configured replication.

Key takeaways

Typical lag between local write and S3 confirmation is below one second — the practical equivalent of a synchronous replica.
A single Go binary with no dependencies, managed as a systemd service or container sidecar.
Compatible with AWS S3, Cloudflare R2, MinIO, Backblaze B2, and any S3-compatible endpoint.
Point-in-time recovery accepts an arbitrary timestamp within the configured retention window.
Real overhead is 1-3% CPU and a few MB of RAM: negligible in production.

What it actually solves

The pitch isn’t to replace Postgres or compete with a distributed cluster. It’s to plug the only serious hole in the “SQLite plus local disk” pattern: losing the host. With Litestream running as a companion process, every transaction landing in the WAL gets packaged into segments and sent to remote storage with typical lag under a second. If the machine evaporates, the operator spins up another, restores from the bucket, and is back where they were with only seconds of data at risk.

The important conceptual difference versus managed Postgres is that Litestream does not introduce a second live node. No automatic failover, no distributed reads. What you get is an ordered sequence of snapshots and WAL segments in S3 that lets you reconstruct the database state at any recent instant. That fits with the 80% of applications that, honestly, could run on a single machine and never notice the limitation.

How it works under the hood

Litestream leverages the fact that SQLite in WAL mode already writes changes to a separate file before merging them into the main .db. The agent opens that WAL in read mode, follows it like a binary tail -f and, on an interval, packages new bytes into a segment with a deterministic name and uploads it to the bucket. Periodically it takes a full snapshot so that restoration doesn’t have to replay an infinite WAL from the beginning of time.

The fine detail is that Litestream holds an open transaction against the database. That prevents SQLite from checkpointing the WAL until the corresponding segments are confirmed in S3 — a subtle but critical guarantee: you never lose history to a premature checkpoint. The cost is that the WAL grows while the network to the bucket is down, something worth monitoring. See SLOs and error budgets for the approach to alerting on replication lag.

Basic configuration

yaml

dbs:
  - path: /data/app.db
    replicas:
      - type: s3
        bucket: my-app-backups
        path: app.db
        region: eu-west-1
        retention: 72h

With that and credentials in environment variables, the agent starts replicating. The destination can be AWS S3, but also self-hosted MinIO, Cloudflare R2 (very compelling for its zero egress cost — see Cloudflare Workers in 2024 for R2 context), Backblaze B2, Hetzner Object Storage, or Wasabi. Nothing stops you from declaring several simultaneously for cross-provider redundancy: one bucket on AWS, another on R2, and a local disk copy for instant recoveries.

Recovery and point-in-time

Restoring is a single call. The binary downloads the latest snapshot, applies WAL segments up to the desired point, and writes the resulting .db to the indicated path. On small databases we’re talking seconds; on bases of several GB, a handful of minutes bounded more by bandwidth than by logic.

Point-in-time recovery accepts an arbitrary timestamp within the configured retention window — tremendously useful when someone runs an accidental DELETE and you need to rewind thirty minutes without losing everything written afterwards to other tables.

Snapshot cadence is an important lever:

More frequent snapshots → faster restorations, more storage.
Spaced snapshots → cheaper bucket, longer recovery time.

For most applications, a daily snapshot and 72-hour retention works well and keeps costs in cents per month.

When it beats managed Postgres

The honest decision axis isn’t technical, it’s operational. Managed Postgres on a mid-tier provider starts at twenty or thirty euros a month for the smallest instance, climbs when you add replicas, and forces the team to think about connection poolers, major versions, maintenance windows and migrations. SQLite with Litestream lives inside the application process itself and adds only the cost of the bucket to the budget — for most cases below a euro per month.

The advantage disappears the moment you genuinely need:

Multiple concurrent writers from different processes.
Postgres-specific extensions like pg_trgm or vector search — see pgvector mature in 2024.
Geographically distributed reads with guaranteed consistency.
Regulatory compliance that mandates a specific engine.

The volume of applications that truly need those capabilities is much smaller than the industry has assumed for years. See also the SQLite vs DuckDB comparison to understand when the use case points to columnar instead of row-based.

Litestream vs LiteFS

Don’t confuse the projects. LiteFS^[2], from the same author under the Fly.io umbrella, is a FUSE layer that replicates SQLite across several nodes in real time with strong consistency — it solves active multi-node replication, at the price of significant operational complexity. Litestream deliberately stays in the single-writer scenario and pushes all the value to the bucket as a durability system. If you need reads from multiple regions, look at LiteFS or go straight to a distributed database; if you just want to sleep well when your VPS reboots, Litestream is enough.

Operation and observability

The agent exposes Prometheus metrics on a configurable port. The ones that matter measure:

Lag between local write and S3 confirmation.
Bytes replicated.
Time since last successful sync.

An alert on “last sync over five minutes ago” is enough to catch network issues, bucket permission problems or exhausted quotas before they turn into actual data loss. The quarterly restore drill in a staging environment is the second indispensable piece: a backup that never gets tested eventually fails to work when needed.

Conclusion

Litestream is a mature tool that represents something broader: the rehabilitation of SQLite as a suitable engine for real production when the workload permits. For a decade the industry assumed that “serious” meant a separate database server, with its own operations, its own backups and its own complications. The result was architectures oversized for problems that weren’t. For the solo developer’s stack, the small SaaS, or the internal service that doesn’t need a costume, Litestream isn’t a shortcut: it’s the technically correct choice.

Was this useful?

[Total: 14 · Average: 4.1]

Post Views: 240

Written by

Javier Cañete

CEO - Jacar Systems

Passionate about technology, cloud infrastructure and artificial intelligence. Writes about DevOps, AI, platforms and software from Madrid.

Litestream: Near-Real-Time Replication for SQLite

Key takeaways

What it actually solves

How it works under the hood

Basic configuration

Recovery and point-in-time

When it beats managed Postgres

Litestream vs LiteFS

Operation and observability

Conclusion

Related posts

Hybrid RAG in 2026: the patterns that keep winning

MCP as multi-vendor standard: patterns already mature

Skills and subagents: the agent reuse pattern

Kubernetes 1.35 GA: an operations-side balance sheet