For years, running SQLite in production meant striking a small deal with the devil: performance was excellent, operations trivial, cost near zero, but backup was awkward. Cron, sqlite3 .backup, hourly rsync, some script shipping the file to a bucket, and crossing your fingers that the loss window wouldn’t coincide with an incident. Litestream, released by Ben Johnson in 2021 and stable in the 0.3 line, closes that gap elegantly: it reads the WAL log SQLite writes anyway and streams it, page by page, to an S3-compatible bucket. The result is a local database with durability equivalent to managed services, without an extra server, pooler, or hand-configured replication.
What It Actually Solves
The pitch isn’t to replace Postgres or compete with a distributed cluster. It’s to plug the only serious hole in the “SQLite plus local disk” pattern: losing the host. With Litestream running as a companion process, every transaction landing in the WAL gets packaged into segments and sent to remote storage with typical lag under a second. If the machine evaporates, the operator spins up another, restores from the bucket, and is back where they were with only seconds of data at risk. Same value as a synchronous replica, bought with 1-3% CPU and a few MB of RAM.
The important conceptual difference against managed Postgres or MySQL is that Litestream does not introduce a second live node. No automatic failover, no distributed reads, no multi-machine consistency. What you get is an ordered sequence of snapshots and WAL segments in S3 that lets you reconstruct the database state at any recent instant. That fits perfectly with the 80% of applications that, honestly, could run on a single machine with SQLite and never notice the limitation.
How It Works Under the Hood
Litestream leverages the fact that SQLite in WAL mode already writes changes to a separate file before merging them into the main .db. The agent opens that WAL in read mode, follows it like a binary tail -f and, on an interval, packages new bytes into a segment with a deterministic name and uploads it to the bucket. Periodically it takes a full snapshot so that restoration doesn’t have to replay an infinite WAL from the beginning of time.
The fine detail is that Litestream holds an open transaction against the database. That prevents SQLite from checkpointing the WAL until the corresponding segments are confirmed in S3. It’s a subtle but critical guarantee: you never lose a slice of history to a premature checkpoint. The cost is that the WAL file grows while the network to the bucket is down, something worth monitoring.
Installation is a single Go binary, no dependencies, managed as a systemd service or container sidecar. Configuration lives in a small YAML where you declare the databases to replicate and their destinations:
dbs:
- path: /data/app.db
replicas:
- type: s3
bucket: my-app-backups
path: app.db
region: eu-west-1
retention: 72h
With that and credentials in environment variables, the agent starts replicating. The destination can be AWS S3, but also self-hosted MinIO, Cloudflare R2 (very compelling thanks to its zero egress cost), Backblaze B2, Hetzner Object Storage or Wasabi. Any S3-API endpoint works, and nothing stops you from declaring several simultaneously for cross-provider redundancy: one bucket on AWS, another on R2, and a local disk copy for instant recoveries. If one provider suffers a prolonged outage, the other remains a valid restoration source.
Recovery and Point-in-Time
Restoring is a single call. The binary downloads the latest snapshot, applies WAL segments up to the desired point and writes the resulting .db to the indicated path. On small databases we’re talking seconds; on bases of several GB, a handful of minutes bounded more by bandwidth than by logic. Point-in-time recovery accepts an arbitrary timestamp within the configured retention window, tremendously useful when someone runs an accidental DELETE and you need to rewind thirty minutes without losing everything written afterwards to other tables.
Snapshot cadence is an important lever. More frequent snapshots mean faster restorations but more storage; spaced snapshots make the bucket cheaper but lengthen recovery time. For most applications, a daily snapshot and 72-hour retention works well and keeps costs in cents per month.
When It Beats Managed Postgres
The honest decision axis isn’t technical, it’s operational. Managed Postgres on a mid-tier provider starts at twenty or thirty euros a month for the smallest instance, climbs quickly when you add replicas, and forces the team to think about connection poolers, major versions, maintenance windows and migrations. SQLite with Litestream lives inside the application process itself, shares its lifecycle, and adds only the cost of the bucket to the budget, which for most cases sits below a euro per month.
The advantage disappears the moment you genuinely need multiple concurrent writers from different processes, Postgres-specific extensions like pg_trgm or vector search, geographically distributed reads with guaranteed consistency, or regulatory compliance that mandates a specific engine. In those cases there’s no discussion: Postgres. But the volume of applications that truly need those capabilities is much smaller than the industry has assumed for years.
Litestream vs LiteFS
Don’t confuse the projects. LiteFS, from the same author under the Fly.io umbrella, is a FUSE layer that replicates SQLite across several nodes in real time with strong consistency; it solves a different problem, active multi-node replication, at the price of significant operational complexity. Litestream, by contrast, deliberately stays in the single-writer scenario and pushes all the value to the bucket as a durability system. If you need reads from multiple regions with low latency, look at LiteFS or go straight to a distributed database; if you just want to sleep well when your VPS reboots, Litestream is enough.
Operation and Observability
The agent exposes Prometheus metrics on a configurable port. The ones that matter measure the lag between local write and S3 confirmation, bytes replicated and time since last successful sync. An alert on “last sync over five minutes ago” is enough to catch network issues, bucket permission problems or exhausted quotas before they turn into actual data loss. The quarterly restore drill in a staging environment is the second indispensable piece: a backup that never gets tested eventually fails to work when needed.
Conclusion
Litestream is a mature tool that represents something broader: the rehabilitation of SQLite as a suitable engine for real production when the workload permits. For a decade the industry assumed that “serious” meant a separate database server, with its own operations, its own backups and its own complications. The result was architectures oversized for problems that weren’t. Projects like Basecamp, PocketBase and Fly.io itself show that many applications live better with a well-replicated local file than with an expensive remote instance. For the solo developer’s stack, the small SaaS or the internal service that doesn’t need a costume, Litestream isn’t a shortcut: it’s the technically correct choice.