Barman Was Not a Drop-in Replacement for pgBackRest

A post on DEV Community documented migrating production Postgres backups from pgBackRest to Barman.
The setup: PostgreSQL 16 on Railway, ~4.2 GB database, restore time of 18 minutes under pgBackRest. What happened after switching to Barman.

The background is pgBackRest’s archival.
Percona wrote on April 28, 2026 that pgBackRest was archived on April 27, but that Percona Distribution for PostgreSQL would continue recommending pgBackRest while discussing future support plans.
So “pgBackRest is dangerous starting tomorrow” is not the story.
But the pressure to choose a maintained backup foundation for new deployments and long-term operations got noticeably stronger.

On this blog, the Supabase April 2026 update mentioned that the Multigres Operator includes PITR backups via pgBackRest.
pgBackRest is deeply embedded in the Postgres ecosystem.
That’s exactly why “just swap the command name” doesn’t work when evaluating alternatives.

Where Barman Fits

Barman (Backup and Recovery Manager) is an open-source backup tool dedicated to PostgreSQL, developed and maintained by EDB.
Written in Python, it supports full backups, differential backups, and PITR (Point-in-Time Recovery) via WAL archiving.
PITR lets you roll back a database to any arbitrary point in time, like “10 seconds before the failure,” which is a different recovery granularity from pg_dump’s logical backups.

A key design characteristic is that Barman assumes a separate dedicated backup server apart from the DB server.
The Barman server collects data and WAL from PostgreSQL over SSH or streaming replication, storing them on local disk, S3, Azure Blob Storage, or Google Cloud Storage.
This operational model differs from both pg_dump (which dumps from inside the DB) and pg_basebackup (which ships with PostgreSQL).
pg_basebackup can only take full backups with no differential/incremental or parallel restore capabilities, so production environments typically chose Barman or pgBackRest.

Before pgBackRest’s archival, Barman and pgBackRest were essentially the two choices for physical backups, split by configuration preferences and environment.
pgBackRest is written in C and performance-oriented; Barman has richer operational management features.

There are two backup methods: rsync/SSH and streaming.

The rsync/SSH method transfers the PostgreSQL data directory via rsync over SSH from the Barman server.
You configure barman-wal-archive in PostgreSQL’s archive_command so WAL files are sent to the Barman server on each segment switch.
SSH public key authentication between the PostgreSQL server and Barman server is required.

The streaming method acquires base backups via PostgreSQL’s replication protocol, with WAL continuously received by Barman’s built-in functionality equivalent to pg_receivewal.
No SSH connection needed, so it works in container environments or setups where SSH ports can’t be opened.
However, a replication slot is mandatory, and if Barman stops, PostgreSQL keeps accumulating WAL.
The original post’s Railway environment used this method.

flowchart LR
    subgraph PostgreSQL
        A[Database]
        B[WAL]
    end
    subgraph Barman Server
        C[Backup Catalog]
        D[WAL Archive]
        E[Base Backup]
    end
    A -->|rsync/SSH or<br/>streaming| E
    B -->|archive_command or<br/>receive-wal| D

Day-to-day operations run through periodic barman cron execution.
WAL archive processing, old backup auto-deletion, and retention policy enforcement are all handled by this command.
barman backup takes full backups, barman recover restores, and barman check verifies configuration and consistency.
The barman check issue discussed later is about this verification command including SSH-related checks that produce FAILED results under streaming configurations.

The backup catalog management layer is another Barman characteristic, tracking which backup corresponds to which WAL range.
barman list-backup shows all backups; barman show-backup shows individual sizes, durations, and WAL ranges.
With pg_basebackup, all this management is manual, so the catalog matters in environments rotating multiple backup generations.

The RECOVERY WINDOW OF 14 DAYS in the original config is a retention policy that keeps enough backups and WAL for PITR to any point in the past 14 days.
Backups older than this policy are deleted when barman cron runs.
For generation-based management, you can specify something like REDUNDANCY 3 (keep the latest 3 generations).

What Changed Was the Operational Model, Not the Tool Name

The original post’s conclusion was very practitioner-oriented.
Barman itself is solid. Documentation is thorough, and EDB maintains it. Barman 3.17.0 was released in February 2026 with S3 Object Lock support and restore workflow improvements.

But Barman’s assumed picture is one where the “PostgreSQL server” and “backup server” are separate and connected straightforwardly via SSH.
That’s easy on a VPS or bare metal.
On container-oriented environments like Railway, Render, or Fly.io, the SSH-centric model gets heavy fast.

The original post chose backup_method = streaming to avoid SSH.

[railway-postgres]
conninfo = host=<RAILWAY_HOST> user=barman dbname=postgres
streaming_conninfo = host=<RAILWAY_HOST> user=streaming_barman
backup_method = streaming
streaming_archiver = on
slot_name = barman_streaming_slot
streaming_archiver_name = barman_receive_wal
retention_policy = RECOVERY WINDOW OF 14 DAYS

This gets backups running.
But the operational profile changes.

flowchart TD
    A[pgBackRest setup] --> B[File-based WAL archiving]
    B --> C[Restore: 18 min measured]
    D[Barman setup] --> E[Streaming backup]
    E --> F[WAL retention via<br/>replication slot]
    F --> G[Restore: 23 min 17 sec measured]

The original measurements: full backup was 12 min 34 sec for Barman, about 14 min for pgBackRest.
Barman is slightly faster on backup alone.
But restore was 23 min 17 sec for Barman vs. 18 min for pgBackRest, about 5 minutes slower.

At 4 GB database size, a 5-minute difference might be acceptable for personal services or small SaaS.
But this is the kind of number that, as size grows, can only be answered by measuring before migration.
PostgreSQL is used not just for business data but also as metadata stores for other services like Airflow or GitLab.
Even a non-business DB can block entire pipelines when it goes down.

The Scariest Part of Barman’s Streaming Setup Is the WAL Slot

Barman’s official manual explains configuring streaming_conninfo, streaming_archiver = on, and optionally slot_name for WAL streaming.
A replication slot is the mechanism that prevents PostgreSQL from deleting WAL until Barman has received it.

This is a safety device.
It is also a disk consumption device when the backup side goes down.

The original post showed an example where, after a Barman container restart, the slot went inactive and WAL lag accumulated to 847 MB.
PostgreSQL determined “Barman hasn’t received this yet” and kept holding WAL.
If the container doesn’t come back, WAL fills the disk.

The verification query looks like this.

SELECT
  slot_name,
  active,
  restart_lsn,
  confirmed_flush_lsn,
  pg_size_pretty(
    pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn)
  ) AS lag
FROM pg_replication_slots
WHERE slot_name = 'barman_streaming_slot';

“Did the backup succeed?” alone is not enough.
You need to watch whether the slot is active, whether lag is growing, and whether Barman’s receive-wal is alive.

flowchart TD
    A[PostgreSQL] --> B[WAL generation]
    B --> C{Barman receive-wal running}
    C -->|yes| D[WAL received]
    D --> E[Saved via archive-wal]
    C -->|no| F[Replication slot retains WAL]
    F --> G[pg_wal bloat]
    G --> H[Disk pressure]

This is the decision point between choosing managed DB auto-backups versus self-managed backups.
Managed Postgres backups often lack fine-grained control, but you can offload some operational responsibility to the provider.
Running Barman yourself means accepting WAL slot monitoring, backup storage management, restore procedures, and monitoring noise in exchange for flexibility.

Don’t Wire barman check Directly Into Alerts

One subtly painful part of the original post was dealing with barman check.
Even when running streaming backup, SSH-related checks can return FAILED, making the overall check appear to fail.

This isn’t Barman being broken; the check granularity doesn’t match the configuration assumptions.
But if you wire it directly into monitoring, it becomes daily noise.

If you’re going to monitor, at minimum separate these concerns.

Target	What happens when abnormal
Latest full backup timestamp	Restore starting point gets stale
`receive-wal` status	WAL stops flowing to Barman
Replication slot lag	Eats disk on the PostgreSQL side
Restore drill duration	RTO drifts from reality
Storage capacity	Can’t retain backups after they succeed

These five are closer to recovery decision-making than “did the check command return 0 or 1.”
RTO in particular can only be measured by actually running a restore, not by writing a hopeful number in documentation.

The AWS DR article discussed Backup & Restore strategy as essentially “data survives but recovery takes time.”
The same applies to PostgreSQL alone: having backups and being able to restore within an acceptable time window are different things.

Stay on pgBackRest or Move to Barman

Jumping to Barman immediately upon seeing pgBackRest’s archival is slightly premature.
Percona has signaled they’ll continue treating pgBackRest as a mature choice, and if your existing environment has verified restore procedures, there’s little reason to rush a disruptive change.

Conversely, for greenfield PostgreSQL backup infrastructure, Barman is a strong option.
Especially in environments with SSH access like VPS, bare metal, or on-premise VMs, Barman’s design and operational model align well.

The judgment gets split in container environments like Railway.

Environment	How Barman migration looks
VPS / Bare metal	SSH-based configuration sets up naturally
Kubernetes	Dedicated Pod, PVC, NetworkPolicy, slot monitoring all in scope
Railway / Render / Fly.io	Tends toward streaming config; slot lag monitoring is critical
Managed DB (Supabase / RDS etc.)	Check provider’s standard PITR and restore conditions first

The decision criteria for backup migration aren’t TPS or latency; they’re backup completion time, WAL lag, and restore time.

Actually Restore Once Before Migrating

The best part of the original post was measuring restore time, not just backup success.
Backup tool comparisons tend to focus on compression ratios, differential backups, and cloud storage destinations, but what matters in the moment of an incident is “when does it come back.”

If migrating to Barman, include at least this flow in the migration work.

Take a full backup at production-equivalent size
Restore to a separate environment
Connect the application and verify reads
Verify timestamp and data consistency after WAL application
Record the duration and reflect it in RTO
Test replication slot lag alerts when Barman is stopped

After going through all this, you’ll see whether Barman is “a pgBackRest replacement” or “an operational model rebuild.”
In environments like Railway or Render, it’s probably closer to the latter.