Migrating PostgreSQL across the planet, with almost no downtime.
TL;DR: Using PostgreSQL logical replication and Tailscale VPN, I migrated 200GB of data from California to Sydney with seconds of downtime and no time pressure. The replica synchronized over several days while production ran normally.
In 2023, I successfully migrated Booko's 8 linux servers, including the primary and replica PostgreSQL databases from California to Sydney, with downtime measured in seconds. Here's how to move a database around the world.
The Security Challenge
One of the first challenges you have with using PostgreSQL across data centres is how to maintain security. At no point should your PostgreSQL be actually available to the internet at large. In the simplest case, maybe your PostgreSQL and Rails site are on the same host, so you're just using 127.0.0.1. If you're using several hosts, maybe you're using a private network range within your hosts network and PostgreSQL is tucked safely away from the internet. Whoever you host with are unlikely to have a VPC that can span Fremont to Sydney.
I've been using Tailscale with Booko since early 2022. It's a transformative product / service. A free ( at my scale ) mesh VPN built on Wireguard ( another excellent improvement over every other bit of Linux VPN software ).
When a host is added to a "Tailnet" - as Tailscale calls instances of VPNs, they get an address in the 100.64.0.0/10 network. Any host in your Tailnet, can talk directly to any other host in your Tailnet, directly to their 100.64.0.0/10 address - and this communication will be via Wireguard. Tailscale essentially orchestrates point to point Wireguard connections.
Once you have your hosts inside your Tailnet, you can safely set PostgreSQL to listen on it's 100.64.0.0/10 address - only other hosts within your Tailnet can communicate with it. It's a brilliant setup. We'll make use of this network to allow our PostgreSQL primary and replica servers to communicate with each other, without messy TLS setups - just direct communication via the default PostgreSQL ports.
Setting up Replication
PostgreSQL's Logical Replication is awesome. It streams the changes made to rows in the Primary database, to the replica servers. Notably, it doesn't stream DDL - you must apply any schema changes manually to the replica servers. Logical Replication even works between different versions of PostgreSQL, which is also a great way to upgrade your PostgreSQL version. Logical Replication is available from PostgreSQL 10 - and you can use it to upgrade to any newer version. When I migrated from Fremont to Sydney, I took the opportunity to upgrade PostgreSQL as part of the journey.
You'll need to configure PostgreSQL for logical replication, by setting wal_level = logical
, max_replication_slots
and max_wal_senders
- consult the PostgreSQL manual for full details.
Logical Replication quite safe to setup and tear down. In PostgreSQL lingo, you create a Publication on the primary, and a Subscription on the replica. These publications can include all tables in a database, or a subset of them. You do need to be aware that once a publication has a subscriber, PostgreSQL will hold onto WAL files for the subscribers until they've been replicated. On the publisher, we have "Replication Slots" which is how replica servers are tracked - including which WAL files they've consumed. The consequence of this is increase in disk usage while new replica servers are in the process of bringing themselves up-to-date with the primary. If you have large tables and slow networks between the hosts it may be worth creating the publications with a single table, then adding tables to the publication once all subscribers have caught up to the publication.
PostgreSQL replica tables become writable once synchronized. This is super handy because once the replica has synchronised, you can configure Rails to point to it and new data will start appearing in the new database. When you're ready, you drop the subscription to the primary and your secondary database is now your primary database.
Replication Users
Logical replication requires a user with REPLICATION privilege to read the WAL, plus SELECT permissions for initial table synchronisation:
sudo -u postgres createuser -U postgres replicator -P -c 5 --replication
On the Primary host, grant SELECT for initial data copy:
GRANT SELECT ON ALL TABLES IN SCHEMA public TO replicator;
As a side note, you'll need to grant SELECT on any new tables you create if you want to add them to replication later. This is for the initial data copy—ongoing changes are replicated via WAL and don't need SELECT permissions.
Create schema on the replicas
For replication to work, the replica databases must have a matching schema, so we'll dump the schema from the primary and restore it to each replica:
pg_dump -U prod_user --schema-only prod_db > schema_timestamp.sql
Copy the schema to the replica:
scp schema_timestamp.sql new_replica_host:
Load the Schema into the replica PostgreSQL
psql -f schema_timestamp.sql -d secondary_db -U prod_user
Create Publication and Subscriptions
Create the publication on the primary database as below. Add all the tables to be replicated to the publication. For your first replication, it's wise to start small - add some small tables and understand how long they take to replicate. Here, we're creating a publication prod_db_pub
:
CREATE PUBLICATION prod_db_pub FOR TABLE table1 table2 table3 table_etc schema_migrations;
On the primary host, check your publication was created:
SELECT * FROM pg_publication;
On the replica, we'll create the subscription:
CREATE SUBSCRIPTION prod_db_sub CONNECTION 'dbname=prod_db host=<primary db ip> user=replicator password=<the password for replication user>' PUBLICATION prod_db_pub;
Check your subscription was created:
SELECT * FROM pg_subscription;
Check the log files for both primary and replica to monitor the state of replication. If have replicated only a few tables, you can now update the publication:
ALTER PUBLICATION prod_db_pub ADD TABLE table4;
Once you've updated the publication, you must tell the subscribers to refresh:
ALTER SUBSCRIPTION prod_db_sub REFRESH PUBLICATION;
Again, watch the logs to identify any issues. You'll see messages logging the time the replication starts and finishes.
Sequences: A Fly in the Ointment
If you're using auto-incrementing integers as primary keys, their underlying sequences are not replicated to replica hosts—only the data is. This becomes a problem when you promote a replica to primary and start writing to it.
What happens: The replica server's sequences still have their initial values (usually 1), while the actual table data may have IDs in the millions. When you INSERT new records, PostgreSQL tries to use these outdated sequence values, causing:
- Rails/application errors about duplicate primary keys
- Inserts failing with "primary key already exists" errors
- Potential data conflicts if writes accidentally succeed
When to fix this: Update sequences immediately before or after promoting a replica to primary. The cutover should be fast to minimize the window for conflicts.
The fix: Reset each sequence to continue from the table's maximum ID value:
SELECT SETVAL('table1_id_seq', (SELECT MAX(id) FROM table1));
A Possible Fix
After performing migrations like this several times, I've found two ways to avoid it:
- Set the ID Sequences on the replica to a value far above the current maximum, so that new records on the primary won't conflict with new records on the replica
- Pros: Simple, immediate fix
- Cons: Wastes ID space, requires careful coordination, temporary solution
- Switch from integer to UUID or ULID primary keys which generate unique values without coordination
- Pros: No sequence conflicts ever, works perfectly with distributed systems
- Cons: Requires substantial migration effort, larger indexes (16 bytes vs 4-8 bytes), slightly slower joins
The UUID approach is the "correct" long-term solution for distributed databases, but the sequence adjustment works fine for planned failovers.
Monitoring
I created a script which would monitor the replication lag between the primary and the replica. This script would post the lag to Booko's monitoring system, UptimeKuma.
#!/bin/bash
# Get the lag from PostgreSQL
lag=$(psql -U booko booko_production --tuples-only -c "SELECT (pg_current_wal_lsn() - confirmed_flush_lsn) AS lsn_distance FROM pg_replication_slots WHERE slot_name='sub_from_db02';" | tr -d ' ')
# Convert to KB and remove decimal places
lag_kb=$(echo "scale=0; $lag / 1024" | bc)
# echo "Posting lag: $lag_kb"
# Construct the URL
url="https://status.booko.info/api/push/123ABC?status=up&msg=OK&ping=${lag_kb}"
# Send the request
# echo "Post to $url"
curl -s "$url"
The Migration
Getting a PostgreSQL replica running in Sydney was the first step to migrating a Rails app like Booko. The primary database was ~ 200GB at the time. I started by adding the small tables to the publication on the primary, and monitored the lag. I added the last and largest of the tables late in the evening when there'd be fewer updates and let it run overnight. The risk as mentioned, is that the source database will run out of space storing the WAL files. For Booko, this was no problem and we didn't come close to needing more space.
In the case of running out of space, the ideal approach is to delete the subscription from the replica. If the replica is unavailable, you can delete the replication slots on the primary to have old wal files immediately deleted. Naturally you'll need to start replication again, after cleaning up the replica.
The Redis and Rails side of the migration is beyond the scope of this post, but the truncated version is that I built a new load balancer in the Sydney location and pointed it to the backend servers still in Fremont.
I use Caddy as the load balancer - and the switching of DNS over to the new load balancer, then restarting Caddy and waiting for it to acquire new Lets Encrypt SSL certificates was the riskiest step. It took just a few seconds and I did it during a quiet period. You must prepare for this step by lowering the TTL ( Time To Live ) of your DNS records - you need clients to switch over to the new load balancer quickly, and you want Lets Encrypt to provide the certificates quickly without having to watch for DNS records to expire.
As I built servers in Sydney, I added them to the new load balancer, but commented out. The final cut over consisted of updating the load balancer to use the new, Sydney rails servers which were connected to the Sydney replica, commenting out Fremont servers, and quickly update the primary keys sequences. Dropping the subscription happened last, which effectively promoted the replica to a primary. And that was that. Much planning, several checklists and dry runs and a migration performed across continents in just a few seconds of down time.
The beauty of this approach is there's no rush - the migration stretched over several days because it could be. The replica quietly synchronized in the background while production continued normally in California. This removed the pressure of a high-stakes, time-boxed migration window.
Final thoughts
PostgreSQL's logical replication, paired with Tailscale's mesh VPN networking are the perfect combination for migrating between regions, or between hosting providers. I continue to host all internal Booko services inside Tailscale - the Postgresql servers for SolidCache / SolidQueue / SolidCable, Redis, Dragonfly, beanstalkd, Meilisearch - every service that Booko needs is hosted within Tailscale - invisible to the internet, easily accessible to each other.
Booko can easily stretch between hosting providers. Want to hire a whole physical server from Hetzner for a few days or permanently? Add it to your tailnet and it's there, for every other host to access. It makes migrations like this so much easier.