Zero-Downtime Database Migrations in Financial Systems
Techniques for migrating production databases without service interruption in mission-critical financial applications
In financial systems, downtime means lost revenue and regulatory risk. A trading platform processing billions in transactions can't afford a maintenance window. This post shares our battle-tested strategies for zero-downtime database migrations.
Traditional migration approach:
1# ❌ Don't do this in production
21. Stop application
32. Run migrations
43. Start application
5Downtime: 5-30 minutes
Revenue impact: 500k per hour for large platforms
Regulatory risk: Trading halts trigger regulatory scrutiny
Core principle: Deploy in phases, never break backward compatibility
1Phase 1: Expand (add new schema)
2Phase 2: Migrate (copy data)
3Phase 3: Contract (remove old schema)
4Each phase is a separate deployment with full backward compatibility.
1-- This breaks running code immediately
2ALTER TABLE orders ADD COLUMN execution_venue VARCHAR(50) NOT NULL;
3Phase 1: Add nullable column
1-- Migration v1
2ALTER TABLE orders ADD COLUMN execution_venue VARCHAR(50) NULL;
3
4-- Old code continues working (ignores new column)
5-- New code can start using it
6Phase 2: Backfill data
1-- Migration v2 (run during low traffic)
2UPDATE orders
3SET execution_venue = 'NYSE'
4WHERE execution_venue IS NULL
5 AND symbol IN (SELECT symbol FROM nyse_listings);
6
7UPDATE orders
8SET execution_venue = 'NASDAQ'
9WHERE execution_venue IS NULL;
10Phase 3: Add constraint
1-- Migration v3 (after all old code deployed)
2ALTER TABLE orders
3ALTER COLUMN execution_venue SET NOT NULL;
4
5CREATE INDEX idx_orders_venue ON orders(execution_venue);
6Renaming is especially tricky - old and new code must coexist.
Phase 1: Create view with both names
1-- Migration v1
2ALTER TABLE trades ADD COLUMN execution_timestamp TIMESTAMPTZ;
3
4-- Update new column when old column is written
5CREATE OR REPLACE FUNCTION sync_execution_time()
6RETURNS TRIGGER AS $$
7BEGIN
8 IF NEW.execution_time IS NOT NULL THEN
9 NEW.execution_timestamp = NEW.execution_time;
10 END IF;
11 RETURN NEW;
12END;
13$$ LANGUAGE plpgsql;
14
15CREATE TRIGGER sync_execution_time_trigger
16BEFORE INSERT OR UPDATE ON trades
17FOR EACH ROW
18EXECUTE FUNCTION sync_execution_time();
19
20-- Backfill existing data
21UPDATE trades
22SET execution_timestamp = execution_time
23WHERE execution_timestamp IS NULL;
24Phase 2: Update application code
1// Old code (still running)
2const trade = await db.query(
3 'INSERT INTO trades (symbol, quantity, execution_time) VALUES ($1, $2, $3)',
4 ['AAPL', 100, new Date()]
5);
6
7// New code (gradual rollout)
8const trade = await db.query(
9 'INSERT INTO trades (symbol, quantity, execution_timestamp) VALUES ($1, $2, $3)',
10 ['AAPL', 100, new Date()]
11);
12Phase 3: Remove old column
1-- Migration v2 (after all old code replaced)
2DROP TRIGGER sync_execution_time_trigger ON trades;
3DROP FUNCTION sync_execution_time();
4ALTER TABLE trades DROP COLUMN execution_time;
5Even more complex - requires data transformation.
Current: price DECIMAL(10,2)
Needed: price DECIMAL(18,8) (for crypto trading)
Phase 1: Shadow column
1-- Migration v1
2ALTER TABLE orders ADD COLUMN price_precise DECIMAL(18,8);
3
4-- Dual write trigger
5CREATE OR REPLACE FUNCTION sync_price()
6RETURNS TRIGGER AS $$
7BEGIN
8 IF NEW.price IS NOT NULL THEN
9 NEW.price_precise = NEW.price::DECIMAL(18,8);
10 END IF;
11 IF NEW.price_precise IS NOT NULL THEN
12 NEW.price = NEW.price_precise::DECIMAL(10,2);
13 END IF;
14 RETURN NEW;
15END;
16$$ LANGUAGE plpgsql;
17
18CREATE TRIGGER sync_price_trigger
19BEFORE INSERT OR UPDATE ON orders
20FOR EACH ROW
21EXECUTE FUNCTION sync_price();
22
23-- Backfill
24UPDATE orders SET price_precise = price::DECIMAL(18,8);
25Phase 2: Switch reads
1// Old code
2const order = await db.query('SELECT id, symbol, price FROM orders');
3
4// New code
5const order = await db.query(
6 'SELECT id, symbol, price_precise as price FROM orders'
7);
8Phase 3: Swap columns
1-- Migration v2
2DROP TRIGGER sync_price_trigger ON orders;
3DROP FUNCTION sync_price();
4
5ALTER TABLE orders DROP COLUMN price;
6ALTER TABLE orders RENAME COLUMN price_precise TO price;
7
8-- Recreate indexes
9CREATE INDEX idx_orders_price ON orders(price);
10Indexes can lock tables for hours on large datasets.
1-- ❌ Locks table for 2 hours on 1B row table
2CREATE INDEX idx_trades_timestamp ON trades(execution_timestamp);
31-- ✅ No table lock (builds in background)
2CREATE INDEX CONCURRENTLY idx_trades_timestamp
3ON trades(execution_timestamp);
4
5-- Monitor progress
6SELECT
7 schemaname,
8 tablename,
9 indexname,
10 pg_size_pretty(pg_relation_size(indexrelid)) as size
11FROM pg_stat_user_indexes
12WHERE indexrelname = 'idx_trades_timestamp';
13Caveat: CONCURRENTLY requires more disk space and time but doesn't block writes.
Backfilling 1 billion rows requires careful batching.
1-- Locks table, fills transaction log, runs for hours
2UPDATE trades
3SET venue_id = (SELECT id FROM venues WHERE name = venue_name);
41-- Migration script
2DO $$
3DECLARE
4 batch_size INTEGER := 10000;
5 total_rows BIGINT;
6 processed BIGINT := 0;
7BEGIN
8 SELECT COUNT(*) INTO total_rows FROM trades WHERE venue_id IS NULL;
9
10 WHILE processed < total_rows LOOP
11 UPDATE trades
12 SET venue_id = v.id
13 FROM venues v
14 WHERE trades.venue_name = v.name
15 AND trades.venue_id IS NULL
16 AND trades.id IN (
17 SELECT id FROM trades
18 WHERE venue_id IS NULL
19 LIMIT batch_size
20 );
21
22 processed := processed + batch_size;
23
24 -- Log progress
25 RAISE NOTICE 'Processed % / % rows (% %%)',
26 processed, total_rows, (processed * 100 / total_rows);
27
28 -- Prevent transaction log bloat
29 COMMIT;
30
31 -- Throttle to avoid overwhelming database
32 PERFORM pg_sleep(0.1);
33 END LOOP;
34END $$;
35Adding foreign keys can lock tables.
1-- Phase 1: Add constraint NOT VALID (doesn't scan table)
2ALTER TABLE orders
3ADD CONSTRAINT fk_orders_accounts
4FOREIGN KEY (account_id) REFERENCES accounts(id)
5NOT VALID;
6
7-- Phase 2: Validate in background (doesn't block writes)
8ALTER TABLE orders
9VALIDATE CONSTRAINT fk_orders_accounts;
10Track migrations carefully:
1CREATE TABLE schema_migrations (
2 version VARCHAR(50) PRIMARY KEY,
3 description TEXT,
4 applied_at TIMESTAMPTZ DEFAULT NOW(),
5 applied_by VARCHAR(100),
6 execution_time INTERVAL,
7 rollback_script TEXT
8);
9
10-- Record each migration
11INSERT INTO schema_migrations (version, description, rollback_script)
12VALUES (
13 '2024_11_01_001',
14 'Add execution_venue column to orders',
15 'ALTER TABLE orders DROP COLUMN execution_venue;'
16);
171-- Create test table with same size
2CREATE TABLE trades_test AS
3SELECT * FROM trades LIMIT 10000000;
4
5-- Test migration
6\timing on
7ALTER TABLE trades_test ADD COLUMN new_column VARCHAR(50);
8-- Output: Time: 45231.234 ms (45 seconds)
91// Simulation test
2describe('Migration compatibility', () => {
3 it('old code works with new schema', async () => {
4 // Apply new schema
5 await runMigration('2024_11_01_001');
6
7 // Run old code
8 const result = await oldVersionCode.createOrder({
9 symbol: 'AAPL',
10 quantity: 100
11 });
12
13 expect(result).toBeDefined();
14 });
15
16 it('new code works with old schema', async () => {
17 // Don't apply migration
18
19 // Run new code (should handle missing column)
20 const result = await newVersionCode.createOrder({
21 symbol: 'AAPL',
22 quantity: 100,
23 executionVenue: 'NYSE'
24 });
25
26 expect(result).toBeDefined();
27 });
28});
29Always have a rollback plan:
1-- Migration up
2-- v2024_11_01_001_up.sql
3ALTER TABLE orders ADD COLUMN execution_venue VARCHAR(50);
4
5-- Migration down
6-- v2024_11_01_001_down.sql
7ALTER TABLE orders DROP COLUMN execution_venue;
8For data migrations, store original values:
1-- Before updating
2CREATE TABLE orders_backup_20241101 AS
3SELECT id, original_column FROM orders;
4
5-- If rollback needed
6UPDATE orders o
7SET original_column = b.original_column
8FROM orders_backup_20241101 b
9WHERE o.id = b.id;
101-- Check migration progress
2SELECT
3 schemaname,
4 tablename,
5 n_tup_upd,
6 n_tup_hot_upd,
7 n_live_tup
8FROM pg_stat_user_tables
9WHERE tablename = 'orders';
10
11-- Monitor locks
12SELECT
13 pid,
14 usename,
15 pg_blocking_pids(pid) as blocked_by,
16 query
17FROM pg_stat_activity
18WHERE datname = 'trading_db'
19 AND state = 'active';
20
21-- Check replication lag (if using replicas)
22SELECT
23 client_addr,
24 state,
25 sync_state,
26 replay_lag
27FROM pg_stat_replication;
28Challenge: Migrate 800M trades to new schema with additional columns
Approach:
1Day 1: Add nullable columns (5 minutes)
2Day 2-14: Backfill in batches (100k rows/minute, off-peak hours)
3Day 15: Add NOT NULL constraints (10 minutes)
4Day 16: Deploy new application code
5Day 30: Remove old columns (5 minutes)
6Result:
✅ Never deploy breaking changes
✅ Always deploy schema changes before code changes
✅ Use batched updates for large datasets
✅ Create indexes concurrently
✅ Add constraints as NOT VALID first
✅ Test with production data volumes
✅ Monitor locks and replication lag
✅ Have rollback scripts ready
✅ Document each migration
✅ Validate backward compatibility
Zero-downtime migrations are possible with careful planning:
The key is maintaining backward compatibility at every step. It requires more migrations and longer timelines, but eliminates business risk.
We specialize in:
Contact us to ensure your next migration is seamless.
Technical Writer
NordVarg Team is a software engineer at NordVarg specializing in high-performance financial systems and type-safe programming.
Get weekly insights on building high-performance financial systems, latest industry trends, and expert tips delivered straight to your inbox.