Pitr on Tarragon

MySQL PITR + Backup Strategy：備份不是「拷貝資料」、是 N 點任意 restore 的能力

Tue, 19 May 2026 00:00:00 +0000

本文是 MySQL overview 的 implementation-layer deep article。Overview 已說明 MySQL 在 OLTP 譜系的定位、本文聚焦 backup + PITR — 不是「拷貝資料」、是「N 點任意 restore 的能力」。

「我們每天 mysqldump 一次、放 S3、沒問題吧」是個常見錯誤。問「能不能 restore 到 5 分鐘前」、答案會是不能。Dump-based backup 只能 restore 到 dump 那個瞬間、5 分鐘前的事故無法 recover、必須等下次 dump。

真正的 backup strategy 是 PITR（point-in-time recovery）：

能 restore 到任意過去時間點（RPO 取決於 binlog flush 頻率、可接近 0）
由 full backup 基線 + binlog 連續流（從 backup 點到目標時間點的 incremental delta）組成
Restore 過程：先 restore full backup → 再 apply binlog 到目標 timestamp 或 GTID

這篇 deep article 把 backup 拆解成能力、然後展開達到此能力需要的工具鏈跟工程紀律。

Backup 三層責任

PITR 的能力由三層工程責任達成、任一層失效則 PITR 不成立：

1Layer 1: Full Backup（基線）
2   ↓     (mysqldump / XtraBackup / MyDumper / LVM snapshot / EBS snapshot)
3   ↓
4Layer 2: Binlog Stream（incremental）
5   ↓     (sync_binlog=1 + binlog 持續流到 backup storage)
6   ↓
7Layer 3: Restore + Replay 流程
8         (能 restore full + 能 apply binlog 到目標時間點)

每層的 backup 不夠 — 必須有 測試 restore 流程 才算真的有 backup。「dump 在 S3」加「沒有 verified restore」= no backup。

Tool 1：mysqldump — 邏輯備份、最廣容、最慢

1mysqldump --single-transaction --master-data=2 --gtid-purged=ON \
2  --triggers --routines --events \
3  --all-databases > full-backup.sql

輸出：SQL statement、純文字、可 grep / 編輯。

Trade-off：

優點：跨 MySQL 版本（5.7 → 8.0 也讀）、跨 cloud / 跨 OS、可選 dump 部分 table
缺點：極慢（rebuild 整 DB 從 SQL execute）、大 DB（> 100 GB）不適用、restore 時長 hours+
--single-transaction：InnoDB only、用 REPEATABLE READ 拿 consistent snapshot、不 lock 表

適合：

< 100 GB DB
Schema dump（migration / 給 dev clone DB）
跨版本 migrate
配 binlog 做 PITR baseline

不適合：

500 GB DB（restore 跑 days）
高吞吐 production（dump 跑時 hold MVCC read view、bloat）

Tool 2：Percona XtraBackup — 物理備份、快、production 標準

1xtrabackup --backup --target-dir=/backup/full-2026-05-19 \
2  --user=backup --password=... \
3  --slave-info --safe-slave-backup
4# Prepare（apply 內部 redo log、變成可 restore 狀態）
5xtrabackup --prepare --target-dir=/backup/full-2026-05-19

輸出：InnoDB 資料檔案的 binary copy。

Trade-off：

優點：極快（直接 copy file、無 SQL execute）、適合 TB-scale DB、restore 跑時間跟 copy file 同
缺點：MySQL 版本綁定（XtraBackup 8.0 不能 restore 5.7 backup）、有 storage engine 限制（只 InnoDB）
Incremental backup 支援：基於 LSN（log sequence number）只 copy 變更 page

Incremental flow：

 1# Day 1: Full backup
 2xtrabackup --backup --target-dir=/backup/full-day1
 3
 4# Day 2: Incremental（only changes since day 1）
 5xtrabackup --backup --target-dir=/backup/inc-day2 \
 6  --incremental-basedir=/backup/full-day1
 7
 8# Restore: Apply incremental on top of full
 9xtrabackup --prepare --apply-log-only --target-dir=/backup/full-day1
10xtrabackup --prepare --apply-log-only --target-dir=/backup/full-day1 \
11  --incremental-dir=/backup/inc-day2
12xtrabackup --prepare --target-dir=/backup/full-day1

適合：

100 GB production DB
每日 incremental + 週一次 full（典型 enterprise schedule）
從自管 MySQL 遷 cloud（XtraBackup + rsync 到 cloud restore）

不適合：

Schema-only dump（用 mysqldump 更簡單）
跨 major version restore

Tool 3：MyDumper — 並行邏輯備份

1mydumper --user=backup --password=... \
2  --threads=8 --rows=100000 \
3  --outputdir=/backup/mydumper-2026-05-19 \
4  --less-locking

輸出：每張 table 一個 .sql file（schema） + 多個 chunked .dat file（資料）。

Trade-off：

優點：並行 dump（per-table thread）、比 mysqldump 快 5-10x、可恢復斷點（resume）
缺點：tooling 不如 mysqldump 普及、需要單獨裝
對應的 myloader restore：也並行、比 mysqldump restore 快 5-10x

適合：

100 GB - 1 TB 範圍
中型 production、想要邏輯備份的可讀性 + 並行加速

Tool 4：LVM / EBS Snapshot — 物理 file system 層

1# 1. Freeze MySQL（讓 write 暫停）
2mysql> FLUSH TABLES WITH READ LOCK;
3# 2. Trigger snapshot（EBS / LVM）
4aws ec2 create-snapshot --volume-id vol-xxx --description "mysql-2026-05-19"
5# 3. Unfreeze
6mysql> UNLOCK TABLES;

Trade-off：

優點：超快（file system 層）、適合 VM-based MySQL（EC2 / on-prem）
缺點：必須 暫停 write（短時間 lock）、不能跨 OS / cloud 移植
AWS RDS / Aurora 全部走這條路（自動 snapshot）

適合：

AWS RDS / Aurora（自動）
自管 MySQL on EC2 with EBS（EBS snapshot 結合 mysql freeze）
大 DB 想要 fast backup + fast restore

Binlog-based PITR

Full backup 加上 binlog 才能達到 PITR。Binlog 是 MySQL replication / CDC / PITR 共用的 source。

配置：

1[mysqld]
2log_bin = mysql-bin
3binlog_format = ROW                  # ROW 必須
4binlog_row_image = FULL              # 完整 row image
5sync_binlog = 1                      # 每次 commit fsync binlog（zero loss）
6binlog_expire_logs_seconds = 1209600 # 14 天 retention（依需求調）
7gtid_mode = ON                       # GTID 必須、PITR 用 GTID 識別 transaction
8enforce_gtid_consistency = ON

Binlog backup：

1# 持續 stream binlog 到 backup storage
2mysqlbinlog --read-from-remote-server --raw --stop-never \
3  --user=replication --password=... \
4  --host=primary.example.com \
5  --result-file=/backup/binlog/ mysql-bin.000001 &

--read-from-remote-server + --stop-never 持續從 primary tail binlog、不間斷 stream 到 backup directory。每個 binlog file 寫滿後 close + 開新 file。

Restore + PITR 流程

完整 PITR 流程（restore 到 2026-05-19 14:30:00）：

 1# Step 1: Restore full backup
 2xtrabackup --copy-back --target-dir=/backup/full-2026-05-18  # 前一天 full
 3
 4# Step 2: 啟動 MySQL（會看到 backup 拿那刻的 GTID set）
 5systemctl start mysqld
 6
 7# Step 3: 查 full backup 結束時的 GTID
 8mysql> SHOW MASTER STATUS;
 9+------------------+----------+------------------------------------------+
10| File             | Position | Executed_Gtid_Set                        |
11+------------------+----------+------------------------------------------+
12| mysql-bin.000150 |     1234 | server-uuid:1-12345                      |
13+------------------+----------+------------------------------------------+
14
15# Step 4: Apply binlog 從 backup 之後到目標時間
16mysqlbinlog --start-datetime="2026-05-18 03:00:00" \
17            --stop-datetime="2026-05-19 14:30:00" \
18            /backup/binlog/mysql-bin.000150 \
19            /backup/binlog/mysql-bin.000151 \
20            ...                                # 列所有需要的 binlog
21            | mysql -u root -p
22
23# Step 5: 驗證 GTID set 到目標時間點對應的位置
24mysql> SHOW MASTER STATUS;
25# Executed_Gtid_Set 應包含到目標時間點的 transaction

對 精確 GTID-based PITR（停在特定 transaction、不是 timestamp）：

1mysqlbinlog --include-gtids='server-uuid:1-50000' \
2            /backup/binlog/mysql-bin.000150 ... | mysql -u root -p

5 個 Production 踩雷

1. GTID 處理不一致 — Restore 後 replication broken

XtraBackup restore 時 --slave-info 紀錄 GTID purged set、mysqldump 用 --gtid-purged=ON。如果 restore 後沒正確 set gtid_purged、replica re-attach 時 GTID gap error。

修法：

XtraBackup restore：用 xtrabackup_binlog_info 內的 GTID set 設 SET GLOBAL gtid_purged='...';
mysqldump：dump file 內已有 SET @@GLOBAL.GTID_PURGED='...';、執行 dump 自動 set
Restore 後 先驗證 Executed_Gtid_Set 跟 source 預期對齊、再 START SLAVE

2. Binlog gap — 中間遺漏 file 直接 restore fail

Binlog stream 失聯（network blip / disk full）+ binlog rotate、mysql-bin.000156 不在 backup storage 內。PITR 試圖跨過該 file restore、跳過已 commit transaction、結果 資料不一致（不是錯誤、是 silently incorrect）。

修法：

Binlog stream 必須持續、失聯 → alert
監控 backup storage 內 binlog 連續性（file name 連號、無 gap）
Restore 前 先驗證 binlog 完整性：mysqlbinlog --verify-binlog-checksum *.bin > /dev/null
對 missing binlog 中止 PITR、不繼續 partial restore

3. Backup 沒 verify — 真事故時才發現 restore broken

每天備份成功、storage 用了 5 TB、實際 從未 restore 過。事故發生 restore 才知道 backup file corrupt / GTID 錯 / binlog gap、整套無用。

修法：

自動化 restore test：每週 / 每月在 staging server 跑完整 restore + PITR、跑完 SELECT 比對 production
驗證 restore 後 row count 跟 production 接近、CHECKSUM TABLE 比對主要 table
真的事故時 RTO 才不會 surprise

4. RPO 不到 1 分鐘的代價

「我要 RPO < 1 分鐘」聽起來合理、但實現需要：

sync_binlog=1（每 commit fsync、寫吞吐降 10-30%）
Binlog stream 到 獨立 storage（不只是 primary local disk）、cross-region replication（額外 network cost）
Replica 也用 semi-sync 配合（zero binlog loss）
監控 + alert RPO 違反（< 1 分鐘 stream lag）

TCO：~30% 寫吞吐 penalty + 額外 storage / network cost + 7x24 on-call。考慮 real RPO requirement — 多數 application 5 分鐘 RPO 已足夠、追求 1 分鐘 RPO 不划算。

修法：

跟 product / business 確認 真 RPO 要求
RPO budget = 寫吞吐 trade-off + ops cost、不是 free
用 Aurora / managed offering 把 RPO 議題 outsource（Aurora < 1 秒 RPO + 自動 cross-AZ）

5. Encryption key 沒備份 — Restore 後解不開資料

啟用 encryption at rest（MySQL 8.0+ default_table_encryption=ON + keyring plugin / component；MariaDB 用 innodb_encrypt_tables）後、所有 InnoDB tablespace 都加密。Master key 在 keyring file 或 KMS-backed component。如果 backup 只 backup MySQL data file、沒備 keyring、restore 後資料 encrypted 但無 key、無法讀。

修法：

Keyring file 跟 data file 分開儲存、但兩者 都要 backup
用 KMS-based keyring（AWS KMS / HashiCorp Vault）取代 file-based、key 不在 MySQL server 上
Disaster recovery runbook 紀錄 key recovery 流程、不要假設「重 install MySQL」就能解

容量規劃要點

項目	建議
Full backup 頻率	週一次（XtraBackup）或日一次（小 DB）
Incremental 頻率	每日（XtraBackup incremental）
Binlog retention	14 天（給 PITR window）
Backup retention	Full × 4 週 + 月度 archive × 12 個月
Storage cost	約 2-3x DB size（full + incremental + binlog）
Cross-region copy	必要（local backup 失效時還有 disaster recovery）
Restore test 頻率	每週 staging 上跑、每月 production-like 跑

跟其他模組整合

跟 Replication topology

Replication replica 不能取代 backup — replica 上的 DROP TABLE 也會被 replicate、replica 上資料同樣消失。Backup 是 獨立保險。詳見 Replication Topology。

跟 InnoDB Tuning

innodb_flush_log_at_trx_commit=1 + sync_binlog=1 是 backup-friendly 的設定（zero loss）、但寫吞吐降。如果為了寫吞吐放寬 durability、必須接受 PITR window 也 widening。詳見 InnoDB Tuning。

跟 Aurora MySQL

Aurora 完全 outsource backup — automatic continuous backup + PITR < 1 秒、不必管 mysqldump / XtraBackup / binlog stream。從 Aurora 遷出時、需要重新建 self-managed backup chain。詳見 migrate-to-aurora。

跟 PostgreSQL PITR

維度	MySQL PITR	PostgreSQL PITR
Logical backup	mysqldump / MyDumper	pg_dump / pg_dumpall
Physical backup	XtraBackup	pg_basebackup / pgBackRest
Incremental log	Binary log（binlog）	WAL (Write-Ahead Log)
Stream tool	mysqlbinlog –read-from-remote-server	pg_receivewal
PITR command	mysqlbinlog –stop-datetime	pg_ctl + recovery.conf / standby.signal
Identifier	GTID 或 file:position	LSN（Log Sequence Number）
Cross-version	mysqldump（廣容）	pg_dump（廣容）

兩家 PITR 概念類似（full + log replay）、tool name 不同、概念對等。詳見 PostgreSQL PITR + WAL Archiving。

何時 outsource backup

場景	建議
AWS 生態 + 不想管 backup ops	Aurora MySQL（內建 PITR）
GCP 生態	Cloud SQL（內建 PITR）
Azure 生態	Azure DB for MySQL
跨雲 + 想自管	XtraBackup + binlog stream + S3
規模小、可接受 mysqldump	mysqldump cron + S3
規模大、無 cloud	Percona XtraBackup Enterprise + tape archive
強合規（HIPAA / PCI-DSS）	自管 + air-gap backup + audit trail

PostgreSQL PITR + WAL archiving：從 base backup 到 point-in-time recovery 的完整鏈

Mon, 18 May 2026 00:00:00 +0000

本文是 PostgreSQL overview 的 implementation-layer deep article。Overview 已說明 backup / recovery 是 OLTP 必備能力、本文聚焦 PITR（Point-In-Time Recovery）的雙軌資料設計 + production 5 個 failure mode。

問題情境

Logical bug 在 production 部署、執行 6 小時後才發現 — 某個 batch job 把 50 萬筆 user.email 改成 NULL。此時：

還原最新 daily backup（昨晚）→ 丟掉今天所有正常寫入（訂單、註冊）
從 standby promote → standby 已同步 bug、跟 primary 同狀態
從 application log 重建 → 部分操作不可逆（已寄出 email）

PITR 是這類 logical disaster 的標準解 — 不還原到 backup 時間點、而是 還原到 bug 發生前一刻（例：1 分鐘前）。需要 base backup + WAL archive 雙軌資料：base backup 是 snapshot、WAL archive 是 snapshot 之後的所有寫入；recovery 時 replay WAL 到指定 timestamp / LSN / transaction ID。

核心概念：base backup + WAL archive 的雙軌設計

1[Base backup t0]  +  [WAL archive t0 → now]
2     ↓                       ↓
3  全量 snapshot          incremental log
4     ↓                       ↓
5     └────── recover to t_target ──→ [restored cluster at t_target]

兩個軌道各自獨立但必須對齊：

Base backup：某時刻整個 data dir 的 snapshot。pg_basebackup / pgBackRest / WAL-G 都產這個；通常 每天 / 每週 跑一次
WAL archive：base backup 之後每段 WAL 都 push 到外部 storage（S3 / GCS / NFS）。archive_command 觸發、PostgreSQL 等到 archive 成功才回收那段 WAL

兩者組合決定 RPO（recovery point objective）：

RPO ≈ WAL archive frequency（streaming 即時、archive_timeout 預設 1 分鐘）
RPO 不是 base backup frequency — daily base backup + 每分鐘 archive WAL → RPO 1 分鐘

RTO（recovery time objective）跟 base backup size + WAL replay 量 相關：

Restore base backup ~ 1-4 小時（TB 級）
WAL replay 時間 ~ archive 累積量 / replay throughput

Step-by-step 配置

Primary：archive_command 設好

1# postgresql.conf
2wal_level = replica                          # 預設 replica、PITR 需要
3archive_mode = on                            # 啟用 archive
4archive_command = 'wal-g wal-push %p'        # 或 pgBackRest / 自寫 script
5archive_timeout = 60                         # 60s 無 WAL 時強制切 segment
6max_wal_size = 4GB
7checkpoint_timeout = 15min

archive_command 必須 回 exit code 0 才算成功；非 0 PostgreSQL retry、retry 失敗會在 pg_wal 堆積 WAL 直到 disk 滿。critical：archive_command 不能寫成 silent-fail。

用 pgBackRest 取代手寫 script

production 強烈不建議自寫 archive script — pgBackRest / WAL-G / Barman 處理過所有 edge case：

 1# pgbackrest.conf
 2[global]
 3repo1-type=s3
 4repo1-s3-bucket=mybucket
 5repo1-s3-region=us-east-1
 6repo1-retention-full=4                       # 留 4 個 full backup
 7repo1-retention-diff=8                       # 留 8 個 differential
 8repo1-cipher-type=aes-256-cbc                # encrypt at rest
 9process-max=8                                # parallel restore
10
11[main]
12pg1-path=/var/lib/postgresql/16/main

1# 跑 full backup
2pgbackrest --stanza=main backup --type=full
3
4# archive_command 用 pgbackrest 內建
5archive_command = 'pgbackrest --stanza=main archive-push %p'

pgBackRest 處理：parallel push、compression、encryption、checksum、archive replay timing、backup catalog、retention 自動清理。

Restore：recovery_target_time

1# 1. 從 S3 / repo 拉 base backup
2pgbackrest --stanza=main --type=time \
3  --target="2026-05-18 14:30:00+00" restore
4
5# 2. PostgreSQL 進 recovery mode、自動 replay WAL 到 target time
6# (pgBackRest 寫好 recovery.signal + postgresql.auto.conf)
7
8# 3. 確認到目標 timestamp 後、promote
9pg_ctl promote

Recovery target 三種：

recovery_target_time：到某 timestamp
recovery_target_xid：到某 transaction ID（log 有 xid 才好定位）
recovery_target_lsn：到某 WAL LSN（最精確、但需要事先記下 LSN）

production 多用 timestamp、application log 有時間戳容易定位。

故障演練 / 邊界 case

Case 1：archive_command 靜默失敗

徵兆：DBA 發現某 PITR test 時、最近 3 天的 WAL 在 S3 上沒有；但 PostgreSQL 沒 alert、pg_wal 也沒堆積（早就被回收？）。

根因：archive_command 寫成 aws s3 cp %p s3://bucket/... 2>/dev/null — 錯誤訊息被吞、exit code 卻是 0（cp 失敗但 redirect 後 shell wrapper 不傳 fail code）；PostgreSQL 以為成功、繼續 advance WAL pointer、舊 WAL 已回收、archive 上實際沒有。

修法：

絕對不要靜默 exit code：archive_command 必須 fail loud、exit code 非 0
用 pgBackRest / WAL-G、不自寫 shell 腳本
monitoring：對 archive lag 寫 alert

1SELECT pg_last_archived_xact_time(), now() - pg_last_archived_xact_time() AS lag;

alert if lag > 5 minutes

定期測試 restore：每月跑一次 PITR drill、實際從 archive restore + 驗證 timestamp

Case 2：WAL archive lag、primary disk 壓力

徵兆：pg_wal 目錄持續長大、df -h 90%+；pg_stat_archiver 顯示 failed_count 累積、last_failed_time 是 30 分鐘前；archive_command 寫不出去（S3 throttle / network 慢）。

根因：archive_command 寫到 S3、但 S3 rate limit / connection timeout、PostgreSQL retry；WAL 一直在 pg_wal 不能回收、disk 持續長。

修法：

預防：archive_command 內部 retry + parallel push（pgBackRest 自帶 process-max）
alert：pg_stat_archiver.failed_count 增長 + primary disk usage > 80%
緊急：暫時改 archive_command 寫 local NFS / 其他 storage、等 S3 恢復再同步；不要直接 disable archive（會丟資料）
架構：archive storage 至少跨 region 兩份、單一 storage 故障不影響 archive

Case 3：recovery 跑到 wrong target time

徵兆：PITR 還原後資料看起來 缺一塊；DBA 後悔 — target time 設早了 30 分鐘、recovery 已 promote、後續 WAL 在新 timeline 上、回不去。

根因：recovery 過程不可逆 — 一旦 promote 開新 timeline、舊 WAL 在新 timeline 上不會被 replay；想還原到更晚 timestamp 必須 重新 restore base backup + WAL。

修法：

recovery_target_action = pause（PG 13+）：到 target time 後暫停、不自動 promote；DBA 手動 query 確認資料對才 promote

1recovery_target_time = '2026-05-18 14:30:00+00'
2recovery_target_action = pause

多次 PITR 試錯：用 獨立 staging cluster restore、驗證 target time 對、再對 production 跑
記錄 target time 來源：application log / event timestamp 多比對、避免時區錯亂（+00 UTC 跟 local time 差）

Case 4：base backup 過期未清、storage 爆

徵兆：S3 backup bucket size 半年內從 200GB 漲到 5TB；DBA 才發現 retention 沒設、daily base backup 留 180 天。

根因：archive_command 自寫腳本沒 retention 邏輯、或 pgBackRest 設了 repo1-retention-full=180 漏看；DB 容量本來就成長 + 每日 full backup 累積。

修法：

1# pgBackRest retention：4 full + auto-expire archive
2repo1-retention-full=4                         # 留 4 個 full backup
3repo1-retention-diff=8                         # 留 8 個 differential
4repo1-retention-archive=4                      # WAL archive 跟 full 對齊
5repo1-retention-archive-type=full

storage budgeting：

daily full + diff + WAL archive ≈ 1-2x DB size / day
4-week retention → ~30-60x DB size storage
跨 region replication → 2-3x

Case 5：timeline 分歧後 recovery 模糊

徵兆：production 經歷一次 failover（Patroni promote）+ 之後又 PITR 一次；現在要再 PITR 到 failover 前一刻、archive 上有兩個 timeline、recovery target 搞不清要哪個。

根因：每次 promote 開新 timeline ID（.history 檔）；archive storage 上同 LSN 可能對應不同 timeline；recovery target time 在分歧點附近、ambiguous。

修法：

recovery_target_timeline 明示要 follow 哪個 timeline

1recovery_target_time = '2026-05-15 10:00:00+00'
2recovery_target_timeline = '3'                 # 要 follow timeline 3

熟悉 .history 檔：/wal_archive/000000XX.history 記錄 timeline 切換點、PITR 前先看
預防：每次 promote 後立刻跑新的 base backup、簡化未來 PITR 流程（不用跨 timeline）

容量 / cost 規劃

維度	估算	警戒
Base backup size	跟 DB data dir 大小成正比（PostgreSQL 內部 compression 後）	每 backup ~ 0.5-1x DB size
WAL archive size	~5-50GB / day depending on write volume	1TB DB / write-heavy 可能 100GB+ / day
Storage retention	4-12 weeks 典型	30-60x DB size budget
Base backup time	TB 級 1-4 小時	跑在 maintenance window
Restore time	base backup restore + WAL replay	TB 級 PITR 通常 2-6 小時
Network bandwidth	full backup 期間 100-500 Mbps	跨 region 注意 egress cost

實務 default：

Daily full backup + 4 weeks retention
WAL archive every 60s（archive_timeout = 60）
跨 region replication（S3 → S3 cross-region）
月度 restore drill 驗證可用

整合 / 下一步

跟 Patroni HA 整合

Patroni 不管 backup，但 promotion 後 timeline 切換影響 archive：

archive_command 用 %t（timeline）+ %f（filename）路徑、避免不同 timeline WAL 覆蓋
Patroni recovery_conf 包含 restore_command、standby clone 從 archive 拉
每次 Patroni failover 後跑 full backup、簡化未來 PITR

跟 logical replication 對位

PITR 跟 logical replication 服務不同 use case：

PITR 是 災難恢復（logical bug / corruption）— 全量還原到某時刻
Logical replication 是 連續 sync — Kafka / 跨 DB 即時複製

兩者 都依賴 WAL、但目標不同；同 PostgreSQL 可同時跑、互不衝突。

跟 monitoring + alert

關鍵 metric：

1-- archive 健康度
2SELECT * FROM pg_stat_archiver;
3-- archived_count, failed_count, last_archived_wal, last_archived_time
4
5-- WAL 在 pg_wal 等待 archive 量
6SELECT count(*) FROM pg_ls_waldir() WHERE name ~ '^[0-9A-F]{24}$';
7
8-- base backup 上次跑時間
9-- (pgBackRest API 或 backup catalog)

Prometheus alert 三條：archive failed_count 增、archive lag > 5min、base backup > 25h 沒跑。

下一步議題

Incremental backup（PG 17+）：base backup 不全量、只 base + incremental
Block-level differential：pgBackRest 已支援
Cloud-native 替代：RDS / Aurora 用 storage-layer snapshot、不走 PITR 鏈
pg_dump vs PITR：pg_dump 是 logical backup（resume to different schema OK）、PITR 是 physical（必須同 version + same arch）

PostgreSQL PITR Restore Drill

Fri, 22 May 2026 00:00:00 +0000

PostgreSQL PITR restore drill 的核心責任是證明 backup 可以還原到指定時間點。這篇承接 PITR + WAL Archiving，把備份從存在狀態推進到可恢復證據。

本文的驗收標準是：你能記錄 base backup 時間、target time、restore duration、validation query 與 RPO / RTO note。實際命令會依 pgBackRest、Barman、cloud snapshot 或 managed service 而變；本文提供 vendor-neutral drill frame。

Prepare Recovery Point

Prepare recovery point 的核心責任是建立可辨識 transaction。先寫入一筆 marker，記錄時間。

 1psql "$DATABASE_URL" <<'SQL'
 2CREATE TABLE IF NOT EXISTS restore_markers (
 3  id bigserial PRIMARY KEY,
 4  marker text NOT NULL,
 5  created_at timestamptz NOT NULL DEFAULT clock_timestamp()
 6);
 7
 8INSERT INTO restore_markers(marker) VALUES ('before-bad-change');
 9SELECT id, marker, created_at FROM restore_markers ORDER BY id DESC LIMIT 1;
10SQL

把 created_at 記為 target time。正式 drill 要用 UTC，並記錄 timezone、operator、backup set 與 WAL archive status。

Create Bad Change

Create bad change 的核心責任是模擬需要 PITR 的錯誤。

1psql "$DATABASE_URL" <<'SQL'
2INSERT INTO restore_markers(marker) VALUES ('bad-change-after-target');
3UPDATE accounts SET status = 'closed';
4SELECT status, count(*) FROM accounts GROUP BY status;
5SQL

這一步在 lab 中代表誤操作。Production 事故中，bad change 可能是誤刪、錯誤 batch、壞 migration 或 application bug。

Restore Workflow

Restore workflow 的核心責任是把 backup tool 的操作轉成固定 evidence。不同工具命令不同，但流程一致：

選定 base backup。
設定 recovery target time。
套用 WAL 到 target time。
Promote restored instance。
跑 validation query。
啟動 application smoke test。

Example pseudo-runbook：

1restore_target_time = 2026-05-21T10:15:30Z
2base_backup = latest backup before target
3wal_archive = available through target
4restore_path = isolated environment

Restore 必須在隔離環境先完成。直接覆蓋 production 會讓 evidence 與 rollback 空間消失。

Validation Query

Validation query 的核心責任是確認 restore 到正確時間點。

1psql "$RESTORED_DATABASE_URL" <<'SQL'
2SELECT marker, created_at
3FROM restore_markers
4ORDER BY id;
5
6SELECT status, count(*)
7FROM accounts
8GROUP BY status;
9SQL

預期結果是存在 before-bad-change，且 bad-change-after-target 尚未出現。accounts 狀態應維持 target time 前的分布。

RPO / RTO Evidence

RPO / RTO evidence 的核心責任是把 drill 結果轉成服務語言。

Evidence	記錄內容
Backup timestamp	使用哪份 base backup
Target time	要恢復到哪一秒
WAL availability	target time 前後 WAL 是否完整
Restore duration	從開始 restore 到 validation 成功
Data gap	target time 後需補償的 transaction
Smoke test	application 核心 workflow 是否可用

PITR 的成功標準是資料與 application 都可用。只讓 PostgreSQL 啟動成功，還不足以交付服務。

Drill Retrospective

Drill retrospective 的核心責任是把演練缺口轉成下一步。

常見缺口：

找不到正確 base backup。
WAL archive 缺段。
target time timezone 混亂。
Restore 太慢，超過 RTO。
Application secret / config 指不到 restored DB。
Validation query 缺少 business invariant。

完成本篇後，跨區恢復讀 Cross-region DR；備份策略讀 PITR + WAL Archiving。

Pitr on Tarragon

MySQL PITR + Backup Strategy：備份不是「拷貝資料」、是 N 點任意 restore 的能力

Backup 三層責任

Tool 1：mysqldump — 邏輯備份、最廣容、最慢

Tool 2：Percona XtraBackup — 物理備份、快、production 標準

Tool 3：MyDumper — 並行邏輯備份

Tool 4：LVM / EBS Snapshot — 物理 file system 層

Binlog-based PITR

Restore + PITR 流程

5 個 Production 踩雷

1. GTID 處理不一致 — Restore 後 replication broken

2. Binlog gap — 中間遺漏 file 直接 restore fail

3. Backup 沒 verify — 真事故時才發現 restore broken

4. RPO 不到 1 分鐘的代價

5. Encryption key 沒備份 — Restore 後解不開資料

容量規劃要點

跟其他模組整合

跟 Replication topology

跟 InnoDB Tuning

跟 Aurora MySQL

跟 PostgreSQL PITR

何時 outsource backup

相關連結

PostgreSQL PITR + WAL archiving：從 base backup 到 point-in-time recovery 的完整鏈

問題情境

核心概念：base backup + WAL archive 的雙軌設計

Step-by-step 配置

Primary：archive_command 設好

用 pgBackRest 取代手寫 script

Restore：recovery_target_time

故障演練 / 邊界 case

Case 1：archive_command 靜默失敗

Case 2：WAL archive lag、primary disk 壓力

Case 3：recovery 跑到 wrong target time

Case 4：base backup 過期未清、storage 爆

Case 5：timeline 分歧後 recovery 模糊

容量 / cost 規劃

整合 / 下一步

跟 Patroni HA 整合

跟 logical replication 對位

跟 monitoring + alert

下一步議題

相關連結

PostgreSQL PITR Restore Drill

Prepare Recovery Point

Create Bad Change

Restore Workflow

Validation Query

RPO / RTO Evidence

Drill Retrospective