<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/"><channel><title>Pitr on Tarragon</title><link>https://tarrragon.github.io/blog/tags/pitr/</link><description>Recent content in Pitr on Tarragon</description><generator>Hugo -- gohugo.io</generator><language>zh-TW</language><copyright>Tarragon (CC BY 4.0)</copyright><lastBuildDate>Fri, 22 May 2026 00:00:00 +0000</lastBuildDate><atom:link href="https://tarrragon.github.io/blog/tags/pitr/index.xml" rel="self" type="application/rss+xml"/><item><title>MySQL PITR + Backup Strategy：備份不是「拷貝資料」、是 N 點任意 restore 的能力</title><link>https://tarrragon.github.io/blog/backend/01-database/vendors/mysql/pitr-backup/</link><pubDate>Tue, 19 May 2026 00:00:00 +0000</pubDate><guid>https://tarrragon.github.io/blog/backend/01-database/vendors/mysql/pitr-backup/</guid><description>&lt;blockquote>
&lt;p>本文是 &lt;a href="https://tarrragon.github.io/blog/backend/01-database/vendors/mysql/" data-link-title="MySQL" data-link-desc="高併發網路服務常用關聯式資料庫、Vitess / PlanetScale 分片生態、GitHub / Shopify / Facebook 規模驗證">MySQL&lt;/a> overview 的 implementation-layer deep article。Overview 已說明 MySQL 在 OLTP 譜系的定位、本文聚焦 &lt;em>backup + PITR&lt;/em> — 不是「拷貝資料」、是「N 點任意 restore 的能力」。&lt;/p>&lt;/blockquote>
&lt;hr>
&lt;p>「我們每天 mysqldump 一次、放 S3、沒問題吧」是個常見錯誤。問「能不能 restore 到 5 分鐘前」、答案會是 &lt;em>不能&lt;/em>。Dump-based backup 只能 restore 到 &lt;em>dump 那個瞬間&lt;/em>、5 分鐘前的事故無法 recover、必須等下次 dump。&lt;/p>
&lt;p>&lt;strong>真正的 backup strategy 是 &lt;a href="https://tarrragon.github.io/blog/backend/knowledge-cards/point-in-time-recovery/" data-link-title="Point-in-Time Recovery" data-link-desc="說明如何用完整備份加上後續變更日誌，把資料庫還原到任意時間點">PITR（point-in-time recovery）&lt;/a>&lt;/strong>：&lt;/p>
&lt;ul>
&lt;li>&lt;em>能 restore 到任意過去時間點&lt;/em>（RPO 取決於 binlog flush 頻率、可接近 0）&lt;/li>
&lt;li>由 &lt;em>full backup 基線&lt;/em> + &lt;em>binlog 連續流&lt;/em>（從 backup 點到目標時間點的 incremental delta）組成&lt;/li>
&lt;li>Restore 過程：先 restore full backup → 再 apply binlog 到目標 timestamp 或 GTID&lt;/li>
&lt;/ul>
&lt;p>這篇 deep article 把 backup &lt;em>拆解成能力&lt;/em>、然後展開達到此能力需要的工具鏈跟工程紀律。&lt;/p>
&lt;h2 id="backup-三層責任">Backup 三層責任&lt;/h2>
&lt;p>PITR 的 &lt;em>能力&lt;/em> 由三層工程責任達成、任一層失效則 PITR 不成立：&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-text" data-lang="text">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">Layer 1: Full Backup（基線）
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">2&lt;/span>&lt;span class="cl"> ↓ (mysqldump / XtraBackup / MyDumper / LVM snapshot / EBS snapshot)
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">3&lt;/span>&lt;span class="cl"> ↓
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">4&lt;/span>&lt;span class="cl">Layer 2: Binlog Stream（incremental）
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">5&lt;/span>&lt;span class="cl"> ↓ (sync_binlog=1 + binlog 持續流到 backup storage)
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">6&lt;/span>&lt;span class="cl"> ↓
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">7&lt;/span>&lt;span class="cl">Layer 3: Restore + Replay 流程
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">8&lt;/span>&lt;span class="cl"> (能 restore full + 能 apply binlog 到目標時間點)&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>每層的 &lt;em>backup&lt;/em> 不夠 — 必須有 &lt;em>測試 restore 流程&lt;/em> 才算真的有 backup。「dump 在 S3」加「沒有 verified restore」= no backup。&lt;/p>
&lt;h2 id="tool-1mysqldump--邏輯備份最廣容最慢">Tool 1：mysqldump — 邏輯備份、最廣容、最慢&lt;/h2>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-bash" data-lang="bash">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">mysqldump --single-transaction --master-data&lt;span class="o">=&lt;/span>&lt;span class="m">2&lt;/span> --gtid-purged&lt;span class="o">=&lt;/span>ON &lt;span class="se">\
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">2&lt;/span>&lt;span class="cl">&lt;span class="se">&lt;/span> --triggers --routines --events &lt;span class="se">\
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">3&lt;/span>&lt;span class="cl">&lt;span class="se">&lt;/span> --all-databases &amp;gt; full-backup.sql&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>&lt;strong>輸出&lt;/strong>：SQL statement、純文字、可 grep / 編輯。&lt;/p>
&lt;p>&lt;strong>Trade-off&lt;/strong>：&lt;/p>
&lt;ul>
&lt;li>優點：跨 MySQL 版本（5.7 → 8.0 也讀）、跨 cloud / 跨 OS、可選 dump 部分 table&lt;/li>
&lt;li>缺點：&lt;em>極慢&lt;/em>（rebuild 整 DB 從 SQL execute）、大 DB（&amp;gt; 100 GB）不適用、restore 時長 hours+&lt;/li>
&lt;li>&lt;code>--single-transaction&lt;/code>：InnoDB only、用 REPEATABLE READ 拿 consistent snapshot、不 lock 表&lt;/li>
&lt;/ul>
&lt;p>&lt;strong>適合&lt;/strong>：&lt;/p></description><content:encoded><![CDATA[<blockquote>
<p>本文是 <a href="/blog/backend/01-database/vendors/mysql/" data-link-title="MySQL" data-link-desc="高併發網路服務常用關聯式資料庫、Vitess / PlanetScale 分片生態、GitHub / Shopify / Facebook 規模驗證">MySQL</a> overview 的 implementation-layer deep article。Overview 已說明 MySQL 在 OLTP 譜系的定位、本文聚焦 <em>backup + PITR</em> — 不是「拷貝資料」、是「N 點任意 restore 的能力」。</p></blockquote>
<hr>
<p>「我們每天 mysqldump 一次、放 S3、沒問題吧」是個常見錯誤。問「能不能 restore 到 5 分鐘前」、答案會是 <em>不能</em>。Dump-based backup 只能 restore 到 <em>dump 那個瞬間</em>、5 分鐘前的事故無法 recover、必須等下次 dump。</p>
<p><strong>真正的 backup strategy 是 <a href="/blog/backend/knowledge-cards/point-in-time-recovery/" data-link-title="Point-in-Time Recovery" data-link-desc="說明如何用完整備份加上後續變更日誌，把資料庫還原到任意時間點">PITR（point-in-time recovery）</a></strong>：</p>
<ul>
<li><em>能 restore 到任意過去時間點</em>（RPO 取決於 binlog flush 頻率、可接近 0）</li>
<li>由 <em>full backup 基線</em> + <em>binlog 連續流</em>（從 backup 點到目標時間點的 incremental delta）組成</li>
<li>Restore 過程：先 restore full backup → 再 apply binlog 到目標 timestamp 或 GTID</li>
</ul>
<p>這篇 deep article 把 backup <em>拆解成能力</em>、然後展開達到此能力需要的工具鏈跟工程紀律。</p>
<h2 id="backup-三層責任">Backup 三層責任</h2>
<p>PITR 的 <em>能力</em> 由三層工程責任達成、任一層失效則 PITR 不成立：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="ln">1</span><span class="cl">Layer 1: Full Backup（基線）
</span></span><span class="line"><span class="ln">2</span><span class="cl">   ↓     (mysqldump / XtraBackup / MyDumper / LVM snapshot / EBS snapshot)
</span></span><span class="line"><span class="ln">3</span><span class="cl">   ↓
</span></span><span class="line"><span class="ln">4</span><span class="cl">Layer 2: Binlog Stream（incremental）
</span></span><span class="line"><span class="ln">5</span><span class="cl">   ↓     (sync_binlog=1 + binlog 持續流到 backup storage)
</span></span><span class="line"><span class="ln">6</span><span class="cl">   ↓
</span></span><span class="line"><span class="ln">7</span><span class="cl">Layer 3: Restore + Replay 流程
</span></span><span class="line"><span class="ln">8</span><span class="cl">         (能 restore full + 能 apply binlog 到目標時間點)</span></span></code></pre></div><p>每層的 <em>backup</em> 不夠 — 必須有 <em>測試 restore 流程</em> 才算真的有 backup。「dump 在 S3」加「沒有 verified restore」= no backup。</p>
<h2 id="tool-1mysqldump--邏輯備份最廣容最慢">Tool 1：mysqldump — 邏輯備份、最廣容、最慢</h2>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">mysqldump --single-transaction --master-data<span class="o">=</span><span class="m">2</span> --gtid-purged<span class="o">=</span>ON <span class="se">\
</span></span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="se"></span>  --triggers --routines --events <span class="se">\
</span></span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="se"></span>  --all-databases &gt; full-backup.sql</span></span></code></pre></div><p><strong>輸出</strong>：SQL statement、純文字、可 grep / 編輯。</p>
<p><strong>Trade-off</strong>：</p>
<ul>
<li>優點：跨 MySQL 版本（5.7 → 8.0 也讀）、跨 cloud / 跨 OS、可選 dump 部分 table</li>
<li>缺點：<em>極慢</em>（rebuild 整 DB 從 SQL execute）、大 DB（&gt; 100 GB）不適用、restore 時長 hours+</li>
<li><code>--single-transaction</code>：InnoDB only、用 REPEATABLE READ 拿 consistent snapshot、不 lock 表</li>
</ul>
<p><strong>適合</strong>：</p>
<ul>
<li>&lt; 100 GB DB</li>
<li>Schema dump（migration / 給 dev clone DB）</li>
<li>跨版本 migrate</li>
<li>配 binlog 做 PITR baseline</li>
</ul>
<p><strong>不適合</strong>：</p>
<ul>
<li>
<blockquote>
<p>500 GB DB（restore 跑 days）</p></blockquote>
</li>
<li>高吞吐 production（dump 跑時 hold MVCC read view、bloat）</li>
</ul>
<h2 id="tool-2percona-xtrabackup--物理備份快production-標準">Tool 2：Percona XtraBackup — 物理備份、快、production 標準</h2>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">xtrabackup --backup --target-dir<span class="o">=</span>/backup/full-2026-05-19 <span class="se">\
</span></span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="se"></span>  --user<span class="o">=</span>backup --password<span class="o">=</span>... <span class="se">\
</span></span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="se"></span>  --slave-info --safe-slave-backup
</span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="c1"># Prepare（apply 內部 redo log、變成可 restore 狀態）</span>
</span></span><span class="line"><span class="ln">5</span><span class="cl">xtrabackup --prepare --target-dir<span class="o">=</span>/backup/full-2026-05-19</span></span></code></pre></div><p><strong>輸出</strong>：InnoDB 資料檔案的 binary copy。</p>
<p><strong>Trade-off</strong>：</p>
<ul>
<li>優點：<em>極快</em>（直接 copy file、無 SQL execute）、適合 TB-scale DB、restore 跑時間跟 copy file 同</li>
<li>缺點：MySQL 版本綁定（XtraBackup 8.0 不能 restore 5.7 backup）、有 storage engine 限制（只 InnoDB）</li>
<li><em>Incremental backup</em> 支援：基於 LSN（log sequence number）只 copy 變更 page</li>
</ul>
<p><strong>Incremental flow</strong>：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="c1"># Day 1: Full backup</span>
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">xtrabackup --backup --target-dir<span class="o">=</span>/backup/full-day1
</span></span><span class="line"><span class="ln"> 3</span><span class="cl">
</span></span><span class="line"><span class="ln"> 4</span><span class="cl"><span class="c1"># Day 2: Incremental（only changes since day 1）</span>
</span></span><span class="line"><span class="ln"> 5</span><span class="cl">xtrabackup --backup --target-dir<span class="o">=</span>/backup/inc-day2 <span class="se">\
</span></span></span><span class="line"><span class="ln"> 6</span><span class="cl"><span class="se"></span>  --incremental-basedir<span class="o">=</span>/backup/full-day1
</span></span><span class="line"><span class="ln"> 7</span><span class="cl">
</span></span><span class="line"><span class="ln"> 8</span><span class="cl"><span class="c1"># Restore: Apply incremental on top of full</span>
</span></span><span class="line"><span class="ln"> 9</span><span class="cl">xtrabackup --prepare --apply-log-only --target-dir<span class="o">=</span>/backup/full-day1
</span></span><span class="line"><span class="ln">10</span><span class="cl">xtrabackup --prepare --apply-log-only --target-dir<span class="o">=</span>/backup/full-day1 <span class="se">\
</span></span></span><span class="line"><span class="ln">11</span><span class="cl"><span class="se"></span>  --incremental-dir<span class="o">=</span>/backup/inc-day2
</span></span><span class="line"><span class="ln">12</span><span class="cl">xtrabackup --prepare --target-dir<span class="o">=</span>/backup/full-day1</span></span></code></pre></div><p><strong>適合</strong>：</p>
<ul>
<li>
<blockquote>
<p>100 GB production DB</p></blockquote>
</li>
<li>每日 incremental + 週一次 full（典型 enterprise schedule）</li>
<li>從自管 MySQL 遷 cloud（XtraBackup + rsync 到 cloud restore）</li>
</ul>
<p><strong>不適合</strong>：</p>
<ul>
<li>Schema-only dump（用 mysqldump 更簡單）</li>
<li>跨 major version restore</li>
</ul>
<h2 id="tool-3mydumper--並行邏輯備份">Tool 3：MyDumper — 並行邏輯備份</h2>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">mydumper --user<span class="o">=</span>backup --password<span class="o">=</span>... <span class="se">\
</span></span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="se"></span>  --threads<span class="o">=</span><span class="m">8</span> --rows<span class="o">=</span><span class="m">100000</span> <span class="se">\
</span></span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="se"></span>  --outputdir<span class="o">=</span>/backup/mydumper-2026-05-19 <span class="se">\
</span></span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="se"></span>  --less-locking</span></span></code></pre></div><p><strong>輸出</strong>：每張 table 一個 <code>.sql</code> file（schema） + 多個 chunked <code>.dat</code> file（資料）。</p>
<p><strong>Trade-off</strong>：</p>
<ul>
<li>優點：<em>並行 dump</em>（per-table thread）、比 mysqldump 快 5-10x、可恢復斷點（resume）</li>
<li>缺點：tooling 不如 mysqldump 普及、需要單獨裝</li>
<li>對應的 <code>myloader</code> restore：也並行、比 mysqldump restore 快 5-10x</li>
</ul>
<p><strong>適合</strong>：</p>
<ul>
<li>100 GB - 1 TB 範圍</li>
<li>中型 production、想要邏輯備份的可讀性 + 並行加速</li>
</ul>
<h2 id="tool-4lvm--ebs-snapshot--物理-file-system-層">Tool 4：LVM / EBS Snapshot — 物理 file system 層</h2>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl"><span class="c1"># 1. Freeze MySQL（讓 write 暫停）</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl">mysql&gt; FLUSH TABLES WITH READ LOCK<span class="p">;</span>
</span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="c1"># 2. Trigger snapshot（EBS / LVM）</span>
</span></span><span class="line"><span class="ln">4</span><span class="cl">aws ec2 create-snapshot --volume-id vol-xxx --description <span class="s2">&#34;mysql-2026-05-19&#34;</span>
</span></span><span class="line"><span class="ln">5</span><span class="cl"><span class="c1"># 3. Unfreeze</span>
</span></span><span class="line"><span class="ln">6</span><span class="cl">mysql&gt; UNLOCK TABLES<span class="p">;</span></span></span></code></pre></div><p><strong>Trade-off</strong>：</p>
<ul>
<li>優點：超快（file system 層）、適合 <em>VM-based MySQL</em>（EC2 / on-prem）</li>
<li>缺點：必須 <em>暫停 write</em>（短時間 lock）、不能跨 OS / cloud 移植</li>
<li>AWS RDS / Aurora 全部走這條路（自動 snapshot）</li>
</ul>
<p><strong>適合</strong>：</p>
<ul>
<li>AWS RDS / Aurora（自動）</li>
<li>自管 MySQL on EC2 with EBS（EBS snapshot 結合 mysql freeze）</li>
<li>大 DB 想要 fast backup + fast restore</li>
</ul>
<h2 id="binlog-based-pitr">Binlog-based PITR</h2>
<p>Full backup 加上 binlog 才能達到 PITR。Binlog 是 MySQL replication / CDC / PITR 共用的 source。</p>
<p><strong>配置</strong>：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-ini" data-lang="ini"><span class="line"><span class="ln">1</span><span class="cl"><span class="k">[mysqld]</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="na">log_bin</span> <span class="o">=</span> <span class="s">mysql-bin</span>
</span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="na">binlog_format</span> <span class="o">=</span> <span class="s">ROW                  # ROW 必須</span>
</span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="na">binlog_row_image</span> <span class="o">=</span> <span class="s">FULL              # 完整 row image</span>
</span></span><span class="line"><span class="ln">5</span><span class="cl"><span class="na">sync_binlog</span> <span class="o">=</span> <span class="s">1                      # 每次 commit fsync binlog（zero loss）</span>
</span></span><span class="line"><span class="ln">6</span><span class="cl"><span class="na">binlog_expire_logs_seconds</span> <span class="o">=</span> <span class="s">1209600 # 14 天 retention（依需求調）</span>
</span></span><span class="line"><span class="ln">7</span><span class="cl"><span class="na">gtid_mode</span> <span class="o">=</span> <span class="s">ON                       # GTID 必須、PITR 用 GTID 識別 transaction</span>
</span></span><span class="line"><span class="ln">8</span><span class="cl"><span class="na">enforce_gtid_consistency</span> <span class="o">=</span> <span class="s">ON</span></span></span></code></pre></div><p><strong>Binlog backup</strong>：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl"><span class="c1"># 持續 stream binlog 到 backup storage</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl">mysqlbinlog --read-from-remote-server --raw --stop-never <span class="se">\
</span></span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="se"></span>  --user<span class="o">=</span>replication --password<span class="o">=</span>... <span class="se">\
</span></span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="se"></span>  --host<span class="o">=</span>primary.example.com <span class="se">\
</span></span></span><span class="line"><span class="ln">5</span><span class="cl"><span class="se"></span>  --result-file<span class="o">=</span>/backup/binlog/ mysql-bin.000001 <span class="p">&amp;</span></span></span></code></pre></div><p><code>--read-from-remote-server</code> + <code>--stop-never</code> 持續從 primary tail binlog、不間斷 stream 到 backup directory。每個 binlog file 寫滿後 close + 開新 file。</p>
<h2 id="restore--pitr-流程">Restore + PITR 流程</h2>
<p>完整 PITR 流程（restore 到 2026-05-19 14:30:00）：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="c1"># Step 1: Restore full backup</span>
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">xtrabackup --copy-back --target-dir<span class="o">=</span>/backup/full-2026-05-18  <span class="c1"># 前一天 full</span>
</span></span><span class="line"><span class="ln"> 3</span><span class="cl">
</span></span><span class="line"><span class="ln"> 4</span><span class="cl"><span class="c1"># Step 2: 啟動 MySQL（會看到 backup 拿那刻的 GTID set）</span>
</span></span><span class="line"><span class="ln"> 5</span><span class="cl">systemctl start mysqld
</span></span><span class="line"><span class="ln"> 6</span><span class="cl">
</span></span><span class="line"><span class="ln"> 7</span><span class="cl"><span class="c1"># Step 3: 查 full backup 結束時的 GTID</span>
</span></span><span class="line"><span class="ln"> 8</span><span class="cl">mysql&gt; SHOW MASTER STATUS<span class="p">;</span>
</span></span><span class="line"><span class="ln"> 9</span><span class="cl">+------------------+----------+------------------------------------------+
</span></span><span class="line"><span class="ln">10</span><span class="cl"><span class="p">|</span> File             <span class="p">|</span> Position <span class="p">|</span> Executed_Gtid_Set                        <span class="p">|</span>
</span></span><span class="line"><span class="ln">11</span><span class="cl">+------------------+----------+------------------------------------------+
</span></span><span class="line"><span class="ln">12</span><span class="cl"><span class="p">|</span> mysql-bin.000150 <span class="p">|</span>     <span class="m">1234</span> <span class="p">|</span> server-uuid:1-12345                      <span class="p">|</span>
</span></span><span class="line"><span class="ln">13</span><span class="cl">+------------------+----------+------------------------------------------+
</span></span><span class="line"><span class="ln">14</span><span class="cl">
</span></span><span class="line"><span class="ln">15</span><span class="cl"><span class="c1"># Step 4: Apply binlog 從 backup 之後到目標時間</span>
</span></span><span class="line"><span class="ln">16</span><span class="cl">mysqlbinlog --start-datetime<span class="o">=</span><span class="s2">&#34;2026-05-18 03:00:00&#34;</span> <span class="se">\
</span></span></span><span class="line"><span class="ln">17</span><span class="cl"><span class="se"></span>            --stop-datetime<span class="o">=</span><span class="s2">&#34;2026-05-19 14:30:00&#34;</span> <span class="se">\
</span></span></span><span class="line"><span class="ln">18</span><span class="cl"><span class="se"></span>            /backup/binlog/mysql-bin.000150 <span class="se">\
</span></span></span><span class="line"><span class="ln">19</span><span class="cl"><span class="se"></span>            /backup/binlog/mysql-bin.000151 <span class="se">\
</span></span></span><span class="line"><span class="ln">20</span><span class="cl"><span class="se"></span>            ...                                <span class="c1"># 列所有需要的 binlog</span>
</span></span><span class="line"><span class="ln">21</span><span class="cl">            <span class="p">|</span> mysql -u root -p
</span></span><span class="line"><span class="ln">22</span><span class="cl">
</span></span><span class="line"><span class="ln">23</span><span class="cl"><span class="c1"># Step 5: 驗證 GTID set 到目標時間點對應的位置</span>
</span></span><span class="line"><span class="ln">24</span><span class="cl">mysql&gt; SHOW MASTER STATUS<span class="p">;</span>
</span></span><span class="line"><span class="ln">25</span><span class="cl"><span class="c1"># Executed_Gtid_Set 應包含到目標時間點的 transaction</span></span></span></code></pre></div><p>對 <em>精確 GTID-based PITR</em>（停在特定 transaction、不是 timestamp）：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">mysqlbinlog --include-gtids<span class="o">=</span><span class="s1">&#39;server-uuid:1-50000&#39;</span> <span class="se">\
</span></span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="se"></span>            /backup/binlog/mysql-bin.000150 ... <span class="p">|</span> mysql -u root -p</span></span></code></pre></div><h2 id="5-個-production-踩雷">5 個 Production 踩雷</h2>
<h3 id="1-gtid-處理不一致--restore-後-replication-broken">1. GTID 處理不一致 — Restore 後 replication broken</h3>
<p>XtraBackup restore 時 <code>--slave-info</code> 紀錄 GTID purged set、mysqldump 用 <code>--gtid-purged=ON</code>。如果 restore 後沒正確 set <code>gtid_purged</code>、replica re-attach 時 GTID gap error。</p>
<p>修法：</p>
<ul>
<li>XtraBackup restore：用 <code>xtrabackup_binlog_info</code> 內的 GTID set 設 <code>SET GLOBAL gtid_purged='...';</code></li>
<li>mysqldump：dump file 內已有 <code>SET @@GLOBAL.GTID_PURGED='...';</code>、執行 dump 自動 set</li>
<li>Restore 後 <em>先驗證 <code>Executed_Gtid_Set</code></em> 跟 source 預期對齊、再 START SLAVE</li>
</ul>
<h3 id="2-binlog-gap--中間遺漏-file-直接-restore-fail">2. Binlog gap — 中間遺漏 file 直接 restore fail</h3>
<p>Binlog stream 失聯（network blip / disk full）+ binlog rotate、<code>mysql-bin.000156</code> 不在 backup storage 內。PITR 試圖跨過該 file restore、跳過已 commit transaction、結果 <em>資料不一致</em>（不是錯誤、是 <em>silently incorrect</em>）。</p>
<p>修法：</p>
<ul>
<li><em>Binlog stream 必須持續</em>、失聯 → alert</li>
<li>監控 backup storage 內 binlog 連續性（file name 連號、無 gap）</li>
<li>Restore 前 <em>先驗證 binlog 完整性</em>：<code>mysqlbinlog --verify-binlog-checksum *.bin &gt; /dev/null</code></li>
<li>對 missing binlog <em>中止 PITR</em>、不繼續 partial restore</li>
</ul>
<h3 id="3-backup-沒-verify--真事故時才發現-restore-broken">3. Backup 沒 verify — 真事故時才發現 restore broken</h3>
<p>每天備份成功、storage 用了 5 TB、實際 <em>從未 restore 過</em>。事故發生 restore 才知道 backup file corrupt / GTID 錯 / binlog gap、整套無用。</p>
<p>修法：</p>
<ul>
<li><em>自動化 restore test</em>：每週 / 每月在 staging server 跑完整 restore + PITR、跑完 SELECT 比對 production</li>
<li>驗證 restore 後 row count 跟 production 接近、<code>CHECKSUM TABLE</code> 比對主要 table</li>
<li>真的事故時 RTO 才不會 surprise</li>
</ul>
<h3 id="4-rpo-不到-1-分鐘的代價">4. RPO 不到 1 分鐘的代價</h3>
<p>「我要 RPO &lt; 1 分鐘」聽起來合理、但實現需要：</p>
<ul>
<li><code>sync_binlog=1</code>（每 commit fsync、寫吞吐降 10-30%）</li>
<li>Binlog stream 到 <em>獨立 storage</em>（不只是 primary local disk）、cross-region replication（額外 network cost）</li>
<li>Replica 也用 semi-sync 配合（zero binlog loss）</li>
<li>監控 + alert RPO 違反（&lt; 1 分鐘 stream lag）</li>
</ul>
<p><strong>TCO</strong>：~30% 寫吞吐 penalty + 額外 storage / network cost + 7x24 on-call。考慮 <em>real RPO requirement</em> — 多數 application 5 分鐘 RPO 已足夠、追求 1 分鐘 RPO 不划算。</p>
<p>修法：</p>
<ul>
<li>跟 product / business 確認 <em>真 RPO 要求</em></li>
<li><em>RPO budget = 寫吞吐 trade-off + ops cost</em>、不是 free</li>
<li>用 <a href="/blog/backend/01-database/vendors/aurora/" data-link-title="AWS Aurora" data-link-desc="AWS managed PostgreSQL / MySQL、storage / compute 分離、&#43;75% 效能改善的 production 證據">Aurora</a> / managed offering 把 RPO 議題 outsource（Aurora &lt; 1 秒 RPO + 自動 cross-AZ）</li>
</ul>
<h3 id="5-encryption-key-沒備份--restore-後解不開資料">5. Encryption key 沒備份 — Restore 後解不開資料</h3>
<p>啟用 <em>encryption at rest</em>（MySQL 8.0+ <code>default_table_encryption=ON</code> + keyring plugin / component；MariaDB 用 <code>innodb_encrypt_tables</code>）後、所有 InnoDB tablespace 都加密。Master key 在 <em>keyring file</em> 或 KMS-backed component。如果 backup 只 backup MySQL data file、沒備 keyring、restore 後資料 <em>encrypted 但無 key、無法讀</em>。</p>
<p>修法：</p>
<ul>
<li><em>Keyring file 跟 data file 分開儲存</em>、但兩者 <em>都要 backup</em></li>
<li>用 <em>KMS-based keyring</em>（AWS KMS / HashiCorp Vault）取代 file-based、key 不在 MySQL server 上</li>
<li>Disaster recovery runbook 紀錄 <em>key recovery 流程</em>、不要假設「重 install MySQL」就能解</li>
</ul>
<h2 id="容量規劃要點">容量規劃要點</h2>
<table>
  <thead>
      <tr>
          <th>項目</th>
          <th>建議</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>Full backup 頻率</td>
          <td>週一次（XtraBackup）或日一次（小 DB）</td>
      </tr>
      <tr>
          <td>Incremental 頻率</td>
          <td>每日（XtraBackup incremental）</td>
      </tr>
      <tr>
          <td>Binlog retention</td>
          <td>14 天（給 PITR window）</td>
      </tr>
      <tr>
          <td>Backup retention</td>
          <td>Full × 4 週 + 月度 archive × 12 個月</td>
      </tr>
      <tr>
          <td>Storage cost</td>
          <td>約 2-3x DB size（full + incremental + binlog）</td>
      </tr>
      <tr>
          <td>Cross-region copy</td>
          <td>必要（local backup 失效時還有 disaster recovery）</td>
      </tr>
      <tr>
          <td>Restore test 頻率</td>
          <td>每週 staging 上跑、每月 production-like 跑</td>
      </tr>
  </tbody>
</table>
<h2 id="跟其他模組整合">跟其他模組整合</h2>
<h3 id="跟-replication-topology">跟 Replication topology</h3>
<p>Replication replica 不能取代 backup — replica 上的 DROP TABLE 也會被 replicate、replica 上資料同樣消失。Backup 是 <em>獨立保險</em>。詳見 <a href="/blog/backend/01-database/vendors/mysql/replication-topology/" data-link-title="MySQL Replication Topology：async / semi-sync / GTID 不是三選一、是三個 trade-off 軸的疊加" data-link-desc="MySQL replication 不是「選 async 還是 semi-sync」、是 *durability / latency / consistency* 三個 trade-off 軸的疊加；GTID 是跨 mode 的 infrastructure layer、不是第三種 mode。本文走 3 軸取捨模型 → async / semi-sync 行為對比 → GTID 替代 binlog-position 的好處 → 配置 step-by-step → 5 production 踩雷（lag 暴衝 / semi-sync 退回 async / GTID gap / Loss-Less semi-sync 真的 loss-less / chained replication 雪崩）→ 跟 Aurora MySQL / Vitess / ProxySQL / Orchestrator 整合">Replication Topology</a>。</p>
<h3 id="跟-innodb-tuning">跟 InnoDB Tuning</h3>
<p><code>innodb_flush_log_at_trx_commit=1</code> + <code>sync_binlog=1</code> 是 backup-friendly 的設定（zero loss）、但寫吞吐降。如果為了寫吞吐放寬 durability、必須接受 <em>PITR window</em> 也 widening。詳見 <a href="/blog/backend/01-database/vendors/mysql/innodb-tuning/" data-link-title="MySQL InnoDB Tuning：為什麼一個 100 GB DB 在 64 GB RAM server 上 query 慢 5 倍" data-link-desc="InnoDB 是 MySQL 預設 storage engine、預設值給 256 MB buffer pool（早期 default）。本文從一個常見痛點開場（DB &gt; RAM 但 server 仍 swap）、走 4 個 critical knob（buffer pool / redo log / flush method / IO capacity）、各自如何影響讀寫吞吐、配置 step-by-step、5 production 踩雷（buffer pool warm-up / log file 大小 / 設 sync_binlog=0 換速度 / IO scheduler / undo log 膨脹）、跟 SSD / NVMe / EBS 的 IO 假設">InnoDB Tuning</a>。</p>
<h3 id="跟-aurora-mysql">跟 Aurora MySQL</h3>
<p>Aurora 完全 outsource backup — automatic continuous backup + PITR &lt; 1 秒、不必管 mysqldump / XtraBackup / binlog stream。從 Aurora 遷出時、需要重新建 self-managed backup chain。詳見 <a href="/blog/backend/01-database/vendors/mysql/migrate-to-aurora/" data-link-title="MySQL → Aurora MySQL：storage layer 轉手到 AWS、replication / HA / backup 全部 outsource" data-link-desc="自管 MySQL → Aurora MySQL 是 Type C operational hybrid migration — wire protocol 一致、ops 責任轉到 AWS。本文走 6 維 audit（Operational High）、Aurora storage architecture 衝擊、4-phase migration、5 production 踩雷、何時維持原路線。">migrate-to-aurora</a>。</p>
<h3 id="跟-postgresql-pitr">跟 PostgreSQL PITR</h3>
<table>
  <thead>
      <tr>
          <th>維度</th>
          <th>MySQL PITR</th>
          <th>PostgreSQL PITR</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>Logical backup</td>
          <td>mysqldump / MyDumper</td>
          <td>pg_dump / pg_dumpall</td>
      </tr>
      <tr>
          <td>Physical backup</td>
          <td>XtraBackup</td>
          <td>pg_basebackup / pgBackRest</td>
      </tr>
      <tr>
          <td>Incremental log</td>
          <td>Binary log（binlog）</td>
          <td>WAL (Write-Ahead Log)</td>
      </tr>
      <tr>
          <td>Stream tool</td>
          <td>mysqlbinlog &ndash;read-from-remote-server</td>
          <td>pg_receivewal</td>
      </tr>
      <tr>
          <td>PITR command</td>
          <td>mysqlbinlog &ndash;stop-datetime</td>
          <td>pg_ctl + recovery.conf / standby.signal</td>
      </tr>
      <tr>
          <td>Identifier</td>
          <td>GTID 或 file:position</td>
          <td>LSN（Log Sequence Number）</td>
      </tr>
      <tr>
          <td>Cross-version</td>
          <td>mysqldump（廣容）</td>
          <td>pg_dump（廣容）</td>
      </tr>
  </tbody>
</table>
<p>兩家 PITR 概念類似（full + log replay）、tool name 不同、概念對等。詳見 <a href="/blog/backend/01-database/vendors/postgresql/pitr-wal-archiving/" data-link-title="PostgreSQL PITR &#43; WAL archiving：從 base backup 到 point-in-time recovery 的完整鏈" data-link-desc="Base backup &#43; WAL archive 構成 PITR 的雙軌資料、archive_command &#43; restore_command 配置、用 pgBackRest / WAL-G 替代手寫腳本、5 個 production 踩雷（archive 靜默失敗 / archive lag / 錯誤 target time / base backup 過期未清 / timeline 分歧 recovery 模糊）、跟 Patroni &#43; monitoring 整合">PostgreSQL PITR + WAL Archiving</a>。</p>
<h2 id="何時-outsource-backup">何時 outsource backup</h2>
<table>
  <thead>
      <tr>
          <th>場景</th>
          <th>建議</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>AWS 生態 + 不想管 backup ops</td>
          <td>Aurora MySQL（內建 PITR）</td>
      </tr>
      <tr>
          <td>GCP 生態</td>
          <td>Cloud SQL（內建 PITR）</td>
      </tr>
      <tr>
          <td>Azure 生態</td>
          <td>Azure DB for MySQL</td>
      </tr>
      <tr>
          <td>跨雲 + 想自管</td>
          <td>XtraBackup + binlog stream + S3</td>
      </tr>
      <tr>
          <td>規模小、可接受 mysqldump</td>
          <td>mysqldump cron + S3</td>
      </tr>
      <tr>
          <td>規模大、無 cloud</td>
          <td>Percona XtraBackup Enterprise + tape archive</td>
      </tr>
      <tr>
          <td>強合規（HIPAA / PCI-DSS）</td>
          <td>自管 + air-gap backup + audit trail</td>
      </tr>
  </tbody>
</table>
<h2 id="相關連結">相關連結</h2>
<ul>
<li><a href="/blog/backend/01-database/vendors/mysql/" data-link-title="MySQL" data-link-desc="高併發網路服務常用關聯式資料庫、Vitess / PlanetScale 分片生態、GitHub / Shopify / Facebook 規模驗證">MySQL vendor overview</a></li>
<li><a href="/blog/backend/01-database/vendors/mysql/replication-topology/" data-link-title="MySQL Replication Topology：async / semi-sync / GTID 不是三選一、是三個 trade-off 軸的疊加" data-link-desc="MySQL replication 不是「選 async 還是 semi-sync」、是 *durability / latency / consistency* 三個 trade-off 軸的疊加；GTID 是跨 mode 的 infrastructure layer、不是第三種 mode。本文走 3 軸取捨模型 → async / semi-sync 行為對比 → GTID 替代 binlog-position 的好處 → 配置 step-by-step → 5 production 踩雷（lag 暴衝 / semi-sync 退回 async / GTID gap / Loss-Less semi-sync 真的 loss-less / chained replication 雪崩）→ 跟 Aurora MySQL / Vitess / ProxySQL / Orchestrator 整合">MySQL Replication Topology</a>（binlog 跟 PITR 共用 source）</li>
<li><a href="/blog/backend/01-database/vendors/mysql/innodb-tuning/" data-link-title="MySQL InnoDB Tuning：為什麼一個 100 GB DB 在 64 GB RAM server 上 query 慢 5 倍" data-link-desc="InnoDB 是 MySQL 預設 storage engine、預設值給 256 MB buffer pool（早期 default）。本文從一個常見痛點開場（DB &gt; RAM 但 server 仍 swap）、走 4 個 critical knob（buffer pool / redo log / flush method / IO capacity）、各自如何影響讀寫吞吐、配置 step-by-step、5 production 踩雷（buffer pool warm-up / log file 大小 / 設 sync_binlog=0 換速度 / IO scheduler / undo log 膨脹）、跟 SSD / NVMe / EBS 的 IO 假設">MySQL InnoDB Tuning</a>（durability + backup 互動）</li>
<li><a href="/blog/backend/01-database/vendors/mysql/migrate-to-aurora/" data-link-title="MySQL → Aurora MySQL：storage layer 轉手到 AWS、replication / HA / backup 全部 outsource" data-link-desc="自管 MySQL → Aurora MySQL 是 Type C operational hybrid migration — wire protocol 一致、ops 責任轉到 AWS。本文走 6 維 audit（Operational High）、Aurora storage architecture 衝擊、4-phase migration、5 production 踩雷、何時維持原路線。">migrate-to-aurora</a>（backup outsource）</li>
<li><a href="/blog/backend/01-database/vendors/postgresql/pitr-wal-archiving/" data-link-title="PostgreSQL PITR &#43; WAL archiving：從 base backup 到 point-in-time recovery 的完整鏈" data-link-desc="Base backup &#43; WAL archive 構成 PITR 的雙軌資料、archive_command &#43; restore_command 配置、用 pgBackRest / WAL-G 替代手寫腳本、5 個 production 踩雷（archive 靜默失敗 / archive lag / 錯誤 target time / base backup 過期未清 / timeline 分歧 recovery 模糊）、跟 Patroni &#43; monitoring 整合">PostgreSQL PITR + WAL Archiving</a>（PG sibling）</li>
<li>官方：<a href="https://docs.percona.com/percona-xtrabackup/8.0/">Percona XtraBackup</a> / <a href="https://github.com/mydumper/mydumper">MyDumper</a> / <a href="https://dev.mysql.com/doc/refman/8.0/en/mysqldump.html">mysqldump</a></li>
</ul>
]]></content:encoded></item><item><title>PostgreSQL PITR + WAL archiving：從 base backup 到 point-in-time recovery 的完整鏈</title><link>https://tarrragon.github.io/blog/backend/01-database/vendors/postgresql/pitr-wal-archiving/</link><pubDate>Mon, 18 May 2026 00:00:00 +0000</pubDate><guid>https://tarrragon.github.io/blog/backend/01-database/vendors/postgresql/pitr-wal-archiving/</guid><description>&lt;blockquote>
&lt;p>本文是 &lt;a href="https://tarrragon.github.io/blog/backend/01-database/vendors/postgresql/" data-link-title="PostgreSQL" data-link-desc="多用途 OLTP 主流關聯式資料庫、MVCC、豐富 SQL 特性、是 Aurora / Cosmos DB / Spanner / CockroachDB / Aurora DSQL 的相容目標">PostgreSQL&lt;/a> overview 的 implementation-layer deep article。Overview 已說明 backup / recovery 是 OLTP 必備能力、本文聚焦 &lt;em>PITR（Point-In-Time Recovery）的雙軌資料設計 + production 5 個 failure mode&lt;/em>。&lt;/p>&lt;/blockquote>
&lt;h2 id="問題情境">問題情境&lt;/h2>
&lt;p>Logical bug 在 production 部署、執行 6 小時後才發現 — 某個 batch job 把 50 萬筆 user.email 改成 NULL。此時：&lt;/p>
&lt;ul>
&lt;li>還原最新 daily backup（昨晚）→ 丟掉今天所有正常寫入（訂單、註冊）&lt;/li>
&lt;li>從 standby promote → standby 已同步 bug、跟 primary 同狀態&lt;/li>
&lt;li>從 application log 重建 → 部分操作不可逆（已寄出 email）&lt;/li>
&lt;/ul>
&lt;p>PITR 是這類 &lt;em>logical disaster&lt;/em> 的標準解 — 不還原到 backup 時間點、而是 &lt;em>還原到 bug 發生前一刻&lt;/em>（例：1 分鐘前）。需要 &lt;em>base backup + WAL archive&lt;/em> 雙軌資料：base backup 是 snapshot、WAL archive 是 snapshot 之後的所有寫入；recovery 時 replay WAL 到指定 timestamp / LSN / transaction ID。&lt;/p>
&lt;h2 id="核心概念base-backup--wal-archive-的雙軌設計">核心概念：base backup + WAL archive 的雙軌設計&lt;/h2>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-text" data-lang="text">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">[Base backup t0] + [WAL archive t0 → now]
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">2&lt;/span>&lt;span class="cl"> ↓ ↓
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">3&lt;/span>&lt;span class="cl"> 全量 snapshot incremental log
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">4&lt;/span>&lt;span class="cl"> ↓ ↓
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">5&lt;/span>&lt;span class="cl"> └────── recover to t_target ──→ [restored cluster at t_target]&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>兩個軌道各自獨立但必須對齊：&lt;/p>
&lt;ol>
&lt;li>&lt;strong>Base backup&lt;/strong>：某時刻整個 data dir 的 snapshot。&lt;code>pg_basebackup&lt;/code> / &lt;code>pgBackRest&lt;/code> / &lt;code>WAL-G&lt;/code> 都產這個；通常 &lt;em>每天 / 每週&lt;/em> 跑一次&lt;/li>
&lt;li>&lt;strong>WAL archive&lt;/strong>：base backup 之後每段 WAL 都 push 到外部 storage（S3 / GCS / NFS）。&lt;code>archive_command&lt;/code> 觸發、PostgreSQL 等到 archive 成功才 &lt;em>回收&lt;/em> 那段 WAL&lt;/li>
&lt;/ol>
&lt;p>兩者組合決定 RPO（recovery point objective）：&lt;/p>
&lt;ul>
&lt;li>RPO ≈ WAL archive frequency（streaming 即時、&lt;code>archive_timeout&lt;/code> 預設 1 分鐘）&lt;/li>
&lt;li>RPO 不是 base backup frequency — daily base backup + 每分鐘 archive WAL → RPO 1 分鐘&lt;/li>
&lt;/ul>
&lt;p>RTO（recovery time objective）跟 &lt;em>base backup size + WAL replay 量&lt;/em> 相關：&lt;/p></description><content:encoded><![CDATA[<blockquote>
<p>本文是 <a href="/blog/backend/01-database/vendors/postgresql/" data-link-title="PostgreSQL" data-link-desc="多用途 OLTP 主流關聯式資料庫、MVCC、豐富 SQL 特性、是 Aurora / Cosmos DB / Spanner / CockroachDB / Aurora DSQL 的相容目標">PostgreSQL</a> overview 的 implementation-layer deep article。Overview 已說明 backup / recovery 是 OLTP 必備能力、本文聚焦 <em>PITR（Point-In-Time Recovery）的雙軌資料設計 + production 5 個 failure mode</em>。</p></blockquote>
<h2 id="問題情境">問題情境</h2>
<p>Logical bug 在 production 部署、執行 6 小時後才發現 — 某個 batch job 把 50 萬筆 user.email 改成 NULL。此時：</p>
<ul>
<li>還原最新 daily backup（昨晚）→ 丟掉今天所有正常寫入（訂單、註冊）</li>
<li>從 standby promote → standby 已同步 bug、跟 primary 同狀態</li>
<li>從 application log 重建 → 部分操作不可逆（已寄出 email）</li>
</ul>
<p>PITR 是這類 <em>logical disaster</em> 的標準解 — 不還原到 backup 時間點、而是 <em>還原到 bug 發生前一刻</em>（例：1 分鐘前）。需要 <em>base backup + WAL archive</em> 雙軌資料：base backup 是 snapshot、WAL archive 是 snapshot 之後的所有寫入；recovery 時 replay WAL 到指定 timestamp / LSN / transaction ID。</p>
<h2 id="核心概念base-backup--wal-archive-的雙軌設計">核心概念：base backup + WAL archive 的雙軌設計</h2>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="ln">1</span><span class="cl">[Base backup t0]  +  [WAL archive t0 → now]
</span></span><span class="line"><span class="ln">2</span><span class="cl">     ↓                       ↓
</span></span><span class="line"><span class="ln">3</span><span class="cl">  全量 snapshot          incremental log
</span></span><span class="line"><span class="ln">4</span><span class="cl">     ↓                       ↓
</span></span><span class="line"><span class="ln">5</span><span class="cl">     └────── recover to t_target ──→ [restored cluster at t_target]</span></span></code></pre></div><p>兩個軌道各自獨立但必須對齊：</p>
<ol>
<li><strong>Base backup</strong>：某時刻整個 data dir 的 snapshot。<code>pg_basebackup</code> / <code>pgBackRest</code> / <code>WAL-G</code> 都產這個；通常 <em>每天 / 每週</em> 跑一次</li>
<li><strong>WAL archive</strong>：base backup 之後每段 WAL 都 push 到外部 storage（S3 / GCS / NFS）。<code>archive_command</code> 觸發、PostgreSQL 等到 archive 成功才 <em>回收</em> 那段 WAL</li>
</ol>
<p>兩者組合決定 RPO（recovery point objective）：</p>
<ul>
<li>RPO ≈ WAL archive frequency（streaming 即時、<code>archive_timeout</code> 預設 1 分鐘）</li>
<li>RPO 不是 base backup frequency — daily base backup + 每分鐘 archive WAL → RPO 1 分鐘</li>
</ul>
<p>RTO（recovery time objective）跟 <em>base backup size + WAL replay 量</em> 相關：</p>
<ul>
<li>Restore base backup ~ 1-4 小時（TB 級）</li>
<li>WAL replay 時間 ~ archive 累積量 / replay throughput</li>
</ul>
<h2 id="step-by-step-配置">Step-by-step 配置</h2>
<h3 id="primaryarchive_command-設好">Primary：archive_command 設好</h3>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-ini" data-lang="ini"><span class="line"><span class="ln">1</span><span class="cl"><span class="c1"># postgresql.conf</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="na">wal_level</span> <span class="o">=</span> <span class="s">replica                          # 預設 replica、PITR 需要</span>
</span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="na">archive_mode</span> <span class="o">=</span> <span class="s">on                            # 啟用 archive</span>
</span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="na">archive_command</span> <span class="o">=</span> <span class="s">&#39;wal-g wal-push %p&#39;        # 或 pgBackRest / 自寫 script</span>
</span></span><span class="line"><span class="ln">5</span><span class="cl"><span class="na">archive_timeout</span> <span class="o">=</span> <span class="s">60                         # 60s 無 WAL 時強制切 segment</span>
</span></span><span class="line"><span class="ln">6</span><span class="cl"><span class="na">max_wal_size</span> <span class="o">=</span> <span class="s">4GB</span>
</span></span><span class="line"><span class="ln">7</span><span class="cl"><span class="na">checkpoint_timeout</span> <span class="o">=</span> <span class="s">15min</span></span></span></code></pre></div><p><code>archive_command</code> 必須 <em>回 exit code 0 才算成功</em>；非 0 PostgreSQL retry、retry 失敗會在 <code>pg_wal</code> 堆積 WAL 直到 disk 滿。<strong>critical：archive_command 不能寫成 silent-fail</strong>。</p>
<h3 id="用-pgbackrest-取代手寫-script">用 pgBackRest 取代手寫 script</h3>
<p>production 強烈不建議自寫 archive script — pgBackRest / WAL-G / Barman 處理過所有 edge case：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-ini" data-lang="ini"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="c1"># pgbackrest.conf</span>
</span></span><span class="line"><span class="ln"> 2</span><span class="cl"><span class="k">[global]</span>
</span></span><span class="line"><span class="ln"> 3</span><span class="cl"><span class="na">repo1-type</span><span class="o">=</span><span class="s">s3</span>
</span></span><span class="line"><span class="ln"> 4</span><span class="cl"><span class="na">repo1-s3-bucket</span><span class="o">=</span><span class="s">mybucket</span>
</span></span><span class="line"><span class="ln"> 5</span><span class="cl"><span class="na">repo1-s3-region</span><span class="o">=</span><span class="s">us-east-1</span>
</span></span><span class="line"><span class="ln"> 6</span><span class="cl"><span class="na">repo1-retention-full</span><span class="o">=</span><span class="s">4                       # 留 4 個 full backup</span>
</span></span><span class="line"><span class="ln"> 7</span><span class="cl"><span class="na">repo1-retention-diff</span><span class="o">=</span><span class="s">8                       # 留 8 個 differential</span>
</span></span><span class="line"><span class="ln"> 8</span><span class="cl"><span class="na">repo1-cipher-type</span><span class="o">=</span><span class="s">aes-256-cbc                # encrypt at rest</span>
</span></span><span class="line"><span class="ln"> 9</span><span class="cl"><span class="na">process-max</span><span class="o">=</span><span class="s">8                                # parallel restore</span>
</span></span><span class="line"><span class="ln">10</span><span class="cl">
</span></span><span class="line"><span class="ln">11</span><span class="cl"><span class="k">[main]</span>
</span></span><span class="line"><span class="ln">12</span><span class="cl"><span class="na">pg1-path</span><span class="o">=</span><span class="s">/var/lib/postgresql/16/main</span></span></span></code></pre></div>




<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl"><span class="c1"># 跑 full backup</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl">pgbackrest --stanza<span class="o">=</span>main backup --type<span class="o">=</span>full
</span></span><span class="line"><span class="ln">3</span><span class="cl">
</span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="c1"># archive_command 用 pgbackrest 內建</span>
</span></span><span class="line"><span class="ln">5</span><span class="cl"><span class="nv">archive_command</span> <span class="o">=</span> <span class="s1">&#39;pgbackrest --stanza=main archive-push %p&#39;</span></span></span></code></pre></div><p>pgBackRest 處理：parallel push、compression、encryption、checksum、archive replay timing、backup catalog、retention 自動清理。</p>
<h3 id="restorerecovery_target_time">Restore：recovery_target_time</h3>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl"><span class="c1"># 1. 從 S3 / repo 拉 base backup</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl">pgbackrest --stanza<span class="o">=</span>main --type<span class="o">=</span><span class="nb">time</span> <span class="se">\
</span></span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="se"></span>  --target<span class="o">=</span><span class="s2">&#34;2026-05-18 14:30:00+00&#34;</span> restore
</span></span><span class="line"><span class="ln">4</span><span class="cl">
</span></span><span class="line"><span class="ln">5</span><span class="cl"><span class="c1"># 2. PostgreSQL 進 recovery mode、自動 replay WAL 到 target time</span>
</span></span><span class="line"><span class="ln">6</span><span class="cl"><span class="c1"># (pgBackRest 寫好 recovery.signal + postgresql.auto.conf)</span>
</span></span><span class="line"><span class="ln">7</span><span class="cl">
</span></span><span class="line"><span class="ln">8</span><span class="cl"><span class="c1"># 3. 確認到目標 timestamp 後、promote</span>
</span></span><span class="line"><span class="ln">9</span><span class="cl">pg_ctl promote</span></span></code></pre></div><p>Recovery target 三種：</p>
<ul>
<li><strong><code>recovery_target_time</code></strong>：到某 timestamp</li>
<li><strong><code>recovery_target_xid</code></strong>：到某 transaction ID（log 有 xid 才好定位）</li>
<li><strong><code>recovery_target_lsn</code></strong>：到某 WAL LSN（最精確、但需要事先記下 LSN）</li>
</ul>
<p>production 多用 timestamp、application log 有時間戳容易定位。</p>
<h2 id="故障演練--邊界-case">故障演練 / 邊界 case</h2>
<h3 id="case-1archive_command-靜默失敗">Case 1：archive_command 靜默失敗</h3>
<p><strong>徵兆</strong>：DBA 發現某 PITR test 時、最近 3 天的 WAL 在 S3 上沒有；但 PostgreSQL 沒 alert、<code>pg_wal</code> 也沒堆積（早就被回收？）。</p>
<p><strong>根因</strong>：archive_command 寫成 <code>aws s3 cp %p s3://bucket/... 2&gt;/dev/null</code> — 錯誤訊息被吞、exit code 卻是 0（cp 失敗但 redirect 後 shell wrapper 不傳 fail code）；PostgreSQL 以為成功、繼續 advance WAL pointer、舊 WAL 已回收、archive 上實際沒有。</p>
<p><strong>修法</strong>：</p>
<ol>
<li><strong>絕對不要靜默 exit code</strong>：archive_command 必須 <em>fail loud</em>、exit code 非 0</li>
<li><strong>用 pgBackRest / WAL-G</strong>、不自寫 shell 腳本</li>
<li><strong>monitoring</strong>：對 archive lag 寫 alert</li>
</ol>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-sql" data-lang="sql"><span class="line"><span class="ln">1</span><span class="cl"><span class="k">SELECT</span><span class="w"> </span><span class="n">pg_last_archived_xact_time</span><span class="p">(),</span><span class="w"> </span><span class="n">now</span><span class="p">()</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="n">pg_last_archived_xact_time</span><span class="p">()</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">lag</span><span class="p">;</span></span></span></code></pre></div><p>alert if lag &gt; 5 minutes</p>
<ol start="4">
<li><strong>定期測試 restore</strong>：每月跑一次 PITR drill、實際從 archive restore + 驗證 timestamp</li>
</ol>
<h3 id="case-2wal-archive-lagprimary-disk-壓力">Case 2：WAL archive lag、primary disk 壓力</h3>
<p><strong>徵兆</strong>：<code>pg_wal</code> 目錄持續長大、<code>df -h</code> 90%+；<code>pg_stat_archiver</code> 顯示 <code>failed_count</code> 累積、<code>last_failed_time</code> 是 30 分鐘前；archive_command 寫不出去（S3 throttle / network 慢）。</p>
<p><strong>根因</strong>：archive_command 寫到 S3、但 S3 rate limit / connection timeout、PostgreSQL retry；WAL 一直在 <code>pg_wal</code> 不能回收、disk 持續長。</p>
<p><strong>修法</strong>：</p>
<ol>
<li><strong>預防</strong>：<code>archive_command</code> 內部 retry + parallel push（pgBackRest 自帶 <code>process-max</code>）</li>
<li><strong>alert</strong>：<code>pg_stat_archiver.failed_count</code> 增長 + primary disk usage &gt; 80%</li>
<li><strong>緊急</strong>：暫時改 archive_command 寫 local NFS / 其他 storage、等 S3 恢復再同步；不要直接 disable archive（會丟資料）</li>
<li><strong>架構</strong>：archive storage 至少跨 region 兩份、單一 storage 故障不影響 archive</li>
</ol>
<h3 id="case-3recovery-跑到-wrong-target-time">Case 3：recovery 跑到 wrong target time</h3>
<p><strong>徵兆</strong>：PITR 還原後資料看起來 <em>缺一塊</em>；DBA 後悔 — target time 設早了 30 分鐘、recovery 已 promote、後續 WAL 在新 timeline 上、回不去。</p>
<p><strong>根因</strong>：recovery 過程不可逆 — 一旦 promote 開新 timeline、舊 WAL 在新 timeline 上不會被 replay；想還原到更晚 timestamp 必須 <em>重新 restore base backup + WAL</em>。</p>
<p><strong>修法</strong>：</p>
<ol>
<li><strong><code>recovery_target_action = pause</code></strong>（PG 13+）：到 target time 後 <em>暫停</em>、不自動 promote；DBA 手動 query 確認資料對才 promote</li>
</ol>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-ini" data-lang="ini"><span class="line"><span class="ln">1</span><span class="cl"><span class="na">recovery_target_time</span> <span class="o">=</span> <span class="s">&#39;2026-05-18 14:30:00+00&#39;</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="na">recovery_target_action</span> <span class="o">=</span> <span class="s">pause</span></span></span></code></pre></div><ol start="2">
<li><strong>多次 PITR 試錯</strong>：用 <em>獨立 staging cluster</em> restore、驗證 target time 對、再對 production 跑</li>
<li><strong>記錄 target time 來源</strong>：application log / event timestamp 多比對、避免時區錯亂（<code>+00</code> UTC 跟 local time 差）</li>
</ol>
<h3 id="case-4base-backup-過期未清storage-爆">Case 4：base backup 過期未清、storage 爆</h3>
<p><strong>徵兆</strong>：S3 backup bucket size 半年內從 200GB 漲到 5TB；DBA 才發現 retention 沒設、daily base backup 留 180 天。</p>
<p><strong>根因</strong>：archive_command 自寫腳本沒 retention 邏輯、或 pgBackRest 設了 <code>repo1-retention-full=180</code> 漏看；DB 容量本來就成長 + 每日 full backup 累積。</p>
<p><strong>修法</strong>：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-ini" data-lang="ini"><span class="line"><span class="ln">1</span><span class="cl"><span class="c1"># pgBackRest retention：4 full + auto-expire archive</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="na">repo1-retention-full</span><span class="o">=</span><span class="s">4                         # 留 4 個 full backup</span>
</span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="na">repo1-retention-diff</span><span class="o">=</span><span class="s">8                         # 留 8 個 differential</span>
</span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="na">repo1-retention-archive</span><span class="o">=</span><span class="s">4                      # WAL archive 跟 full 對齊</span>
</span></span><span class="line"><span class="ln">5</span><span class="cl"><span class="na">repo1-retention-archive-type</span><span class="o">=</span><span class="s">full</span></span></span></code></pre></div><p>storage budgeting：</p>
<ul>
<li>daily full + diff + WAL archive ≈ 1-2x DB size / day</li>
<li>4-week retention → ~30-60x DB size storage</li>
<li>跨 region replication → 2-3x</li>
</ul>
<h3 id="case-5timeline-分歧後-recovery-模糊">Case 5：timeline 分歧後 recovery 模糊</h3>
<p><strong>徵兆</strong>：production 經歷一次 failover（Patroni promote）+ 之後又 PITR 一次；現在要再 PITR 到 failover 前一刻、archive 上有兩個 timeline、recovery target 搞不清要哪個。</p>
<p><strong>根因</strong>：每次 promote 開新 timeline ID（<code>.history</code> 檔）；archive storage 上同 LSN 可能對應不同 timeline；recovery target time 在分歧點附近、ambiguous。</p>
<p><strong>修法</strong>：</p>
<ol>
<li><strong><code>recovery_target_timeline</code></strong> 明示要 follow 哪個 timeline</li>
</ol>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-ini" data-lang="ini"><span class="line"><span class="ln">1</span><span class="cl"><span class="na">recovery_target_time</span> <span class="o">=</span> <span class="s">&#39;2026-05-15 10:00:00+00&#39;</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="na">recovery_target_timeline</span> <span class="o">=</span> <span class="s">&#39;3&#39;                 # 要 follow timeline 3</span></span></span></code></pre></div><ol start="2">
<li><strong>熟悉 <code>.history</code> 檔</strong>：<code>/wal_archive/000000XX.history</code> 記錄 timeline 切換點、PITR 前先看</li>
<li><strong>預防</strong>：每次 promote 後 <em>立刻</em> 跑新的 base backup、簡化未來 PITR 流程（不用跨 timeline）</li>
</ol>
<h2 id="容量--cost-規劃">容量 / cost 規劃</h2>
<table>
  <thead>
      <tr>
          <th>維度</th>
          <th>估算</th>
          <th>警戒</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>Base backup size</td>
          <td>跟 DB data dir 大小成正比（PostgreSQL 內部 compression 後）</td>
          <td>每 backup ~ 0.5-1x DB size</td>
      </tr>
      <tr>
          <td>WAL archive size</td>
          <td>~5-50GB / day depending on write volume</td>
          <td>1TB DB / write-heavy 可能 100GB+ / day</td>
      </tr>
      <tr>
          <td>Storage retention</td>
          <td>4-12 weeks 典型</td>
          <td>30-60x DB size budget</td>
      </tr>
      <tr>
          <td>Base backup time</td>
          <td>TB 級 1-4 小時</td>
          <td>跑在 maintenance window</td>
      </tr>
      <tr>
          <td>Restore time</td>
          <td>base backup restore + WAL replay</td>
          <td>TB 級 PITR 通常 2-6 小時</td>
      </tr>
      <tr>
          <td>Network bandwidth</td>
          <td>full backup 期間 100-500 Mbps</td>
          <td>跨 region 注意 egress cost</td>
      </tr>
  </tbody>
</table>
<p>實務 default：</p>
<ul>
<li>Daily full backup + 4 weeks retention</li>
<li>WAL archive every 60s（<code>archive_timeout = 60</code>）</li>
<li>跨 region replication（S3 → S3 cross-region）</li>
<li>月度 restore drill 驗證可用</li>
</ul>
<h2 id="整合--下一步">整合 / 下一步</h2>
<h3 id="跟-patroni-ha-整合">跟 <a href="/blog/backend/01-database/vendors/postgresql/patroni-ha/" data-link-title="PostgreSQL Patroni HA：從 leader 失聯到 client 重連的 5 段 failover lifecycle" data-link-desc="Patroni 把 PostgreSQL HA 拆成 detection / election / promotion / reconfiguration / recovery 五段 lifecycle、每段都有獨立配置跟 failure mode；DCS quorum &#43; watchdog 防 split-brain、async/sync replication 取捨、5 個 production 踩雷、跟 PgBouncer / HAProxy / cert-manager 整合">Patroni HA</a> 整合</h3>
<p>Patroni 不管 backup，但 promotion 後 timeline 切換影響 archive：</p>
<ol>
<li>archive_command 用 <code>%t</code>（timeline）+ <code>%f</code>（filename）路徑、避免不同 timeline WAL 覆蓋</li>
<li>Patroni <code>recovery_conf</code> 包含 <code>restore_command</code>、standby clone 從 archive 拉</li>
<li>每次 Patroni failover 後跑 <em>full backup</em>、簡化未來 PITR</li>
</ol>
<h3 id="跟-logical-replication-對位">跟 <a href="/blog/backend/01-database/vendors/postgresql/logical-replication-debezium/" data-link-title="PostgreSQL Logical Replication &#43; Debezium CDC：replication slot × failure × recovery 對照" data-link-desc="PostgreSQL logical replication slot 跟 Debezium CDC 的失效模式對照表：slot lag 撐爆 primary disk / schema change 斷流 / 初始 COPY 鎖表 / zombie slot 不釋放 / replay storm 後 offset reset；publication / subscription / pgoutput 配置、跟 Kafka outbox pattern 整合">logical replication</a> 對位</h3>
<p>PITR 跟 logical replication 服務不同 use case：</p>
<ul>
<li>PITR 是 <em>災難恢復</em>（logical bug / corruption）— 全量還原到某時刻</li>
<li>Logical replication 是 <em>連續 sync</em> — Kafka / 跨 DB 即時複製</li>
</ul>
<p>兩者 <em>都依賴 WAL</em>、但目標不同；同 PostgreSQL 可同時跑、互不衝突。</p>
<h3 id="跟-monitoring--alert">跟 monitoring + alert</h3>
<p>關鍵 metric：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-sql" data-lang="sql"><span class="line"><span class="ln">1</span><span class="cl"><span class="c1">-- archive 健康度
</span></span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="c1"></span><span class="k">SELECT</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">pg_stat_archiver</span><span class="p">;</span><span class="w">
</span></span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="w"></span><span class="c1">-- archived_count, failed_count, last_archived_wal, last_archived_time
</span></span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="c1"></span><span class="w">
</span></span></span><span class="line"><span class="ln">5</span><span class="cl"><span class="w"></span><span class="c1">-- WAL 在 pg_wal 等待 archive 量
</span></span></span><span class="line"><span class="ln">6</span><span class="cl"><span class="c1"></span><span class="k">SELECT</span><span class="w"> </span><span class="k">count</span><span class="p">(</span><span class="o">*</span><span class="p">)</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">pg_ls_waldir</span><span class="p">()</span><span class="w"> </span><span class="k">WHERE</span><span class="w"> </span><span class="n">name</span><span class="w"> </span><span class="o">~</span><span class="w"> </span><span class="s1">&#39;^[0-9A-F]{24}$&#39;</span><span class="p">;</span><span class="w">
</span></span></span><span class="line"><span class="ln">7</span><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="ln">8</span><span class="cl"><span class="w"></span><span class="c1">-- base backup 上次跑時間
</span></span></span><span class="line"><span class="ln">9</span><span class="cl"><span class="c1">-- (pgBackRest API 或 backup catalog)</span></span></span></code></pre></div><p>Prometheus alert 三條：archive failed_count 增、archive lag &gt; 5min、base backup &gt; 25h 沒跑。</p>
<h3 id="下一步議題">下一步議題</h3>
<ul>
<li><strong>Incremental backup（PG 17+）</strong>：base backup 不全量、只 base + incremental</li>
<li><strong>Block-level differential</strong>：pgBackRest 已支援</li>
<li><strong>Cloud-native 替代</strong>：RDS / Aurora 用 storage-layer snapshot、不走 PITR 鏈</li>
<li><strong><code>pg_dump</code> vs PITR</strong>：pg_dump 是 logical backup（resume to different schema OK）、PITR 是 physical（必須同 version + same arch）</li>
</ul>
<h2 id="相關連結">相關連結</h2>
<ul>
<li>上游 vendor 頁：<a href="/blog/backend/01-database/vendors/postgresql/" data-link-title="PostgreSQL" data-link-desc="多用途 OLTP 主流關聯式資料庫、MVCC、豐富 SQL 特性、是 Aurora / Cosmos DB / Spanner / CockroachDB / Aurora DSQL 的相容目標">PostgreSQL</a></li>
<li>上游 chapter：<a href="/blog/backend/01-database/database-migration-playbook/" data-link-title="1.6 資料庫轉換實作：雙寫、回填、切流與回滾" data-link-desc="同 DB 內 schema 演進與資料變更的可分段驗證流程、跟 1.12 cross-DB migration 分工">Database Migration Playbook</a> — PITR 是 migration 的失敗回退</li>
<li>平行 deep article：<a href="/blog/backend/01-database/vendors/postgresql/patroni-ha/" data-link-title="PostgreSQL Patroni HA：從 leader 失聯到 client 重連的 5 段 failover lifecycle" data-link-desc="Patroni 把 PostgreSQL HA 拆成 detection / election / promotion / reconfiguration / recovery 五段 lifecycle、每段都有獨立配置跟 failure mode；DCS quorum &#43; watchdog 防 split-brain、async/sync replication 取捨、5 個 production 踩雷、跟 PgBouncer / HAProxy / cert-manager 整合">Patroni HA</a> / <a href="/blog/backend/01-database/vendors/postgresql/logical-replication-debezium/" data-link-title="PostgreSQL Logical Replication &#43; Debezium CDC：replication slot × failure × recovery 對照" data-link-desc="PostgreSQL logical replication slot 跟 Debezium CDC 的失效模式對照表：slot lag 撐爆 primary disk / schema change 斷流 / 初始 COPY 鎖表 / zombie slot 不釋放 / replay storm 後 offset reset；publication / subscription / pgoutput 配置、跟 Kafka outbox pattern 整合">Logical Replication + Debezium</a> / <a href="/blog/backend/01-database/vendors/postgresql/autovacuum-tuning/" data-link-title="PostgreSQL autovacuum tuning：為什麼你的 autovacuum 永遠追不上 bloat" data-link-desc="MVCC 怎麼產生 dead tuple、autovacuum cost-based throttle 為什麼預設保守、per-table tuning 怎麼設、5 個 production 踩雷（cost_limit 太低 / 長 transaction blocks vacuum / anti-wraparound 在 peak / partition vacuum 滿 worker / index bloat 沒處理）、跟 partitioning &#43; monitoring 整合">autovacuum tuning</a></li>
<li>Methodology：<a href="/blog/posts/vendor-%E6%B7%B1%E5%BA%A6%E6%8A%80%E8%A1%93%E6%96%87%E7%AB%A0%E6%96%B9%E6%B3%95%E8%AB%96%E7%9A%84%E6%BC%94%E5%8C%96%E7%B4%80%E9%8C%84%E5%90%8C-vendor-%E7%B3%BB%E5%88%97%E7%9A%84%E9%96%8B%E5%A0%B4%E8%BC%AA%E6%9B%BF%E9%A9%97%E8%AD%89/" data-link-title="Vendor 深度技術文章方法論的演化紀錄：同 vendor 系列的開場輪替驗證" data-link-desc="vendor overview 飽和後要寫單一功能深度文章、需要選題與結構依據時回來。這套方法論的驗證來源與 cadence variant 在高風險場景（同 vendor sub-tool 系列）的實證。">Vendor 深度技術文章的寫作方法論</a></li>
</ul>
]]></content:encoded></item><item><title>PostgreSQL PITR Restore Drill</title><link>https://tarrragon.github.io/blog/backend/01-database/vendors/postgresql/hands-on/pitr-restore-drill/</link><pubDate>Fri, 22 May 2026 00:00:00 +0000</pubDate><guid>https://tarrragon.github.io/blog/backend/01-database/vendors/postgresql/hands-on/pitr-restore-drill/</guid><description>&lt;p>PostgreSQL PITR restore drill 的核心責任是證明 backup 可以還原到指定時間點。這篇承接 &lt;a href="../../pitr-wal-archiving/">PITR + WAL Archiving&lt;/a>，把備份從存在狀態推進到可恢復證據。&lt;/p>
&lt;p>本文的驗收標準是：你能記錄 base backup 時間、target time、restore duration、validation query 與 RPO / RTO note。實際命令會依 pgBackRest、Barman、cloud snapshot 或 managed service 而變；本文提供 vendor-neutral drill frame。&lt;/p>
&lt;h2 id="prepare-recovery-point">Prepare Recovery Point&lt;/h2>
&lt;p>Prepare recovery point 的核心責任是建立可辨識 transaction。先寫入一筆 marker，記錄時間。&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-bash" data-lang="bash">&lt;span class="line">&lt;span class="ln"> 1&lt;/span>&lt;span class="cl">psql &lt;span class="s2">&amp;#34;&lt;/span>&lt;span class="nv">$DATABASE_URL&lt;/span>&lt;span class="s2">&amp;#34;&lt;/span> &lt;span class="s">&amp;lt;&amp;lt;&amp;#39;SQL&amp;#39;
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 2&lt;/span>&lt;span class="cl">&lt;span class="s">CREATE TABLE IF NOT EXISTS restore_markers (
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 3&lt;/span>&lt;span class="cl">&lt;span class="s"> id bigserial PRIMARY KEY,
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 4&lt;/span>&lt;span class="cl">&lt;span class="s"> marker text NOT NULL,
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 5&lt;/span>&lt;span class="cl">&lt;span class="s"> created_at timestamptz NOT NULL DEFAULT clock_timestamp()
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 6&lt;/span>&lt;span class="cl">&lt;span class="s">);
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 7&lt;/span>&lt;span class="cl">&lt;span class="s">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 8&lt;/span>&lt;span class="cl">&lt;span class="s">INSERT INTO restore_markers(marker) VALUES (&amp;#39;before-bad-change&amp;#39;);
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 9&lt;/span>&lt;span class="cl">&lt;span class="s">SELECT id, marker, created_at FROM restore_markers ORDER BY id DESC LIMIT 1;
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">10&lt;/span>&lt;span class="cl">&lt;span class="s">SQL&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>把 &lt;code>created_at&lt;/code> 記為 target time。正式 drill 要用 UTC，並記錄 timezone、operator、backup set 與 WAL archive status。&lt;/p>
&lt;h2 id="create-bad-change">Create Bad Change&lt;/h2>
&lt;p>Create bad change 的核心責任是模擬需要 PITR 的錯誤。&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-bash" data-lang="bash">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">psql &lt;span class="s2">&amp;#34;&lt;/span>&lt;span class="nv">$DATABASE_URL&lt;/span>&lt;span class="s2">&amp;#34;&lt;/span> &lt;span class="s">&amp;lt;&amp;lt;&amp;#39;SQL&amp;#39;
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">2&lt;/span>&lt;span class="cl">&lt;span class="s">INSERT INTO restore_markers(marker) VALUES (&amp;#39;bad-change-after-target&amp;#39;);
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">3&lt;/span>&lt;span class="cl">&lt;span class="s">UPDATE accounts SET status = &amp;#39;closed&amp;#39;;
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">4&lt;/span>&lt;span class="cl">&lt;span class="s">SELECT status, count(*) FROM accounts GROUP BY status;
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">5&lt;/span>&lt;span class="cl">&lt;span class="s">SQL&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>這一步在 lab 中代表誤操作。Production 事故中，bad change 可能是誤刪、錯誤 batch、壞 migration 或 application bug。&lt;/p>
&lt;h2 id="restore-workflow">Restore Workflow&lt;/h2>
&lt;p>Restore workflow 的核心責任是把 backup tool 的操作轉成固定 evidence。不同工具命令不同，但流程一致：&lt;/p>
&lt;ol>
&lt;li>選定 base backup。&lt;/li>
&lt;li>設定 recovery target time。&lt;/li>
&lt;li>套用 WAL 到 target time。&lt;/li>
&lt;li>Promote restored instance。&lt;/li>
&lt;li>跑 validation query。&lt;/li>
&lt;li>啟動 application smoke test。&lt;/li>
&lt;/ol>
&lt;p>Example pseudo-runbook：&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-text" data-lang="text">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">restore_target_time = 2026-05-21T10:15:30Z
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">2&lt;/span>&lt;span class="cl">base_backup = latest backup before target
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">3&lt;/span>&lt;span class="cl">wal_archive = available through target
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">4&lt;/span>&lt;span class="cl">restore_path = isolated environment&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>Restore 必須在隔離環境先完成。直接覆蓋 production 會讓 evidence 與 rollback 空間消失。&lt;/p></description><content:encoded><![CDATA[<p>PostgreSQL PITR restore drill 的核心責任是證明 backup 可以還原到指定時間點。這篇承接 <a href="../../pitr-wal-archiving/">PITR + WAL Archiving</a>，把備份從存在狀態推進到可恢復證據。</p>
<p>本文的驗收標準是：你能記錄 base backup 時間、target time、restore duration、validation query 與 RPO / RTO note。實際命令會依 pgBackRest、Barman、cloud snapshot 或 managed service 而變；本文提供 vendor-neutral drill frame。</p>
<h2 id="prepare-recovery-point">Prepare Recovery Point</h2>
<p>Prepare recovery point 的核心責任是建立可辨識 transaction。先寫入一筆 marker，記錄時間。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln"> 1</span><span class="cl">psql <span class="s2">&#34;</span><span class="nv">$DATABASE_URL</span><span class="s2">&#34;</span> <span class="s">&lt;&lt;&#39;SQL&#39;
</span></span></span><span class="line"><span class="ln"> 2</span><span class="cl"><span class="s">CREATE TABLE IF NOT EXISTS restore_markers (
</span></span></span><span class="line"><span class="ln"> 3</span><span class="cl"><span class="s">  id bigserial PRIMARY KEY,
</span></span></span><span class="line"><span class="ln"> 4</span><span class="cl"><span class="s">  marker text NOT NULL,
</span></span></span><span class="line"><span class="ln"> 5</span><span class="cl"><span class="s">  created_at timestamptz NOT NULL DEFAULT clock_timestamp()
</span></span></span><span class="line"><span class="ln"> 6</span><span class="cl"><span class="s">);
</span></span></span><span class="line"><span class="ln"> 7</span><span class="cl"><span class="s">
</span></span></span><span class="line"><span class="ln"> 8</span><span class="cl"><span class="s">INSERT INTO restore_markers(marker) VALUES (&#39;before-bad-change&#39;);
</span></span></span><span class="line"><span class="ln"> 9</span><span class="cl"><span class="s">SELECT id, marker, created_at FROM restore_markers ORDER BY id DESC LIMIT 1;
</span></span></span><span class="line"><span class="ln">10</span><span class="cl"><span class="s">SQL</span></span></span></code></pre></div><p>把 <code>created_at</code> 記為 target time。正式 drill 要用 UTC，並記錄 timezone、operator、backup set 與 WAL archive status。</p>
<h2 id="create-bad-change">Create Bad Change</h2>
<p>Create bad change 的核心責任是模擬需要 PITR 的錯誤。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">psql <span class="s2">&#34;</span><span class="nv">$DATABASE_URL</span><span class="s2">&#34;</span> <span class="s">&lt;&lt;&#39;SQL&#39;
</span></span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="s">INSERT INTO restore_markers(marker) VALUES (&#39;bad-change-after-target&#39;);
</span></span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="s">UPDATE accounts SET status = &#39;closed&#39;;
</span></span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="s">SELECT status, count(*) FROM accounts GROUP BY status;
</span></span></span><span class="line"><span class="ln">5</span><span class="cl"><span class="s">SQL</span></span></span></code></pre></div><p>這一步在 lab 中代表誤操作。Production 事故中，bad change 可能是誤刪、錯誤 batch、壞 migration 或 application bug。</p>
<h2 id="restore-workflow">Restore Workflow</h2>
<p>Restore workflow 的核心責任是把 backup tool 的操作轉成固定 evidence。不同工具命令不同，但流程一致：</p>
<ol>
<li>選定 base backup。</li>
<li>設定 recovery target time。</li>
<li>套用 WAL 到 target time。</li>
<li>Promote restored instance。</li>
<li>跑 validation query。</li>
<li>啟動 application smoke test。</li>
</ol>
<p>Example pseudo-runbook：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="ln">1</span><span class="cl">restore_target_time = 2026-05-21T10:15:30Z
</span></span><span class="line"><span class="ln">2</span><span class="cl">base_backup = latest backup before target
</span></span><span class="line"><span class="ln">3</span><span class="cl">wal_archive = available through target
</span></span><span class="line"><span class="ln">4</span><span class="cl">restore_path = isolated environment</span></span></code></pre></div><p>Restore 必須在隔離環境先完成。直接覆蓋 production 會讓 evidence 與 rollback 空間消失。</p>
<h2 id="validation-query">Validation Query</h2>
<p>Validation query 的核心責任是確認 restore 到正確時間點。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">psql <span class="s2">&#34;</span><span class="nv">$RESTORED_DATABASE_URL</span><span class="s2">&#34;</span> <span class="s">&lt;&lt;&#39;SQL&#39;
</span></span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="s">SELECT marker, created_at
</span></span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="s">FROM restore_markers
</span></span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="s">ORDER BY id;
</span></span></span><span class="line"><span class="ln">5</span><span class="cl"><span class="s">
</span></span></span><span class="line"><span class="ln">6</span><span class="cl"><span class="s">SELECT status, count(*)
</span></span></span><span class="line"><span class="ln">7</span><span class="cl"><span class="s">FROM accounts
</span></span></span><span class="line"><span class="ln">8</span><span class="cl"><span class="s">GROUP BY status;
</span></span></span><span class="line"><span class="ln">9</span><span class="cl"><span class="s">SQL</span></span></span></code></pre></div><p>預期結果是存在 <code>before-bad-change</code>，且 <code>bad-change-after-target</code> 尚未出現。<code>accounts</code> 狀態應維持 target time 前的分布。</p>
<h2 id="rpo--rto-evidence">RPO / RTO Evidence</h2>
<p>RPO / RTO evidence 的核心責任是把 drill 結果轉成服務語言。</p>
<table>
  <thead>
      <tr>
          <th>Evidence</th>
          <th>記錄內容</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>Backup timestamp</td>
          <td>使用哪份 base backup</td>
      </tr>
      <tr>
          <td>Target time</td>
          <td>要恢復到哪一秒</td>
      </tr>
      <tr>
          <td>WAL availability</td>
          <td>target time 前後 WAL 是否完整</td>
      </tr>
      <tr>
          <td>Restore duration</td>
          <td>從開始 restore 到 validation 成功</td>
      </tr>
      <tr>
          <td>Data gap</td>
          <td>target time 後需補償的 transaction</td>
      </tr>
      <tr>
          <td>Smoke test</td>
          <td>application 核心 workflow 是否可用</td>
      </tr>
  </tbody>
</table>
<p>PITR 的成功標準是資料與 application 都可用。只讓 PostgreSQL 啟動成功，還不足以交付服務。</p>
<h2 id="drill-retrospective">Drill Retrospective</h2>
<p>Drill retrospective 的核心責任是把演練缺口轉成下一步。</p>
<p>常見缺口：</p>
<ol>
<li>找不到正確 base backup。</li>
<li>WAL archive 缺段。</li>
<li>target time timezone 混亂。</li>
<li>Restore 太慢，超過 RTO。</li>
<li>Application secret / config 指不到 restored DB。</li>
<li>Validation query 缺少 business invariant。</li>
</ol>
<p>完成本篇後，跨區恢復讀 <a href="../../cross-region-dr/">Cross-region DR</a>；備份策略讀 <a href="../../pitr-wal-archiving/">PITR + WAL Archiving</a>。</p>
]]></content:encoded></item></channel></rss>