<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/"><channel><title>Streaming-Replication on Tarragon</title><link>https://tarrragon.github.io/blog/tags/streaming-replication/</link><description>Recent content in Streaming-Replication on Tarragon</description><generator>Hugo -- gohugo.io</generator><language>zh-TW</language><copyright>Tarragon (CC BY 4.0)</copyright><lastBuildDate>Tue, 19 May 2026 00:00:00 +0000</lastBuildDate><atom:link href="https://tarrragon.github.io/blog/tags/streaming-replication/index.xml" rel="self" type="application/rss+xml"/><item><title>PostgreSQL Replication Topology：async / sync / quorum 三模式跟 LSN + replication slot 的三軸組合</title><link>https://tarrragon.github.io/blog/backend/01-database/vendors/postgresql/replication-topology/</link><pubDate>Tue, 19 May 2026 00:00:00 +0000</pubDate><guid>https://tarrragon.github.io/blog/backend/01-database/vendors/postgresql/replication-topology/</guid><description>&lt;blockquote>
&lt;p>本文是 &lt;a href="https://tarrragon.github.io/blog/backend/01-database/vendors/postgresql/" data-link-title="PostgreSQL" data-link-desc="多用途 OLTP 主流關聯式資料庫、MVCC、豐富 SQL 特性、是 Aurora / Cosmos DB / Spanner / CockroachDB / Aurora DSQL 的相容目標">PostgreSQL&lt;/a> overview 的 implementation-layer deep article。Overview 已說明 PG 在 OLTP 譜系的定位、本文聚焦 &lt;em>streaming replication topology&lt;/em> — 從 single primary 到 multi-standby 部署的 3 個 trade-off 軸 + LSN + replication slot 機制。&lt;/p>&lt;/blockquote>
&lt;hr>
&lt;h2 id="replication-的-3-個-trade-off-軸--mode-選擇">Replication 的 3 個 trade-off 軸 + mode 選擇&lt;/h2>
&lt;p>PG streaming replication mode 選擇看起來是「async 還是 sync」、實際是 3 個獨立 trade-off 軸的組合、async / sync / quorum-based sync 是這些軸的常見組合 &lt;em>名稱&lt;/em>：&lt;/p>
&lt;table>
 &lt;thead>
 &lt;tr>
 &lt;th>軸&lt;/th>
 &lt;th>端 A&lt;/th>
 &lt;th>端 B&lt;/th>
 &lt;th>PG 旋鈕&lt;/th>
 &lt;/tr>
 &lt;/thead>
 &lt;tbody>
 &lt;tr>
 &lt;td>&lt;strong>Durability&lt;/strong>&lt;/td>
 &lt;td>primary 寫完就 commit&lt;/td>
 &lt;td>至少一個 standby 收到才 commit&lt;/td>
 &lt;td>&lt;code>synchronous_commit&lt;/code> / &lt;code>synchronous_standby_names&lt;/code>&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>&lt;strong>Latency&lt;/strong>&lt;/td>
 &lt;td>client 等 primary 寫完 OK&lt;/td>
 &lt;td>client 等 standby ack（額外 RTT）&lt;/td>
 &lt;td>同上&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>&lt;strong>Consistency&lt;/strong>&lt;/td>
 &lt;td>standby 隨時可能 stale&lt;/td>
 &lt;td>standby 跟 primary 保證讀到一致&lt;/td>
 &lt;td>application read routing rule（不是 replication 旋鈕）&lt;/td>
 &lt;/tr>
 &lt;/tbody>
&lt;/table>
&lt;p>跟這三軸獨立的、是 &lt;em>replication 機制本身的可維護性&lt;/em>：&lt;/p>
&lt;ul>
&lt;li>&lt;strong>LSN（Log Sequence Number）&lt;/strong>：PG 用全域 byte offset 標 WAL 進度、所有 standby 同步用 LSN 對齊、不像 MySQL 早期 binlog position + file 雙欄&lt;/li>
&lt;li>&lt;strong>Replication slot&lt;/strong>：primary 紀錄每個 standby 已接收的 LSN、防 standby 失聯期間 WAL 被清掉、是 streaming replication 的 &lt;em>持久化進度追蹤&lt;/em>&lt;/li>
&lt;/ul>
&lt;p>跟 &lt;a href="https://tarrragon.github.io/blog/backend/01-database/vendors/mysql/replication-topology/" data-link-title="MySQL Replication Topology：async / semi-sync / GTID 不是三選一、是三個 trade-off 軸的疊加" data-link-desc="MySQL replication 不是「選 async 還是 semi-sync」、是 *durability / latency / consistency* 三個 trade-off 軸的疊加；GTID 是跨 mode 的 infrastructure layer、不是第三種 mode。本文走 3 軸取捨模型 → async / semi-sync 行為對比 → GTID 替代 binlog-position 的好處 → 配置 step-by-step → 5 production 踩雷（lag 暴衝 / semi-sync 退回 async / GTID gap / Loss-Less semi-sync 真的 loss-less / chained replication 雪崩）→ 跟 Aurora MySQL / Vitess / ProxySQL / Orchestrator 整合">MySQL Replication Topology&lt;/a> 對比、PG 的 LSN + replication slot 直接內建 &lt;em>standby 進度追蹤&lt;/em>、不像 MySQL 5.7- 要靠 binlog position + GTID 雙機制；但 slot 是 &lt;em>primary 紀錄&lt;/em>、orphan slot 是 PG-specific 議題（slot 留 WAL 直到 standby 重連、standby 永久失聯 → primary disk 爆）。&lt;/p></description><content:encoded><![CDATA[<blockquote>
<p>本文是 <a href="/blog/backend/01-database/vendors/postgresql/" data-link-title="PostgreSQL" data-link-desc="多用途 OLTP 主流關聯式資料庫、MVCC、豐富 SQL 特性、是 Aurora / Cosmos DB / Spanner / CockroachDB / Aurora DSQL 的相容目標">PostgreSQL</a> overview 的 implementation-layer deep article。Overview 已說明 PG 在 OLTP 譜系的定位、本文聚焦 <em>streaming replication topology</em> — 從 single primary 到 multi-standby 部署的 3 個 trade-off 軸 + LSN + replication slot 機制。</p></blockquote>
<hr>
<h2 id="replication-的-3-個-trade-off-軸--mode-選擇">Replication 的 3 個 trade-off 軸 + mode 選擇</h2>
<p>PG streaming replication mode 選擇看起來是「async 還是 sync」、實際是 3 個獨立 trade-off 軸的組合、async / sync / quorum-based sync 是這些軸的常見組合 <em>名稱</em>：</p>
<table>
  <thead>
      <tr>
          <th>軸</th>
          <th>端 A</th>
          <th>端 B</th>
          <th>PG 旋鈕</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td><strong>Durability</strong></td>
          <td>primary 寫完就 commit</td>
          <td>至少一個 standby 收到才 commit</td>
          <td><code>synchronous_commit</code> / <code>synchronous_standby_names</code></td>
      </tr>
      <tr>
          <td><strong>Latency</strong></td>
          <td>client 等 primary 寫完 OK</td>
          <td>client 等 standby ack（額外 RTT）</td>
          <td>同上</td>
      </tr>
      <tr>
          <td><strong>Consistency</strong></td>
          <td>standby 隨時可能 stale</td>
          <td>standby 跟 primary 保證讀到一致</td>
          <td>application read routing rule（不是 replication 旋鈕）</td>
      </tr>
  </tbody>
</table>
<p>跟這三軸獨立的、是 <em>replication 機制本身的可維護性</em>：</p>
<ul>
<li><strong>LSN（Log Sequence Number）</strong>：PG 用全域 byte offset 標 WAL 進度、所有 standby 同步用 LSN 對齊、不像 MySQL 早期 binlog position + file 雙欄</li>
<li><strong>Replication slot</strong>：primary 紀錄每個 standby 已接收的 LSN、防 standby 失聯期間 WAL 被清掉、是 streaming replication 的 <em>持久化進度追蹤</em></li>
</ul>
<p>跟 <a href="/blog/backend/01-database/vendors/mysql/replication-topology/" data-link-title="MySQL Replication Topology：async / semi-sync / GTID 不是三選一、是三個 trade-off 軸的疊加" data-link-desc="MySQL replication 不是「選 async 還是 semi-sync」、是 *durability / latency / consistency* 三個 trade-off 軸的疊加；GTID 是跨 mode 的 infrastructure layer、不是第三種 mode。本文走 3 軸取捨模型 → async / semi-sync 行為對比 → GTID 替代 binlog-position 的好處 → 配置 step-by-step → 5 production 踩雷（lag 暴衝 / semi-sync 退回 async / GTID gap / Loss-Less semi-sync 真的 loss-less / chained replication 雪崩）→ 跟 Aurora MySQL / Vitess / ProxySQL / Orchestrator 整合">MySQL Replication Topology</a> 對比、PG 的 LSN + replication slot 直接內建 <em>standby 進度追蹤</em>、不像 MySQL 5.7- 要靠 binlog position + GTID 雙機制；但 slot 是 <em>primary 紀錄</em>、orphan slot 是 PG-specific 議題（slot 留 WAL 直到 standby 重連、standby 永久失聯 → primary disk 爆）。</p>
<h2 id="async-streamingdefault--高-throughput-的代價">Async streaming：default + 高 throughput 的代價</h2>
<p>Async 是 PG 預設、行為：</p>
<ol>
<li>Primary 寫 WAL 進 <code>pg_wal/</code> 目錄、commit、回應 client OK</li>
<li>WAL sender process 把 WAL stream 給 standby</li>
<li>Standby WAL receiver 寫 standby 的 <code>pg_wal/</code>、startup 進程 redo 套用</li>
</ol>
<p><strong>Trade-off</strong>：</p>
<ul>
<li>Durability：primary commit 後 standby 還沒收 → primary 永久故障 → <em>data loss</em>（已 commit 的 transaction 在 standby 不存在）</li>
<li>Latency：client 寫入延遲 = primary 自身 fsync WAL 的時間（<code>fsync=on</code> + <code>synchronous_commit=on</code> 預設、通常 &lt; 1ms 在 SSD / NVMe）</li>
<li>Consistency：standby 可能 lag、application 讀 standby 會 stale；用 <code>pg_stat_replication.write_lag / flush_lag / replay_lag</code> 看</li>
</ul>
<p><strong>配置</strong>：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-ini" data-lang="ini"><span class="line"><span class="ln">1</span><span class="cl"><span class="c1"># postgresql.conf on primary</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="na">wal_level</span> <span class="o">=</span> <span class="s">replica          # 至少 replica（logical 是 superset）</span>
</span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="na">max_wal_senders</span> <span class="o">=</span> <span class="s">10         # 並行 WAL sender process 數（依 standby 數量）</span>
</span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="na">wal_keep_size</span> <span class="o">=</span> <span class="s">1024MB       # WAL 保留量（slot 為主、但 backup buffer）</span>
</span></span><span class="line"><span class="ln">5</span><span class="cl"><span class="na">synchronous_commit</span> <span class="o">=</span> <span class="s">on      # 預設、primary 自己 fsync WAL</span>
</span></span><span class="line"><span class="ln">6</span><span class="cl"><span class="c1"># synchronous_standby_names 留空 = async</span></span></span></code></pre></div><p><strong>適用</strong>：</p>
<ul>
<li>主流選擇（90% 場景）</li>
<li>Failover loss 在容忍範圍（多數 web 應用容忍 1-2 秒 data loss）</li>
<li>Read scaling 為主要 driver、絕對 durability 非首要</li>
</ul>
<h2 id="sync-streaming至少一個-standby-flush-wal-才-commit">Sync streaming：至少一個 standby flush WAL 才 commit</h2>
<p>Sync mode 在 async 基礎上加 <em>primary 等指定 standby flush WAL 才回 client</em>：</p>
<ol>
<li>Primary 寫 WAL、send to standby</li>
<li>Standby 收到 WAL、寫進 <code>pg_wal/</code>、fsync、回 ack</li>
<li><em>Primary 等 ack</em> → commit → 回 client</li>
</ol>
<p><code>synchronous_commit</code> 有 5 個 level、不是 binary：</p>
<table>
  <thead>
      <tr>
          <th>Level</th>
          <th>行為</th>
          <th>Latency 影響</th>
          <th>Crash data loss</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td><code>off</code></td>
          <td>primary 不等自己 fsync、background flush</td>
          <td>+0</td>
          <td>primary crash 丟 0-1 秒</td>
      </tr>
      <tr>
          <td><code>local</code></td>
          <td>primary fsync own WAL（不等 standby）</td>
          <td>baseline</td>
          <td>primary crash 0、standby 丟</td>
      </tr>
      <tr>
          <td><code>remote_write</code></td>
          <td>primary fsync + standby 收到（不必 standby fsync）</td>
          <td>+1 RTT 大致</td>
          <td>OS crash on standby 丟</td>
      </tr>
      <tr>
          <td><code>on</code> (預設)</td>
          <td>primary fsync + standby fsync（standby 收進 disk）</td>
          <td>+1 RTT + fsync</td>
          <td>全 crash 都不丟</td>
      </tr>
      <tr>
          <td><code>remote_apply</code></td>
          <td>primary fsync + standby fsync + standby 已 <em>replay</em>（visible to read）</td>
          <td>+1 RTT + fsync + replay</td>
          <td>全 crash 都不丟 + replica 立刻可讀</td>
      </tr>
  </tbody>
</table>
<p><strong>配置（synchronous）</strong>：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-ini" data-lang="ini"><span class="line"><span class="ln">1</span><span class="cl"><span class="na">synchronous_commit</span> <span class="o">=</span> <span class="s">on</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="na">synchronous_standby_names</span> <span class="o">=</span> <span class="s">&#39;FIRST 1 (standby1, standby2)&#39;</span>
</span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="c1"># &#39;FIRST 1&#39; = 第一個 active standby ack 即可</span>
</span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="c1"># &#39;ANY 2 (s1, s2, s3)&#39; = 任 2 個 ack 即可（quorum-based）</span></span></span></code></pre></div><p><strong>Quorum-based sync</strong>：用 <code>ANY N</code> 語法、達到 N 個 ack 就 commit、提高 latency stability（不依賴特定 standby）：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-ini" data-lang="ini"><span class="line"><span class="ln">1</span><span class="cl"><span class="na">synchronous_standby_names</span> <span class="o">=</span> <span class="s">&#39;ANY 2 (standby1, standby2, standby3)&#39;</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="c1"># 3 個 standby 中任 2 個 ack 即 commit</span></span></span></code></pre></div><p><strong>適用</strong>：</p>
<ul>
<li>金融交易 / 訂單 / payment ledger（不允許 data loss）</li>
<li>已有 multi-AZ deploy、replica 物理上可靠</li>
<li>可接受寫入延遲 +1-3ms (跨 AZ)</li>
</ul>
<p><strong>不適用</strong>：</p>
<ul>
<li>跨 region sync（RTT 50-200ms）— 寫吞吐砍半、改用 <em>region-local sync + cross-region async</em></li>
<li>寫吞吐 &gt; 50K WPS + 容忍 sub-second loss — async 即可</li>
</ul>
<h2 id="lsn--replication-slotpg-的進度追蹤機制">LSN + Replication Slot：PG 的進度追蹤機制</h2>
<p>PG 每個 WAL 寫入都標 <em>LSN</em>（64-bit byte offset）。Standby 紀錄 <em>已收到 / 已 flush / 已 replay</em> 的 LSN、primary 透過 streaming protocol 知道每個 standby 進度。</p>
<p><strong>Replication slot</strong> 是 <em>primary 端的 standby 進度紀錄</em>：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-sql" data-lang="sql"><span class="line"><span class="ln">1</span><span class="cl"><span class="c1">-- 建 physical replication slot（給 streaming replication 用）
</span></span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="c1"></span><span class="k">SELECT</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">pg_create_physical_replication_slot</span><span class="p">(</span><span class="s1">&#39;standby1_slot&#39;</span><span class="p">);</span><span class="w">
</span></span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="w"></span><span class="c1">-- 查 slot 狀態
</span></span></span><span class="line"><span class="ln">5</span><span class="cl"><span class="c1"></span><span class="k">SELECT</span><span class="w"> </span><span class="n">slot_name</span><span class="p">,</span><span class="w"> </span><span class="n">active</span><span class="p">,</span><span class="w"> </span><span class="n">restart_lsn</span><span class="p">,</span><span class="w"> </span><span class="n">confirmed_flush_lsn</span><span class="p">,</span><span class="w">
</span></span></span><span class="line"><span class="ln">6</span><span class="cl"><span class="w">       </span><span class="n">pg_size_pretty</span><span class="p">(</span><span class="n">pg_wal_lsn_diff</span><span class="p">(</span><span class="n">pg_current_wal_lsn</span><span class="p">(),</span><span class="w"> </span><span class="n">restart_lsn</span><span class="p">))</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">lag</span><span class="w">
</span></span></span><span class="line"><span class="ln">7</span><span class="cl"><span class="w"></span><span class="k">FROM</span><span class="w"> </span><span class="n">pg_replication_slots</span><span class="p">;</span></span></span></code></pre></div><p><strong>Slot 的核心責任</strong>：</p>
<ul>
<li><em>防 WAL premature deletion</em>：standby 失聯（restart / network blip）、primary 仍保留 slot 對應 LSN 之後的 WAL、standby 重連可繼續 stream</li>
<li><em>無需 base backup re-build</em>：跟沒 slot 的 standby 對比、有 slot 的 standby 失聯後重連、不用重建</li>
</ul>
<p><strong>Slot 跟 <code>wal_keep_size</code></strong>：</p>
<ul>
<li><code>wal_keep_size</code>（PG 13+）/ <code>wal_keep_segments</code>（&lt; 13）：minimum WAL 保留量、不依賴 slot</li>
<li>Slot 是 <em>動態保留</em>：直到 slot 的 standby 推進 LSN 才釋放對應 WAL</li>
<li>兩者組合：<code>wal_keep_size</code> 是底線、slot 是 standby-specific 動態保留</li>
</ul>
<p><strong>Standby 配置（用 slot）</strong>：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-ini" data-lang="ini"><span class="line"><span class="ln">1</span><span class="cl"><span class="c1"># standby1 postgresql.conf</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="na">primary_conninfo</span> <span class="o">=</span> <span class="s">&#39;host=primary.example.com port=5432 user=replication password=...&#39;</span>
</span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="na">primary_slot_name</span> <span class="o">=</span> <span class="s">&#39;standby1_slot&#39;   # 用 primary 上預先建的 slot</span>
</span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="na">hot_standby</span> <span class="o">=</span> <span class="s">on                       # 讓 standby 接受 read query</span></span></span></code></pre></div><p><code>standby.signal</code> 空檔案在 PG_DATA 內、告訴 PG 這是 standby、進入 recovery mode。</p>
<h2 id="配置-step-by-stepsync-streaming--slot">配置 step-by-step（sync streaming + slot）</h2>
<p>實務最常見組合：sync streaming + replication slot + cross-AZ replica。</p>
<h3 id="step-1primary-配置">Step 1：Primary 配置</h3>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-ini" data-lang="ini"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="c1"># postgresql.conf</span>
</span></span><span class="line"><span class="ln"> 2</span><span class="cl"><span class="na">wal_level</span> <span class="o">=</span> <span class="s">replica</span>
</span></span><span class="line"><span class="ln"> 3</span><span class="cl"><span class="na">max_wal_senders</span> <span class="o">=</span> <span class="s">10</span>
</span></span><span class="line"><span class="ln"> 4</span><span class="cl"><span class="na">max_replication_slots</span> <span class="o">=</span> <span class="s">10</span>
</span></span><span class="line"><span class="ln"> 5</span><span class="cl"><span class="na">synchronous_commit</span> <span class="o">=</span> <span class="s">on</span>
</span></span><span class="line"><span class="ln"> 6</span><span class="cl"><span class="na">synchronous_standby_names</span> <span class="o">=</span> <span class="s">&#39;FIRST 1 (standby1, standby2)&#39;</span>
</span></span><span class="line"><span class="ln"> 7</span><span class="cl"><span class="na">wal_keep_size</span> <span class="o">=</span> <span class="s">1024MB</span>
</span></span><span class="line"><span class="ln"> 8</span><span class="cl">
</span></span><span class="line"><span class="ln"> 9</span><span class="cl"><span class="c1"># pg_hba.conf — 允許 replication 連線</span>
</span></span><span class="line"><span class="ln">10</span><span class="cl"><span class="na">host replication replication 10.0.0.0/16 scram-sha-256</span></span></span></code></pre></div><p>Restart primary 套用。</p>
<h3 id="step-2建-replication-user--slot">Step 2：建 replication user + slot</h3>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-sql" data-lang="sql"><span class="line"><span class="ln">1</span><span class="cl"><span class="k">CREATE</span><span class="w"> </span><span class="k">USER</span><span class="w"> </span><span class="n">replication</span><span class="w"> </span><span class="k">WITH</span><span class="w"> </span><span class="n">REPLICATION</span><span class="w"> </span><span class="n">PASSWORD</span><span class="w"> </span><span class="s1">&#39;...&#39;</span><span class="p">;</span><span class="w">
</span></span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="w"></span><span class="k">SELECT</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">pg_create_physical_replication_slot</span><span class="p">(</span><span class="s1">&#39;standby1_slot&#39;</span><span class="p">);</span><span class="w">
</span></span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="w"></span><span class="k">SELECT</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">pg_create_physical_replication_slot</span><span class="p">(</span><span class="s1">&#39;standby2_slot&#39;</span><span class="p">);</span></span></span></code></pre></div><h3 id="step-3standby-base-backup">Step 3：Standby base backup</h3>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl"><span class="c1"># 在 standby 上跑</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl">pg_basebackup -h primary.example.com -D /var/lib/postgresql/data <span class="se">\
</span></span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="se"></span>  -U replication -P -X stream <span class="se">\
</span></span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="se"></span>  -S standby1_slot -R
</span></span><span class="line"><span class="ln">5</span><span class="cl"><span class="c1"># -R: 自動生成 standby.signal + primary_conninfo</span>
</span></span><span class="line"><span class="ln">6</span><span class="cl"><span class="c1"># -X stream: 邊 backup 邊 stream 增量 WAL（避免 backup 期間 WAL gap）</span></span></span></code></pre></div><h3 id="step-4standby-啟動">Step 4：Standby 啟動</h3>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl"><span class="c1"># standby /var/lib/postgresql/data/postgresql.auto.conf 已有：</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="c1"># primary_conninfo = &#39;host=primary.example.com user=replication password=... application_name=standby1&#39;</span>
</span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="c1"># primary_slot_name = &#39;standby1_slot&#39;</span>
</span></span><span class="line"><span class="ln">4</span><span class="cl">
</span></span><span class="line"><span class="ln">5</span><span class="cl">pg_ctl -D /var/lib/postgresql/data start</span></span></code></pre></div><h3 id="step-5驗證">Step 5：驗證</h3>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-sql" data-lang="sql"><span class="line"><span class="ln">1</span><span class="cl"><span class="c1">-- Primary: 確認 standby 連上
</span></span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="c1"></span><span class="k">SELECT</span><span class="w"> </span><span class="n">application_name</span><span class="p">,</span><span class="w"> </span><span class="k">state</span><span class="p">,</span><span class="w"> </span><span class="n">sync_state</span><span class="p">,</span><span class="w"> </span><span class="n">write_lag</span><span class="p">,</span><span class="w"> </span><span class="n">flush_lag</span><span class="p">,</span><span class="w"> </span><span class="n">replay_lag</span><span class="w">
</span></span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="w"></span><span class="k">FROM</span><span class="w"> </span><span class="n">pg_stat_replication</span><span class="p">;</span><span class="w">
</span></span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="w"></span><span class="c1">-- 應顯示 standby1 / streaming / sync / 各 lag
</span></span></span><span class="line"><span class="ln">5</span><span class="cl"><span class="c1"></span><span class="w">
</span></span></span><span class="line"><span class="ln">6</span><span class="cl"><span class="w"></span><span class="c1">-- Standby: 確認在 recovery + 收到 WAL
</span></span></span><span class="line"><span class="ln">7</span><span class="cl"><span class="c1"></span><span class="k">SELECT</span><span class="w"> </span><span class="n">pg_is_in_recovery</span><span class="p">(),</span><span class="w"> </span><span class="n">pg_last_wal_receive_lsn</span><span class="p">(),</span><span class="w"> </span><span class="n">pg_last_wal_replay_lsn</span><span class="p">();</span></span></span></code></pre></div><h2 id="5-個-production-踩雷">5 個 Production 踩雷</h2>
<h3 id="1-standby-lag-暴衝--single-replay-process-bottleneck">1. Standby lag 暴衝 — Single replay process bottleneck</h3>
<p>PG standby 是 <em>single startup process</em> 套用 WAL（不像 MySQL multi-thread replication）、primary 高並發寫入時 standby 跟不上、lag 從 &lt; 100ms 飆到分鐘級。常見觸發：批次 UPDATE / DELETE、大 transaction、index 建立、autovacuum 大量 dead tuple cleanup。</p>
<p>修法：</p>
<ul>
<li><em>Parallel WAL apply</em>（PG 14+）：<code>max_parallel_workers_per_gather</code> 增加 background worker、但仍受 startup process 主導</li>
<li>對 <em>read scaling</em> 場景接受 standby lag、application 用 <em>primary read 對 latency-critical query</em></li>
<li><em>Cascading replication</em> 對 high-fan-out 解決 sender CPU bottleneck、但 standby replay 仍 single-thread</li>
</ul>
<p>監控：<code>pg_stat_replication.replay_lag</code> 是 <em>最後一個 commit 到 standby replay 的時間差</em>、超過 threshold 即告警。</p>
<h3 id="2-sync-standby-失聯時-primary-commit-卡住">2. Sync standby 失聯時 primary commit 卡住</h3>
<p><code>synchronous_standby_names = 'FIRST 1 (standby1)'</code> + standby1 down → primary commit <em>等永遠</em>。Application 全部 timeout。</p>
<p>修法：</p>
<ul>
<li>用 <code>ANY N</code> quorum：<code>synchronous_standby_names = 'ANY 1 (standby1, standby2)'</code> — 任一 standby ack 即可</li>
<li>設多 standby、防單一失聯</li>
<li>監控 sync standby 健康、自動 failover 切 sync mode 到其他 standby（Patroni 自動做）</li>
<li>緊急情況：在 primary 跑 <code>ALTER SYSTEM SET synchronous_standby_names = ''; SELECT pg_reload_conf();</code> 暫時退 async（接受 data loss risk）</li>
</ul>
<h3 id="3-orphan-replication-slot--primary-disk-爆">3. Orphan replication slot — Primary disk 爆</h3>
<p>Standby 失聯（永久故障 / 重 decommission 但忘了 drop slot）、primary slot 持續保留 WAL、<code>pg_wal/</code> 累積到 disk 滿、primary 也掛。</p>
<p>修法：</p>
<ul>
<li>
<p>監控 <code>pg_replication_slots.active</code> — <code>false</code> 持續 &gt; N 小時是警訊</p>
</li>
<li>
<p>監控 slot lag：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-sql" data-lang="sql"><span class="line"><span class="ln">1</span><span class="cl"><span class="k">SELECT</span><span class="w"> </span><span class="n">slot_name</span><span class="p">,</span><span class="w"> </span><span class="n">active</span><span class="p">,</span><span class="w">
</span></span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="w">       </span><span class="n">pg_size_pretty</span><span class="p">(</span><span class="n">pg_wal_lsn_diff</span><span class="p">(</span><span class="n">pg_current_wal_lsn</span><span class="p">(),</span><span class="w"> </span><span class="n">restart_lsn</span><span class="p">))</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">retained_wal</span><span class="w">
</span></span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="w"></span><span class="k">FROM</span><span class="w"> </span><span class="n">pg_replication_slots</span><span class="w"> </span><span class="k">WHERE</span><span class="w"> </span><span class="n">retained_wal</span><span class="w"> </span><span class="o">&gt;</span><span class="w"> </span><span class="mi">10</span><span class="n">GB</span><span class="p">;</span></span></span></code></pre></div></li>
<li>
<p>設 <code>max_slot_wal_keep_size</code>（PG 13+）— slot 對應 WAL 超過 limit 自動 invalidate slot（standby 之後要 base backup 重來）</p>
</li>
<li>
<p>DR runbook 紀錄 <em>standby 退役流程</em> 必須包含 <code>pg_drop_replication_slot('xxx')</code></p>
</li>
</ul>
<h3 id="4-cascading-replication-雪崩">4. Cascading replication 雪崩</h3>
<p>Topology <code>primary → standby1 → standby2 → ...</code>（每層遞迴 stream）。Standby1 startup process 卡住、後續 standby 都被 block、整條 chain 雪崩。</p>
<p>修法：</p>
<ul>
<li>避免超過 2 層 cascade（primary → tier1 → tier2 是上限）</li>
<li>跨 region 用 <em>region-local tier1 + cross-region tier2</em>、不是長 chain</li>
<li>真的大規模、改用 <em>binlog server</em> style：<a href="https://github.com/postgresml/PgCat">Citus / PgCat</a> 等中介、或 logical replication 解耦</li>
</ul>
<h3 id="5-failover-後-timeline-分歧">5. Failover 後 timeline 分歧</h3>
<p>Primary 失敗、standby1 promote 為新 primary、其他 standby（standby2 / 3）原本連舊 primary、必須重新連 standby1。但 PG 用 <em>timeline</em>（每次 promotion 增 1）標 WAL 分支、原 standby 的 timeline 跟新 primary 不同。重連時看到 timeline mismatch、報錯。</p>
<p>修法：</p>
<ul>
<li><em>pg_rewind</em> 工具：對比新 primary 跟舊 standby 的 timeline 分歧點、把舊 standby 上 <em>新 primary 沒有的 WAL</em> 倒退、然後從分歧點重新跟新 primary 同步</li>
<li><em>Base backup re-build</em>：對舊 standby 重建 — 慢但保證乾淨</li>
<li><a href="/blog/backend/01-database/vendors/postgresql/patroni-ha/" data-link-title="PostgreSQL Patroni HA：從 leader 失聯到 client 重連的 5 段 failover lifecycle" data-link-desc="Patroni 把 PostgreSQL HA 拆成 detection / election / promotion / reconfiguration / recovery 五段 lifecycle、每段都有獨立配置跟 failure mode；DCS quorum &#43; watchdog 防 split-brain、async/sync replication 取捨、5 個 production 踩雷、跟 PgBouncer / HAProxy / cert-manager 整合">Patroni</a> 自動處理 pg_rewind / base backup 選擇</li>
</ul>
<h2 id="容量--cost-對照">容量 / cost 對照</h2>
<table>
  <thead>
      <tr>
          <th>配置</th>
          <th>寫吞吐影響</th>
          <th>Standby overhead</th>
          <th>適合 workload</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>Async streaming + slot</td>
          <td>baseline</td>
          <td>低（WAL receive + startup）</td>
          <td>高吞吐、容忍 sub-second loss</td>
      </tr>
      <tr>
          <td>Sync <code>remote_write</code> + 1 standby</td>
          <td>-5% ~ -10%</td>
          <td>同上 + RTT</td>
          <td>一般 production、可接受 OS crash 丟</td>
      </tr>
      <tr>
          <td>Sync <code>on</code> + 1 standby</td>
          <td>-10% ~ -20%</td>
          <td>同上 + fsync</td>
          <td>金融、訂單、不容忍 data loss</td>
      </tr>
      <tr>
          <td>Sync <code>on</code> + ANY 2 quorum</td>
          <td>-15% ~ -30%</td>
          <td>同上、跨 AZ</td>
          <td>強 durability + multi-AZ HA</td>
      </tr>
      <tr>
          <td>Sync <code>remote_apply</code> + 1 standby</td>
          <td>-20% ~ -40%</td>
          <td>同上 + replay</td>
          <td>強一致 read on standby（少用、成本高）</td>
      </tr>
  </tbody>
</table>
<p>跨 AZ sync 通常加 1-3ms、跨 region 加 50-200ms — 寫密集 workload 跨 region sync 通常不划算、改用 <em>region-local sync + cross-region async chain</em>。</p>
<h2 id="整合--下一步">整合 / 下一步</h2>
<h3 id="patroni-ha">Patroni HA</h3>
<p><a href="/blog/backend/01-database/vendors/postgresql/patroni-ha/" data-link-title="PostgreSQL Patroni HA：從 leader 失聯到 client 重連的 5 段 failover lifecycle" data-link-desc="Patroni 把 PostgreSQL HA 拆成 detection / election / promotion / reconfiguration / recovery 五段 lifecycle、每段都有獨立配置跟 failure mode；DCS quorum &#43; watchdog 防 split-brain、async/sync replication 取捨、5 個 production 踩雷、跟 PgBouncer / HAProxy / cert-manager 整合">Patroni</a> 是 PG HA 自動 failover 標準、依賴 DCS（etcd / Consul）+ 本文 replication topology。Patroni 自動：</p>
<ul>
<li>偵測 primary 失聯、promote 適合 standby</li>
<li>處理 timeline 分歧（pg_rewind）</li>
<li>重配 sync standby（避免 sync standby 失聯卡 primary）</li>
</ul>
<h3 id="logical-replication--debezium">Logical Replication + Debezium</h3>
<p><a href="/blog/backend/01-database/vendors/postgresql/logical-replication-debezium/" data-link-title="PostgreSQL Logical Replication &#43; Debezium CDC：replication slot × failure × recovery 對照" data-link-desc="PostgreSQL logical replication slot 跟 Debezium CDC 的失效模式對照表：slot lag 撐爆 primary disk / schema change 斷流 / 初始 COPY 鎖表 / zombie slot 不釋放 / replay storm 後 offset reset；publication / subscription / pgoutput 配置、跟 Kafka outbox pattern 整合">Logical replication + Debezium</a> 是 <em>跟 streaming replication 共用 WAL</em> 但不同 abstraction — logical decoding output event、streaming replication output physical bytes。Logical replication slot 跟 physical slot 共存、各自獨立 retention。</p>
<h3 id="pitr--wal-archiving">PITR + WAL Archiving</h3>
<p><a href="/blog/backend/01-database/vendors/postgresql/pitr-wal-archiving/" data-link-title="PostgreSQL PITR &#43; WAL archiving：從 base backup 到 point-in-time recovery 的完整鏈" data-link-desc="Base backup &#43; WAL archive 構成 PITR 的雙軌資料、archive_command &#43; restore_command 配置、用 pgBackRest / WAL-G 替代手寫腳本、5 個 production 踩雷（archive 靜默失敗 / archive lag / 錯誤 target time / base backup 過期未清 / timeline 分歧 recovery 模糊）、跟 Patroni &#43; monitoring 整合">PITR + WAL Archiving</a> 用 <em>archive_command</em> 把 WAL ship 到 S3、跟 streaming replication 並行：</p>
<ul>
<li>Streaming：給 <em>活的 standby</em>（real-time read scaling / HA）</li>
<li>Archive：給 <em>PITR + 新 standby base backup source</em></li>
</ul>
<p>兩者使用同一 WAL stream、不衝突。</p>
<h3 id="connection-路由pgbouncer--readwrite-split">Connection 路由（PgBouncer + read/write split）</h3>
<p><a href="/blog/backend/01-database/vendors/postgresql/pgbouncer-config/" data-link-title="PostgreSQL pgBouncer 配置 &#43; 連線池治理" data-link-desc="pgBouncer transaction pooling 配置、跟 application connection pool 的分層、production 故障演練（pool exhaustion / stale connection / DNS failover）跟容量規劃">PgBouncer</a> 不做 read/write split（transaction pool 不看 SQL）。Read replica routing 通常用 <em>application-level</em> 或 <em>HAProxy 監控 standby health</em>。</p>
<h3 id="跟-mysql-replication-topology-對比">跟 MySQL Replication Topology 對比</h3>
<table>
  <thead>
      <tr>
          <th>維度</th>
          <th>PG streaming replication</th>
          <th>MySQL replication</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>進度追蹤</td>
          <td>LSN（單一 byte offset）</td>
          <td>GTID 或 binlog (file, position)</td>
      </tr>
      <tr>
          <td>標準工具</td>
          <td>streaming replication（physical）+ logical</td>
          <td>binlog ROW format</td>
      </tr>
      <tr>
          <td>Sync 機制</td>
          <td><code>synchronous_commit</code> + standby names</td>
          <td>semi-sync plugin</td>
      </tr>
      <tr>
          <td>Quorum</td>
          <td><code>ANY N</code> syntax</td>
          <td><code>rpl_semi_sync_master_wait_for_slave_count</code></td>
      </tr>
      <tr>
          <td>Replay parallelism</td>
          <td>Single startup process</td>
          <td>Multi-thread (logical clock / writeset)</td>
      </tr>
      <tr>
          <td>Replica routing</td>
          <td>PgBouncer 不看 SQL、需外接</td>
          <td>ProxySQL 內建 query routing</td>
      </tr>
  </tbody>
</table>
<p>兩者 high-level 對等、低層機制有顯著差異。詳見 <a href="/blog/backend/01-database/vendors/mysql/replication-topology/" data-link-title="MySQL Replication Topology：async / semi-sync / GTID 不是三選一、是三個 trade-off 軸的疊加" data-link-desc="MySQL replication 不是「選 async 還是 semi-sync」、是 *durability / latency / consistency* 三個 trade-off 軸的疊加；GTID 是跨 mode 的 infrastructure layer、不是第三種 mode。本文走 3 軸取捨模型 → async / semi-sync 行為對比 → GTID 替代 binlog-position 的好處 → 配置 step-by-step → 5 production 踩雷（lag 暴衝 / semi-sync 退回 async / GTID gap / Loss-Less semi-sync 真的 loss-less / chained replication 雪崩）→ 跟 Aurora MySQL / Vitess / ProxySQL / Orchestrator 整合">MySQL Replication Topology</a>。</p>
<h2 id="相關連結">相關連結</h2>
<ul>
<li><a href="/blog/backend/01-database/vendors/postgresql/" data-link-title="PostgreSQL" data-link-desc="多用途 OLTP 主流關聯式資料庫、MVCC、豐富 SQL 特性、是 Aurora / Cosmos DB / Spanner / CockroachDB / Aurora DSQL 的相容目標">PostgreSQL vendor overview</a></li>
<li><a href="/blog/backend/01-database/vendors/postgresql/patroni-ha/" data-link-title="PostgreSQL Patroni HA：從 leader 失聯到 client 重連的 5 段 failover lifecycle" data-link-desc="Patroni 把 PostgreSQL HA 拆成 detection / election / promotion / reconfiguration / recovery 五段 lifecycle、每段都有獨立配置跟 failure mode；DCS quorum &#43; watchdog 防 split-brain、async/sync replication 取捨、5 個 production 踩雷、跟 PgBouncer / HAProxy / cert-manager 整合">PG Patroni HA</a>（HA failover、依賴本文 replication topology）</li>
<li><a href="/blog/backend/01-database/vendors/postgresql/logical-replication-debezium/" data-link-title="PostgreSQL Logical Replication &#43; Debezium CDC：replication slot × failure × recovery 對照" data-link-desc="PostgreSQL logical replication slot 跟 Debezium CDC 的失效模式對照表：slot lag 撐爆 primary disk / schema change 斷流 / 初始 COPY 鎖表 / zombie slot 不釋放 / replay storm 後 offset reset；publication / subscription / pgoutput 配置、跟 Kafka outbox pattern 整合">PG Logical Replication + Debezium</a>（不同 abstraction、共用 WAL）</li>
<li><a href="/blog/backend/01-database/vendors/postgresql/pitr-wal-archiving/" data-link-title="PostgreSQL PITR &#43; WAL archiving：從 base backup 到 point-in-time recovery 的完整鏈" data-link-desc="Base backup &#43; WAL archive 構成 PITR 的雙軌資料、archive_command &#43; restore_command 配置、用 pgBackRest / WAL-G 替代手寫腳本、5 個 production 踩雷（archive 靜默失敗 / archive lag / 錯誤 target time / base backup 過期未清 / timeline 分歧 recovery 模糊）、跟 Patroni &#43; monitoring 整合">PG PITR + WAL Archiving</a>（streaming + archive 並行）</li>
<li><a href="/blog/backend/01-database/vendors/postgresql/pgbouncer-config/" data-link-title="PostgreSQL pgBouncer 配置 &#43; 連線池治理" data-link-desc="pgBouncer transaction pooling 配置、跟 application connection pool 的分層、production 故障演練（pool exhaustion / stale connection / DNS failover）跟容量規劃">PG PgBouncer</a>（connection pool、不做 read/write split）</li>
<li><a href="/blog/backend/01-database/vendors/mysql/replication-topology/" data-link-title="MySQL Replication Topology：async / semi-sync / GTID 不是三選一、是三個 trade-off 軸的疊加" data-link-desc="MySQL replication 不是「選 async 還是 semi-sync」、是 *durability / latency / consistency* 三個 trade-off 軸的疊加；GTID 是跨 mode 的 infrastructure layer、不是第三種 mode。本文走 3 軸取捨模型 → async / semi-sync 行為對比 → GTID 替代 binlog-position 的好處 → 配置 step-by-step → 5 production 踩雷（lag 暴衝 / semi-sync 退回 async / GTID gap / Loss-Less semi-sync 真的 loss-less / chained replication 雪崩）→ 跟 Aurora MySQL / Vitess / ProxySQL / Orchestrator 整合">MySQL Replication Topology</a>（sibling、不同機制）</li>
<li><a href="/blog/backend/knowledge-cards/quorum/" data-link-title="Quorum" data-link-desc="分散式系統以多數節點同意作為提交或讀取有效性的門檻">quorum 卡片</a> / <a href="/blog/backend/knowledge-cards/stale-read/" data-link-title="Stale Read" data-link-desc="讀取到落後於最新寫入版本的舊資料">stale-read 卡片</a> / <a href="/blog/backend/knowledge-cards/eventual-consistency/" data-link-title="Eventual Consistency" data-link-desc="允許短暫不一致、最終收斂到同一資料狀態的一致性語意">eventual-consistency 卡片</a></li>
<li>官方：<a href="https://www.postgresql.org/docs/current/warm-standby.html">PG Streaming Replication</a> / <a href="https://www.postgresql.org/docs/current/app-pgbasebackup.html">pg_basebackup</a></li>
</ul>
]]></content:encoded></item></channel></rss>