<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/"><channel><title>Replication-Slot on Tarragon</title><link>https://tarrragon.github.io/blog/tags/replication-slot/</link><description>Recent content in Replication-Slot on Tarragon</description><generator>Hugo -- gohugo.io</generator><language>zh-TW</language><copyright>Tarragon (CC BY 4.0)</copyright><lastBuildDate>Tue, 19 May 2026 00:00:00 +0000</lastBuildDate><atom:link href="https://tarrragon.github.io/blog/tags/replication-slot/index.xml" rel="self" type="application/rss+xml"/><item><title>PostgreSQL Replication Slot Management：Physical / Logical / Failover Slot 治理</title><link>https://tarrragon.github.io/blog/backend/01-database/vendors/postgresql/replication-slot-management/</link><pubDate>Tue, 19 May 2026 00:00:00 +0000</pubDate><guid>https://tarrragon.github.io/blog/backend/01-database/vendors/postgresql/replication-slot-management/</guid><description>&lt;blockquote>
&lt;p>本文是 &lt;a href="https://tarrragon.github.io/blog/backend/01-database/vendors/postgresql/" data-link-title="PostgreSQL" data-link-desc="多用途 OLTP 主流關聯式資料庫、MVCC、豐富 SQL 特性、是 Aurora / Cosmos DB / Spanner / CockroachDB / Aurora DSQL 的相容目標">PostgreSQL&lt;/a> overview 的 implementation-layer deep article。Overview 已說明 PG 在 OLTP 譜系的定位、本文聚焦 &lt;em>replication slot management&lt;/em> — physical / logical / failover slot 三類治理。&lt;/p>&lt;/blockquote>
&lt;hr>
&lt;h2 id="replication-slot-兩大類">Replication Slot 兩大類&lt;/h2>
&lt;p>PG 兩種 replication slot：&lt;/p>
&lt;h3 id="physical-replication-slot">Physical Replication Slot&lt;/h3>
&lt;p>對應 &lt;em>streaming replication&lt;/em>（physical WAL byte-level）：&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-sql" data-lang="sql">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">&lt;span class="k">SELECT&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="n">pg_create_physical_replication_slot&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s1">&amp;#39;standby1_slot&amp;#39;&lt;/span>&lt;span class="p">);&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>用於：&lt;/p>
&lt;ul>
&lt;li>Streaming replication standby（&lt;a href="https://tarrragon.github.io/blog/backend/01-database/vendors/postgresql/replication-topology/" data-link-title="PostgreSQL Replication Topology：async / sync / quorum 三模式跟 LSN &amp;#43; replication slot 的三軸組合" data-link-desc="PostgreSQL streaming replication 不是「sync 或 async」、是 *durability / latency / consistency* 三軸組合 &amp;#43; LSN-based 進度追蹤 &amp;#43; replication slot 治理。本文走 3 軸取捨模型、async / sync / quorum-based sync 行為對比、LSN &amp;#43; replication slot 機制、配置 step-by-step、5 production 踩雷（standby lag 暴衝 / sync standby 退回 async / orphan replication slot / cascading replication 雪崩 / failover 後 timeline 分歧）、跟 Patroni HA &amp;#43; logical replication 整合">Replication Topology&lt;/a>）&lt;/li>
&lt;li>pg_basebackup 用 slot 防 WAL 清理&lt;/li>
&lt;li>高 lag standby 防 WAL premature deletion&lt;/li>
&lt;/ul>
&lt;h3 id="logical-replication-slot">Logical Replication Slot&lt;/h3>
&lt;p>對應 &lt;em>logical replication / logical decoding&lt;/em>：&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-sql" data-lang="sql">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">&lt;span class="k">SELECT&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="n">pg_create_logical_replication_slot&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s1">&amp;#39;my_slot&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="s1">&amp;#39;pgoutput&amp;#39;&lt;/span>&lt;span class="p">);&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">2&lt;/span>&lt;span class="cl">&lt;span class="w">&lt;/span>&lt;span class="c1">-- 或用 wal2json plugin
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">3&lt;/span>&lt;span class="cl">&lt;span class="c1">&lt;/span>&lt;span class="k">SELECT&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="n">pg_create_logical_replication_slot&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="s1">&amp;#39;debezium_slot&amp;#39;&lt;/span>&lt;span class="p">,&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="s1">&amp;#39;wal2json&amp;#39;&lt;/span>&lt;span class="p">);&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>用於：&lt;/p>
&lt;ul>
&lt;li>PG-to-PG logical replication（publication / subscription）&lt;/li>
&lt;li>CDC（Debezium / Maxwell / pg_logical_emitter）&lt;/li>
&lt;li>Multi-master replication（BDR / pgEdge / Spock）&lt;/li>
&lt;/ul>
&lt;p>logical slot 跟 physical slot 共存、各自獨立 retention。&lt;/p>
&lt;h2 id="slot-lifecycle">Slot Lifecycle&lt;/h2>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-text" data-lang="text">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">建立 → active（有 consumer）→ inactive（consumer 失聯）→ drop
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">2&lt;/span>&lt;span class="cl"> ↓
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">3&lt;/span>&lt;span class="cl"> WAL 持續累積（直到推進 LSN 或 drop）&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>&lt;strong>狀態查詢&lt;/strong>：&lt;/p></description><content:encoded><![CDATA[<blockquote>
<p>本文是 <a href="/blog/backend/01-database/vendors/postgresql/" data-link-title="PostgreSQL" data-link-desc="多用途 OLTP 主流關聯式資料庫、MVCC、豐富 SQL 特性、是 Aurora / Cosmos DB / Spanner / CockroachDB / Aurora DSQL 的相容目標">PostgreSQL</a> overview 的 implementation-layer deep article。Overview 已說明 PG 在 OLTP 譜系的定位、本文聚焦 <em>replication slot management</em> — physical / logical / failover slot 三類治理。</p></blockquote>
<hr>
<h2 id="replication-slot-兩大類">Replication Slot 兩大類</h2>
<p>PG 兩種 replication slot：</p>
<h3 id="physical-replication-slot">Physical Replication Slot</h3>
<p>對應 <em>streaming replication</em>（physical WAL byte-level）：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-sql" data-lang="sql"><span class="line"><span class="ln">1</span><span class="cl"><span class="k">SELECT</span><span class="w"> </span><span class="n">pg_create_physical_replication_slot</span><span class="p">(</span><span class="s1">&#39;standby1_slot&#39;</span><span class="p">);</span></span></span></code></pre></div><p>用於：</p>
<ul>
<li>Streaming replication standby（<a href="/blog/backend/01-database/vendors/postgresql/replication-topology/" data-link-title="PostgreSQL Replication Topology：async / sync / quorum 三模式跟 LSN &#43; replication slot 的三軸組合" data-link-desc="PostgreSQL streaming replication 不是「sync 或 async」、是 *durability / latency / consistency* 三軸組合 &#43; LSN-based 進度追蹤 &#43; replication slot 治理。本文走 3 軸取捨模型、async / sync / quorum-based sync 行為對比、LSN &#43; replication slot 機制、配置 step-by-step、5 production 踩雷（standby lag 暴衝 / sync standby 退回 async / orphan replication slot / cascading replication 雪崩 / failover 後 timeline 分歧）、跟 Patroni HA &#43; logical replication 整合">Replication Topology</a>）</li>
<li>pg_basebackup 用 slot 防 WAL 清理</li>
<li>高 lag standby 防 WAL premature deletion</li>
</ul>
<h3 id="logical-replication-slot">Logical Replication Slot</h3>
<p>對應 <em>logical replication / logical decoding</em>：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-sql" data-lang="sql"><span class="line"><span class="ln">1</span><span class="cl"><span class="k">SELECT</span><span class="w"> </span><span class="n">pg_create_logical_replication_slot</span><span class="p">(</span><span class="s1">&#39;my_slot&#39;</span><span class="p">,</span><span class="w"> </span><span class="s1">&#39;pgoutput&#39;</span><span class="p">);</span><span class="w">
</span></span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="w"></span><span class="c1">-- 或用 wal2json plugin
</span></span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="c1"></span><span class="k">SELECT</span><span class="w"> </span><span class="n">pg_create_logical_replication_slot</span><span class="p">(</span><span class="s1">&#39;debezium_slot&#39;</span><span class="p">,</span><span class="w"> </span><span class="s1">&#39;wal2json&#39;</span><span class="p">);</span></span></span></code></pre></div><p>用於：</p>
<ul>
<li>PG-to-PG logical replication（publication / subscription）</li>
<li>CDC（Debezium / Maxwell / pg_logical_emitter）</li>
<li>Multi-master replication（BDR / pgEdge / Spock）</li>
</ul>
<p>logical slot 跟 physical slot 共存、各自獨立 retention。</p>
<h2 id="slot-lifecycle">Slot Lifecycle</h2>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="ln">1</span><span class="cl">建立 → active（有 consumer）→ inactive（consumer 失聯）→ drop
</span></span><span class="line"><span class="ln">2</span><span class="cl">                                    ↓
</span></span><span class="line"><span class="ln">3</span><span class="cl">                              WAL 持續累積（直到推進 LSN 或 drop）</span></span></code></pre></div><p><strong>狀態查詢</strong>：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-sql" data-lang="sql"><span class="line"><span class="ln">1</span><span class="cl"><span class="k">SELECT</span><span class="w"> </span><span class="n">slot_name</span><span class="p">,</span><span class="w">
</span></span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="w">       </span><span class="n">slot_type</span><span class="p">,</span><span class="w">
</span></span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="w">       </span><span class="n">active</span><span class="p">,</span><span class="w">
</span></span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="w">       </span><span class="n">restart_lsn</span><span class="p">,</span><span class="w">
</span></span></span><span class="line"><span class="ln">5</span><span class="cl"><span class="w">       </span><span class="n">confirmed_flush_lsn</span><span class="p">,</span><span class="w">
</span></span></span><span class="line"><span class="ln">6</span><span class="cl"><span class="w">       </span><span class="n">pg_size_pretty</span><span class="p">(</span><span class="n">pg_wal_lsn_diff</span><span class="p">(</span><span class="n">pg_current_wal_lsn</span><span class="p">(),</span><span class="w"> </span><span class="n">restart_lsn</span><span class="p">))</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">retained_wal</span><span class="w">
</span></span></span><span class="line"><span class="ln">7</span><span class="cl"><span class="w"></span><span class="k">FROM</span><span class="w"> </span><span class="n">pg_replication_slots</span><span class="p">;</span></span></span></code></pre></div><p>關鍵欄位：</p>
<ul>
<li><code>slot_type</code>：<code>physical</code> / <code>logical</code></li>
<li><code>active</code>：true / false（consumer 是否連著）</li>
<li><code>restart_lsn</code>：slot 起點 LSN、primary 必須保留這以後的 WAL</li>
<li><code>confirmed_flush_lsn</code>：logical slot 已 confirm flush 的 LSN</li>
<li><code>retained_wal</code>：當前因 slot 累積的 WAL</li>
</ul>
<h2 id="failover-slot-synchronization-pg-17">Failover Slot Synchronization (PG 17+)</h2>
<p>PG 17 之前的 <em>痛點</em>：logical replication slot 是 <em>primary 上的 state</em>、failover 後 <em>新 primary 沒這個 slot</em>、CDC consumer 失聯、需要重建（大工程）。</p>
<p>PG 17 加 <em>failover slot synchronization</em>：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-sql" data-lang="sql"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="c1">-- PG 17+：標 slot 為 failover-tracked
</span></span></span><span class="line"><span class="ln"> 2</span><span class="cl"><span class="c1">-- signature: pg_create_logical_replication_slot(slot_name, plugin, temporary, two_phase, failover)
</span></span></span><span class="line"><span class="ln"> 3</span><span class="cl"><span class="c1"></span><span class="k">SELECT</span><span class="w"> </span><span class="n">pg_create_logical_replication_slot</span><span class="p">(</span><span class="s1">&#39;my_slot&#39;</span><span class="p">,</span><span class="w"> </span><span class="s1">&#39;pgoutput&#39;</span><span class="p">,</span><span class="w"> </span><span class="k">false</span><span class="p">,</span><span class="w"> </span><span class="k">false</span><span class="p">,</span><span class="w"> </span><span class="k">true</span><span class="p">);</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 4</span><span class="cl"><span class="w"></span><span class="c1">--                                                                          ↑
</span></span></span><span class="line"><span class="ln"> 5</span><span class="cl"><span class="c1">--                                                                     failover=true（第 5 個參數）
</span></span></span><span class="line"><span class="ln"> 6</span><span class="cl"><span class="c1">-- 注意：第 4 個參數是 two_phase（這裡 false）、第 5 個才是 failover
</span></span></span><span class="line"><span class="ln"> 7</span><span class="cl"><span class="c1"></span><span class="w">
</span></span></span><span class="line"><span class="ln"> 8</span><span class="cl"><span class="w"></span><span class="c1">-- Standby 上 enable sync_replication_slots
</span></span></span><span class="line"><span class="ln"> 9</span><span class="cl"><span class="c1"></span><span class="k">ALTER</span><span class="w"> </span><span class="k">SYSTEM</span><span class="w"> </span><span class="k">SET</span><span class="w"> </span><span class="n">sync_replication_slots</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">on</span><span class="p">;</span><span class="w">
</span></span></span><span class="line"><span class="ln">10</span><span class="cl"><span class="w"></span><span class="k">SELECT</span><span class="w"> </span><span class="n">pg_reload_conf</span><span class="p">();</span></span></span></code></pre></div><p><code>sync_replication_slots = on</code> 後、physical replication 同步 slot state 到 standby。Failover promote standby 後、logical slot 仍可用、CDC consumer 重連即可。</p>
<p>PG 17 之前用 <a href="https://www.pgedge.com/">pgEdge</a> / <em>pglogical</em> 等 extension 提供類似功能、現在 PG core 內建。</p>
<h2 id="orphan-slot-治理">Orphan Slot 治理</h2>
<p><code>active = false</code> 的 slot 持續累積 WAL、disk 爆是 PG production 經典事故。</p>
<h3 id="監控-orphan-slot">監控 orphan slot</h3>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-sql" data-lang="sql"><span class="line"><span class="ln">1</span><span class="cl"><span class="c1">-- 找 inactive 太久的 slot
</span></span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="c1"></span><span class="k">SELECT</span><span class="w"> </span><span class="n">slot_name</span><span class="p">,</span><span class="w"> </span><span class="n">active</span><span class="p">,</span><span class="w"> </span><span class="n">restart_lsn</span><span class="p">,</span><span class="w">
</span></span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="w">       </span><span class="n">pg_size_pretty</span><span class="p">(</span><span class="n">pg_wal_lsn_diff</span><span class="p">(</span><span class="n">pg_current_wal_lsn</span><span class="p">(),</span><span class="w"> </span><span class="n">restart_lsn</span><span class="p">))</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="n">retained_wal</span><span class="w">
</span></span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="w"></span><span class="k">FROM</span><span class="w"> </span><span class="n">pg_replication_slots</span><span class="w">
</span></span></span><span class="line"><span class="ln">5</span><span class="cl"><span class="w"></span><span class="k">WHERE</span><span class="w"> </span><span class="k">NOT</span><span class="w"> </span><span class="n">active</span><span class="w">
</span></span></span><span class="line"><span class="ln">6</span><span class="cl"><span class="w">  </span><span class="k">AND</span><span class="w"> </span><span class="n">pg_wal_lsn_diff</span><span class="p">(</span><span class="n">pg_current_wal_lsn</span><span class="p">(),</span><span class="w"> </span><span class="n">restart_lsn</span><span class="p">)</span><span class="w"> </span><span class="o">&gt;</span><span class="w"> </span><span class="mi">1024</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="mi">1024</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="mi">1024</span><span class="p">;</span><span class="w">  </span><span class="c1">-- &gt; 1 GB</span></span></span></code></pre></div><h3 id="自動-invalidate-slotpg-13">自動 invalidate slot（PG 13+）</h3>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-sql" data-lang="sql"><span class="line"><span class="ln">1</span><span class="cl"><span class="c1">-- postgresql.conf
</span></span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="c1"></span><span class="k">ALTER</span><span class="w"> </span><span class="k">SYSTEM</span><span class="w"> </span><span class="k">SET</span><span class="w"> </span><span class="n">max_slot_wal_keep_size</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">&#39;50GB&#39;</span><span class="p">;</span><span class="w">  </span><span class="c1">-- slot 累積 &gt; 50GB 自動 invalidate</span></span></span></code></pre></div><p>當 slot 累積 WAL 超過 <code>max_slot_wal_keep_size</code>、PG 自動 invalidate slot（<code>active=false</code> 且不再保留 WAL）。Consumer 重連會 fail、必須重建（base backup + new slot）。</p>
<p>這是 <em>trade-off</em>：</p>
<ul>
<li>設 limit → 保護 disk、但 consumer 失聯 → 大重建工作</li>
<li>不設 limit → consumer 失聯 OK、但 disk 爆</li>
</ul>
<p>實務多數設 <code>max_slot_wal_keep_size</code> 給 <em>disk capacity 50%</em>、避免徹底 disk full。</p>
<h3 id="手動-drop-orphan-slot">手動 drop orphan slot</h3>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-sql" data-lang="sql"><span class="line"><span class="ln">1</span><span class="cl"><span class="c1">-- 確認 slot 真的不需要
</span></span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="c1"></span><span class="k">SELECT</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">pg_replication_slots</span><span class="w"> </span><span class="k">WHERE</span><span class="w"> </span><span class="n">slot_name</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">&#39;old_standby_slot&#39;</span><span class="p">;</span><span class="w">
</span></span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="w"></span><span class="c1">-- Drop
</span></span></span><span class="line"><span class="ln">5</span><span class="cl"><span class="c1"></span><span class="k">SELECT</span><span class="w"> </span><span class="n">pg_drop_replication_slot</span><span class="p">(</span><span class="s1">&#39;old_standby_slot&#39;</span><span class="p">);</span></span></span></code></pre></div><p>DR runbook 必須包含 <em>standby 退役流程</em>：先 standby fence、再 primary drop slot。</p>
<h2 id="5-個-production-踩雷">5 個 Production 踩雷</h2>
<h3 id="1-orphan-slot-disk-爆">1. Orphan slot disk 爆</h3>
<p>最經典 PG 事故：standby decomission 沒 drop slot、primary 持續保留 WAL、<code>pg_wal/</code> 累積到 disk full、primary 也掛。</p>
<p>修法：</p>
<ul>
<li>監控 <code>pg_replication_slots</code> + <code>pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn))</code> retained_wal</li>
<li>設 <code>max_slot_wal_keep_size</code>（PG 13+）— hard limit</li>
<li>Standby 退役 runbook 強制 <em>先 fence、再 drop slot</em></li>
<li>Cron job 自動 alert orphan slot</li>
</ul>
<h3 id="2-logical-slot-lag--cdc-consumer-跟不上">2. Logical slot lag — CDC consumer 跟不上</h3>
<p>Logical decoding 比 physical replication 慢（per-transaction logical event 重組）。CDC consumer（Debezium）跟不上 → slot lag 累積。</p>
<p>修法：</p>
<ul>
<li>監控 <code>pg_replication_slots.confirmed_flush_lsn</code> 跟 primary <code>pg_current_wal_lsn()</code> 對比</li>
<li>CDC consumer 性能調整（throughput / batch size）</li>
<li>Throttle source writes（如果不能升 consumer）</li>
<li>對 hot table 拆 publication / subscription、避免單 slot 處理所有變更</li>
</ul>
<p>詳見 <a href="/blog/backend/01-database/vendors/postgresql/logical-replication-debezium/" data-link-title="PostgreSQL Logical Replication &#43; Debezium CDC：replication slot × failure × recovery 對照" data-link-desc="PostgreSQL logical replication slot 跟 Debezium CDC 的失效模式對照表：slot lag 撐爆 primary disk / schema change 斷流 / 初始 COPY 鎖表 / zombie slot 不釋放 / replay storm 後 offset reset；publication / subscription / pgoutput 配置、跟 Kafka outbox pattern 整合">Logical Replication + Debezium</a>。</p>
<h3 id="3-failover-後-logical-slot-丟pg-16-之前">3. Failover 後 logical slot 丟（PG 16 之前）</h3>
<p>PG 16 之前、failover promote standby、新 primary 沒有原 logical slot。CDC consumer 試連、ERROR: <code>replication slot &quot;xxx&quot; does not exist</code>。</p>
<p>修法（PG 17+）：</p>
<ul>
<li>用 <em>failover slot synchronization</em>（如上）</li>
<li><code>pg_create_logical_replication_slot(...,  failover := true)</code></li>
<li>Standby <code>sync_replication_slots = on</code></li>
</ul>
<p>修法（PG 16-）：</p>
<ul>
<li>用 <a href="https://www.2ndquadrant.com/en/resources/pglogical/">pglogical</a> 或 <a href="https://www.pgedge.com/">pgEdge</a> extension</li>
<li>Failover runbook 包含 <em>新 primary 重建 logical slot</em>（CDC consumer 重 snapshot）</li>
<li>Pre-create slot on standby + manual sync（早期 workaround）</li>
</ul>
<h3 id="4-wal_keep_size-跟-slot-衝突">4. <code>wal_keep_size</code> 跟 slot 衝突</h3>
<p><code>wal_keep_size</code>（PG 13+）/ <code>wal_keep_segments</code>（&lt; 13）跟 slot 都會保留 WAL：</p>
<ul>
<li><code>wal_keep_size</code>：固定 minimum WAL 保留量</li>
<li>Slot：動態保留直到 consumer 推進</li>
</ul>
<p>兩者一起 set 時：實際保留 WAL = <code>max(wal_keep_size, slot 需要的量)</code>。</p>
<p>修法：</p>
<ul>
<li><code>wal_keep_size</code> 設小（如 1-2 GB）作 <em>minimum backup</em></li>
<li>主要靠 slot 動態保留 — 給 active consumer</li>
<li>監控 <code>pg_wal/</code> 大小 + 拆解 retention source（<code>wal_keep_size</code> vs slot 各佔多少）</li>
</ul>
<h3 id="5-slot-數量上限">5. Slot 數量上限</h3>
<p><code>max_replication_slots</code> 預設 10、不夠時新 slot 建不出來、報錯。</p>
<p>修法：</p>
<ul>
<li>Production 大 cluster 設 <code>max_replication_slots = 50</code> 或更多</li>
<li>對 <em>standby + logical replication + CDC consumer</em> 同時跑、計算需要的 slot 數</li>
<li>監控 <code>SELECT count(*) FROM pg_replication_slots</code> 接近 limit 時告警</li>
</ul>
<h2 id="slot-naming-convention">Slot Naming Convention</h2>
<p>Production 大 cluster 多 slot、命名 convention 重要：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="ln">1</span><span class="cl">&lt;consumer-type&gt;_&lt;consumer-name&gt;_&lt;purpose&gt;
</span></span><span class="line"><span class="ln">2</span><span class="cl">例：
</span></span><span class="line"><span class="ln">3</span><span class="cl">- physical_standby1_replication
</span></span><span class="line"><span class="ln">4</span><span class="cl">- physical_standby2_replication
</span></span><span class="line"><span class="ln">5</span><span class="cl">- logical_debezium_orders_cdc
</span></span><span class="line"><span class="ln">6</span><span class="cl">- logical_pgedge_node2_subscription
</span></span><span class="line"><span class="ln">7</span><span class="cl">- physical_pgbasebackup_temp（base backup 用、completed 後 drop）</span></span></code></pre></div><p>清楚命名讓 <em>看 slot 名</em> 就知道用途、誰負責、能不能 drop。</p>
<h2 id="跟其他模組整合">跟其他模組整合</h2>
<ul>
<li><a href="/blog/backend/01-database/vendors/postgresql/replication-topology/" data-link-title="PostgreSQL Replication Topology：async / sync / quorum 三模式跟 LSN &#43; replication slot 的三軸組合" data-link-desc="PostgreSQL streaming replication 不是「sync 或 async」、是 *durability / latency / consistency* 三軸組合 &#43; LSN-based 進度追蹤 &#43; replication slot 治理。本文走 3 軸取捨模型、async / sync / quorum-based sync 行為對比、LSN &#43; replication slot 機制、配置 step-by-step、5 production 踩雷（standby lag 暴衝 / sync standby 退回 async / orphan replication slot / cascading replication 雪崩 / failover 後 timeline 分歧）、跟 Patroni HA &#43; logical replication 整合">Replication Topology</a>：physical slot 給 streaming replication 用</li>
<li><a href="/blog/backend/01-database/vendors/postgresql/logical-replication-debezium/" data-link-title="PostgreSQL Logical Replication &#43; Debezium CDC：replication slot × failure × recovery 對照" data-link-desc="PostgreSQL logical replication slot 跟 Debezium CDC 的失效模式對照表：slot lag 撐爆 primary disk / schema change 斷流 / 初始 COPY 鎖表 / zombie slot 不釋放 / replay storm 後 offset reset；publication / subscription / pgoutput 配置、跟 Kafka outbox pattern 整合">Logical Replication + Debezium</a>：logical slot 給 CDC</li>
<li><a href="/blog/backend/01-database/vendors/postgresql/bdr-multi-master/" data-link-title="PostgreSQL BDR / Multi-Master：active-active 寫入的 3 種路徑跟 conflict 治理" data-link-desc="PG 預設是 single-primary、active-active 多寫入入口需要 *BDR (EDB)* / *pgEdge* / *Bucardo* 等 extension。本文走 3 種 multi-master 方案對比、conflict detection &#43; resolution model、async vs sync 取捨、配置 step-by-step（pgEdge 為主）、5 production 踩雷（last-write-wins data loss / sequence collision / DDL replication / conflict log 治理 / failover 後 timeline 分歧）、跟 MySQL Group Replication sibling 對比">BDR / Multi-Master</a>：multi-master 大量用 logical slot</li>
<li><a href="/blog/backend/01-database/vendors/postgresql/pitr-wal-archiving/" data-link-title="PostgreSQL PITR &#43; WAL archiving：從 base backup 到 point-in-time recovery 的完整鏈" data-link-desc="Base backup &#43; WAL archive 構成 PITR 的雙軌資料、archive_command &#43; restore_command 配置、用 pgBackRest / WAL-G 替代手寫腳本、5 個 production 踩雷（archive 靜默失敗 / archive lag / 錯誤 target time / base backup 過期未清 / timeline 分歧 recovery 模糊）、跟 Patroni &#43; monitoring 整合">PITR + WAL Archiving</a>：WAL archive 跟 slot 是兩種 WAL retention 機制、可並行</li>
</ul>
<h2 id="監控-metric">監控 metric</h2>
<p>Production 持續監控：</p>
<ul>
<li><code>pg_replication_slots.active</code> — 失聯 slot</li>
<li><code>pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn)</code> — slot 累積 WAL</li>
<li><code>pg_replication_slots.confirmed_flush_lsn</code> vs <code>pg_current_wal_lsn()</code> — logical slot lag</li>
<li><code>pg_ls_waldir()</code> 看 <code>pg_wal/</code> 目錄大小</li>
<li><code>count(*) FROM pg_replication_slots</code> 對 <code>max_replication_slots</code> 比例</li>
</ul>
<p>把這些丟進 Datadog / Prometheus + alert。</p>
<h2 id="相關連結">相關連結</h2>
<ul>
<li><a href="/blog/backend/01-database/vendors/postgresql/" data-link-title="PostgreSQL" data-link-desc="多用途 OLTP 主流關聯式資料庫、MVCC、豐富 SQL 特性、是 Aurora / Cosmos DB / Spanner / CockroachDB / Aurora DSQL 的相容目標">PostgreSQL vendor overview</a></li>
<li><a href="/blog/backend/01-database/vendors/postgresql/replication-topology/" data-link-title="PostgreSQL Replication Topology：async / sync / quorum 三模式跟 LSN &#43; replication slot 的三軸組合" data-link-desc="PostgreSQL streaming replication 不是「sync 或 async」、是 *durability / latency / consistency* 三軸組合 &#43; LSN-based 進度追蹤 &#43; replication slot 治理。本文走 3 軸取捨模型、async / sync / quorum-based sync 行為對比、LSN &#43; replication slot 機制、配置 step-by-step、5 production 踩雷（standby lag 暴衝 / sync standby 退回 async / orphan replication slot / cascading replication 雪崩 / failover 後 timeline 分歧）、跟 Patroni HA &#43; logical replication 整合">PG Replication Topology</a>（physical slot 用途）</li>
<li><a href="/blog/backend/01-database/vendors/postgresql/logical-replication-debezium/" data-link-title="PostgreSQL Logical Replication &#43; Debezium CDC：replication slot × failure × recovery 對照" data-link-desc="PostgreSQL logical replication slot 跟 Debezium CDC 的失效模式對照表：slot lag 撐爆 primary disk / schema change 斷流 / 初始 COPY 鎖表 / zombie slot 不釋放 / replay storm 後 offset reset；publication / subscription / pgoutput 配置、跟 Kafka outbox pattern 整合">PG Logical Replication + Debezium</a>（logical slot 用途）</li>
<li><a href="/blog/backend/01-database/vendors/postgresql/bdr-multi-master/" data-link-title="PostgreSQL BDR / Multi-Master：active-active 寫入的 3 種路徑跟 conflict 治理" data-link-desc="PG 預設是 single-primary、active-active 多寫入入口需要 *BDR (EDB)* / *pgEdge* / *Bucardo* 等 extension。本文走 3 種 multi-master 方案對比、conflict detection &#43; resolution model、async vs sync 取捨、配置 step-by-step（pgEdge 為主）、5 production 踩雷（last-write-wins data loss / sequence collision / DDL replication / conflict log 治理 / failover 後 timeline 分歧）、跟 MySQL Group Replication sibling 對比">PG BDR / Multi-Master</a>（multi-master 大量 slot）</li>
<li><a href="/blog/backend/01-database/vendors/postgresql/pitr-wal-archiving/" data-link-title="PostgreSQL PITR &#43; WAL archiving：從 base backup 到 point-in-time recovery 的完整鏈" data-link-desc="Base backup &#43; WAL archive 構成 PITR 的雙軌資料、archive_command &#43; restore_command 配置、用 pgBackRest / WAL-G 替代手寫腳本、5 個 production 踩雷（archive 靜默失敗 / archive lag / 錯誤 target time / base backup 過期未清 / timeline 分歧 recovery 模糊）、跟 Patroni &#43; monitoring 整合">PG PITR + WAL Archiving</a>（WAL retention 兩種機制）</li>
<li>官方：<a href="https://www.postgresql.org/docs/current/warm-standby.html#STREAMING-REPLICATION-SLOTS">PG Replication Slots</a> / <a href="https://www.postgresql.org/docs/current/logicaldecoding.html">Logical Replication Slot</a></li>
</ul>
]]></content:encoded></item></channel></rss>