<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/"><channel><title>Version-Upgrade on Tarragon</title><link>https://tarrragon.github.io/blog/tags/version-upgrade/</link><description>Recent content in Version-Upgrade on Tarragon</description><generator>Hugo -- gohugo.io</generator><language>zh-TW</language><copyright>Tarragon (CC BY 4.0)</copyright><lastBuildDate>Tue, 19 May 2026 00:00:00 +0000</lastBuildDate><atom:link href="https://tarrragon.github.io/blog/tags/version-upgrade/index.xml" rel="self" type="application/rss+xml"/><item><title>PostgreSQL major version upgrade (14 → 17)：為什麼這篇不套 5 type migration</title><link>https://tarrragon.github.io/blog/backend/01-database/vendors/postgresql/major-version-upgrade/</link><pubDate>Tue, 19 May 2026 00:00:00 +0000</pubDate><guid>https://tarrragon.github.io/blog/backend/01-database/vendors/postgresql/major-version-upgrade/</guid><description>&lt;blockquote>
&lt;p>本文是 &lt;a href="https://tarrragon.github.io/blog/backend/01-database/vendors/postgresql/" data-link-title="PostgreSQL" data-link-desc="多用途 OLTP 主流關聯式資料庫、MVCC、豐富 SQL 特性、是 Aurora / Cosmos DB / Spanner / CockroachDB / Aurora DSQL 的相容目標">PostgreSQL&lt;/a> overview 的 implementation-layer deep article。寫作前判讀 &lt;em>不適用&lt;/em> &lt;a href="https://tarrragon.github.io/blog/posts/migration-playbook-%E6%96%B9%E6%B3%95%E8%AB%96%E7%9A%84%E6%BC%94%E5%8C%96%E7%B4%80%E9%8C%84stage-0-variant-%E8%A6%8F%E5%8A%83%E6%8A%8A-collapse-%E7%8E%87%E5%BE%9E-60-%E9%99%8D%E5%88%B0-0/" data-link-title="Migration Playbook 方法論的演化紀錄：Stage 0 variant 規劃把 collapse 率從 60% 降到 0%" data-link-desc="跨 vendor migration playbook 需要獨立寫作方法論的依據，以及這套方法論從三輪 batch dogfood 中演化出來的驗證證據。">Migration playbook methodology&lt;/a> 的 5 type — 本文是該 methodology 「何時不該套」段的第 2 項實證（同 vendor major version upgrade）。&lt;/p>&lt;/blockquote>
&lt;h2 id="為什麼這篇不套-5-type-migration">為什麼這篇不套 5 type migration&lt;/h2>
&lt;p>跑 &lt;a href="https://tarrragon.github.io/blog/report/content-structure-by-max-diff-dimension/" data-link-title="Process content 結構由最大差異維度決定、不是 universal phased" data-link-desc="跨 X process content（migration / upgrade / rollout / playbook）的結構由 source / target 之間 *差異維度組合* 決定、不存在 universal phased 模板；6 種 migration / process type 實證（schema 差 / drop-in / operational / multi-tool / paradigm / topology re-layout）跑出 6 種不同結構；寫作前必須做 *6 維 diff dimension audit* 才能決定結構、跳過會套錯模板">diff dimension audit&lt;/a> 對 PostgreSQL 14 → 17：&lt;/p>
&lt;table>
 &lt;thead>
 &lt;tr>
 &lt;th>維度&lt;/th>
 &lt;th>評估&lt;/th>
 &lt;th>等級&lt;/th>
 &lt;/tr>
 &lt;/thead>
 &lt;tbody>
 &lt;tr>
 &lt;td>Schema / API&lt;/td>
 &lt;td>同 PostgreSQL wire protocol、SQL syntax 99%+ 相容&lt;/td>
 &lt;td>Low&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>Operational model&lt;/td>
 &lt;td>同 PostgreSQL operational stack、tooling 不變&lt;/td>
 &lt;td>Low&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>Abstraction / paradigm&lt;/td>
 &lt;td>同 OLTP RDBMS&lt;/td>
 &lt;td>Low&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>Number of components&lt;/td>
 &lt;td>同 1 個&lt;/td>
 &lt;td>Low&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>Application change&lt;/td>
 &lt;td>多數 application 不改&lt;/td>
 &lt;td>Low&lt;/td>
 &lt;/tr>
 &lt;/tbody>
&lt;/table>
&lt;p>5 維皆 Low — 對映 Type B drop-in。但 &lt;em>實際工作量&lt;/em> 跟 drop-in 完全不同：&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Extension 相容性&lt;/strong>：pg14 的 extension 不一定能在 pg17 直接用（API 變動 / ABI break）&lt;/li>
&lt;li>&lt;strong>Breaking change&lt;/strong>：每個 major version 有 release-specific behavior change（pg17 移除 &lt;code>relation&lt;/code>/&lt;code>oid&lt;/code> 隱性 type、pg15 公開 &lt;code>pg_role&lt;/code> 規則變嚴）&lt;/li>
&lt;li>&lt;strong>Storage format&lt;/strong>：major version 之間 &lt;em>data dir 不向後相容&lt;/em>、必須 &lt;code>pg_upgrade&lt;/code> 或 dump-restore&lt;/li>
&lt;li>&lt;strong>Statistics 重建&lt;/strong>：upgrade 後 &lt;code>pg_statistic&lt;/code> 失效、必須跑 &lt;code>ANALYZE&lt;/code>、否則 query plan 退化&lt;/li>
&lt;li>&lt;strong>Replication slot&lt;/strong>：logical replication slot 不跨 major version&lt;/li>
&lt;/ul>
&lt;p>5 type 對映 &lt;em>跨 vendor process&lt;/em>、漏了 &lt;em>同 vendor 內升級&lt;/em> 的 upgrade-specific dimension。本文採用 &lt;em>deep article methodology 的 6-section + 額外 upgrade audit 段&lt;/em> 結構、不是 5 type 的任一個。&lt;/p></description><content:encoded><![CDATA[<blockquote>
<p>本文是 <a href="/blog/backend/01-database/vendors/postgresql/" data-link-title="PostgreSQL" data-link-desc="多用途 OLTP 主流關聯式資料庫、MVCC、豐富 SQL 特性、是 Aurora / Cosmos DB / Spanner / CockroachDB / Aurora DSQL 的相容目標">PostgreSQL</a> overview 的 implementation-layer deep article。寫作前判讀 <em>不適用</em> <a href="/blog/posts/migration-playbook-%E6%96%B9%E6%B3%95%E8%AB%96%E7%9A%84%E6%BC%94%E5%8C%96%E7%B4%80%E9%8C%84stage-0-variant-%E8%A6%8F%E5%8A%83%E6%8A%8A-collapse-%E7%8E%87%E5%BE%9E-60-%E9%99%8D%E5%88%B0-0/" data-link-title="Migration Playbook 方法論的演化紀錄：Stage 0 variant 規劃把 collapse 率從 60% 降到 0%" data-link-desc="跨 vendor migration playbook 需要獨立寫作方法論的依據，以及這套方法論從三輪 batch dogfood 中演化出來的驗證證據。">Migration playbook methodology</a> 的 5 type — 本文是該 methodology 「何時不該套」段的第 2 項實證（同 vendor major version upgrade）。</p></blockquote>
<h2 id="為什麼這篇不套-5-type-migration">為什麼這篇不套 5 type migration</h2>
<p>跑 <a href="/blog/report/content-structure-by-max-diff-dimension/" data-link-title="Process content 結構由最大差異維度決定、不是 universal phased" data-link-desc="跨 X process content（migration / upgrade / rollout / playbook）的結構由 source / target 之間 *差異維度組合* 決定、不存在 universal phased 模板；6 種 migration / process type 實證（schema 差 / drop-in / operational / multi-tool / paradigm / topology re-layout）跑出 6 種不同結構；寫作前必須做 *6 維 diff dimension audit* 才能決定結構、跳過會套錯模板">diff dimension audit</a> 對 PostgreSQL 14 → 17：</p>
<table>
  <thead>
      <tr>
          <th>維度</th>
          <th>評估</th>
          <th>等級</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>Schema / API</td>
          <td>同 PostgreSQL wire protocol、SQL syntax 99%+ 相容</td>
          <td>Low</td>
      </tr>
      <tr>
          <td>Operational model</td>
          <td>同 PostgreSQL operational stack、tooling 不變</td>
          <td>Low</td>
      </tr>
      <tr>
          <td>Abstraction / paradigm</td>
          <td>同 OLTP RDBMS</td>
          <td>Low</td>
      </tr>
      <tr>
          <td>Number of components</td>
          <td>同 1 個</td>
          <td>Low</td>
      </tr>
      <tr>
          <td>Application change</td>
          <td>多數 application 不改</td>
          <td>Low</td>
      </tr>
  </tbody>
</table>
<p>5 維皆 Low — 對映 Type B drop-in。但 <em>實際工作量</em> 跟 drop-in 完全不同：</p>
<ul>
<li><strong>Extension 相容性</strong>：pg14 的 extension 不一定能在 pg17 直接用（API 變動 / ABI break）</li>
<li><strong>Breaking change</strong>：每個 major version 有 release-specific behavior change（pg17 移除 <code>relation</code>/<code>oid</code> 隱性 type、pg15 公開 <code>pg_role</code> 規則變嚴）</li>
<li><strong>Storage format</strong>：major version 之間 <em>data dir 不向後相容</em>、必須 <code>pg_upgrade</code> 或 dump-restore</li>
<li><strong>Statistics 重建</strong>：upgrade 後 <code>pg_statistic</code> 失效、必須跑 <code>ANALYZE</code>、否則 query plan 退化</li>
<li><strong>Replication slot</strong>：logical replication slot 不跨 major version</li>
</ul>
<p>5 type 對映 <em>跨 vendor process</em>、漏了 <em>同 vendor 內升級</em> 的 upgrade-specific dimension。本文採用 <em>deep article methodology 的 6-section + 額外 upgrade audit 段</em> 結構、不是 5 type 的任一個。</p>
<h2 id="結構-differentiatordeep-article--upgrade-audit">結構 differentiator：deep article + upgrade audit</h2>
<p>跟 single feature deep article（如 <a href="/blog/backend/01-database/vendors/postgresql/pgbouncer-config/" data-link-title="PostgreSQL pgBouncer 配置 &#43; 連線池治理" data-link-desc="pgBouncer transaction pooling 配置、跟 application connection pool 的分層、production 故障演練（pool exhaustion / stale connection / DNS failover）跟容量規劃">pgBouncer config</a> / <a href="/blog/backend/01-database/vendors/postgresql/patroni-ha/" data-link-title="PostgreSQL Patroni HA：從 leader 失聯到 client 重連的 5 段 failover lifecycle" data-link-desc="Patroni 把 PostgreSQL HA 拆成 detection / election / promotion / reconfiguration / recovery 五段 lifecycle、每段都有獨立配置跟 failure mode；DCS quorum &#43; watchdog 防 split-brain、async/sync replication 取捨、5 個 production 踩雷、跟 PgBouncer / HAProxy / cert-manager 整合">Patroni HA</a>）對照、本文多一段 <em>upgrade audit</em>；跟 migration playbook 對照、本文 <em>沒 phased translation / parallel run / cutover routing</em>：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="ln">1</span><span class="cl">問題情境（為什麼升）
</span></span><span class="line"><span class="ln">2</span><span class="cl">→ Upgrade audit（extension / breaking change / dependency）
</span></span><span class="line"><span class="ln">3</span><span class="cl">→ 升級方法選擇（pg_upgrade / logical / blue-green）
</span></span><span class="line"><span class="ln">4</span><span class="cl">→ Step-by-step 執行
</span></span><span class="line"><span class="ln">5</span><span class="cl">→ 故障演練
</span></span><span class="line"><span class="ln">6</span><span class="cl">→ Capacity / downtime trade-off
</span></span><span class="line"><span class="ln">7</span><span class="cl">→ 整合 / 下一步</span></span></code></pre></div><p>7 段、220-280 行。比 single feature deep article 多 1 段 audit、比 migration playbook 少 phased translation 章節。</p>
<h2 id="問題情境major-version-不只是-minor-bump">問題情境：major version 不只是 minor bump</h2>
<p>PostgreSQL major version（14 / 15 / 16 / 17）一年一版、每版含 <em>breaking change</em>、不是 minor bump。常見升級驅動：</p>
<ul>
<li><strong>EOL pressure</strong>：PostgreSQL 每版 maintained 5 年、pg14 EOL 2026-11；pg13 EOL 2025-11 已過、production 仍跑 pg13 是 risk</li>
<li><strong>新 feature 需求</strong>：pg15 MERGE / pg16 parallel hash join / pg17 incremental backup</li>
<li><strong>Cloud provider 強制</strong>：Aurora / RDS 對 EOL 版本停 minor patch、planned upgrade 不能拖</li>
</ul>
<p>不升級的代價：security patch 停發、新功能不能用、跟新 client / extension 漸增不相容。</p>
<h2 id="upgrade-audit">Upgrade audit</h2>
<p>升級前的硬閘門 audit、跳過任一個 production 必踩：</p>
<h3 id="audit-1extension-相容性">Audit 1：Extension 相容性</h3>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-sql" data-lang="sql"><span class="line"><span class="ln">1</span><span class="cl"><span class="k">SELECT</span><span class="w"> </span><span class="n">extname</span><span class="p">,</span><span class="w"> </span><span class="n">extversion</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">pg_extension</span><span class="w"> </span><span class="k">WHERE</span><span class="w"> </span><span class="n">extname</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="s1">&#39;plpgsql&#39;</span><span class="p">;</span></span></span></code></pre></div><p>對每個 extension 跑：</p>
<ol>
<li>對應 target version (pg17) 是否有 release？</li>
<li>ABI break？（如 PostGIS major version 對應 PG major version）</li>
<li>是否有 maintainer 持續更新？（TimescaleDB 已不 cover pg17 部分 feature）</li>
</ol>
<p>常見 pg14 → pg17 需要 <em>先升 extension</em> 的：PostGIS / TimescaleDB / pgaudit / pg_partman / pg_repack。</p>
<h3 id="audit-2breaking-change-pull">Audit 2：Breaking change pull</h3>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl"><span class="c1"># 查 release note 累積 breaking change（pg14 → pg17 跨 3 個 major）</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="c1"># pg15: deprecated public schema 預設 write 權限變嚴</span>
</span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="c1"># pg16: regrole removed implicit casts</span>
</span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="c1"># pg17: removed several deprecated columns from system catalogs</span></span></span></code></pre></div><p>對每個 breaking change：</p>
<ol>
<li>用 SQL grep / static analysis 找 application code 影響範圍</li>
<li>評估修改工作量（通常 50-95% 是 false alarm、5-10% 真實影響）</li>
<li>列出無法立刻修的、規劃 <em>逐 major 升</em> 而不是 <em>一次升 3 major</em></li>
</ol>
<h3 id="audit-3replication--logical-slot">Audit 3：Replication / logical slot</h3>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-sql" data-lang="sql"><span class="line"><span class="ln">1</span><span class="cl"><span class="k">SELECT</span><span class="w"> </span><span class="n">slot_name</span><span class="p">,</span><span class="w"> </span><span class="n">plugin</span><span class="p">,</span><span class="w"> </span><span class="n">slot_type</span><span class="p">,</span><span class="w"> </span><span class="n">active</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">pg_replication_slots</span><span class="p">;</span></span></span></code></pre></div><p>major version upgrade 後：</p>
<ul>
<li><strong>Physical replication slot</strong>：standby 必須先升級到 <em>相同 major version</em> 才能跟新 primary</li>
<li><strong>Logical replication slot</strong>：<strong>不跨 major version</strong>、必須在 upgrade 前 drop、之後重建（消費者重 init load）</li>
<li>對應 <a href="/blog/backend/01-database/vendors/postgresql/logical-replication-debezium/" data-link-title="PostgreSQL Logical Replication &#43; Debezium CDC：replication slot × failure × recovery 對照" data-link-desc="PostgreSQL logical replication slot 跟 Debezium CDC 的失效模式對照表：slot lag 撐爆 primary disk / schema change 斷流 / 初始 COPY 鎖表 / zombie slot 不釋放 / replay storm 後 offset reset；publication / subscription / pgoutput 配置、跟 Kafka outbox pattern 整合">Debezium CDC</a> consumer 必須重 init</li>
</ul>
<h3 id="audit-4config-參數變更">Audit 4：Config 參數變更</h3>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl"><span class="c1"># diff postgresql.conf default 14 vs 17</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="c1"># 重點: shared_preload_libraries / autovacuum_* / wal_level / synchronous_commit</span></span></span></code></pre></div><p>新 major version 預設值常變（pg14 → 17：<code>max_worker_processes</code> 預設變 / <code>unix_socket_directories</code> 行為差異）；自定 config 需逐項 review。</p>
<h3 id="audit-5statistics-重建計畫">Audit 5：Statistics 重建計畫</h3>
<p><code>pg_upgrade</code> 後 <code>pg_statistic</code> 重置、第一次跑 query plan 用空 stats、production 性能會塌；upgrade 計畫必須含：</p>
<ul>
<li><code>ANALYZE</code> 跑全 DB（小 DB ~10 分鐘、大 DB 1-3 小時）</li>
<li>多 stage <code>vacuumdb --analyze-in-stages</code> 先快速跑 baseline、再跑 full</li>
<li>Maintenance window 內預留 statistics 重建時間</li>
</ul>
<h2 id="升級方法選擇">升級方法選擇</h2>
<p>三種主流方法、依 downtime 容忍跟 DB 大小：</p>
<table>
  <thead>
      <tr>
          <th>方法</th>
          <th>Downtime</th>
          <th>風險</th>
          <th>適用</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td><code>pg_upgrade --link</code></td>
          <td>10-30 分鐘</td>
          <td>data dir 跟 OS package 同 host、回退複雜</td>
          <td>&lt; 500GB、可接受 30 分鐘 downtime</td>
      </tr>
      <tr>
          <td>Logical replication</td>
          <td>切換瞬間（&lt; 1 分鐘）</td>
          <td>設定複雜、long-running migration window</td>
          <td>TB 級、低 downtime 需求</td>
      </tr>
      <tr>
          <td>Blue-green deployment</td>
          <td>切換瞬間</td>
          <td>雙倍硬體、cutover 期間需嚴格 traffic shifting</td>
          <td>Cloud-managed（Aurora / RDS 內建）</td>
      </tr>
  </tbody>
</table>
<h3 id="pg_upgrade---link-流程"><code>pg_upgrade --link</code> 流程</h3>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="c1"># 1. install pg17 binary（不啟動）</span>
</span></span><span class="line"><span class="ln"> 2</span><span class="cl"><span class="c1"># 2. stop pg14</span>
</span></span><span class="line"><span class="ln"> 3</span><span class="cl">sudo systemctl stop postgresql@14
</span></span><span class="line"><span class="ln"> 4</span><span class="cl">
</span></span><span class="line"><span class="ln"> 5</span><span class="cl"><span class="c1"># 3. 跑 pg_upgrade（hard link、不複製資料）</span>
</span></span><span class="line"><span class="ln"> 6</span><span class="cl">sudo -u postgres /usr/lib/postgresql/17/bin/pg_upgrade <span class="se">\
</span></span></span><span class="line"><span class="ln"> 7</span><span class="cl"><span class="se"></span>  --old-bindir<span class="o">=</span>/usr/lib/postgresql/14/bin <span class="se">\
</span></span></span><span class="line"><span class="ln"> 8</span><span class="cl"><span class="se"></span>  --new-bindir<span class="o">=</span>/usr/lib/postgresql/17/bin <span class="se">\
</span></span></span><span class="line"><span class="ln"> 9</span><span class="cl"><span class="se"></span>  --old-datadir<span class="o">=</span>/var/lib/postgresql/14/main <span class="se">\
</span></span></span><span class="line"><span class="ln">10</span><span class="cl"><span class="se"></span>  --new-datadir<span class="o">=</span>/var/lib/postgresql/17/main <span class="se">\
</span></span></span><span class="line"><span class="ln">11</span><span class="cl"><span class="se"></span>  --link <span class="se">\
</span></span></span><span class="line"><span class="ln">12</span><span class="cl"><span class="se"></span>  --jobs<span class="o">=</span><span class="m">8</span>
</span></span><span class="line"><span class="ln">13</span><span class="cl">
</span></span><span class="line"><span class="ln">14</span><span class="cl"><span class="c1"># 4. 啟動 pg17</span>
</span></span><span class="line"><span class="ln">15</span><span class="cl">sudo systemctl start postgresql@17
</span></span><span class="line"><span class="ln">16</span><span class="cl">
</span></span><span class="line"><span class="ln">17</span><span class="cl"><span class="c1"># 5. 跑 pg_upgrade 產出的 analyze script</span>
</span></span><span class="line"><span class="ln">18</span><span class="cl">sudo -u postgres /tmp/analyze_new_cluster.sh</span></span></code></pre></div><p><code>--link</code> 用 hard link、不複製 data dir、適合大 DB；缺點是 <em>回退到 pg14 不可能</em>（data dir 已被新 pg 修改）— 必須有完整 backup + tested restore。</p>
<h2 id="故障演練">故障演練</h2>
<h3 id="case-1extension-相容性沒先-auditupgrade-後啟動失敗">Case 1：Extension 相容性沒先 audit、upgrade 後啟動失敗</h3>
<p><strong>徵兆</strong>：pg_upgrade 跑完、<code>pg_ctl start</code> 失敗、log 顯示 <code>could not load library &quot;timescaledb-2.13.so&quot;</code>。</p>
<p><strong>根因</strong>：TimescaleDB 對應 pg14、pg17 需要 TimescaleDB 2.16+；pg_upgrade 階段沒 check、library path 找不到。</p>
<p><strong>修法</strong>：</p>
<ol>
<li><strong>Pre-upgrade audit</strong>：每個 extension 列出 target version 對應、預先升 extension（在 pg14 上跑、用 <code>ALTER EXTENSION ... UPDATE</code>）</li>
<li><strong>回退</strong>：data dir 用 <code>--link</code> 已不可逆、必須從 backup restore + 重試</li>
<li><strong>預防</strong>：staging 環境完整 dry-run、production upgrade 前已知 path 都驗證過</li>
</ol>
<h3 id="case-2application-用-deprecated-sql跑壞">Case 2：Application 用 deprecated SQL、跑壞</h3>
<p><strong>徵兆</strong>：upgrade 後某些 application query 直接 error <code>ERROR: type &quot;regtype&quot; does not have a cast</code>。</p>
<p><strong>根因</strong>：pg16 移除了某些隱性 cast、application code 用了 implicit cast、現在 explicit cast 才能跑。</p>
<p><strong>修法</strong>：</p>
<ol>
<li><strong>Pre-upgrade</strong>：跑 application test suite 對 pg17 staging、catch 不相容 query</li>
<li><strong>緊急</strong>：staging 找到的 query 在 production 改 application code、deploy 後再 upgrade DB</li>
<li><strong>長期</strong>：application code 用 ORM / query builder、避免 raw SQL 對 PG version-specific behavior 依賴</li>
</ol>
<h3 id="case-3analyze-沒跑production-query-性能崩">Case 3：<code>ANALYZE</code> 沒跑、production query 性能崩</h3>
<p><strong>徵兆</strong>：upgrade 後 5 分鐘、application latency p99 從 50ms 衝到 5000ms；query plan 從 index scan 退化到 seq scan。</p>
<p><strong>根因</strong>：<code>pg_upgrade</code> 重置 <code>pg_statistic</code>、planner 用空 stats 跑 plan、無法估 selectivity、保守選 seq scan。</p>
<p><strong>修法</strong>：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl"><span class="c1"># upgrade 完立刻跑 (順序)</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl">vacuumdb --all --analyze-in-stages --jobs<span class="o">=</span><span class="m">4</span>
</span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="c1"># Stage 1: 最少 stats（快、~5 分鐘）</span>
</span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="c1"># Stage 2: 中 stats（~30 分鐘）</span>
</span></span><span class="line"><span class="ln">5</span><span class="cl"><span class="c1"># Stage 3: 完整 stats（1-3 小時）</span></span></span></code></pre></div><p><code>--analyze-in-stages</code> 分 3 階段、第 1 階段就能讓 planner 做大致正確的決策；可在 maintenance window 內接受 stage 3 仍在跑。</p>
<h3 id="case-4logical-replication-slot-漏-dropdebezium-卡死">Case 4：Logical replication slot 漏 drop、Debezium 卡死</h3>
<p><strong>徵兆</strong>：upgrade 完開機後、Debezium connector log 顯示 <code>slot not found</code>、消費停滯；Kafka downstream 訊息斷流。</p>
<p><strong>根因</strong>：logical replication slot 不跨 major version、<code>pg_upgrade</code> 不自動處理 logical slot；upgrade 前沒 drop、新 cluster 上 slot 不存在。</p>
<p><strong>修法</strong>：</p>
<ol>
<li><strong>Pre-upgrade</strong>：列所有 logical replication slot、Debezium 暫停 consumer + drop slot</li>
<li><strong>Upgrade 後重建</strong>：用新 LSN starting position 建 slot、Debezium snapshot.mode=schema_only_recovery 取代 initial（避免重 init load）</li>
<li><strong>架構</strong>：未來考慮用 <em>outbox pattern</em>、CDC 只追 outbox 表、降低 logical slot 重建成本</li>
</ol>
<h3 id="case-5standby-沒同步升replication-斷">Case 5：Standby 沒同步升、replication 斷</h3>
<p><strong>徵兆</strong>：primary 升 pg17 後、standby 仍 pg14、replication 不通；<code>pg_stat_replication</code> 沒 standby connection。</p>
<p><strong>根因</strong>：streaming replication 不跨 major version；standby 必須 <em>先升</em> 或 <em>upgrade 後重 base backup</em>。</p>
<p><strong>修法</strong>：</p>
<p>兩種策略：</p>
<ol>
<li><strong>In-place upgrade standby</strong>：standby 也跑 <code>pg_upgrade</code>、但要先 stop streaming、升完重接（standby 端 archive_command + restore_command 對齊）</li>
<li><strong>Rebuild standby</strong>：upgrade primary 完、standby 跑 <code>pg_basebackup</code> 重建（適合 standby 容量小、network 快）</li>
</ol>
<p>Patroni HA 環境：用 <em>rolling upgrade</em> — 先升 sync standby、failover 過去、再升舊 primary 變新 standby。複雜度高、需要 staging 演練。</p>
<h2 id="capacity--downtime-trade-off">Capacity / downtime trade-off</h2>
<table>
  <thead>
      <tr>
          <th>方法</th>
          <th>Downtime 估算（500GB DB）</th>
          <th>硬體成本</th>
          <th>風險</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td><code>pg_upgrade --link</code></td>
          <td>15-30 分鐘（含 ANALYZE 1st stage）</td>
          <td>同當前</td>
          <td>高（不可逆）</td>
      </tr>
      <tr>
          <td><code>pg_upgrade --clone</code></td>
          <td>1-3 小時</td>
          <td>暫時 2x storage</td>
          <td>中</td>
      </tr>
      <tr>
          <td>Logical replication</td>
          <td>&lt; 1 分鐘 cutover</td>
          <td>暫時 2x compute + storage</td>
          <td>中（複雜）</td>
      </tr>
      <tr>
          <td>Blue-green</td>
          <td>切換瞬間（&lt; 30 秒）</td>
          <td>持續 2x（cutover 後可拆）</td>
          <td>低（cloud managed）</td>
      </tr>
  </tbody>
</table>
<p>實務 default：</p>
<ul>
<li>&lt; 100GB、可接受 30 分鐘 downtime：<code>pg_upgrade --link</code></li>
<li>100GB - 1TB、要求 &lt; 5 分鐘 downtime：logical replication（標準 PostgreSQL）</li>
<li>1TB+ 或 SLA 嚴格：blue-green via Aurora / RDS（cloud managed）</li>
</ul>
<h2 id="整合--下一步">整合 / 下一步</h2>
<h3 id="跟-patroni-ha-整合">跟 <a href="/blog/backend/01-database/vendors/postgresql/patroni-ha/" data-link-title="PostgreSQL Patroni HA：從 leader 失聯到 client 重連的 5 段 failover lifecycle" data-link-desc="Patroni 把 PostgreSQL HA 拆成 detection / election / promotion / reconfiguration / recovery 五段 lifecycle、每段都有獨立配置跟 failure mode；DCS quorum &#43; watchdog 防 split-brain、async/sync replication 取捨、5 個 production 踩雷、跟 PgBouncer / HAProxy / cert-manager 整合">Patroni HA</a> 整合</h3>
<p>HA cluster upgrade 流程：</p>
<ol>
<li>升新 standby（不在 cluster 中、physical / logical replicate 過去）</li>
<li>Promote 新 standby、舊 cluster failover 過去</li>
<li>重建剩餘 standby</li>
</ol>
<p>Patroni 17+ 支援 <a href="/blog/backend/01-database/vendors/postgresql/patroni-ha/" data-link-title="PostgreSQL Patroni HA：從 leader 失聯到 client 重連的 5 段 failover lifecycle" data-link-desc="Patroni 把 PostgreSQL HA 拆成 detection / election / promotion / reconfiguration / recovery 五段 lifecycle、每段都有獨立配置跟 failure mode；DCS quorum &#43; watchdog 防 split-brain、async/sync replication 取捨、5 個 production 踩雷、跟 PgBouncer / HAProxy / cert-manager 整合">logical slot 跨 failover</a> — major version upgrade 期間 logical consumer 影響降低。</p>
<h3 id="跟-monitoring-整合">跟 monitoring 整合</h3>
<p>upgrade 期間特別關注的 metric：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-sql" data-lang="sql"><span class="line"><span class="ln">1</span><span class="cl"><span class="c1">-- Pre-upgrade baseline
</span></span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="c1"></span><span class="k">SELECT</span><span class="w"> </span><span class="n">pg_database_size</span><span class="p">(</span><span class="s1">&#39;myapp&#39;</span><span class="p">),</span><span class="w"> </span><span class="k">version</span><span class="p">();</span><span class="w">
</span></span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="w"></span><span class="c1">-- Post-upgrade verification
</span></span></span><span class="line"><span class="ln">5</span><span class="cl"><span class="c1"></span><span class="k">SELECT</span><span class="w"> </span><span class="n">pg_database_size</span><span class="p">(</span><span class="s1">&#39;myapp&#39;</span><span class="p">),</span><span class="w"> </span><span class="k">version</span><span class="p">();</span><span class="w">
</span></span></span><span class="line"><span class="ln">6</span><span class="cl"><span class="w"></span><span class="k">SELECT</span><span class="w"> </span><span class="k">count</span><span class="p">(</span><span class="o">*</span><span class="p">)</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">pg_stat_user_tables</span><span class="w"> </span><span class="k">WHERE</span><span class="w"> </span><span class="n">last_analyze</span><span class="w"> </span><span class="k">IS</span><span class="w"> </span><span class="k">NULL</span><span class="p">;</span><span class="w">
</span></span></span><span class="line"><span class="ln">7</span><span class="cl"><span class="w"></span><span class="c1">-- 應該 = 0、若有未 analyze 表、ANALYZE 沒跑完</span></span></span></code></pre></div><p>Prometheus alert 三條：<code>pg_database_size</code> upgrade 後差異 &lt; 1%、<code>pg_stat_replication</code> lag &lt; 10s、<code>pg_query_p99_latency</code> 對 baseline &lt; 1.5x。</p>
<h3 id="下一步議題">下一步議題</h3>
<ul>
<li><strong>Aurora major version upgrade</strong>：blue-green deployment 是 default、流程跟 self-managed 完全不同、見 <a href="/blog/backend/01-database/vendors/postgresql/migrate-to-aurora/" data-link-title="PostgreSQL → Aurora Migration：protocol 相容、operational 重設計" data-link-desc="Aurora 號稱 PostgreSQL-compatible 但 operational model 不同（storage decouple / cluster endpoint / instance class / 自家備份）；遷移流程是混合（protocol drop-in &#43; operational phased）、5 個 production 踩雷（extension 不支援 / replication slot 不直通 / autovacuum 行為差 / IAM 認證強制 / cost model 換算）、跟 Patroni / read replica / DR 對位">PostgreSQL → Aurora migration</a> 對位段</li>
<li><strong>Cross-major version skip upgrade</strong>：pg13 → pg17 跨 4 major、breaking change 累積、建議 <em>逐 major 升</em> 而不是 <em>single hop</em></li>
<li><strong>Extension lifecycle 管理</strong>：自動 audit extension 跟 PG version compatibility、每 quarter 跑 dry-run</li>
</ul>
<h2 id="相關連結">相關連結</h2>
<ul>
<li>上游 vendor 頁：<a href="/blog/backend/01-database/vendors/postgresql/" data-link-title="PostgreSQL" data-link-desc="多用途 OLTP 主流關聯式資料庫、MVCC、豐富 SQL 特性、是 Aurora / Cosmos DB / Spanner / CockroachDB / Aurora DSQL 的相容目標">PostgreSQL</a></li>
<li>平行 deep article：<a href="/blog/backend/01-database/vendors/postgresql/patroni-ha/" data-link-title="PostgreSQL Patroni HA：從 leader 失聯到 client 重連的 5 段 failover lifecycle" data-link-desc="Patroni 把 PostgreSQL HA 拆成 detection / election / promotion / reconfiguration / recovery 五段 lifecycle、每段都有獨立配置跟 failure mode；DCS quorum &#43; watchdog 防 split-brain、async/sync replication 取捨、5 個 production 踩雷、跟 PgBouncer / HAProxy / cert-manager 整合">Patroni HA</a> / <a href="/blog/backend/01-database/vendors/postgresql/pitr-wal-archiving/" data-link-title="PostgreSQL PITR &#43; WAL archiving：從 base backup 到 point-in-time recovery 的完整鏈" data-link-desc="Base backup &#43; WAL archive 構成 PITR 的雙軌資料、archive_command &#43; restore_command 配置、用 pgBackRest / WAL-G 替代手寫腳本、5 個 production 踩雷（archive 靜默失敗 / archive lag / 錯誤 target time / base backup 過期未清 / timeline 分歧 recovery 模糊）、跟 Patroni &#43; monitoring 整合">PITR + WAL Archiving</a> / <a href="/blog/backend/01-database/vendors/postgresql/logical-replication-debezium/" data-link-title="PostgreSQL Logical Replication &#43; Debezium CDC：replication slot × failure × recovery 對照" data-link-desc="PostgreSQL logical replication slot 跟 Debezium CDC 的失效模式對照表：slot lag 撐爆 primary disk / schema change 斷流 / 初始 COPY 鎖表 / zombie slot 不釋放 / replay storm 後 offset reset；publication / subscription / pgoutput 配置、跟 Kafka outbox pattern 整合">Logical Replication + Debezium</a></li>
<li>對位 migration：<a href="/blog/backend/01-database/vendors/postgresql/migrate-to-aurora/" data-link-title="PostgreSQL → Aurora Migration：protocol 相容、operational 重設計" data-link-desc="Aurora 號稱 PostgreSQL-compatible 但 operational model 不同（storage decouple / cluster endpoint / instance class / 自家備份）；遷移流程是混合（protocol drop-in &#43; operational phased）、5 個 production 踩雷（extension 不支援 / replication slot 不直通 / autovacuum 行為差 / IAM 認證強制 / cost model 換算）、跟 Patroni / read replica / DR 對位">PostgreSQL → Aurora</a></li>
<li>Methodology：<a href="/blog/posts/vendor-%E6%B7%B1%E5%BA%A6%E6%8A%80%E8%A1%93%E6%96%87%E7%AB%A0%E6%96%B9%E6%B3%95%E8%AB%96%E7%9A%84%E6%BC%94%E5%8C%96%E7%B4%80%E9%8C%84%E5%90%8C-vendor-%E7%B3%BB%E5%88%97%E7%9A%84%E9%96%8B%E5%A0%B4%E8%BC%AA%E6%9B%BF%E9%A9%97%E8%AD%89/" data-link-title="Vendor 深度技術文章方法論的演化紀錄：同 vendor 系列的開場輪替驗證" data-link-desc="vendor overview 飽和後要寫單一功能深度文章、需要選題與結構依據時回來。這套方法論的驗證來源與 cadence variant 在高風險場景（同 vendor sub-tool 系列）的實證。">Vendor 深度技術文章的寫作方法論</a> / <a href="/blog/posts/migration-playbook-%E6%96%B9%E6%B3%95%E8%AB%96%E7%9A%84%E6%BC%94%E5%8C%96%E7%B4%80%E9%8C%84stage-0-variant-%E8%A6%8F%E5%8A%83%E6%8A%8A-collapse-%E7%8E%87%E5%BE%9E-60-%E9%99%8D%E5%88%B0-0/" data-link-title="Migration Playbook 方法論的演化紀錄：Stage 0 variant 規劃把 collapse 率從 60% 降到 0%" data-link-desc="跨 vendor migration playbook 需要獨立寫作方法論的依據，以及這套方法論從三輪 batch dogfood 中演化出來的驗證證據。">Migration playbook methodology</a>（本文驗證 <em>漏類</em>）</li>
</ul>
]]></content:encoded></item></channel></rss>