<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/"><channel><title>Schema-Migration on Tarragon</title><link>https://tarrragon.github.io/blog/tags/schema-migration/</link><description>Recent content in Schema-Migration on Tarragon</description><generator>Hugo -- gohugo.io</generator><language>zh-TW</language><copyright>Tarragon (CC BY 4.0)</copyright><lastBuildDate>Wed, 27 May 2026 00:00:00 +0000</lastBuildDate><atom:link href="https://tarrragon.github.io/blog/tags/schema-migration/index.xml" rel="self" type="application/rss+xml"/><item><title>MySQL Online Schema Change：gh-ost 跟 pt-online-schema-change 兩條完全不同的 ghost table 路徑</title><link>https://tarrragon.github.io/blog/backend/01-database/vendors/mysql/online-schema-change-tools/</link><pubDate>Tue, 19 May 2026 00:00:00 +0000</pubDate><guid>https://tarrragon.github.io/blog/backend/01-database/vendors/mysql/online-schema-change-tools/</guid><description>&lt;blockquote>
&lt;p>本文是 &lt;a href="https://tarrragon.github.io/blog/backend/01-database/vendors/mysql/" data-link-title="MySQL" data-link-desc="高併發網路服務常用關聯式資料庫、Vitess / PlanetScale 分片生態、GitHub / Shopify / Facebook 規模驗證">MySQL&lt;/a> overview 的 implementation-layer deep article。Overview 已說明 MySQL 在 OLTP 譜系的定位、本文聚焦 &lt;em>online schema change&lt;/em> — gh-ost 跟 pt-online-schema-change 兩條工具路徑的機制對比。&lt;/p>&lt;/blockquote>
&lt;hr>
&lt;table>
 &lt;thead>
 &lt;tr>
 &lt;th>機制&lt;/th>
 &lt;th>pt-online-schema-change（Percona）&lt;/th>
 &lt;th>gh-ost（GitHub）&lt;/th>
 &lt;/tr>
 &lt;/thead>
 &lt;tbody>
 &lt;tr>
 &lt;td>同步機制&lt;/td>
 &lt;td>&lt;strong>MySQL trigger&lt;/strong>（原表 INSERT/UPDATE/DELETE 觸發寫 ghost）&lt;/td>
 &lt;td>&lt;strong>Binlog stream&lt;/strong>（讀 primary binlog 寫 ghost）&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>Primary 寫入 overhead&lt;/td>
 &lt;td>trigger 觸發成本（同 transaction 內）&lt;/td>
 &lt;td>0（binlog 已存在）&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>Replica lag 影響&lt;/td>
 &lt;td>trigger 在 primary 跑、replica 自然 lag&lt;/td>
 &lt;td>從 replica 讀 binlog、可主動 throttle&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>Foreign key&lt;/td>
 &lt;td>部分支援（drop/recreate strategy）&lt;/td>
 &lt;td>不支援（必須先 drop FK）&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>Roll back（過程中）&lt;/td>
 &lt;td>困難（trigger 已建、要清乾淨）&lt;/td>
 &lt;td>容易（drop ghost table 即可）&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>暫停 / resume&lt;/td>
 &lt;td>不支援&lt;/td>
 &lt;td>支援（gh-ost interactive command）&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>切換時 lock 持續&lt;/td>
 &lt;td>rename 期間 metadata lock（毫秒級）&lt;/td>
 &lt;td>rename 期間 metadata lock（毫秒級）&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>工具 binary&lt;/td>
 &lt;td>Perl 腳本（Percona Toolkit）&lt;/td>
 &lt;td>Go binary（單一可執行檔）&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>推出年份&lt;/td>
 &lt;td>2011&lt;/td>
 &lt;td>2016&lt;/td>
 &lt;/tr>
 &lt;/tbody>
&lt;/table>
&lt;p>兩工具最終結果一樣（ghost table 取代原表）、但 &lt;em>過程中對 production 的影響非常不同&lt;/em>。選哪個取決於：trigger overhead 可不可接受、是否有 foreign key、是否需要 resume/throttle 能力、團隊熟悉哪條工具鏈。&lt;/p>
&lt;h2 id="為什麼-alter-table-需要-online-path">為什麼 ALTER TABLE 需要 online path&lt;/h2>
&lt;p>MySQL 8.0 之前的 &lt;code>ALTER TABLE&lt;/code> 多數情況下 &lt;em>rebuild 整張表&lt;/em> — 過程中 &lt;em>primary key 之外的 read/write 都 block&lt;/em>。100 GB 表 ALTER 跑 hours、production write 全部失敗。&lt;/p>
&lt;p>MySQL 8.0 加 &lt;em>Instant DDL&lt;/em>（部分 ALTER 不 rebuild、只改 metadata、毫秒級完成）、但 &lt;em>能用 instant 的 ALTER 是 subset&lt;/em>：&lt;/p>
&lt;ul>
&lt;li>支援：ADD COLUMN（末尾）、DROP COLUMN（部分情境）、RENAME COLUMN&lt;/li>
&lt;li>不支援：ADD INDEX、CHANGE COLUMN type、ADD/DROP PRIMARY KEY、ADD FOREIGN KEY&lt;/li>
&lt;/ul>
&lt;p>不支援 instant 的場景仍要走 ghost table。Percona 跟 GitHub 各自從 production 痛點出發、產出 pt-osc（2011）跟 gh-ost（2016）。&lt;/p>
&lt;h2 id="pt-online-schema-change用-trigger-同步寫入">pt-online-schema-change：用 trigger 同步寫入&lt;/h2>
&lt;p>pt-osc 流程：&lt;/p>
&lt;ol>
&lt;li>CREATE ghost table（跟原表同 schema + 你要的 ALTER）&lt;/li>
&lt;li>在原表上 &lt;em>建 3 個 trigger&lt;/em>：INSERT / UPDATE / DELETE&lt;/li>
&lt;li>任何寫入原表的 transaction &lt;em>同時觸發 trigger&lt;/em> 寫對應 ghost&lt;/li>
&lt;li>背景 chunk-by-chunk copy 既有 row 到 ghost&lt;/li>
&lt;li>全部 copy 完後 &lt;code>RENAME TABLE&lt;/code>：原表 → archive、ghost → 原表名（atomic、metadata lock 毫秒級）&lt;/li>
&lt;li>Drop trigger、drop archive&lt;/li>
&lt;/ol>
&lt;p>&lt;strong>Trade-off&lt;/strong>：&lt;/p></description><content:encoded><![CDATA[<blockquote>
<p>本文是 <a href="/blog/backend/01-database/vendors/mysql/" data-link-title="MySQL" data-link-desc="高併發網路服務常用關聯式資料庫、Vitess / PlanetScale 分片生態、GitHub / Shopify / Facebook 規模驗證">MySQL</a> overview 的 implementation-layer deep article。Overview 已說明 MySQL 在 OLTP 譜系的定位、本文聚焦 <em>online schema change</em> — gh-ost 跟 pt-online-schema-change 兩條工具路徑的機制對比。</p></blockquote>
<hr>
<table>
  <thead>
      <tr>
          <th>機制</th>
          <th>pt-online-schema-change（Percona）</th>
          <th>gh-ost（GitHub）</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>同步機制</td>
          <td><strong>MySQL trigger</strong>（原表 INSERT/UPDATE/DELETE 觸發寫 ghost）</td>
          <td><strong>Binlog stream</strong>（讀 primary binlog 寫 ghost）</td>
      </tr>
      <tr>
          <td>Primary 寫入 overhead</td>
          <td>trigger 觸發成本（同 transaction 內）</td>
          <td>0（binlog 已存在）</td>
      </tr>
      <tr>
          <td>Replica lag 影響</td>
          <td>trigger 在 primary 跑、replica 自然 lag</td>
          <td>從 replica 讀 binlog、可主動 throttle</td>
      </tr>
      <tr>
          <td>Foreign key</td>
          <td>部分支援（drop/recreate strategy）</td>
          <td>不支援（必須先 drop FK）</td>
      </tr>
      <tr>
          <td>Roll back（過程中）</td>
          <td>困難（trigger 已建、要清乾淨）</td>
          <td>容易（drop ghost table 即可）</td>
      </tr>
      <tr>
          <td>暫停 / resume</td>
          <td>不支援</td>
          <td>支援（gh-ost interactive command）</td>
      </tr>
      <tr>
          <td>切換時 lock 持續</td>
          <td>rename 期間 metadata lock（毫秒級）</td>
          <td>rename 期間 metadata lock（毫秒級）</td>
      </tr>
      <tr>
          <td>工具 binary</td>
          <td>Perl 腳本（Percona Toolkit）</td>
          <td>Go binary（單一可執行檔）</td>
      </tr>
      <tr>
          <td>推出年份</td>
          <td>2011</td>
          <td>2016</td>
      </tr>
  </tbody>
</table>
<p>兩工具最終結果一樣（ghost table 取代原表）、但 <em>過程中對 production 的影響非常不同</em>。選哪個取決於：trigger overhead 可不可接受、是否有 foreign key、是否需要 resume/throttle 能力、團隊熟悉哪條工具鏈。</p>
<h2 id="為什麼-alter-table-需要-online-path">為什麼 ALTER TABLE 需要 online path</h2>
<p>MySQL 8.0 之前的 <code>ALTER TABLE</code> 多數情況下 <em>rebuild 整張表</em> — 過程中 <em>primary key 之外的 read/write 都 block</em>。100 GB 表 ALTER 跑 hours、production write 全部失敗。</p>
<p>MySQL 8.0 加 <em>Instant DDL</em>（部分 ALTER 不 rebuild、只改 metadata、毫秒級完成）、但 <em>能用 instant 的 ALTER 是 subset</em>：</p>
<ul>
<li>支援：ADD COLUMN（末尾）、DROP COLUMN（部分情境）、RENAME COLUMN</li>
<li>不支援：ADD INDEX、CHANGE COLUMN type、ADD/DROP PRIMARY KEY、ADD FOREIGN KEY</li>
</ul>
<p>不支援 instant 的場景仍要走 ghost table。Percona 跟 GitHub 各自從 production 痛點出發、產出 pt-osc（2011）跟 gh-ost（2016）。</p>
<h2 id="pt-online-schema-change用-trigger-同步寫入">pt-online-schema-change：用 trigger 同步寫入</h2>
<p>pt-osc 流程：</p>
<ol>
<li>CREATE ghost table（跟原表同 schema + 你要的 ALTER）</li>
<li>在原表上 <em>建 3 個 trigger</em>：INSERT / UPDATE / DELETE</li>
<li>任何寫入原表的 transaction <em>同時觸發 trigger</em> 寫對應 ghost</li>
<li>背景 chunk-by-chunk copy 既有 row 到 ghost</li>
<li>全部 copy 完後 <code>RENAME TABLE</code>：原表 → archive、ghost → 原表名（atomic、metadata lock 毫秒級）</li>
<li>Drop trigger、drop archive</li>
</ol>
<p><strong>Trade-off</strong>：</p>
<ul>
<li><em>寫入 overhead</em>：每個 primary 寫入 transaction 都多一次 trigger 執行、寫吞吐降 10-30%</li>
<li><em>Replica lag</em>：trigger 跟原寫入同 transaction、replica 上每個 row 也跑 trigger、replica lag 可能暴增（缺少主動 throttle）</li>
<li><em>Roll back 困難</em>：tool 跑到一半失敗、trigger 已建、要手動清掉才能 retry</li>
<li><em>FK 處理</em>：原表有 FK 指向時、ghost table 要先 drop FK 再 recreate、操作複雜</li>
</ul>
<p><strong>適用</strong>：</p>
<ul>
<li>寫吞吐 &lt; 50% capacity（有 buffer 撐 trigger overhead）</li>
<li>無 FK 或 FK 簡單</li>
<li>沒有 replica lag 敏感的 read（trigger 在 replica 也跑）</li>
</ul>
<p><strong>不適用</strong>：</p>
<ul>
<li>高寫吞吐（&gt; 80% capacity）— trigger overhead 直接 saturate</li>
<li>大量 FK 結構</li>
<li>需要 throttle / pause / resume</li>
</ul>
<h2 id="gh-ost用-binlog-stream-同步寫入">gh-ost：用 binlog stream 同步寫入</h2>
<p>gh-ost 流程：</p>
<ol>
<li>CREATE ghost table</li>
<li><em>從 replica 讀 binlog</em>（不在 primary 加 trigger）</li>
<li>同步 <em>primary 上的寫入</em> 透過 binlog event 寫到 ghost</li>
<li>背景 chunk-by-chunk copy 既有 row 到 ghost</li>
<li>全部 copy 完後 swap：<code>RENAME TABLE</code></li>
<li>Drop archive</li>
</ol>
<p><strong>Trade-off</strong>：</p>
<ul>
<li><em>寫入 overhead</em>：0（binlog 已經寫了、gh-ost 只是 consumer）</li>
<li><em>Replica lag 影響</em>：gh-ost 可監測 replica lag、超過 threshold 自動 throttle copy（不影響 primary 寫入）</li>
<li><em>Roll back 容易</em>：取消時直接 drop ghost table、原表完全沒被改動</li>
<li><em>FK 不支援</em>：gh-ost 設計上不處理 FK、有 FK 必須先 drop / restructure</li>
</ul>
<p><strong>適用</strong>：</p>
<ul>
<li>高寫吞吐 production（trigger overhead 不可接受）</li>
<li>需要 throttle / pause / resume（gh-ost interactive command 可動態調 chunk size、cut-over 時點）</li>
<li>已用 GitHub-flavored MySQL operations workflow</li>
</ul>
<p><strong>不適用</strong>：</p>
<ul>
<li>有複雜 FK 結構、不想動 schema</li>
<li>Replica 跑不了 binlog（極少數場景）</li>
</ul>
<h2 id="配置-step-by-stepgh-ost">配置 step-by-step（gh-ost）</h2>
<p>實務 production 多用 gh-ost（GitHub / Slack / Booking.com 等）。pt-osc 用於有 FK 或舊系統。</p>
<h3 id="gh-ost-一個-alter-命令">gh-ost 一個 ALTER 命令</h3>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln"> 1</span><span class="cl">gh-ost <span class="se">\
</span></span></span><span class="line"><span class="ln"> 2</span><span class="cl"><span class="se"></span>  --host<span class="o">=</span>replica.example.com <span class="se">\ </span>          <span class="c1"># 從 replica 讀 binlog</span>
</span></span><span class="line"><span class="ln"> 3</span><span class="cl">  --user<span class="o">=</span>ghost <span class="se">\
</span></span></span><span class="line"><span class="ln"> 4</span><span class="cl"><span class="se"></span>  --password<span class="o">=</span>... <span class="se">\
</span></span></span><span class="line"><span class="ln"> 5</span><span class="cl"><span class="se"></span>  --database<span class="o">=</span>production <span class="se">\
</span></span></span><span class="line"><span class="ln"> 6</span><span class="cl"><span class="se"></span>  --table<span class="o">=</span>orders <span class="se">\
</span></span></span><span class="line"><span class="ln"> 7</span><span class="cl"><span class="se"></span>  --alter<span class="o">=</span><span class="s1">&#39;ADD COLUMN status VARCHAR(20) DEFAULT NULL, ADD INDEX idx_status (status)&#39;</span> <span class="se">\
</span></span></span><span class="line"><span class="ln"> 8</span><span class="cl"><span class="se"></span>  --allow-on-master<span class="o">=</span><span class="nb">false</span> <span class="se">\ </span>             <span class="c1"># 不直接連 primary 讀 binlog</span>
</span></span><span class="line"><span class="ln"> 9</span><span class="cl">  --chunk-size<span class="o">=</span><span class="m">1000</span> <span class="se">\ </span>                   <span class="c1"># 每批 copy 1000 row</span>
</span></span><span class="line"><span class="ln">10</span><span class="cl">  --max-load<span class="o">=</span><span class="s1">&#39;Threads_running=50&#39;</span> <span class="se">\ </span>     <span class="c1"># primary load 限制</span>
</span></span><span class="line"><span class="ln">11</span><span class="cl">  --critical-load<span class="o">=</span><span class="s1">&#39;Threads_running=200&#39;</span> <span class="se">\ </span><span class="c1"># 超過直接 abort</span>
</span></span><span class="line"><span class="ln">12</span><span class="cl">  --max-lag-millis<span class="o">=</span><span class="m">1500</span> <span class="se">\ </span>               <span class="c1"># replica lag 限制</span>
</span></span><span class="line"><span class="ln">13</span><span class="cl">  --throttle-additional-flag-file<span class="o">=</span>/tmp/throttle <span class="se">\ </span> <span class="c1"># touch 此檔 throttle</span>
</span></span><span class="line"><span class="ln">14</span><span class="cl">  --postpone-cut-over-flag-file<span class="o">=</span>/tmp/postpone <span class="se">\ </span>   <span class="c1"># touch 此檔延後 cut-over</span>
</span></span><span class="line"><span class="ln">15</span><span class="cl">  --execute                              <span class="c1"># 真的執行（沒這個只 dry-run）</span></span></span></code></pre></div><h3 id="interactive-commandgh-ost-跑起來後">Interactive command（gh-ost 跑起來後）</h3>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl"><span class="c1"># 連 gh-ost socket（同 directory）</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="nb">echo</span> <span class="s2">&#34;status&#34;</span> <span class="p">|</span> nc -U /tmp/gh-ost.production.orders.sock
</span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="c1"># 動態調 chunk size</span>
</span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="nb">echo</span> <span class="s2">&#34;chunk-size=500&#34;</span> <span class="p">|</span> nc -U /tmp/gh-ost.production.orders.sock
</span></span><span class="line"><span class="ln">5</span><span class="cl"><span class="c1"># 立即觸發 cut-over（不再等）</span>
</span></span><span class="line"><span class="ln">6</span><span class="cl"><span class="nb">echo</span> <span class="s2">&#34;unpostpone&#34;</span> <span class="p">|</span> nc -U /tmp/gh-ost.production.orders.sock
</span></span><span class="line"><span class="ln">7</span><span class="cl"><span class="c1"># Abort 並 drop ghost</span>
</span></span><span class="line"><span class="ln">8</span><span class="cl"><span class="nb">echo</span> <span class="s2">&#34;panic&#34;</span> <span class="p">|</span> nc -U /tmp/gh-ost.production.orders.sock</span></span></code></pre></div><h2 id="配置-step-by-steppt-osc">配置 step-by-step（pt-osc）</h2>
<p>對比 gh-ost 的 binlog reader、pt-osc 命令更短但配置義務同樣多：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln"> 1</span><span class="cl">pt-online-schema-change <span class="se">\
</span></span></span><span class="line"><span class="ln"> 2</span><span class="cl"><span class="se"></span>  --host<span class="o">=</span>primary.example.com <span class="se">\
</span></span></span><span class="line"><span class="ln"> 3</span><span class="cl"><span class="se"></span>  --user<span class="o">=</span>ghost <span class="se">\
</span></span></span><span class="line"><span class="ln"> 4</span><span class="cl"><span class="se"></span>  --password<span class="o">=</span>... <span class="se">\
</span></span></span><span class="line"><span class="ln"> 5</span><span class="cl"><span class="se"></span>  --alter<span class="o">=</span><span class="s1">&#39;ADD COLUMN status VARCHAR(20) DEFAULT NULL, ADD INDEX idx_status (status)&#39;</span> <span class="se">\
</span></span></span><span class="line"><span class="ln"> 6</span><span class="cl"><span class="se"></span>  <span class="nv">D</span><span class="o">=</span>production,t<span class="o">=</span>orders <span class="se">\
</span></span></span><span class="line"><span class="ln"> 7</span><span class="cl"><span class="se"></span>  --chunk-size<span class="o">=</span><span class="m">1000</span> <span class="se">\
</span></span></span><span class="line"><span class="ln"> 8</span><span class="cl"><span class="se"></span>  --max-load<span class="o">=</span><span class="s1">&#39;Threads_running=50&#39;</span> <span class="se">\
</span></span></span><span class="line"><span class="ln"> 9</span><span class="cl"><span class="se"></span>  --critical-load<span class="o">=</span><span class="s1">&#39;Threads_running=200&#39;</span> <span class="se">\
</span></span></span><span class="line"><span class="ln">10</span><span class="cl"><span class="se"></span>  --max-lag<span class="o">=</span>1.5 <span class="se">\
</span></span></span><span class="line"><span class="ln">11</span><span class="cl"><span class="se"></span>  --check-replication-filters <span class="se">\ </span>          <span class="c1"># 防 binlog filter 漏 trigger</span>
</span></span><span class="line"><span class="ln">12</span><span class="cl">  --alter-foreign-keys-method<span class="o">=</span>auto <span class="se">\ </span>     <span class="c1"># auto / rebuild_constraints / drop_swap / none</span>
</span></span><span class="line"><span class="ln">13</span><span class="cl">  --execute</span></span></code></pre></div><p><code>--alter-foreign-keys-method</code> 是 pt-osc 對 FK 處理的策略選項、四種選擇對 production 影響非常不同（rebuild 重建 FK / drop_swap 用更快但少了 atomic、none 是不處理）。</p>
<h2 id="5-個-production-踩雷">5 個 Production 踩雷</h2>
<h3 id="1-pt-osc-trigger-overhead-不可預期">1. pt-osc trigger overhead 不可預期</h3>
<p><code>--max-load='Threads_running=50'</code> 看起來保護了 server、但 trigger 在 transaction 內、production 的 <em>每個寫入</em> 都加 trigger 開銷。<code>Threads_running</code> 是 <em>當下</em> 數字、看不到 trigger 累積 latency。常見場景：高峰時段下 pt-osc、預期 30% overhead、實際 60%、p99 飆 5x。</p>
<p>修法：</p>
<ul>
<li>高峰時段不跑 pt-osc、排 off-peak window</li>
<li>用 <em>staging environment</em> 跑 production-like load 預估 trigger overhead</li>
<li>對寫吞吐 &gt; 50% capacity 的 server 改用 gh-ost</li>
</ul>
<h3 id="2-gh-ost-binlog-lag-跟-primary-寫入率追不上">2. gh-ost binlog lag 跟 primary 寫入率追不上</h3>
<p>gh-ost 從 replica 讀 binlog、binlog event 進來速度有上限。如果 <em>primary 寫入率超過 gh-ost binlog consume 速度</em>（每秒幾千 transaction 對某些 server 已是 ceiling）、gh-ost 永遠追不上、cut-over 會長時間卡住。</p>
<p>修法：</p>
<ul>
<li>gh-ost 預設用 <em>replica binlog</em>、改用 <code>--allow-on-master</code> 直接從 primary 讀（如果 primary 容量夠）</li>
<li>提高 <code>--chunk-size</code> 加快 copy（同時用 <code>--max-load</code> 防過載）</li>
<li>真的追不上、考慮 <em>暫停部分寫入流量</em>（throttle traffic，而非 throttle tool）</li>
</ul>
<h3 id="3-foreign-key-constraint--兩工具都尷尬">3. Foreign key constraint — 兩工具都尷尬</h3>
<p>原表有 FK 指向（其他 table FK references 這張表）、ghost table 切換時 <em>新 ghost 沒有那些 FK 指向</em>。Cut-over 一瞬間、FK 從指向「原表」變成指向「archive 表」、外部 constraint 失效。</p>
<p>修法（pt-osc）：</p>
<ul>
<li>用 <code>--alter-foreign-keys-method=rebuild_constraints</code>：先 ALTER 外部 table FK 指向 ghost、再 cut-over</li>
<li>或 <code>drop_swap</code>：cut-over 前 drop FK、cut-over 後 recreate（更快但 cut-over 期間 FK 失效）</li>
</ul>
<p>修法（gh-ost）：</p>
<ul>
<li>gh-ost 不支援 — 手動 drop FK / 重 setup FK</li>
<li>或維護 schema 改 FK 結構（FK 改在 application 層 enforce）</li>
</ul>
<h3 id="4-pt-osc-trigger-跟-application-既有-trigger-衝突">4. pt-osc trigger 跟 application 既有 trigger 衝突</h3>
<p>原表上已經有 application 自建 trigger、pt-osc 在原表 <em>再加 3 個 trigger</em>、新舊 trigger 執行順序 MySQL 不保證（多 trigger 同事件按 <em>未定義順序</em>）。Application 行為可能 subtly broken。</p>
<p>修法：</p>
<ul>
<li>跑 pt-osc 前 audit 原表 trigger（<code>SHOW TRIGGERS FROM production LIKE 'orders'</code>）</li>
<li>如果有 application trigger、考慮 <em>暫時 disable 再 ALTER</em> 或改 gh-ost</li>
<li>gh-ost 不在原表加 trigger、不會碰到這個問題</li>
</ul>
<h3 id="5-cut-over-瞬間-deadlock--兩工具都有但表現不同">5. Cut-over 瞬間 deadlock — 兩工具都有但表現不同</h3>
<p>Cut-over 用 <code>RENAME TABLE original TO archive, ghost TO original</code>（atomic operation）。但 cut-over 瞬間需要 <em>metadata lock</em>、跟 <em>進行中的 long-running transaction</em> 衝突會 wait。Long-running transaction 持續、cut-over 永遠 wait、最後 timeout 失敗。</p>
<p>修法（gh-ost）：</p>
<ul>
<li><code>--cut-over-lock-timeout-seconds=3</code>、超時 abort、稍後 retry</li>
<li><code>--postpone-cut-over-flag-file</code>：先把 copy 跑完、等流量空檔再觸發 cut-over</li>
</ul>
<p>修法（pt-osc）：</p>
<ul>
<li><code>--set-vars=&quot;lock_wait_timeout=60&quot;</code>、cut-over 等更久（風險：long transaction 撐住更久 server 更多 lock wait）</li>
<li>或排在 long transaction 已知不會跑的時段（nightly backup 後）</li>
</ul>
<h2 id="容量--時間估算">容量 / 時間估算</h2>
<p>對 100 GB 表、ALTER 加 column + 加 index 為例：</p>
<table>
  <thead>
      <tr>
          <th>維度</th>
          <th>pt-osc</th>
          <th>gh-ost</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>估算總時間</td>
          <td>6-12 小時（依 chunk size + load）</td>
          <td>5-10 小時（同上、可動態調整）</td>
      </tr>
      <tr>
          <td>寫吞吐影響</td>
          <td>-10% ~ -30%（trigger overhead）</td>
          <td>&lt; 5%（binlog 已存在）</td>
      </tr>
      <tr>
          <td>Replica lag</td>
          <td>1-10 秒（trigger 在 replica 跑）</td>
          <td>自動 throttle 在 threshold 內</td>
      </tr>
      <tr>
          <td>Disk 額外需求</td>
          <td>~原表大小 + index（ghost 用）</td>
          <td>同左</td>
      </tr>
      <tr>
          <td>Rollback 成本</td>
          <td>中（清 trigger）</td>
          <td>低（drop ghost）</td>
      </tr>
  </tbody>
</table>
<p>兩工具總時間接近、<em>影響 production 的差異大</em>。</p>
<h2 id="跟其他模組整合">跟其他模組整合</h2>
<h3 id="跟-gtid--replication-topology">跟 GTID / Replication topology</h3>
<p>兩工具都 <em>依賴 replication</em> — pt-osc 透過 trigger 確保 replica 同步、gh-ost 直接從 replica 讀 binlog。Pre-requisite：</p>
<ul>
<li>Binlog <code>ROW</code> format（兩工具都要）</li>
<li>GTID 啟用（gh-ost 更需要、binlog re-pointing 容易）</li>
<li>詳見 <a href="/blog/backend/01-database/vendors/mysql/replication-topology/" data-link-title="MySQL Replication Topology：async / semi-sync / GTID 不是三選一、是三個 trade-off 軸的疊加" data-link-desc="MySQL replication 不是「選 async 還是 semi-sync」、是 *durability / latency / consistency* 三個 trade-off 軸的疊加；GTID 是跨 mode 的 infrastructure layer、不是第三種 mode。本文走 3 軸取捨模型 → async / semi-sync 行為對比 → GTID 替代 binlog-position 的好處 → 配置 step-by-step → 5 production 踩雷（lag 暴衝 / semi-sync 退回 async / GTID gap / Loss-Less semi-sync 真的 loss-less / chained replication 雪崩）→ 跟 Aurora MySQL / Vitess / ProxySQL / Orchestrator 整合">Replication Topology</a></li>
</ul>
<h3 id="跟-vitess">跟 Vitess</h3>
<p>Vitess 有自己的 <em>VReplication-based online DDL</em>、不用 gh-ost 或 pt-osc。Vitess online DDL 在 shard 內部用類似 gh-ost 的 binlog stream 機制、但有 Vitess-aware schema management。詳見 <em>Vitess sharding 設計</em> 篇（待寫）。</p>
<h3 id="跟-aurora-mysql">跟 Aurora MySQL</h3>
<p>Aurora MySQL 仍支援 gh-ost / pt-osc、但 <em>Aurora 自己的 fast DDL</em>（部分 ALTER） 比 8.0 Instant DDL 更廣。先檢查 Aurora 文件、能用 native fast DDL 就不用 ghost table tool。詳見 <a href="/blog/backend/01-database/vendors/aurora/" data-link-title="AWS Aurora" data-link-desc="AWS managed PostgreSQL / MySQL、storage / compute 分離、&#43;75% 效能改善的 production 證據">Aurora vendor page</a>。</p>
<h3 id="跟-planetscale">跟 PlanetScale</h3>
<p>PlanetScale（managed Vitess）走 <em>branch-based schema migration</em> — 建 schema branch、跑 schema change、deploy 時 atomic merge。schema change 由 PlanetScale 內建流程承擔。詳見 <a href="/blog/backend/01-database/vendors/mysql/migrate-to-planetscale/" data-link-title="MySQL → PlanetScale：managed Vitess &#43; branch-based schema workflow 的 hybrid shift" data-link-desc="自管 MySQL → PlanetScale 加上 Vitess sharding 跟 branch-based schema workflow。本文走 6 維 audit（Paradigm &#43; Operational &#43; Schema 多軸）、4-phase migration、5 production 踩雷、何時不要遷。">PlanetScale migration playbook</a>。</p>
<h2 id="production-casegh-ost-operation-workflow">Production case：gh-ost operation workflow</h2>
<p>Online schema change 的 production 責任是把大表 DDL 拆成可暫停、可節流、可切換的資料搬移流程。gh-ost 作為 GitHub 開源工具，把 schema change 轉成 ghost table copy、binlog tailing 與 controlled cutover；這讓 operator 可以在 replica lag、application load 或部署窗口變化時調整速度。</p>
<p>這個案例要回收到三個操作判準。第一，throttle 指標要接 production SLO，例如 replica lag、thread running、application latency 或錯誤率，而非只看 copy rows/sec。第二，pause / resume 是變更治理能力，代表 schema change 可以配合 incident response、deploy freeze 與商業尖峰窗口。第三，cutover 要設 rollback window 與 owner，因為 rename table 的瞬間仍是高風險控制點。</p>
<p>gh-ost workflow 的 sibling 路由是 <a href="/blog/backend/01-database/vendors/postgresql/online-schema-change/" data-link-title="PostgreSQL Online Schema Change：先用 ALTER 內建特性、不能解才 pg_repack / pg-osc" data-link-desc="PostgreSQL ALTER TABLE 對多數變更已是 *fast catalog-only*（add column nullable / drop column / 改 default），不必走 ghost table tool。本文走 PG 內建 fast DDL 行為、何時必須走 pg_repack / pg-osc、兩工具機制對比（trigger-based vs WAL-shipping）、配置 step-by-step、5 production 踩雷（lock 升級 / VACUUM FULL 誤用 / pg_repack version mismatch / concurrent index 失敗清理 / generated stored column 不能 online）、跟 MySQL gh-ost / pt-osc sibling 對比">PostgreSQL Online Schema Change</a>。PostgreSQL 常靠 fast ALTER、MVCC 與 extension 工具解決同類需求；MySQL 的 ghost table tool 更常成為標準路徑，主因是大表 DDL、metadata lock 與 replication event 的組合壓力不同。</p>
<h2 id="何時用哪一個">何時用哪一個</h2>
<table>
  <thead>
      <tr>
          <th>情境</th>
          <th>選擇</th>
          <th>原因</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>標準 production write &lt; 50% capacity</td>
          <td>gh-ost（預設）</td>
          <td>寫入 overhead 0、控制更細</td>
      </tr>
      <tr>
          <td>高寫吞吐 (&gt; 80% capacity)</td>
          <td>gh-ost（必須）</td>
          <td>pt-osc trigger overhead 直接 OOM</td>
      </tr>
      <tr>
          <td>有 FK constraint 需要保留</td>
          <td>pt-osc</td>
          <td>gh-ost 不處理 FK</td>
      </tr>
      <tr>
          <td>有 application-side trigger 在原表</td>
          <td>gh-ost</td>
          <td>pt-osc trigger 跟既有 trigger 不可預期</td>
      </tr>
      <tr>
          <td>需要 pause / resume 能力</td>
          <td>gh-ost</td>
          <td>pt-osc 不支援</td>
      </tr>
      <tr>
          <td>已用 Percona Toolkit 整套（pt-table-checksum / pt-archiver）</td>
          <td>pt-osc</td>
          <td>工具鏈一致</td>
      </tr>
      <tr>
          <td>已用 Vitess</td>
          <td>Vitess online DDL</td>
          <td>維持 Vitess schema workflow</td>
      </tr>
      <tr>
          <td>已用 PlanetScale</td>
          <td>branch-based</td>
          <td>維持 PlanetScale schema workflow</td>
      </tr>
      <tr>
          <td>已用 Aurora MySQL + native fast DDL OK</td>
          <td>不用 ghost table</td>
          <td>直接 ALTER</td>
      </tr>
  </tbody>
</table>
<h2 id="相關連結">相關連結</h2>
<ul>
<li><a href="/blog/backend/01-database/vendors/mysql/" data-link-title="MySQL" data-link-desc="高併發網路服務常用關聯式資料庫、Vitess / PlanetScale 分片生態、GitHub / Shopify / Facebook 規模驗證">MySQL vendor overview</a></li>
<li><a href="/blog/backend/01-database/vendors/mysql/replication-topology/" data-link-title="MySQL Replication Topology：async / semi-sync / GTID 不是三選一、是三個 trade-off 軸的疊加" data-link-desc="MySQL replication 不是「選 async 還是 semi-sync」、是 *durability / latency / consistency* 三個 trade-off 軸的疊加；GTID 是跨 mode 的 infrastructure layer、不是第三種 mode。本文走 3 軸取捨模型 → async / semi-sync 行為對比 → GTID 替代 binlog-position 的好處 → 配置 step-by-step → 5 production 踩雷（lag 暴衝 / semi-sync 退回 async / GTID gap / Loss-Less semi-sync 真的 loss-less / chained replication 雪崩）→ 跟 Aurora MySQL / Vitess / ProxySQL / Orchestrator 整合">MySQL Replication Topology</a>（binlog ROW format + GTID 是 pre-requisite）</li>
<li><a href="/blog/backend/01-database/vendors/postgresql/online-schema-change/" data-link-title="PostgreSQL Online Schema Change：先用 ALTER 內建特性、不能解才 pg_repack / pg-osc" data-link-desc="PostgreSQL ALTER TABLE 對多數變更已是 *fast catalog-only*（add column nullable / drop column / 改 default），不必走 ghost table tool。本文走 PG 內建 fast DDL 行為、何時必須走 pg_repack / pg-osc、兩工具機制對比（trigger-based vs WAL-shipping）、配置 step-by-step、5 production 踩雷（lock 升級 / VACUUM FULL 誤用 / pg_repack version mismatch / concurrent index 失敗清理 / generated stored column 不能 online）、跟 MySQL gh-ost / pt-osc sibling 對比">PostgreSQL Online Schema Change</a>（PG sibling、為什麼 PG 比 MySQL 少用 ghost table — fast ALTER 覆蓋多數變更）</li>
<li><a href="/blog/backend/01-database/vendors/aurora/" data-link-title="AWS Aurora" data-link-desc="AWS managed PostgreSQL / MySQL、storage / compute 分離、&#43;75% 效能改善的 production 證據">Aurora vendor page</a>（managed MySQL fast DDL）</li>
<li><a href="https://planetscale.com/">PlanetScale</a>（branch-based 不用 ghost table）</li>
<li><a href="/blog/backend/01-database/database-migration-playbook/" data-link-title="1.6 資料庫轉換實作：雙寫、回填、切流與回滾" data-link-desc="同 DB 內 schema 演進與資料變更的可分段驗證流程、跟 1.12 cross-DB migration 分工">1.6 Database Migration Playbook</a>（schema migration 治理）</li>
<li><a href="/blog/backend/knowledge-cards/expand-contract/" data-link-title="Expand / Contract" data-link-desc="說明先擴充相容面、再收斂舊路徑的遷移做法">Expand / Contract 卡片</a>（schema migration 設計原則）</li>
<li>官方：<a href="https://github.com/github/gh-ost">gh-ost</a> / <a href="https://docs.percona.com/percona-toolkit/pt-online-schema-change.html">pt-online-schema-change</a></li>
</ul>
]]></content:encoded></item><item><title>PostgreSQL Online Schema Change：先用 ALTER 內建特性、不能解才 pg_repack / pg-osc</title><link>https://tarrragon.github.io/blog/backend/01-database/vendors/postgresql/online-schema-change/</link><pubDate>Tue, 19 May 2026 00:00:00 +0000</pubDate><guid>https://tarrragon.github.io/blog/backend/01-database/vendors/postgresql/online-schema-change/</guid><description>&lt;blockquote>
&lt;p>本文是 &lt;a href="https://tarrragon.github.io/blog/backend/01-database/vendors/postgresql/" data-link-title="PostgreSQL" data-link-desc="多用途 OLTP 主流關聯式資料庫、MVCC、豐富 SQL 特性、是 Aurora / Cosmos DB / Spanner / CockroachDB / Aurora DSQL 的相容目標">PostgreSQL&lt;/a> overview 的 implementation-layer deep article。Overview 已說明 PG 在 OLTP 譜系的定位、本文聚焦 &lt;em>online schema change&lt;/em> — 先看 PG ALTER 哪些已 fast catalog-only、再看 pg_repack / pg-osc 何時必要。&lt;/p>&lt;/blockquote>
&lt;hr>
&lt;p>跟 MySQL 不同：PG 大量 schema change &lt;em>內建&lt;/em> fast catalog-only 行為、不必走 ghost table tool。MySQL 對應的 gh-ost / pt-online-schema-change 之於 PG 是 &lt;em>少數場景才需要的 escape hatch&lt;/em>、不是 standard practice。&lt;/p>
&lt;p>寫作 OSC 時必須 &lt;em>先看 PG 自身 ALTER 行為&lt;/em>、確認真的需要再上 pg_repack / pg-osc — 否則徒增複雜度。&lt;/p>
&lt;h2 id="pg-alter-table-的-fast--slow-分類">PG ALTER TABLE 的 fast / slow 分類&lt;/h2>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-sql" data-lang="sql">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">&lt;span class="c1">-- ALTER TABLE 的操作大致三類&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h3 id="類-afast-catalog-only-1-秒metadata-改">類 A：Fast catalog-only（&amp;lt; 1 秒、metadata 改）&lt;/h3>
&lt;p>PG 9.4+ / 11+ 多數 ALTER 已 catalog-only：&lt;/p>
&lt;ul>
&lt;li>&lt;code>ADD COLUMN col TYPE NULL DEFAULT NULL&lt;/code> — 直接 metadata、不 rewrite&lt;/li>
&lt;li>&lt;code>ADD COLUMN col TYPE NOT NULL DEFAULT &amp;lt;constant&amp;gt;&lt;/code>（PG 11+）— optimizer 把 default 存在 metadata、舊 row read 時動態返回 default、不 rewrite&lt;/li>
&lt;li>&lt;code>DROP COLUMN&lt;/code> — metadata 標 dropped、實際 row 不 rewrite（VACUUM 之後逐步清理）&lt;/li>
&lt;li>&lt;code>ALTER COLUMN ... SET DEFAULT &amp;lt;constant&amp;gt;&lt;/code> — metadata&lt;/li>
&lt;li>&lt;code>RENAME COLUMN&lt;/code> / &lt;code>RENAME TABLE&lt;/code> — metadata&lt;/li>
&lt;li>&lt;code>ADD CONSTRAINT ... NOT VALID&lt;/code> — 標記 constraint 不 validate、之後 &lt;code>VALIDATE CONSTRAINT&lt;/code> 才 scan&lt;/li>
&lt;li>&lt;code>ALTER COLUMN ... TYPE&lt;/code> 同 binary-compat 類型（&lt;code>VARCHAR(10) → VARCHAR(20)&lt;/code>、&lt;code>TEXT → VARCHAR&lt;/code> 等）— catalog-only&lt;/li>
&lt;/ul>
&lt;p>這類 ALTER &lt;em>直接跑、不必任何工具&lt;/em>。&lt;/p>
&lt;h3 id="類-block-heavyrewrites-tableproduction-慎用">類 B：Lock heavy（rewrites table、production 慎用）&lt;/h3>
&lt;p>需要 &lt;em>rewrite 整張 table&lt;/em>、ACCESS EXCLUSIVE lock 整個 ALTER 期間：&lt;/p></description><content:encoded><![CDATA[<blockquote>
<p>本文是 <a href="/blog/backend/01-database/vendors/postgresql/" data-link-title="PostgreSQL" data-link-desc="多用途 OLTP 主流關聯式資料庫、MVCC、豐富 SQL 特性、是 Aurora / Cosmos DB / Spanner / CockroachDB / Aurora DSQL 的相容目標">PostgreSQL</a> overview 的 implementation-layer deep article。Overview 已說明 PG 在 OLTP 譜系的定位、本文聚焦 <em>online schema change</em> — 先看 PG ALTER 哪些已 fast catalog-only、再看 pg_repack / pg-osc 何時必要。</p></blockquote>
<hr>
<p>跟 MySQL 不同：PG 大量 schema change <em>內建</em> fast catalog-only 行為、不必走 ghost table tool。MySQL 對應的 gh-ost / pt-online-schema-change 之於 PG 是 <em>少數場景才需要的 escape hatch</em>、不是 standard practice。</p>
<p>寫作 OSC 時必須 <em>先看 PG 自身 ALTER 行為</em>、確認真的需要再上 pg_repack / pg-osc — 否則徒增複雜度。</p>
<h2 id="pg-alter-table-的-fast--slow-分類">PG ALTER TABLE 的 fast / slow 分類</h2>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-sql" data-lang="sql"><span class="line"><span class="ln">1</span><span class="cl"><span class="c1">-- ALTER TABLE 的操作大致三類</span></span></span></code></pre></div><h3 id="類-afast-catalog-only-1-秒metadata-改">類 A：Fast catalog-only（&lt; 1 秒、metadata 改）</h3>
<p>PG 9.4+ / 11+ 多數 ALTER 已 catalog-only：</p>
<ul>
<li><code>ADD COLUMN col TYPE NULL DEFAULT NULL</code> — 直接 metadata、不 rewrite</li>
<li><code>ADD COLUMN col TYPE NOT NULL DEFAULT &lt;constant&gt;</code>（PG 11+）— optimizer 把 default 存在 metadata、舊 row read 時動態返回 default、不 rewrite</li>
<li><code>DROP COLUMN</code> — metadata 標 dropped、實際 row 不 rewrite（VACUUM 之後逐步清理）</li>
<li><code>ALTER COLUMN ... SET DEFAULT &lt;constant&gt;</code> — metadata</li>
<li><code>RENAME COLUMN</code> / <code>RENAME TABLE</code> — metadata</li>
<li><code>ADD CONSTRAINT ... NOT VALID</code> — 標記 constraint 不 validate、之後 <code>VALIDATE CONSTRAINT</code> 才 scan</li>
<li><code>ALTER COLUMN ... TYPE</code> 同 binary-compat 類型（<code>VARCHAR(10) → VARCHAR(20)</code>、<code>TEXT → VARCHAR</code> 等）— catalog-only</li>
</ul>
<p>這類 ALTER <em>直接跑、不必任何工具</em>。</p>
<h3 id="類-block-heavyrewrites-tableproduction-慎用">類 B：Lock heavy（rewrites table、production 慎用）</h3>
<p>需要 <em>rewrite 整張 table</em>、ACCESS EXCLUSIVE lock 整個 ALTER 期間：</p>
<ul>
<li><code>ALTER COLUMN ... TYPE</code> binary 不相容類型（<code>INT → BIGINT</code> 永遠 rewrite、<code>TEXT → INT</code> 也是）— 雖然語意「擴大」、底層 4-byte 跟 8-byte storage 不同、全表 rewrite + ACCESS EXCLUSIVE 不可省</li>
<li><code>ALTER COLUMN ... SET NOT NULL</code> 對既有 nullable column（要 scan 整 table）</li>
<li><code>ALTER COLUMN ... DROP IDENTITY</code></li>
<li><code>ALTER TABLE ... SET TABLESPACE</code></li>
</ul>
<p>這類 ALTER 對大表 <em>production 不能直接跑</em>、要 ghost table tool。</p>
<h3 id="類-cconcurrent-index--online-operation無-table-lock">類 C：Concurrent index / online operation（無 table lock）</h3>
<ul>
<li><code>CREATE INDEX CONCURRENTLY</code> — 不 lock 寫入、background build、慢但安全</li>
<li><code>REINDEX INDEX CONCURRENTLY</code>（PG 12+） — 同上</li>
<li><code>DROP INDEX CONCURRENTLY</code> — 短 ACCESS EXCLUSIVE lock 只在最後 swap</li>
</ul>
<h2 id="何時需要-ghost-table-tool">何時需要 ghost table tool</h2>
<p>只在以下場景才需要 pg_repack / pg-osc：</p>
<ol>
<li><strong>Rewrite-required type change</strong>（類 B <code>ALTER COLUMN TYPE</code>）對大表</li>
<li><strong>VACUUM FULL 替代</strong>：pg_repack 比 VACUUM FULL 安全（不 lock 整表）</li>
<li><strong>Bloat 重組</strong>：大表 dead tuple 累積、想完整 rewrite</li>
</ol>
<p>對「add column」「drop column」「create index」等場景 <em>PG 內建 fast 已夠</em>、不必 ghost table tool。</p>
<h2 id="tool-1pg_repack--trigger-based--雙-table-swap">Tool 1：pg_repack — Trigger-based + 雙 table swap</h2>
<p>pg_repack 是 PG community 標準 online table rewrite 工具：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">pg_repack -h primary.example.com -p <span class="m">5432</span> -d production -U postgres <span class="se">\
</span></span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="se"></span>  --table<span class="o">=</span>orders --no-superuser-check</span></span></code></pre></div><p><strong>Mechanism</strong>：</p>
<ol>
<li>CREATE <code>repack.table_&lt;oid&gt;</code> 跟原表同 schema</li>
<li>在原表加 3 個 trigger：INSERT / UPDATE / DELETE → 寫入 log table <code>repack.log_&lt;oid&gt;</code></li>
<li>從原表 <code>INSERT INTO repack.table_&lt;oid&gt; SELECT * FROM original</code> 複製 row</li>
<li>邊複製邊 apply log table 紀錄的變更</li>
<li>切換：rename 原表 → original_old、rename repack.table_<oid> → original（atomic）</li>
<li>Drop 舊原表跟 trigger / log</li>
</ol>
<p><strong>Trade-off</strong>：</p>
<ul>
<li><em>Trigger overhead</em>：每個 primary 寫入加 trigger 執行（10-30% 寫吞吐降）</li>
<li><em>FK 處理</em>：需要 drop &amp; re-create FK referencing original table（pg_repack 自動處理但有 lock window）</li>
<li>適用 <em>PG-version 綁定</em> — pg_repack 13 不能對 PG 14 cluster 跑</li>
</ul>
<p><strong>配置</strong>：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-sql" data-lang="sql"><span class="line"><span class="ln">1</span><span class="cl"><span class="c1">-- Primary 安裝
</span></span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="c1"></span><span class="k">CREATE</span><span class="w"> </span><span class="n">EXTENSION</span><span class="w"> </span><span class="n">pg_repack</span><span class="p">;</span></span></span></code></pre></div>




<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl"><span class="c1"># Repack orders</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl">pg_repack -d production --table<span class="o">=</span>orders
</span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="c1"># 監控 lock：另一 session 跑 SELECT * FROM pg_stat_activity</span></span></span></code></pre></div><h2 id="tool-2pg-osc--pg-online-schema-change--wal-shipping-style">Tool 2：pg-osc / pg-online-schema-change — WAL-shipping style</h2>
<p><a href="https://github.com/shayonj/pg-osc">pg-osc</a>（Shayon Mukherjee、2023）是較新的工具、模仿 gh-ost mechanism：</p>
<p><strong>Mechanism</strong>：</p>
<ol>
<li>用 logical replication slot 從 primary WAL stream 變更</li>
<li>CREATE shadow table + 套 ALTER 變更</li>
<li>Stream WAL event 同步 shadow table（不靠 trigger）</li>
<li>完成後 swap</li>
</ol>
<p><strong>Trade-off</strong>：</p>
<ul>
<li><em>Primary 寫入 overhead</em>：0（WAL 已存在）</li>
<li>比 pg_repack 較新（社群驗證度低）</li>
<li>適合 <em>trigger overhead 不可接受</em> 的高吞吐 production</li>
</ul>
<p><strong>配置</strong>：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl"><span class="c1"># 用 gem install</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl">gem install pg_online_schema_change
</span></span><span class="line"><span class="ln">3</span><span class="cl">
</span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="c1"># Run</span>
</span></span><span class="line"><span class="ln">5</span><span class="cl">pg-online-schema-change perform <span class="se">\
</span></span></span><span class="line"><span class="ln">6</span><span class="cl"><span class="se"></span>  --alter-statement<span class="o">=</span><span class="s2">&#34;ALTER TABLE orders ADD COLUMN status VARCHAR(20)&#34;</span> <span class="se">\
</span></span></span><span class="line"><span class="ln">7</span><span class="cl"><span class="se"></span>  --schema<span class="o">=</span>public <span class="se">\
</span></span></span><span class="line"><span class="ln">8</span><span class="cl"><span class="se"></span>  --dbname<span class="o">=</span>production <span class="se">\
</span></span></span><span class="line"><span class="ln">9</span><span class="cl"><span class="se"></span>  --host<span class="o">=</span>primary.example.com</span></span></code></pre></div><h2 id="配置-step-by-steppg_repack-為主">配置 step-by-step（pg_repack 為主）</h2>
<p>實務多數 PG OSC 用 pg_repack。pg-osc 是 high-write-throughput escape hatch。</p>
<h3 id="step-1安裝--確認版本">Step 1：安裝 + 確認版本</h3>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-sql" data-lang="sql"><span class="line"><span class="ln">1</span><span class="cl"><span class="c1">-- 安裝 pg_repack（versioned）
</span></span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="c1"></span><span class="k">CREATE</span><span class="w"> </span><span class="n">EXTENSION</span><span class="w"> </span><span class="n">pg_repack</span><span class="p">;</span><span class="w">
</span></span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="w"></span><span class="k">SELECT</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">pg_available_extensions</span><span class="w"> </span><span class="k">WHERE</span><span class="w"> </span><span class="n">name</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">&#39;pg_repack&#39;</span><span class="p">;</span><span class="w">
</span></span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="w"></span><span class="c1">-- 確認 installed_version 跟 PG major version 對齊</span></span></span></code></pre></div><h3 id="step-2跑-pg_repack">Step 2：跑 pg_repack</h3>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">pg_repack -h primary -d production -U postgres <span class="se">\
</span></span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="se"></span>  --table<span class="o">=</span>orders <span class="se">\
</span></span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="se"></span>  --jobs<span class="o">=</span><span class="m">4</span> <span class="se">\ </span>                      <span class="c1"># 並行 worker</span>
</span></span><span class="line"><span class="ln">4</span><span class="cl">  --wait-timeout<span class="o">=</span><span class="m">60</span> <span class="se">\ </span>             <span class="c1"># 等 lock 超時（秒）</span>
</span></span><span class="line"><span class="ln">5</span><span class="cl">  --no-kill-backend                <span class="c1"># 不主動 kill 卡 lock 的 query</span></span></span></code></pre></div><h3 id="step-3監控">Step 3：監控</h3>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-sql" data-lang="sql"><span class="line"><span class="ln">1</span><span class="cl"><span class="c1">-- 看 pg_repack 進度
</span></span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="c1"></span><span class="k">SELECT</span><span class="w"> </span><span class="n">pid</span><span class="p">,</span><span class="w"> </span><span class="n">query</span><span class="p">,</span><span class="w"> </span><span class="k">state</span><span class="p">,</span><span class="w"> </span><span class="n">wait_event_type</span><span class="p">,</span><span class="w"> </span><span class="n">wait_event</span><span class="w">
</span></span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="w"></span><span class="k">FROM</span><span class="w"> </span><span class="n">pg_stat_activity</span><span class="w">
</span></span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="w"></span><span class="k">WHERE</span><span class="w"> </span><span class="n">query</span><span class="w"> </span><span class="k">LIKE</span><span class="w"> </span><span class="s1">&#39;%repack%&#39;</span><span class="p">;</span><span class="w">
</span></span></span><span class="line"><span class="ln">5</span><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="ln">6</span><span class="cl"><span class="w"></span><span class="c1">-- 看 lock 狀態
</span></span></span><span class="line"><span class="ln">7</span><span class="cl"><span class="c1"></span><span class="k">SELECT</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">pg_locks</span><span class="w"> </span><span class="k">WHERE</span><span class="w"> </span><span class="n">relation</span><span class="w"> </span><span class="k">IN</span><span class="w"> </span><span class="p">(</span><span class="w">
</span></span></span><span class="line"><span class="ln">8</span><span class="cl"><span class="w">  </span><span class="k">SELECT</span><span class="w"> </span><span class="n">oid</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">pg_class</span><span class="w"> </span><span class="k">WHERE</span><span class="w"> </span><span class="n">relname</span><span class="w"> </span><span class="k">IN</span><span class="w"> </span><span class="p">(</span><span class="s1">&#39;orders&#39;</span><span class="p">,</span><span class="w"> </span><span class="s1">&#39;repack.table_xxx&#39;</span><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="ln">9</span><span class="cl"><span class="w"></span><span class="p">);</span></span></span></code></pre></div><h3 id="step-4驗證">Step 4：驗證</h3>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-sql" data-lang="sql"><span class="line"><span class="ln">1</span><span class="cl"><span class="c1">-- 跑完後對比 row count + 抽樣 query
</span></span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="c1"></span><span class="k">SELECT</span><span class="w"> </span><span class="k">count</span><span class="p">(</span><span class="o">*</span><span class="p">)</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">orders</span><span class="p">;</span><span class="w">
</span></span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="w"></span><span class="c1">-- 跟 pg_repack 之前 count 對比</span></span></span></code></pre></div><h2 id="5-個-production-踩雷">5 個 Production 踩雷</h2>
<h3 id="1-alter-直接跑沒看是不是-fast-變-lock-heavy">1. ALTER 直接跑沒看是不是 fast 變 lock heavy</h3>
<p><code>ALTER TABLE orders ADD COLUMN status VARCHAR(20) NOT NULL DEFAULT 'pending'</code> — 預期 catalog-only（PG 11+）、但若 PG 10 跑這個就會 rewrite 整表、ACCESS EXCLUSIVE lock 幾小時。</p>
<p>修法：</p>
<ul>
<li>寫 schema migration 前 <em>確認 PG version</em></li>
<li>看 <a href="https://www.postgresql.org/docs/current/sql-altertable.html">PG ALTER doc</a>、each subcommand 標 <em>Note</em> 段是否 fast</li>
<li>Production 跑前 staging 測 + 監控 <code>pg_stat_activity</code> lock wait</li>
</ul>
<h3 id="2-vacuum-full-誤用--production-downtime">2. VACUUM FULL 誤用 — Production downtime</h3>
<p><code>VACUUM FULL</code> 等於「rewrite 整表 + ACCESS EXCLUSIVE lock」。Production 跑 = 表變 unavailable 幾分鐘到幾小時。</p>
<p>修法：</p>
<ul>
<li><em>永遠用 pg_repack</em> 取代 VACUUM FULL（除非 maintenance window）</li>
<li>對 bloat 議題、定期跑 pg_repack</li>
<li>autovacuum tuning 第一優先（<a href="/blog/backend/01-database/vendors/postgresql/autovacuum-tuning/" data-link-title="PostgreSQL autovacuum tuning：為什麼你的 autovacuum 永遠追不上 bloat" data-link-desc="MVCC 怎麼產生 dead tuple、autovacuum cost-based throttle 為什麼預設保守、per-table tuning 怎麼設、5 個 production 踩雷（cost_limit 太低 / 長 transaction blocks vacuum / anti-wraparound 在 peak / partition vacuum 滿 worker / index bloat 沒處理）、跟 partitioning &#43; monitoring 整合">autovacuum-tuning</a> 詳細）</li>
</ul>
<h3 id="3-pg_repack-version-mismatch">3. pg_repack version mismatch</h3>
<p>PG cluster 升 14、但 <code>pg_repack</code> extension 還是 13 版本。試 ALTER 跑 <code>pg_repack</code> 命令、ERROR: <code>program &quot;pg_repack 14.x&quot; does not match installed extension &quot;pg_repack 13.x&quot;</code>。</p>
<p>修法：</p>
<ul>
<li>升 PG cluster 後 <em>立即 ALTER EXTENSION pg_repack UPDATE</em></li>
<li>若 pg_repack 還沒釋出對應 PG 版本（早期升級）、暫時用 pg-osc 替代或等待</li>
<li>升級 runbook 紀錄 pg_repack 是 <em>必同步升級的 extension</em></li>
</ul>
<h3 id="4-create-index-concurrently-失敗清理">4. CREATE INDEX CONCURRENTLY 失敗清理</h3>
<p><code>CREATE INDEX CONCURRENTLY</code> 跑到一半被 cancel（用戶 Ctrl-C / connection drop）、產生 <em>invalid index</em>：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-sql" data-lang="sql"><span class="line"><span class="ln">1</span><span class="cl"><span class="k">SELECT</span><span class="w"> </span><span class="n">indexrelid</span><span class="p">::</span><span class="n">regclass</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">pg_index</span><span class="w"> </span><span class="k">WHERE</span><span class="w"> </span><span class="k">NOT</span><span class="w"> </span><span class="n">indisvalid</span><span class="p">;</span><span class="w">
</span></span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="w"></span><span class="c1">-- 顯示一個 idx_orders_status_invalid</span></span></span></code></pre></div><p>Invalid index 仍佔 disk、但 optimizer 不會用。</p>
<p>修法：</p>
<ul>
<li>跑 <code>DROP INDEX CONCURRENTLY idx_orders_status_invalid</code></li>
<li>之後重新 <code>CREATE INDEX CONCURRENTLY</code></li>
<li>避免在 connection 不穩的 session 跑長時間 CREATE INDEX CONCURRENTLY、改用 cron 或 deploy pipeline</li>
</ul>
<h3 id="5-generated-stored-column-不能-online-add">5. Generated stored column 不能 online ADD</h3>
<p><code>ADD COLUMN total NUMERIC GENERATED ALWAYS AS (price * qty) STORED</code> — <em>stored</em> generated column 必須 rewrite 整表計算 column value、不是 catalog-only。</p>
<p>修法：</p>
<ul>
<li>
<p>用 <code>GENERATED ALWAYS AS (...) VIRTUAL</code>（PG 18+）— 不存實際 value、catalog-only</p>
</li>
<li>
<p>或 <em>先加 nullable column + backfill + 加 NOT NULL constraint</em>：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-sql" data-lang="sql"><span class="line"><span class="ln">1</span><span class="cl"><span class="k">ALTER</span><span class="w"> </span><span class="k">TABLE</span><span class="w"> </span><span class="n">orders</span><span class="w"> </span><span class="k">ADD</span><span class="w"> </span><span class="k">COLUMN</span><span class="w"> </span><span class="n">total</span><span class="w"> </span><span class="nb">NUMERIC</span><span class="p">;</span><span class="w">
</span></span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="w"></span><span class="k">UPDATE</span><span class="w"> </span><span class="n">orders</span><span class="w"> </span><span class="k">SET</span><span class="w"> </span><span class="n">total</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">price</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="n">qty</span><span class="w"> </span><span class="k">WHERE</span><span class="w"> </span><span class="n">id</span><span class="w"> </span><span class="k">BETWEEN</span><span class="w"> </span><span class="p">...;</span><span class="w">  </span><span class="c1">-- chunked
</span></span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="c1"></span><span class="k">ALTER</span><span class="w"> </span><span class="k">TABLE</span><span class="w"> </span><span class="n">orders</span><span class="w"> </span><span class="k">ALTER</span><span class="w"> </span><span class="k">COLUMN</span><span class="w"> </span><span class="n">total</span><span class="w"> </span><span class="k">SET</span><span class="w"> </span><span class="k">NOT</span><span class="w"> </span><span class="k">NULL</span><span class="p">;</span><span class="w">
</span></span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="w"></span><span class="c1">-- 之後加 trigger 或 application 層維護 total</span></span></span></code></pre></div></li>
<li>
<p>或用 pg_repack 跑 rewrite ADD GENERATED STORED</p>
</li>
</ul>
<h2 id="容量--時間估算">容量 / 時間估算</h2>
<p>對 100 GB 表、ADD COLUMN 加 index 為例：</p>
<table>
  <thead>
      <tr>
          <th>操作</th>
          <th>時間</th>
          <th>Lock 影響</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td><code>ADD COLUMN col TYPE NULL</code> (PG 11+)</td>
          <td>&lt; 1 秒</td>
          <td>ACCESS EXCLUSIVE（毫秒級）</td>
      </tr>
      <tr>
          <td><code>ADD COLUMN col TYPE NOT NULL DEFAULT 0</code> (PG 11+)</td>
          <td>&lt; 1 秒</td>
          <td>ACCESS EXCLUSIVE（毫秒級）</td>
      </tr>
      <tr>
          <td><code>CREATE INDEX CONCURRENTLY</code></td>
          <td>2-6 小時</td>
          <td>無 table lock</td>
      </tr>
      <tr>
          <td><code>pg_repack table</code></td>
          <td>4-8 小時</td>
          <td>短 ACCESS EXCLUSIVE（swap）</td>
      </tr>
      <tr>
          <td><code>ALTER COLUMN TYPE</code> rewrite</td>
          <td>4-8 小時</td>
          <td>ACCESS EXCLUSIVE 全程</td>
      </tr>
      <tr>
          <td><code>VACUUM FULL</code></td>
          <td>同 pg_repack</td>
          <td>ACCESS EXCLUSIVE 全程（不要跑）</td>
      </tr>
  </tbody>
</table>
<h2 id="跟-mysql-gh-ost--pt-osc-對照">跟 MySQL gh-ost / pt-osc 對照</h2>
<table>
  <thead>
      <tr>
          <th>維度</th>
          <th>PG pg_repack</th>
          <th>PG pg-osc</th>
          <th>MySQL gh-ost</th>
          <th>MySQL pt-osc</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>機制</td>
          <td>Trigger + log table</td>
          <td>WAL logical stream</td>
          <td>Binlog stream</td>
          <td>Trigger + log table</td>
      </tr>
      <tr>
          <td>Primary 寫 overhead</td>
          <td>中（trigger）</td>
          <td>0（WAL 已存在）</td>
          <td>0（binlog 已存在）</td>
          <td>中（trigger）</td>
      </tr>
      <tr>
          <td>Throttle 支援</td>
          <td>部分</td>
          <td>支援</td>
          <td>強</td>
          <td>部分</td>
      </tr>
      <tr>
          <td>Pause / Resume</td>
          <td>不支援</td>
          <td>不支援</td>
          <td>支援</td>
          <td>不支援</td>
      </tr>
      <tr>
          <td>工具成熟度</td>
          <td>高</td>
          <td>中（2023+）</td>
          <td>高</td>
          <td>高</td>
      </tr>
      <tr>
          <td>Use case 比例</td>
          <td>PG 主流（90% case）</td>
          <td>高吞吐 escape hatch</td>
          <td>MySQL 主流（dev）</td>
          <td>MySQL legacy + FK</td>
      </tr>
  </tbody>
</table>
<p>PG OSC tool 使用頻率比 MySQL 低 — 因為 PG 內建 fast ALTER 已 cover 90% schema change、ghost table tool 只對 <em>少數 rewrite-required</em> 場景。</p>
<p>詳見 <a href="/blog/backend/01-database/vendors/mysql/online-schema-change-tools/" data-link-title="MySQL Online Schema Change：gh-ost 跟 pt-online-schema-change 兩條完全不同的 ghost table 路徑" data-link-desc="MySQL ALTER TABLE 可能鎖整張表，production 需要 online schema change 流程。gh-ost（GitHub）跟 pt-online-schema-change（Percona）都用 ghost table 解決、但底層機制完全不同：pt-osc 用 trigger 同步、gh-ost 用 binlog stream 同步。本文走兩工具機制對照表 → trigger vs binlog 各自取捨 → 配置 step-by-step → 5 production 踩雷（trigger overhead / binlog 延遲 / FK constraint / hot trigger lock / 切換瞬間 deadlock）→ 何時用哪一個">MySQL Online Schema Change Tools</a> — sibling、不同 use case mix。</p>
<h2 id="跟其他模組整合">跟其他模組整合</h2>
<h3 id="跟-replication-topology">跟 Replication topology</h3>
<p>ALTER TABLE / pg_repack / pg-osc 都產生 WAL、會 replicate 到 standby。Standby 上的 long-running query 可能跟 ALTER 衝突、被 <code>hot_standby_feedback</code> 影響 primary autovacuum。詳見 <a href="/blog/backend/01-database/vendors/postgresql/replication-topology/" data-link-title="PostgreSQL Replication Topology：async / sync / quorum 三模式跟 LSN &#43; replication slot 的三軸組合" data-link-desc="PostgreSQL streaming replication 不是「sync 或 async」、是 *durability / latency / consistency* 三軸組合 &#43; LSN-based 進度追蹤 &#43; replication slot 治理。本文走 3 軸取捨模型、async / sync / quorum-based sync 行為對比、LSN &#43; replication slot 機制、配置 step-by-step、5 production 踩雷（standby lag 暴衝 / sync standby 退回 async / orphan replication slot / cascading replication 雪崩 / failover 後 timeline 分歧）、跟 Patroni HA &#43; logical replication 整合">Replication Topology</a>。</p>
<h3 id="跟-autovacuum-tuning">跟 Autovacuum Tuning</h3>
<p>Schema change 後常產生 dead tuple、autovacuum 需要重新 cover。詳見 <a href="/blog/backend/01-database/vendors/postgresql/autovacuum-tuning/" data-link-title="PostgreSQL autovacuum tuning：為什麼你的 autovacuum 永遠追不上 bloat" data-link-desc="MVCC 怎麼產生 dead tuple、autovacuum cost-based throttle 為什麼預設保守、per-table tuning 怎麼設、5 個 production 踩雷（cost_limit 太低 / 長 transaction blocks vacuum / anti-wraparound 在 peak / partition vacuum 滿 worker / index bloat 沒處理）、跟 partitioning &#43; monitoring 整合">Autovacuum Tuning</a>。</p>
<h3 id="跟-logical-replication">跟 Logical Replication</h3>
<p>logical replication 透過 publication / subscription 同步 — DDL <em>不會</em> logical replicate（PG 16 之前）、必須 <em>在 publisher / subscriber 各自跑 DDL</em>。詳見 <a href="/blog/backend/01-database/vendors/postgresql/logical-replication-debezium/" data-link-title="PostgreSQL Logical Replication &#43; Debezium CDC：replication slot × failure × recovery 對照" data-link-desc="PostgreSQL logical replication slot 跟 Debezium CDC 的失效模式對照表：slot lag 撐爆 primary disk / schema change 斷流 / 初始 COPY 鎖表 / zombie slot 不釋放 / replay storm 後 offset reset；publication / subscription / pgoutput 配置、跟 Kafka outbox pattern 整合">Logical Replication + Debezium</a>。</p>
<h3 id="跟-patroni-ha">跟 Patroni HA</h3>
<p>Patroni promote 新 primary 後、pg_repack extension state（slot / catalog）跟著走、新 primary 仍可繼續 pg_repack。詳見 <a href="/blog/backend/01-database/vendors/postgresql/patroni-ha/" data-link-title="PostgreSQL Patroni HA：從 leader 失聯到 client 重連的 5 段 failover lifecycle" data-link-desc="Patroni 把 PostgreSQL HA 拆成 detection / election / promotion / reconfiguration / recovery 五段 lifecycle、每段都有獨立配置跟 failure mode；DCS quorum &#43; watchdog 防 split-brain、async/sync replication 取捨、5 個 production 踩雷、跟 PgBouncer / HAProxy / cert-manager 整合">Patroni HA</a>。</p>
<h2 id="何時用哪個">何時用哪個</h2>
<table>
  <thead>
      <tr>
          <th>情境</th>
          <th>選擇</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>ADD COLUMN nullable / DROP COLUMN / RENAME 等</td>
          <td>直接 ALTER（fast catalog-only）</td>
      </tr>
      <tr>
          <td>CREATE INDEX 大表</td>
          <td><code>CREATE INDEX CONCURRENTLY</code></td>
      </tr>
      <tr>
          <td>ALTER COLUMN TYPE rewrite（大表）</td>
          <td>pg_repack</td>
      </tr>
      <tr>
          <td>Bloat 重組</td>
          <td>pg_repack</td>
      </tr>
      <tr>
          <td>高吞吐 + trigger overhead 不可接受</td>
          <td>pg-osc</td>
      </tr>
      <tr>
          <td>ADD GENERATED STORED column</td>
          <td>nullable + backfill + constraint</td>
      </tr>
      <tr>
          <td>Cluster on Cloud（RDS / Aurora）</td>
          <td>RDS / Aurora 內建 fast DDL 多數已 cover、pg_repack 視 vendor 支援</td>
      </tr>
  </tbody>
</table>
<h2 id="相關連結">相關連結</h2>
<ul>
<li><a href="/blog/backend/01-database/vendors/postgresql/" data-link-title="PostgreSQL" data-link-desc="多用途 OLTP 主流關聯式資料庫、MVCC、豐富 SQL 特性、是 Aurora / Cosmos DB / Spanner / CockroachDB / Aurora DSQL 的相容目標">PostgreSQL vendor overview</a></li>
<li><a href="/blog/backend/01-database/vendors/postgresql/replication-topology/" data-link-title="PostgreSQL Replication Topology：async / sync / quorum 三模式跟 LSN &#43; replication slot 的三軸組合" data-link-desc="PostgreSQL streaming replication 不是「sync 或 async」、是 *durability / latency / consistency* 三軸組合 &#43; LSN-based 進度追蹤 &#43; replication slot 治理。本文走 3 軸取捨模型、async / sync / quorum-based sync 行為對比、LSN &#43; replication slot 機制、配置 step-by-step、5 production 踩雷（standby lag 暴衝 / sync standby 退回 async / orphan replication slot / cascading replication 雪崩 / failover 後 timeline 分歧）、跟 Patroni HA &#43; logical replication 整合">PG Replication Topology</a>（ALTER 跟 streaming replication 互動）</li>
<li><a href="/blog/backend/01-database/vendors/postgresql/autovacuum-tuning/" data-link-title="PostgreSQL autovacuum tuning：為什麼你的 autovacuum 永遠追不上 bloat" data-link-desc="MVCC 怎麼產生 dead tuple、autovacuum cost-based throttle 為什麼預設保守、per-table tuning 怎麼設、5 個 production 踩雷（cost_limit 太低 / 長 transaction blocks vacuum / anti-wraparound 在 peak / partition vacuum 滿 worker / index bloat 沒處理）、跟 partitioning &#43; monitoring 整合">PG Autovacuum Tuning</a>（schema change 後 vacuum 議題）</li>
<li><a href="/blog/backend/01-database/vendors/postgresql/logical-replication-debezium/" data-link-title="PostgreSQL Logical Replication &#43; Debezium CDC：replication slot × failure × recovery 對照" data-link-desc="PostgreSQL logical replication slot 跟 Debezium CDC 的失效模式對照表：slot lag 撐爆 primary disk / schema change 斷流 / 初始 COPY 鎖表 / zombie slot 不釋放 / replay storm 後 offset reset；publication / subscription / pgoutput 配置、跟 Kafka outbox pattern 整合">PG Logical Replication + Debezium</a>（DDL 不 replicate 議題）</li>
<li><a href="/blog/backend/01-database/vendors/postgresql/patroni-ha/" data-link-title="PostgreSQL Patroni HA：從 leader 失聯到 client 重連的 5 段 failover lifecycle" data-link-desc="Patroni 把 PostgreSQL HA 拆成 detection / election / promotion / reconfiguration / recovery 五段 lifecycle、每段都有獨立配置跟 failure mode；DCS quorum &#43; watchdog 防 split-brain、async/sync replication 取捨、5 個 production 踩雷、跟 PgBouncer / HAProxy / cert-manager 整合">PG Patroni HA</a>（HA 跟 pg_repack 整合）</li>
<li><a href="/blog/backend/01-database/vendors/mysql/online-schema-change-tools/" data-link-title="MySQL Online Schema Change：gh-ost 跟 pt-online-schema-change 兩條完全不同的 ghost table 路徑" data-link-desc="MySQL ALTER TABLE 可能鎖整張表，production 需要 online schema change 流程。gh-ost（GitHub）跟 pt-online-schema-change（Percona）都用 ghost table 解決、但底層機制完全不同：pt-osc 用 trigger 同步、gh-ost 用 binlog stream 同步。本文走兩工具機制對照表 → trigger vs binlog 各自取捨 → 配置 step-by-step → 5 production 踩雷（trigger overhead / binlog 延遲 / FK constraint / hot trigger lock / 切換瞬間 deadlock）→ 何時用哪一個">MySQL Online Schema Change Tools</a>（sibling、tool ecosystem 不同）</li>
<li><a href="/blog/backend/knowledge-cards/expand-contract/" data-link-title="Expand / Contract" data-link-desc="說明先擴充相容面、再收斂舊路徑的遷移做法">Expand / Contract 卡片</a>（schema migration 設計原則）</li>
<li>官方：<a href="https://www.postgresql.org/docs/current/sql-altertable.html">ALTER TABLE</a> / <a href="https://github.com/reorg/pg_repack">pg_repack GitHub</a> / <a href="https://github.com/shayonj/pg-osc">pg-osc GitHub</a></li>
</ul>
]]></content:encoded></item><item><title>Spanner Schema Migration Without Downtime + Interleaved Tables</title><link>https://tarrragon.github.io/blog/backend/01-database/vendors/spanner/schema-migration-interleaved-tables/</link><pubDate>Wed, 27 May 2026 00:00:00 +0000</pubDate><guid>https://tarrragon.github.io/blog/backend/01-database/vendors/spanner/schema-migration-interleaved-tables/</guid><description>&lt;blockquote>
&lt;p>本文是 &lt;a href="https://tarrragon.github.io/blog/backend/01-database/vendors/spanner/" data-link-title="Google Cloud Spanner" data-link-desc="全球分散式 strong-consistency OLTP、TrueTime API、線性擴展到 10 億 req/sec">Cloud Spanner&lt;/a> overview 的 implementation-layer deep article。Overview 已說明 Spanner 在全球 OLTP 譜系的定位、本文聚焦 &lt;em>schema migration without downtime + interleaved tables&lt;/em> — Spanner 兩個跟傳統 SQL 差異最大的 schema 機制。&lt;/p>&lt;/blockquote>
&lt;hr>
&lt;h2 id="問題情境ddl-不停機跟-parent-child-物理-layout-的兩個疑問">問題情境：DDL 不停機跟 parent-child 物理 layout 的兩個疑問&lt;/h2>
&lt;p>傳統 PostgreSQL / MySQL DDL 拿 ACCESS EXCLUSIVE / metadata lock、線上跑 ALTER TABLE 動輒鎖表幾分鐘、大型 schema change 要 pt-osc / gh-ost / pg_repack 等外掛工具。Spanner 宣稱「schema change 不停機」、但團隊不知道實際機制跟邊界。讀者徵兆通常從這幾個地方浮現：「Spanner ALTER 真的不卡寫入嗎」「INDEX backfill 跑了 12 小時是正常嗎」「parent-child 的 INTERLEAVE IN PARENT 是什麼黑魔法」「ON DELETE CASCADE 在 interleaved table 為什麼是 storage-level 而不是 application-level」。&lt;/p>
&lt;p>真實壓力：multi-tenant SaaS 要對 100 億 row 的 orders 表加 column + 加 index、不能停機、不能讓 p99 write latency 超過 SLA。團隊以為「Spanner schema change 不停機」等同於「DDL 瞬間完成」、實際 ALTER 是 long-running operation、index backfill 在大表上跑數小時到數天、capacity 規劃要把 backfill 期間的 CPU 升幅算進去。&lt;/p>
&lt;p>Case anchor：&lt;strong>缺案例&lt;/strong>。9.C10 是 Google internal dogfood case、未展開 schema migration 細節、且 9.C10 不是 customer-facing capacity reference。本文用通用 pattern + 官方文件 + 反向回 &lt;a href="https://tarrragon.github.io/blog/backend/01-database/vendors/postgresql/online-schema-change/" data-link-title="PostgreSQL Online Schema Change：先用 ALTER 內建特性、不能解才 pg_repack / pg-osc" data-link-desc="PostgreSQL ALTER TABLE 對多數變更已是 *fast catalog-only*（add column nullable / drop column / 改 default），不必走 ghost table tool。本文走 PG 內建 fast DDL 行為、何時必須走 pg_repack / pg-osc、兩工具機制對比（trigger-based vs WAL-shipping）、配置 step-by-step、5 production 踩雷（lock 升級 / VACUUM FULL 誤用 / pg_repack version mismatch / concurrent index 失敗清理 / generated stored column 不能 online）、跟 MySQL gh-ost / pt-osc sibling 對比">PostgreSQL Online Schema Change&lt;/a> 對照、待後續 customer case audit 補強。&lt;/p></description><content:encoded><![CDATA[<blockquote>
<p>本文是 <a href="/blog/backend/01-database/vendors/spanner/" data-link-title="Google Cloud Spanner" data-link-desc="全球分散式 strong-consistency OLTP、TrueTime API、線性擴展到 10 億 req/sec">Cloud Spanner</a> overview 的 implementation-layer deep article。Overview 已說明 Spanner 在全球 OLTP 譜系的定位、本文聚焦 <em>schema migration without downtime + interleaved tables</em> — Spanner 兩個跟傳統 SQL 差異最大的 schema 機制。</p></blockquote>
<hr>
<h2 id="問題情境ddl-不停機跟-parent-child-物理-layout-的兩個疑問">問題情境：DDL 不停機跟 parent-child 物理 layout 的兩個疑問</h2>
<p>傳統 PostgreSQL / MySQL DDL 拿 ACCESS EXCLUSIVE / metadata lock、線上跑 ALTER TABLE 動輒鎖表幾分鐘、大型 schema change 要 pt-osc / gh-ost / pg_repack 等外掛工具。Spanner 宣稱「schema change 不停機」、但團隊不知道實際機制跟邊界。讀者徵兆通常從這幾個地方浮現：「Spanner ALTER 真的不卡寫入嗎」「INDEX backfill 跑了 12 小時是正常嗎」「parent-child 的 INTERLEAVE IN PARENT 是什麼黑魔法」「ON DELETE CASCADE 在 interleaved table 為什麼是 storage-level 而不是 application-level」。</p>
<p>真實壓力：multi-tenant SaaS 要對 100 億 row 的 orders 表加 column + 加 index、不能停機、不能讓 p99 write latency 超過 SLA。團隊以為「Spanner schema change 不停機」等同於「DDL 瞬間完成」、實際 ALTER 是 long-running operation、index backfill 在大表上跑數小時到數天、capacity 規劃要把 backfill 期間的 CPU 升幅算進去。</p>
<p>Case anchor：<strong>缺案例</strong>。9.C10 是 Google internal dogfood case、未展開 schema migration 細節、且 9.C10 不是 customer-facing capacity reference。本文用通用 pattern + 官方文件 + 反向回 <a href="/blog/backend/01-database/vendors/postgresql/online-schema-change/" data-link-title="PostgreSQL Online Schema Change：先用 ALTER 內建特性、不能解才 pg_repack / pg-osc" data-link-desc="PostgreSQL ALTER TABLE 對多數變更已是 *fast catalog-only*（add column nullable / drop column / 改 default），不必走 ghost table tool。本文走 PG 內建 fast DDL 行為、何時必須走 pg_repack / pg-osc、兩工具機制對比（trigger-based vs WAL-shipping）、配置 step-by-step、5 production 踩雷（lock 升級 / VACUUM FULL 誤用 / pg_repack version mismatch / concurrent index 失敗清理 / generated stored column 不能 online）、跟 MySQL gh-ost / pt-osc sibling 對比">PostgreSQL Online Schema Change</a> 對照、待後續 customer case audit 補強。</p>
<h2 id="核心機制ddl-是-long-runningtruetime-對齊-schema-version">核心機制：DDL 是 long-running、TrueTime 對齊 schema version</h2>
<h3 id="schema-change-的-lifecycle">Schema change 的 lifecycle</h3>
<p>Spanner DDL 不是同步 ALTER、是 <em>long-running operation</em>。TrueTime 給每次 schema change 分配一個 version timestamp、所有 read / write 用各自 transaction timestamp 對應「當下看到哪個 schema version」。讀者要理解的核心是：DDL 不是「鎖表→改→解鎖」、是「廣播新 schema version、讓現有 transaction 用舊 schema、新 transaction 用新 schema、背景 backfill 物理資料」。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="ln"> 1</span><span class="cl">時間軸：
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">
</span></span><span class="line"><span class="ln"> 3</span><span class="cl">T0 (DDL 開始)
</span></span><span class="line"><span class="ln"> 4</span><span class="cl">     |
</span></span><span class="line"><span class="ln"> 5</span><span class="cl">     | ──── 舊 schema 仍可用、新 schema metadata 廣播 ────
</span></span><span class="line"><span class="ln"> 6</span><span class="cl">     |
</span></span><span class="line"><span class="ln"> 7</span><span class="cl">T1 (metadata 完成)
</span></span><span class="line"><span class="ln"> 8</span><span class="cl">     |
</span></span><span class="line"><span class="ln"> 9</span><span class="cl">     | ──── 新 transaction 用新 schema、舊 transaction 完成自己 ────
</span></span><span class="line"><span class="ln">10</span><span class="cl">     | ──── backfill 開始（背景）────
</span></span><span class="line"><span class="ln">11</span><span class="cl">     |
</span></span><span class="line"><span class="ln">12</span><span class="cl">T2 (backfill 完成)
</span></span><span class="line"><span class="ln">13</span><span class="cl">     |
</span></span><span class="line"><span class="ln">14</span><span class="cl">     | ──── 新 schema fully serve ────</span></span></code></pre></div><p>DDL 本身瞬間完成的部分是 <em>metadata 廣播</em>（毫秒到秒級）、慢的部分是 <em>backfill</em>（依資料量、可能數小時到數天）。讀者常見誤解是把 metadata 完成當「DDL 完成」、實際 query 還沒走新 index 因為 backfill 沒跑完。</p>
<h3 id="不停機的關鍵不同-ddl-的兩階段行為">不停機的關鍵：不同 DDL 的兩階段行為</h3>
<table>
  <thead>
      <tr>
          <th>DDL 類型</th>
          <th>metadata 行為</th>
          <th>backfill 行為</th>
          <th>阻塞？</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td><code>ADD COLUMN</code>（無 NOT NULL）</td>
          <td>metadata-only、瞬間生效</td>
          <td>不需 backfill（新 column 預設 NULL）</td>
          <td>不阻塞 write</td>
      </tr>
      <tr>
          <td><code>ADD COLUMN</code>（NOT NULL）</td>
          <td>必須兩階段：先 ADD COLUMN with default、後 ADD CONSTRAINT</td>
          <td>兩階段間需 backfill default</td>
          <td>不阻塞 write、但兩階段不能合</td>
      </tr>
      <tr>
          <td><code>CREATE INDEX</code></td>
          <td>metadata 立即</td>
          <td>背景 backfill、不阻塞 write；backfill 完才 serve query</td>
          <td>不阻塞 write、阻塞「該 index 的 query」</td>
      </tr>
      <tr>
          <td><code>DROP COLUMN</code></td>
          <td>metadata 立即</td>
          <td>背景 GC dead column</td>
          <td>不阻塞</td>
      </tr>
      <tr>
          <td><code>ALTER COLUMN TYPE</code></td>
          <td>限制多、查最新文件</td>
          <td>-</td>
          <td>-</td>
      </tr>
  </tbody>
</table>
<p>讀者要記的是：<strong>index backfill 完成前、query 該 index 會 fallback 到 table scan</strong>、用 <code>EXPLAIN</code> 確認 query plan 走新 index 才算真正完成。沒做這層驗證、團隊會以為 CREATE INDEX 已經成功、實際 p99 query latency 還在表掃描的數量級。</p>
<h3 id="interleaved-table-的設計">Interleaved table 的設計</h3>
<p><a href="/blog/backend/knowledge-cards/interleaved-table/" data-link-title="Interleaved Table" data-link-desc="Spanner 把 parent / child table row 物理交錯儲存、parent &#43; child JOIN 不跨 split">Interleaved Table</a> 把 parent table（如 <code>Customer</code>）跟 child table（如 <code>Order</code>）的 row 在 storage 層 <em>物理上交錯儲存</em> — child row 跟對應 parent row 在同一個 split。不是純 foreign key、是 storage layout：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="ln"> 1</span><span class="cl">傳統 PostgreSQL FK 設計（兩張獨立表）：
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">Customer table:  [c1, c2, c3, ...]  → 一張表、一段 storage range
</span></span><span class="line"><span class="ln"> 3</span><span class="cl">Order table:     [o1, o2, o3, ...]  → 另一張表、另一段 storage range
</span></span><span class="line"><span class="ln"> 4</span><span class="cl">FK 由 planner 在 JOIN 時拼接、可能跨 page / 跨 segment
</span></span><span class="line"><span class="ln"> 5</span><span class="cl">
</span></span><span class="line"><span class="ln"> 6</span><span class="cl">Spanner Interleaved 設計（物理交錯）：
</span></span><span class="line"><span class="ln"> 7</span><span class="cl">Storage layout: [c1, c1.o1, c1.o2, c2, c2.o1, c2.o2, c2.o3, c3, ...]
</span></span><span class="line"><span class="ln"> 8</span><span class="cl">                 |____________________|  |________________|
</span></span><span class="line"><span class="ln"> 9</span><span class="cl">                  c1 + 其 child           c2 + 其 child
</span></span><span class="line"><span class="ln">10</span><span class="cl">                  在同一個 split          在同一個 split</span></span></code></pre></div><p>Interleaved 的效果：parent + child JOIN 在同一個 <a href="/blog/backend/knowledge-cards/range-sharding/" data-link-title="Range Sharding" data-link-desc="分散式 SQL 把 key space 切成可自動 split / merge 的 range、每個 range 自己的 consensus group、application 透明">Range Sharding</a> split 完成、不跨 split = 不跨 Paxos group = 低延遲 transaction。這條設計把「FK 是 logical constraint」翻成「parent-child access pattern 是 physical co-location」、對 access pattern 固定的 workload（customer → orders、user → posts、tenant → records）是巨大 latency benefit。</p>
<h3 id="interleaved-的硬限">Interleaved 的硬限</h3>
<table>
  <thead>
      <tr>
          <th>限制</th>
          <th>影響</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>必須以 parent primary key 為 prefix</td>
          <td>child PK 第一段必須是 parent PK、不能完全自由</td>
      </tr>
      <tr>
          <td>最深 7 層</td>
          <td>深巢狀關係要選層級</td>
      </tr>
      <tr>
          <td><code>ON DELETE</code> 只能 CASCADE 或 NO ACTION</td>
          <td>不像 PG FK 有 SET NULL / SET DEFAULT</td>
      </tr>
      <tr>
          <td>一旦建立、無法直接 ALTER 改 interleave</td>
          <td>要改 → export + recreate + import、不是 ALTER</td>
      </tr>
  </tbody>
</table>
<p>最後一條是讀者最容易踩的雷 — 一開始沒設 interleaved、後悔時要 export-import 100 億 row、是大工程、不是 ALTER。Schema 設計階段要先 audit access pattern、決定哪些 parent-child 該 interleave。</p>
<h3 id="跟通用-fk-概念的差異">跟通用 FK 概念的差異</h3>
<p>PostgreSQL FK 是 logical constraint、JOIN 由 planner 處理；Spanner interleaved 是 physical layout、JOIN cost 跟 single-table access 接近。對應 <a href="/blog/backend/knowledge-cards/transaction-boundary/" data-link-title="Transaction Boundary" data-link-desc="說明哪些資料變更應在同一個交易中一起成功或一起回復">transaction-boundary</a> 卡 — interleaved 讓 transaction boundary 跟 storage boundary 對齊、跨 split transaction 變少、commit wait + Paxos round-trip 也省。</p>
<h2 id="操作流程ddl-跟-interleaved-table-的具體步驟">操作流程：DDL 跟 interleaved table 的具體步驟</h2>
<h3 id="加-column">加 column</h3>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-sql" data-lang="sql"><span class="line"><span class="ln">1</span><span class="cl"><span class="k">ALTER</span><span class="w"> </span><span class="k">TABLE</span><span class="w"> </span><span class="n">Orders</span><span class="w"> </span><span class="k">ADD</span><span class="w"> </span><span class="k">COLUMN</span><span class="w"> </span><span class="n">tax_amount</span><span class="w"> </span><span class="n">FLOAT64</span><span class="p">;</span></span></span></code></pre></div><p>執行後拿 long-running operation id、用 <code>gcloud spanner operations list</code> 觀察狀態：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">gcloud spanner operations list --instance<span class="o">=</span>prod --database<span class="o">=</span>app
</span></span><span class="line"><span class="ln">2</span><span class="cl">gcloud spanner operations describe projects/.../operations/&lt;op-id&gt;</span></span></code></pre></div><p>驗證點：operation 顯示 <code>done: true</code> 後、跑 <code>SELECT tax_amount FROM Orders LIMIT 1</code> 確認 column 可查。</p>
<h3 id="加-index">加 index</h3>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-sql" data-lang="sql"><span class="line"><span class="ln">1</span><span class="cl"><span class="k">CREATE</span><span class="w"> </span><span class="k">INDEX</span><span class="w"> </span><span class="n">OrdersByCustomer</span><span class="w"> </span><span class="k">ON</span><span class="w"> </span><span class="n">Orders</span><span class="p">(</span><span class="n">customer_id</span><span class="p">);</span></span></span></code></pre></div><p>拿 operation id → 用 Monitoring metric <code>spanner.googleapis.com/instance/indexes/backfill_progress</code>（或對應的最新 metric、查官方文件）追蹤進度。Backfill 完成前 query 不會走新 index、要用 <code>EXPLAIN</code> 確認：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-sql" data-lang="sql"><span class="line"><span class="ln">1</span><span class="cl"><span class="k">EXPLAIN</span><span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">Orders</span><span class="w"> </span><span class="k">WHERE</span><span class="w"> </span><span class="n">customer_id</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">&#39;c123&#39;</span><span class="p">;</span><span class="w">
</span></span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="w"></span><span class="c1">-- 應看到 plan 用 OrdersByCustomer index、不是 table scan</span></span></span></code></pre></div><h3 id="創建-interleaved-table">創建 interleaved table</h3>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-sql" data-lang="sql"><span class="line"><span class="ln">1</span><span class="cl"><span class="k">CREATE</span><span class="w"> </span><span class="k">TABLE</span><span class="w"> </span><span class="o">`</span><span class="k">Order</span><span class="o">`</span><span class="w"> </span><span class="p">(</span><span class="w">
</span></span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="w">    </span><span class="n">customer_id</span><span class="w"> </span><span class="n">INT64</span><span class="w"> </span><span class="k">NOT</span><span class="w"> </span><span class="k">NULL</span><span class="p">,</span><span class="w">
</span></span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="w">    </span><span class="n">order_id</span><span class="w"> </span><span class="n">INT64</span><span class="w"> </span><span class="k">NOT</span><span class="w"> </span><span class="k">NULL</span><span class="p">,</span><span class="w">
</span></span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="w">    </span><span class="n">amount</span><span class="w"> </span><span class="n">FLOAT64</span><span class="p">,</span><span class="w">
</span></span></span><span class="line"><span class="ln">5</span><span class="cl"><span class="w">    </span><span class="n">created_at</span><span class="w"> </span><span class="k">TIMESTAMP</span><span class="p">,</span><span class="w">
</span></span></span><span class="line"><span class="ln">6</span><span class="cl"><span class="w"></span><span class="p">)</span><span class="w"> </span><span class="k">PRIMARY</span><span class="w"> </span><span class="k">KEY</span><span class="w"> </span><span class="p">(</span><span class="n">customer_id</span><span class="p">,</span><span class="w"> </span><span class="n">order_id</span><span class="p">),</span><span class="w">
</span></span></span><span class="line"><span class="ln">7</span><span class="cl"><span class="w">  </span><span class="n">INTERLEAVE</span><span class="w"> </span><span class="k">IN</span><span class="w"> </span><span class="n">PARENT</span><span class="w"> </span><span class="n">Customer</span><span class="w"> </span><span class="k">ON</span><span class="w"> </span><span class="k">DELETE</span><span class="w"> </span><span class="k">CASCADE</span><span class="p">;</span></span></span></code></pre></div><p>關鍵約束：</p>
<ul>
<li>child PK <code>(customer_id, order_id)</code> 第一段是 parent PK</li>
<li><code>ON DELETE CASCADE</code> 是 storage-level — 刪 parent row 自動刪 child row、Spanner 內部處理、不是 trigger</li>
</ul>
<h3 id="從-non-interleaved-改成-interleaved">從 non-interleaved 改成 interleaved</h3>
<p><em>無法直接 ALTER</em>、要走 export-recreate-import：</p>
<ol>
<li>用 Dataflow / <code>gcloud spanner databases export</code> 把舊表 export 到 GCS</li>
<li>建新表（interleaved schema）</li>
<li>用 Dataflow / <code>gcloud spanner databases import</code> 把資料倒回</li>
<li>應用層 cutover（feature flag / dual write）</li>
</ol>
<p>這個流程是 mini-migration、要走完整 <a href="../migrate-from-cloud-sql-pg/">migration playbook</a> 的 phase plan。Schema 設計階段就決定好 interleave、避免後悔成本。</p>
<h3 id="rollback-boundary">Rollback boundary</h3>
<p>DDL 完成前可 <code>gcloud spanner operations cancel</code> 取消；完成後加 index 要 DROP、加 column 要 DROP COLUMN（同樣是 long-running）。讀者要先確認自己在 DDL 哪個階段、cancel 跟 reverse DDL 是兩條不同路徑。</p>
<h2 id="失敗模式5-個-production-踩雷">失敗模式：5 個 production 踩雷</h2>
<h3 id="backfill-時間沒估event-window-撞牆">Backfill 時間沒估、event window 撞牆</h3>
<p>100 億 row 加 index、預期 1 小時、實際 12 小時 — 沒先用 <code>cost</code> 估 + 沒監控進度 metric。事故場景：團隊在 black friday 前一週開 CREATE INDEX、以為週末跑完、實際週末仍在 backfill、event 期間 CPU 升、query latency 退化。</p>
<p>修法：</p>
<ul>
<li>DDL 前用小表 benchmark backfill 速度（rows/sec）、推估大表時間</li>
<li>DDL 期間監控 <code>instance/cpu/smoothed_utilization</code>、若 &gt; 80% 暫停或降流量</li>
<li>大 DDL 排在 capacity headroom 充足的時段、避開 event window</li>
</ul>
<h3 id="interleaved-table-一開始沒設後悔時要-recreate">Interleaved table 一開始沒設、後悔時要 recreate</h3>
<p>100 億 row export-import + cutover 是大工程、不是 ALTER。事故場景：團隊一開始把 Customer / Order 設成獨立表、上線一年後發現 customer → orders access pattern 是 99% 的 query、JOIN 跨 split 付 commit wait + Paxos cost、想改 interleaved、發現要 mini-migration。</p>
<p>修法：</p>
<ul>
<li>Schema 設計階段就 audit access pattern、決定哪些 parent-child 該 interleave</li>
<li>寫 ADR 把 interleave 決策跟業務 access pattern 綁定、避免後悔成本</li>
</ul>
<h3 id="把-interleaved-跟-fk-混為一談">把 interleaved 跟 FK 混為一談</h3>
<p>interleaved 的 <code>ON DELETE CASCADE</code> 是 storage-level、刪 parent 自動刪 child；非 interleaved FK 要 application 或 trigger 處理。事故場景：團隊以為「我加了 FK 就會 CASCADE」、實際非 interleaved table 只是 constraint check、刪 parent 時 child orphan、對帳爆炸。</p>
<p>修法：</p>
<ul>
<li>Schema 設計時明確分類：interleaved（storage-level CASCADE）vs FK constraint（只檢查、不 CASCADE）</li>
<li>非 interleaved 的 parent-child 刪除邏輯放應用層、寫入對帳測試</li>
</ul>
<h3 id="加-not-null-一步到位">加 NOT NULL 一步到位</h3>
<p>直接 <code>ALTER ADD COLUMN x INT64 NOT NULL</code> 會失敗、必須兩階段。事故場景：開發環境 schema 是新建空表、<code>ADD COLUMN NOT NULL</code> OK；production 表有資料、ADD 失敗、團隊以為 Spanner 不支援、回退。</p>
<p>修法：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-sql" data-lang="sql"><span class="line"><span class="ln">1</span><span class="cl"><span class="c1">-- Phase 1: ADD with default
</span></span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="c1"></span><span class="k">ALTER</span><span class="w"> </span><span class="k">TABLE</span><span class="w"> </span><span class="n">Orders</span><span class="w"> </span><span class="k">ADD</span><span class="w"> </span><span class="k">COLUMN</span><span class="w"> </span><span class="n">tax_amount</span><span class="w"> </span><span class="n">FLOAT64</span><span class="w"> </span><span class="k">DEFAULT</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span><span class="w">
</span></span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="w"></span><span class="c1">-- 等 backfill 完成
</span></span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="c1"></span><span class="w">
</span></span></span><span class="line"><span class="ln">5</span><span class="cl"><span class="w"></span><span class="c1">-- Phase 2: ADD CONSTRAINT
</span></span></span><span class="line"><span class="ln">6</span><span class="cl"><span class="c1"></span><span class="k">ALTER</span><span class="w"> </span><span class="k">TABLE</span><span class="w"> </span><span class="n">Orders</span><span class="w"> </span><span class="k">ALTER</span><span class="w"> </span><span class="k">COLUMN</span><span class="w"> </span><span class="n">tax_amount</span><span class="w"> </span><span class="k">SET</span><span class="w"> </span><span class="k">NOT</span><span class="w"> </span><span class="k">NULL</span><span class="p">;</span></span></span></code></pre></div><h3 id="schema-change-期間舊-client-還在用舊-schema">Schema change 期間舊 client 還在用舊 schema</h3>
<p>TrueTime 保證 read 看到自己 timestamp 對應的 schema version、但 client SDK cache schema 過期會 retry — 沒處理會看到 transient error。事故場景：DDL 完成後、舊 client session 看到 transient <code>FAILED_PRECONDITION</code>、團隊以為 DDL 失敗、回退。</p>
<p>修法：</p>
<ul>
<li>應用層處理 transient retry（指數退避）</li>
<li>DDL 完成後重新 deploy app instance、避免長期 stale schema cache</li>
</ul>
<h2 id="容量與觀測backfill-是-cpu--io-的額外負載">容量與觀測：Backfill 是 CPU + I/O 的額外負載</h2>
<p>必看 metric：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="ln">1</span><span class="cl">spanner.googleapis.com/instance/cpu/smoothed_utilization
</span></span><span class="line"><span class="ln">2</span><span class="cl">   → backfill 期間 CPU 升幅、判讀是否撞 headroom
</span></span><span class="line"><span class="ln">3</span><span class="cl">api/api_request_count for ExecuteSql
</span></span><span class="line"><span class="ln">4</span><span class="cl">   → application traffic 是否受 backfill 影響
</span></span><span class="line"><span class="ln">5</span><span class="cl">long-running operation API progress
</span></span><span class="line"><span class="ln">6</span><span class="cl">   → DDL 自身進度（不是 query 進度）</span></span></code></pre></div><p>Backfill 期間的 capacity impact：DDL 跑在 background priority、但仍佔 CPU、需要在 instance 有足夠 headroom（建議 &lt; 65% CPU baseline 才開大 backfill）。capacity 規劃要把 schema migration 列入 buffer、回 <a href="/blog/backend/09-performance-capacity/capacity-planning/" data-link-title="9.6 容量規劃模型" data-link-desc="peak forecast、headroom budget、growth curve、autoscaling sizing">9.6 容量規劃模型</a>。</p>
<p>Observability evidence：backfill 開始 timestamp、operation id、predicted duration、實際 duration、CPU peak — 全進 incident decision log、回 <a href="/blog/backend/04-observability/observability-evidence-package/" data-link-title="4.20 Observability Evidence Package" data-link-desc="把 log、metric、trace、audit 與資料品質限制包成可交接證據">4.20 Observability Evidence Package</a>。</p>
<p>監控盲點：DDL operation 失敗 silent fail 在 <code>gcloud operations describe</code> 才能看到、Cloud Monitoring 沒有直接 alert。團隊要寫自己的 polling script、operation 失敗時主動 alert、不靠 Cloud Monitoring default。</p>
<h2 id="邊界與整合何時不用-interleaved怎麼跟-pg-對照">邊界與整合：何時不用 interleaved、怎麼跟 PG 對照</h2>
<h3 id="何時不用-interleaved">何時不用 interleaved</h3>
<ul>
<li>小 table（&lt; 1M row、單機可放）：不需要 interleave、用 standard FK 就好</li>
<li>過度 interleave 7 層：把 split 變窄、反而 hot、得不償失</li>
<li>access pattern 不是 parent-child JOIN：interleave 沒 benefit、純粹給 schema 加複雜度</li>
</ul>
<h3 id="跟-postgresql-的對照">跟 PostgreSQL 的對照</h3>
<p><a href="/blog/backend/01-database/vendors/postgresql/online-schema-change/" data-link-title="PostgreSQL Online Schema Change：先用 ALTER 內建特性、不能解才 pg_repack / pg-osc" data-link-desc="PostgreSQL ALTER TABLE 對多數變更已是 *fast catalog-only*（add column nullable / drop column / 改 default），不必走 ghost table tool。本文走 PG 內建 fast DDL 行為、何時必須走 pg_repack / pg-osc、兩工具機制對比（trigger-based vs WAL-shipping）、配置 step-by-step、5 production 踩雷（lock 升級 / VACUUM FULL 誤用 / pg_repack version mismatch / concurrent index 失敗清理 / generated stored column 不能 online）、跟 MySQL gh-ost / pt-osc sibling 對比">PostgreSQL Online Schema Change</a> 用 pg_repack / pt-osc workflow 模擬「不停機」 — 實際是用 trigger + 影子表 + cutover 把 lock 時間壓到秒級、不是真正瞬間。Spanner 是原生支援 DDL long-running operation、不需要外掛工具、但 backfill 時間在大表上仍長、跟 pg_repack 在大表上的執行時間量級接近。</p>
<p>差異點：</p>
<table>
  <thead>
      <tr>
          <th>維度</th>
          <th>PostgreSQL（pg_repack / pt-osc）</th>
          <th>Spanner</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>Lock 時間</td>
          <td>秒級（cutover 時短鎖）</td>
          <td>毫秒（metadata 廣播）</td>
      </tr>
      <tr>
          <td>Backfill 時間</td>
          <td>數小時</td>
          <td>數小時</td>
      </tr>
      <tr>
          <td>工具</td>
          <td>外掛</td>
          <td>原生</td>
      </tr>
      <tr>
          <td>Schema version</td>
          <td>單版</td>
          <td>TrueTime timestamp 對齊多版並存</td>
      </tr>
      <tr>
          <td>大表加 NOT NULL</td>
          <td>一步到位（搭配 default）</td>
          <td>必須兩階段</td>
      </tr>
  </tbody>
</table>
<p>讀者選 Spanner 不是為了「DDL 更快」、是為了「不依賴外掛 + 多版本並存」。實際在大表上的耗時兩邊差不多。</p>
<h3 id="sibling-deep-articles">Sibling deep articles</h3>
<ul>
<li><a href="../truetime-api-depth/">truetime-api-depth</a>：schema version 也是 TrueTime timestamp、跟 transaction timestamp 同層機制</li>
<li><a href="../migrate-from-cloud-sql-pg/">migrate-from-cloud-sql-pg</a>：target schema 設計含 interleaved、Phase 1 必讀本文</li>
<li><a href="../consistency-models-comparison/">consistency-models-comparison</a>：schema change 期間多版本並存的一致性保證</li>
</ul>
<h3 id="跟-1x-章節">跟 1.x 章節</h3>
<p><a href="/blog/backend/01-database/schema-design/" data-link-title="1.2 Schema Design 與資料建模" data-link-desc="整理 table、index、key、partition、denormalization 與命名規則">Schema Design</a> — interleaved 是 schema 設計的物理層決策、不是純 logical design。對照 <a href="/blog/backend/01-database/schema-migration-rollout-evidence/" data-link-title="1.7 Schema Migration Rollout 證據（Schema Migration Rollout Evidence）實作示範" data-link-desc="以訂單付款狀態欄位演進示範 schema migration 如何產出 evidence、release gate 與 incident decision log。">schema-migration-rollout-evidence</a> 看 schema rollout 的 evidence 收集模式。</p>
<h3 id="anti-recommendation">Anti-recommendation</h3>
<p>讀者讀完本文應該能判斷：interleaved 不是「強制使用」的 feature、是「access pattern 固定時的 latency benefit」。小規模 OLTP、access pattern 不確定的 workload、用 standard PostgreSQL FK 就好、為 interleaved 付 schema 後悔成本的判準很高。</p>
]]></content:encoded></item><item><title>MySQL Online Schema Change Lab</title><link>https://tarrragon.github.io/blog/backend/01-database/vendors/mysql/hands-on/online-schema-change-lab/</link><pubDate>Fri, 22 May 2026 00:00:00 +0000</pubDate><guid>https://tarrragon.github.io/blog/backend/01-database/vendors/mysql/hands-on/online-schema-change-lab/</guid><description>&lt;p>MySQL online schema change lab 的核心責任是讓讀者看到 schema change 的 metadata lock、algorithm、copy / cutover 與 validation evidence。這篇承接 &lt;a href="../../online-schema-change-tools/">Online Schema Change Tools&lt;/a> 與 &lt;a href="../../metadata-lock-deep-dive/">Metadata Lock Deep Dive&lt;/a>。&lt;/p>
&lt;p>本文的驗收標準是：你能跑一個低風險 ALTER、觀察 metadata lock、記錄 validation query，並理解 gh-ost / pt-osc 的 cutover evidence。&lt;/p>
&lt;h2 id="direct-alter-baseline">Direct ALTER Baseline&lt;/h2>
&lt;p>Direct ALTER baseline 的核心責任是先看 MySQL 原生 DDL 的行為。&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-bash" data-lang="bash">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">mysql -h 127.0.0.1 -P &lt;span class="m">33069&lt;/span> -u app_user -papp_pw appdb &lt;span class="s">&amp;lt;&amp;lt;&amp;#39;SQL&amp;#39;
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">2&lt;/span>&lt;span class="cl">&lt;span class="s">ALTER TABLE accounts ADD COLUMN email VARCHAR(255) NULL;
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">3&lt;/span>&lt;span class="cl">&lt;span class="s">SHOW CREATE TABLE accounts\G
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">4&lt;/span>&lt;span class="cl">&lt;span class="s">SQL&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>記錄 ALTER duration、algorithm、lock impact 與 table size。不同 MySQL 版本與 DDL 類型會有不同行為，production 要在 staging dry run。&lt;/p>
&lt;h2 id="metadata-lock-observation">Metadata Lock Observation&lt;/h2>
&lt;p>Metadata lock observation 的核心責任是看到 blocker。&lt;/p>
&lt;p>開 Session A：&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-sql" data-lang="sql">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">&lt;span class="k">START&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="k">TRANSACTION&lt;/span>&lt;span class="p">;&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">2&lt;/span>&lt;span class="cl">&lt;span class="w">&lt;/span>&lt;span class="k">SELECT&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="o">*&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="k">FROM&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="n">accounts&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="k">WHERE&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="n">id&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="mi">1&lt;/span>&lt;span class="p">;&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>保持 transaction 開啟。Session B 執行：&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-sql" data-lang="sql">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">&lt;span class="k">ALTER&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="k">TABLE&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="n">accounts&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="k">ADD&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="k">COLUMN&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="n">note&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="nb">VARCHAR&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="mi">255&lt;/span>&lt;span class="p">)&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="k">NULL&lt;/span>&lt;span class="p">;&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>Session C 查：&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-sql" data-lang="sql">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">&lt;span class="k">SELECT&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="n">OBJECT_SCHEMA&lt;/span>&lt;span class="p">,&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="n">OBJECT_NAME&lt;/span>&lt;span class="p">,&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="n">LOCK_TYPE&lt;/span>&lt;span class="p">,&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="n">LOCK_STATUS&lt;/span>&lt;span class="p">,&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="n">OWNER_THREAD_ID&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">2&lt;/span>&lt;span class="cl">&lt;span class="w">&lt;/span>&lt;span class="k">FROM&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="n">performance_schema&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="n">metadata_locks&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">3&lt;/span>&lt;span class="cl">&lt;span class="w">&lt;/span>&lt;span class="k">WHERE&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="n">OBJECT_SCHEMA&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="s1">&amp;#39;appdb&amp;#39;&lt;/span>&lt;span class="p">;&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>完成觀察後，Session A &lt;code>COMMIT&lt;/code>。這段 lab 展示 long transaction 如何讓 DDL 等待。&lt;/p>
&lt;h2 id="osc-frame">OSC Frame&lt;/h2>
&lt;p>OSC frame 的核心責任是理解 gh-ost / pt-online-schema-change 的證據，而非要求每個 lab 都安裝工具。&lt;/p>
&lt;p>OSC runbook 要記錄：&lt;/p>
&lt;ol>
&lt;li>Source table、ghost table、migration statement。&lt;/li>
&lt;li>Copy progress、chunk size、throttle condition。&lt;/li>
&lt;li>Replication lag / load threshold。&lt;/li>
&lt;li>Cutover pre-check：long transaction、metadata lock、traffic。&lt;/li>
&lt;li>Cutover duration 與 validation query。&lt;/li>
&lt;li>Rollback / drop ghost table policy。&lt;/li>
&lt;/ol>
&lt;p>Cutover 前最重要的是 metadata lock pre-check。工具能降低大部分 copy 風險，但最後 rename / swap 仍需要短暫鎖。&lt;/p>
&lt;h2 id="validation">Validation&lt;/h2>
&lt;p>Validation 的核心責任是證明 schema change 後資料與 query 仍正確。&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-bash" data-lang="bash">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">mysql -h 127.0.0.1 -P &lt;span class="m">33069&lt;/span> -u app_user -papp_pw appdb &lt;span class="s">&amp;lt;&amp;lt;&amp;#39;SQL&amp;#39;
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">2&lt;/span>&lt;span class="cl">&lt;span class="s">SELECT COUNT(*) FROM accounts;
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">3&lt;/span>&lt;span class="cl">&lt;span class="s">SELECT COUNT(*) FROM ledger_entries;
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">4&lt;/span>&lt;span class="cl">&lt;span class="s">EXPLAIN SELECT * FROM accounts WHERE tenant_id = &amp;#39;tenant-a&amp;#39;;
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">5&lt;/span>&lt;span class="cl">&lt;span class="s">SQL&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>正式 migration 要補 row checksum、null rate、index usage、replication lag 與 application smoke test。&lt;/p></description><content:encoded><![CDATA[<p>MySQL online schema change lab 的核心責任是讓讀者看到 schema change 的 metadata lock、algorithm、copy / cutover 與 validation evidence。這篇承接 <a href="../../online-schema-change-tools/">Online Schema Change Tools</a> 與 <a href="../../metadata-lock-deep-dive/">Metadata Lock Deep Dive</a>。</p>
<p>本文的驗收標準是：你能跑一個低風險 ALTER、觀察 metadata lock、記錄 validation query，並理解 gh-ost / pt-osc 的 cutover evidence。</p>
<h2 id="direct-alter-baseline">Direct ALTER Baseline</h2>
<p>Direct ALTER baseline 的核心責任是先看 MySQL 原生 DDL 的行為。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">mysql -h 127.0.0.1 -P <span class="m">33069</span> -u app_user -papp_pw appdb <span class="s">&lt;&lt;&#39;SQL&#39;
</span></span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="s">ALTER TABLE accounts ADD COLUMN email VARCHAR(255) NULL;
</span></span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="s">SHOW CREATE TABLE accounts\G
</span></span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="s">SQL</span></span></span></code></pre></div><p>記錄 ALTER duration、algorithm、lock impact 與 table size。不同 MySQL 版本與 DDL 類型會有不同行為，production 要在 staging dry run。</p>
<h2 id="metadata-lock-observation">Metadata Lock Observation</h2>
<p>Metadata lock observation 的核心責任是看到 blocker。</p>
<p>開 Session A：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-sql" data-lang="sql"><span class="line"><span class="ln">1</span><span class="cl"><span class="k">START</span><span class="w"> </span><span class="k">TRANSACTION</span><span class="p">;</span><span class="w">
</span></span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="w"></span><span class="k">SELECT</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">accounts</span><span class="w"> </span><span class="k">WHERE</span><span class="w"> </span><span class="n">id</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span></span></span></code></pre></div><p>保持 transaction 開啟。Session B 執行：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-sql" data-lang="sql"><span class="line"><span class="ln">1</span><span class="cl"><span class="k">ALTER</span><span class="w"> </span><span class="k">TABLE</span><span class="w"> </span><span class="n">accounts</span><span class="w"> </span><span class="k">ADD</span><span class="w"> </span><span class="k">COLUMN</span><span class="w"> </span><span class="n">note</span><span class="w"> </span><span class="nb">VARCHAR</span><span class="p">(</span><span class="mi">255</span><span class="p">)</span><span class="w"> </span><span class="k">NULL</span><span class="p">;</span></span></span></code></pre></div><p>Session C 查：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-sql" data-lang="sql"><span class="line"><span class="ln">1</span><span class="cl"><span class="k">SELECT</span><span class="w"> </span><span class="n">OBJECT_SCHEMA</span><span class="p">,</span><span class="w"> </span><span class="n">OBJECT_NAME</span><span class="p">,</span><span class="w"> </span><span class="n">LOCK_TYPE</span><span class="p">,</span><span class="w"> </span><span class="n">LOCK_STATUS</span><span class="p">,</span><span class="w"> </span><span class="n">OWNER_THREAD_ID</span><span class="w">
</span></span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="w"></span><span class="k">FROM</span><span class="w"> </span><span class="n">performance_schema</span><span class="p">.</span><span class="n">metadata_locks</span><span class="w">
</span></span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="w"></span><span class="k">WHERE</span><span class="w"> </span><span class="n">OBJECT_SCHEMA</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">&#39;appdb&#39;</span><span class="p">;</span></span></span></code></pre></div><p>完成觀察後，Session A <code>COMMIT</code>。這段 lab 展示 long transaction 如何讓 DDL 等待。</p>
<h2 id="osc-frame">OSC Frame</h2>
<p>OSC frame 的核心責任是理解 gh-ost / pt-online-schema-change 的證據，而非要求每個 lab 都安裝工具。</p>
<p>OSC runbook 要記錄：</p>
<ol>
<li>Source table、ghost table、migration statement。</li>
<li>Copy progress、chunk size、throttle condition。</li>
<li>Replication lag / load threshold。</li>
<li>Cutover pre-check：long transaction、metadata lock、traffic。</li>
<li>Cutover duration 與 validation query。</li>
<li>Rollback / drop ghost table policy。</li>
</ol>
<p>Cutover 前最重要的是 metadata lock pre-check。工具能降低大部分 copy 風險，但最後 rename / swap 仍需要短暫鎖。</p>
<h2 id="validation">Validation</h2>
<p>Validation 的核心責任是證明 schema change 後資料與 query 仍正確。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">mysql -h 127.0.0.1 -P <span class="m">33069</span> -u app_user -papp_pw appdb <span class="s">&lt;&lt;&#39;SQL&#39;
</span></span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="s">SELECT COUNT(*) FROM accounts;
</span></span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="s">SELECT COUNT(*) FROM ledger_entries;
</span></span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="s">EXPLAIN SELECT * FROM accounts WHERE tenant_id = &#39;tenant-a&#39;;
</span></span></span><span class="line"><span class="ln">5</span><span class="cl"><span class="s">SQL</span></span></span></code></pre></div><p>正式 migration 要補 row checksum、null rate、index usage、replication lag 與 application smoke test。</p>
<h2 id="release-gate">Release Gate</h2>
<p>Release gate 的核心責任是形成交付 artifact。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="ln">1</span><span class="cl">Migration:
</span></span><span class="line"><span class="ln">2</span><span class="cl">DDL / OSC command:
</span></span><span class="line"><span class="ln">3</span><span class="cl">Table size:
</span></span><span class="line"><span class="ln">4</span><span class="cl">MDL pre-check:
</span></span><span class="line"><span class="ln">5</span><span class="cl">Duration:
</span></span><span class="line"><span class="ln">6</span><span class="cl">Validation:
</span></span><span class="line"><span class="ln">7</span><span class="cl">Rollback:
</span></span><span class="line"><span class="ln">8</span><span class="cl">Owner:</span></span></code></pre></div><p>完成本篇後，MDL 事故讀 <a href="../../metadata-lock-deep-dive/">Metadata Lock Deep Dive</a>；工具選型讀 <a href="../../online-schema-change-tools/">Online Schema Change Tools</a>。</p>
]]></content:encoded></item></channel></rss>