<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/"><channel><title>Gtid on Tarragon</title><link>https://tarrragon.github.io/blog/tags/gtid/</link><description>Recent content in Gtid on Tarragon</description><generator>Hugo -- gohugo.io</generator><language>zh-TW</language><copyright>Tarragon (CC BY 4.0)</copyright><lastBuildDate>Tue, 19 May 2026 00:00:00 +0000</lastBuildDate><atom:link href="https://tarrragon.github.io/blog/tags/gtid/index.xml" rel="self" type="application/rss+xml"/><item><title>MySQL Replication Topology：async / semi-sync / GTID 不是三選一、是三個 trade-off 軸的疊加</title><link>https://tarrragon.github.io/blog/backend/01-database/vendors/mysql/replication-topology/</link><pubDate>Tue, 19 May 2026 00:00:00 +0000</pubDate><guid>https://tarrragon.github.io/blog/backend/01-database/vendors/mysql/replication-topology/</guid><description>&lt;blockquote>
&lt;p>本文是 &lt;a href="https://tarrragon.github.io/blog/backend/01-database/vendors/mysql/" data-link-title="MySQL" data-link-desc="高併發網路服務常用關聯式資料庫、Vitess / PlanetScale 分片生態、GitHub / Shopify / Facebook 規模驗證">MySQL&lt;/a> overview 的 implementation-layer deep article。Overview 已說明 MySQL 在 OLTP 譜系的定位、本文聚焦 &lt;em>replication topology&lt;/em> — 從 single primary 到 multi-replica 部署的 3 個 trade-off 軸跟 5 段配置。&lt;/p>&lt;/blockquote>
&lt;hr>
&lt;h2 id="replication-的-3-個-trade-off-軸--mode-選擇">Replication 的 3 個 trade-off 軸 + mode 選擇&lt;/h2>
&lt;p>Replication mode 選擇看起來是「選 async 還是 semi-sync」、但決策實際是 3 個獨立 trade-off 軸的權衡、async / semi-sync 是這些軸的兩個常見組合 &lt;em>名稱&lt;/em>：&lt;/p>
&lt;table>
 &lt;thead>
 &lt;tr>
 &lt;th>軸&lt;/th>
 &lt;th>端 A&lt;/th>
 &lt;th>端 B&lt;/th>
 &lt;th>MySQL 旋鈕&lt;/th>
 &lt;/tr>
 &lt;/thead>
 &lt;tbody>
 &lt;tr>
 &lt;td>&lt;strong>Durability&lt;/strong>&lt;/td>
 &lt;td>primary 寫完就 commit&lt;/td>
 &lt;td>至少一個 standby 收到才 commit&lt;/td>
 &lt;td>&lt;code>rpl_semi_sync_master_enabled&lt;/code> / sync ack count&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>&lt;strong>Latency&lt;/strong>&lt;/td>
 &lt;td>client 等 primary 寫完 OK&lt;/td>
 &lt;td>client 等 standby ack（額外 RTT）&lt;/td>
 &lt;td>&lt;code>rpl_semi_sync_master_timeout&lt;/code>&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>&lt;strong>Consistency&lt;/strong>&lt;/td>
 &lt;td>replica 隨時可能 stale&lt;/td>
 &lt;td>replica 跟 primary 保證讀到一致&lt;/td>
 &lt;td>application read routing rule（不是 replication 旋鈕）&lt;/td>
 &lt;/tr>
 &lt;/tbody>
&lt;/table>
&lt;p>「async vs semi-sync」實際上是 &lt;em>durability + latency 兩軸&lt;/em> 的選擇、不影響 &lt;em>consistency 軸&lt;/em>（consistency 在 read routing 層決定）。Group Replication / MySQL Cluster（synchronous multi-primary）會同時改三軸、是另一個故事、不在本文 scope。&lt;/p>
&lt;p>跟這三軸獨立的、是 &lt;em>replication 機制本身的可維護性&lt;/em>。binlog position-based replication 用 &lt;code>(file, position)&lt;/code> 標 replica 進度、failover 時要對齊 position 容易出錯；&lt;a href="https://tarrragon.github.io/blog/backend/knowledge-cards/gtid/" data-link-title="GTID" data-link-desc="說明全域交易識別碼如何讓複製進度與故障切換不依賴實體 log 位置">&lt;strong>GTID（Global Transaction Identifier）&lt;/strong>&lt;/a>用全域 transaction ID 標進度、failover / re-pointing 不必算 position。GTID 是 &lt;em>跨 mode 的 infrastructure&lt;/em>、不是第三種 mode。&lt;/p>
&lt;h2 id="async-replicationdefault--高-throughput-的代價">Async replication：default + 高 throughput 的代價&lt;/h2>
&lt;p>Async 是 MySQL 預設、行為：&lt;/p>
&lt;ol>
&lt;li>Primary 寫 binlog、立刻 commit、回應 client OK&lt;/li>
&lt;li>Replica 的 IO thread 從 primary pull binlog event 到 local relay log&lt;/li>
&lt;li>Replica 的 SQL thread apply relay log（單 thread 或 multi-thread parallel）&lt;/li>
&lt;/ol>
&lt;p>&lt;strong>Trade-off&lt;/strong>：&lt;/p></description><content:encoded><![CDATA[<blockquote>
<p>本文是 <a href="/blog/backend/01-database/vendors/mysql/" data-link-title="MySQL" data-link-desc="高併發網路服務常用關聯式資料庫、Vitess / PlanetScale 分片生態、GitHub / Shopify / Facebook 規模驗證">MySQL</a> overview 的 implementation-layer deep article。Overview 已說明 MySQL 在 OLTP 譜系的定位、本文聚焦 <em>replication topology</em> — 從 single primary 到 multi-replica 部署的 3 個 trade-off 軸跟 5 段配置。</p></blockquote>
<hr>
<h2 id="replication-的-3-個-trade-off-軸--mode-選擇">Replication 的 3 個 trade-off 軸 + mode 選擇</h2>
<p>Replication mode 選擇看起來是「選 async 還是 semi-sync」、但決策實際是 3 個獨立 trade-off 軸的權衡、async / semi-sync 是這些軸的兩個常見組合 <em>名稱</em>：</p>
<table>
  <thead>
      <tr>
          <th>軸</th>
          <th>端 A</th>
          <th>端 B</th>
          <th>MySQL 旋鈕</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td><strong>Durability</strong></td>
          <td>primary 寫完就 commit</td>
          <td>至少一個 standby 收到才 commit</td>
          <td><code>rpl_semi_sync_master_enabled</code> / sync ack count</td>
      </tr>
      <tr>
          <td><strong>Latency</strong></td>
          <td>client 等 primary 寫完 OK</td>
          <td>client 等 standby ack（額外 RTT）</td>
          <td><code>rpl_semi_sync_master_timeout</code></td>
      </tr>
      <tr>
          <td><strong>Consistency</strong></td>
          <td>replica 隨時可能 stale</td>
          <td>replica 跟 primary 保證讀到一致</td>
          <td>application read routing rule（不是 replication 旋鈕）</td>
      </tr>
  </tbody>
</table>
<p>「async vs semi-sync」實際上是 <em>durability + latency 兩軸</em> 的選擇、不影響 <em>consistency 軸</em>（consistency 在 read routing 層決定）。Group Replication / MySQL Cluster（synchronous multi-primary）會同時改三軸、是另一個故事、不在本文 scope。</p>
<p>跟這三軸獨立的、是 <em>replication 機制本身的可維護性</em>。binlog position-based replication 用 <code>(file, position)</code> 標 replica 進度、failover 時要對齊 position 容易出錯；<a href="/blog/backend/knowledge-cards/gtid/" data-link-title="GTID" data-link-desc="說明全域交易識別碼如何讓複製進度與故障切換不依賴實體 log 位置"><strong>GTID（Global Transaction Identifier）</strong></a>用全域 transaction ID 標進度、failover / re-pointing 不必算 position。GTID 是 <em>跨 mode 的 infrastructure</em>、不是第三種 mode。</p>
<h2 id="async-replicationdefault--高-throughput-的代價">Async replication：default + 高 throughput 的代價</h2>
<p>Async 是 MySQL 預設、行為：</p>
<ol>
<li>Primary 寫 binlog、立刻 commit、回應 client OK</li>
<li>Replica 的 IO thread 從 primary pull binlog event 到 local relay log</li>
<li>Replica 的 SQL thread apply relay log（單 thread 或 multi-thread parallel）</li>
</ol>
<p><strong>Trade-off</strong>：</p>
<ul>
<li>Durability：primary 寫完 commit、replica 還沒 pull = primary 在這瞬間 crash + 永久故障 → <em>data loss</em>（已 commit 的 transaction 在 replica 不存在）</li>
<li>Latency：client 不等 replica、寫入延遲 = primary 自身寫 binlog 的時間（通常 &lt; 1ms with <code>innodb_flush_log_at_trx_commit=1</code>）</li>
<li>Consistency：replica 可能 lag、application 讀 replica 會 stale；用 <code>SHOW SLAVE STATUS</code> 看 <code>Seconds_Behind_Master</code></li>
</ul>
<p><strong>適用</strong>：</p>
<ul>
<li>主流選擇（90% 場景）</li>
<li>Failover loss 在容忍範圍（多數 web 應用容忍 1-2 秒 data loss）</li>
<li>Read scaling 為主要 driver、絕對 durability 非首要</li>
</ul>
<p><strong>不適用</strong>：</p>
<ul>
<li>金融交易 / 訂單系統、不允許 any data loss</li>
<li>Compliance 要求 zero data loss（PCI-DSS / 部分監管場景）</li>
</ul>
<h2 id="semi-sync-replication至少一個-standby-ack-才-commit">Semi-sync replication：至少一個 standby ack 才 commit</h2>
<p>Semi-sync 在 async 基礎上加 <em>primary 等至少 N 個 replica ack 才 commit</em> 的步驟：</p>
<ol>
<li>Primary 寫 binlog</li>
<li>Primary 發送 binlog event 到所有 replica</li>
<li><em>Primary 等至少 N 個 replica 回 ack</em>（N 是 <code>rpl_semi_sync_master_wait_for_slave_count</code>、預設 1）</li>
<li>Primary commit、回應 client</li>
</ol>
<p><strong>Trade-off</strong>：</p>
<ul>
<li>Durability：至少 N 個 replica 收到 binlog（不一定 apply）、primary crash 後 replica 還有 binlog 可 promote、保證 zero data loss（但是 <em>binlog-level</em>、不是 <em>applied-level</em>）</li>
<li>Latency：client 等 primary + 一輪 replica ack RTT；跨 AZ 通常 +1-3ms、跨 region 可能 +50-200ms</li>
<li>Consistency：跟 async 一樣、replica apply 仍 async、application 讀 replica 仍可能 stale</li>
</ul>
<p><strong>MySQL 5.7+ 區分 <em>standard</em> 跟 <em>Loss-Less</em> semi-sync</strong>：</p>
<ul>
<li>Standard semi-sync（5.5-5.6）：primary 先 commit 再等 ack、ack 超時 fallback 成 async — <em>仍可能 lose data</em></li>
<li>Loss-Less semi-sync（5.7+、<code>rpl_semi_sync_master_wait_point=AFTER_SYNC</code>）：primary 寫完 binlog 但 <em>先等 ack 再 commit</em>、ack 超時 fallback async 之前已寫 binlog 仍保證 durable</li>
</ul>
<p>Production 場景必須用 Loss-Less semi-sync、不是 standard。</p>
<p><strong>適用</strong>：</p>
<ul>
<li>金融交易 / 訂單 / payment ledger</li>
<li>不允許 data loss、可接受寫入延遲 +1-3ms</li>
<li>已有 multi-AZ / multi-region 部署、replica 物理上可靠</li>
</ul>
<p><strong>不適用</strong>：</p>
<ul>
<li>跨 region semi-sync（RTT 50-200ms）通常不划算 — 寫吞吐砍半、改用 <em>region-local sync replica + cross-region async chain</em></li>
<li>寫吞吐 &gt; 50K WPS 且容忍 sub-second loss — async 即可</li>
</ul>
<h2 id="gtid-based-replication機制升級跨-mode-都需要">GTID-based replication：機制升級、跨 mode 都需要</h2>
<p>GTID 把每個 transaction 標一個全域 ID：<code>&lt;server_uuid&gt;:&lt;transaction_id&gt;</code>。Replica 紀錄「已 apply 的 GTID set」、不再用 <code>(binlog_file, position)</code>。</p>
<p><strong>為什麼 GTID 比 binlog position 好</strong>：</p>
<ul>
<li><strong>Failover re-pointing 簡單</strong>：promote 新 primary 後、其他 replica 重新 attach 不必算 <code>MASTER_LOG_FILE</code> + <code>MASTER_LOG_POS</code>、用 <code>CHANGE MASTER TO MASTER_AUTO_POSITION=1</code> 即可</li>
<li><strong>Multi-source replication 可行</strong>：一個 replica 從多個 primary 拉、各 primary 的 GTID set 獨立 track</li>
<li><strong>Consistency check 容易</strong>：兩個 server 對 GTID set、就知道誰落後、有無 gap</li>
<li><strong>跟 group replication / MySQL Cluster 必需</strong>：5.7+ 多 primary 場景 GTID 是前提</li>
</ul>
<p><strong>設定流程</strong>（兩階段、不能直接開）：</p>
<ol>
<li>
<p><strong>Phase 1 (預備、所有 server 同 mode)</strong>：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-ini" data-lang="ini"><span class="line"><span class="ln">1</span><span class="cl"><span class="na">gtid_mode</span> <span class="o">=</span> <span class="s">ON_PERMISSIVE  -- 接受 GTID 跟 non-GTID transaction</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="na">enforce_gtid_consistency</span> <span class="o">=</span> <span class="s">ON  -- 拒絕無法用 GTID 表達的 statement（CREATE TABLE...SELECT 等）</span></span></span></code></pre></div></li>
<li>
<p><strong>Phase 2 (rolling、全部 server 都 Phase 1 後)</strong>：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-ini" data-lang="ini"><span class="line"><span class="ln">1</span><span class="cl"><span class="na">gtid_mode</span> <span class="o">=</span> <span class="s">ON  -- 只接受 GTID transaction</span></span></span></code></pre></div></li>
</ol>
<p>跳 phase 直接 <code>gtid_mode=ON</code> 會讓 replication break（既有 non-GTID transaction 無法處理）。Production 啟用 GTID 要排 maintenance window、跑完 phase 1 觀察 1-2 天再進 phase 2。</p>
<h2 id="配置-step-by-steploss-less-semi-sync--gtid-組合">配置 step-by-step（Loss-Less semi-sync + GTID 組合）</h2>
<p>實務最常見組合：Loss-Less semi-sync + GTID。配置順序：</p>
<h3 id="step-1primary--replica-都開-gtid兩-phase-跑完">Step 1：Primary + replica 都開 GTID（兩 phase 跑完）</h3>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-ini" data-lang="ini"><span class="line"><span class="ln">1</span><span class="cl"><span class="c1"># my.cnf on primary AND replica</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="na">gtid_mode</span> <span class="o">=</span> <span class="s">ON</span>
</span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="na">enforce_gtid_consistency</span> <span class="o">=</span> <span class="s">ON</span>
</span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="na">log_bin</span> <span class="o">=</span> <span class="s">mysql-bin</span>
</span></span><span class="line"><span class="ln">5</span><span class="cl"><span class="na">log_slave_updates</span> <span class="o">=</span> <span class="s">1  -- replica 也記 binlog (chained replication 需要)</span>
</span></span><span class="line"><span class="ln">6</span><span class="cl"><span class="na">binlog_format</span> <span class="o">=</span> <span class="s">ROW    -- ROW 比 STATEMENT 安全</span>
</span></span><span class="line"><span class="ln">7</span><span class="cl"><span class="na">sync_binlog</span> <span class="o">=</span> <span class="s">1        -- 每次 commit fsync binlog</span>
</span></span><span class="line"><span class="ln">8</span><span class="cl"><span class="na">innodb_flush_log_at_trx_commit</span> <span class="o">=</span> <span class="s">1  -- 每次 commit fsync InnoDB log</span></span></span></code></pre></div><h3 id="step-2primary-安裝-semi-sync-plugin">Step 2：Primary 安裝 semi-sync plugin</h3>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-sql" data-lang="sql"><span class="line"><span class="ln">1</span><span class="cl"><span class="n">INSTALL</span><span class="w"> </span><span class="n">PLUGIN</span><span class="w"> </span><span class="n">rpl_semi_sync_master</span><span class="w"> </span><span class="n">SONAME</span><span class="w"> </span><span class="s1">&#39;semisync_master.so&#39;</span><span class="p">;</span><span class="w">
</span></span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="w"></span><span class="k">SET</span><span class="w"> </span><span class="k">GLOBAL</span><span class="w"> </span><span class="n">rpl_semi_sync_master_enabled</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span><span class="w">
</span></span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="w"></span><span class="k">SET</span><span class="w"> </span><span class="k">GLOBAL</span><span class="w"> </span><span class="n">rpl_semi_sync_master_wait_for_slave_count</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span><span class="w">  </span><span class="c1">-- 至少 1 個 ack
</span></span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="c1"></span><span class="k">SET</span><span class="w"> </span><span class="k">GLOBAL</span><span class="w"> </span><span class="n">rpl_semi_sync_master_wait_point</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">AFTER_SYNC</span><span class="p">;</span><span class="w">   </span><span class="c1">-- Loss-Less
</span></span></span><span class="line"><span class="ln">5</span><span class="cl"><span class="c1"></span><span class="k">SET</span><span class="w"> </span><span class="k">GLOBAL</span><span class="w"> </span><span class="n">rpl_semi_sync_master_timeout</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">10000</span><span class="p">;</span><span class="w">           </span><span class="c1">-- 10s timeout、超時 fallback async</span></span></span></code></pre></div><h3 id="step-3replica-安裝-semi-sync-plugin">Step 3：Replica 安裝 semi-sync plugin</h3>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-sql" data-lang="sql"><span class="line"><span class="ln">1</span><span class="cl"><span class="n">INSTALL</span><span class="w"> </span><span class="n">PLUGIN</span><span class="w"> </span><span class="n">rpl_semi_sync_slave</span><span class="w"> </span><span class="n">SONAME</span><span class="w"> </span><span class="s1">&#39;semisync_slave.so&#39;</span><span class="p">;</span><span class="w">
</span></span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="w"></span><span class="k">SET</span><span class="w"> </span><span class="k">GLOBAL</span><span class="w"> </span><span class="n">rpl_semi_sync_slave_enabled</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span><span class="w">
</span></span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="w"></span><span class="n">STOP</span><span class="w"> </span><span class="n">SLAVE</span><span class="w"> </span><span class="n">IO_THREAD</span><span class="p">;</span><span class="w">
</span></span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="w"></span><span class="k">START</span><span class="w"> </span><span class="n">SLAVE</span><span class="w"> </span><span class="n">IO_THREAD</span><span class="p">;</span><span class="w">  </span><span class="c1">-- 重啟 IO thread 啟用 semi-sync</span></span></span></code></pre></div><h3 id="step-4replica-attach-primary">Step 4：Replica attach primary</h3>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-sql" data-lang="sql"><span class="line"><span class="ln">1</span><span class="cl"><span class="n">CHANGE</span><span class="w"> </span><span class="n">MASTER</span><span class="w"> </span><span class="k">TO</span><span class="w">
</span></span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="w">  </span><span class="n">MASTER_HOST</span><span class="o">=</span><span class="s1">&#39;primary.example.com&#39;</span><span class="p">,</span><span class="w">
</span></span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="w">  </span><span class="n">MASTER_PORT</span><span class="o">=</span><span class="mi">3306</span><span class="p">,</span><span class="w">
</span></span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="w">  </span><span class="n">MASTER_USER</span><span class="o">=</span><span class="s1">&#39;repl&#39;</span><span class="p">,</span><span class="w">
</span></span></span><span class="line"><span class="ln">5</span><span class="cl"><span class="w">  </span><span class="n">MASTER_PASSWORD</span><span class="o">=</span><span class="s1">&#39;...&#39;</span><span class="p">,</span><span class="w">
</span></span></span><span class="line"><span class="ln">6</span><span class="cl"><span class="w">  </span><span class="n">MASTER_AUTO_POSITION</span><span class="o">=</span><span class="mi">1</span><span class="p">;</span><span class="w">  </span><span class="c1">-- 用 GTID auto-position
</span></span></span><span class="line"><span class="ln">7</span><span class="cl"><span class="c1"></span><span class="k">START</span><span class="w"> </span><span class="n">SLAVE</span><span class="p">;</span></span></span></code></pre></div><h3 id="step-5驗證">Step 5：驗證</h3>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-sql" data-lang="sql"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="c1">-- Primary: 確認 semi-sync 啟用 + 有 active client
</span></span></span><span class="line"><span class="ln"> 2</span><span class="cl"><span class="c1"></span><span class="k">SHOW</span><span class="w"> </span><span class="n">STATUS</span><span class="w"> </span><span class="k">LIKE</span><span class="w"> </span><span class="s1">&#39;Rpl_semi_sync_master_status&#39;</span><span class="p">;</span><span class="w">      </span><span class="c1">-- ON
</span></span></span><span class="line"><span class="ln"> 3</span><span class="cl"><span class="c1"></span><span class="k">SHOW</span><span class="w"> </span><span class="n">STATUS</span><span class="w"> </span><span class="k">LIKE</span><span class="w"> </span><span class="s1">&#39;Rpl_semi_sync_master_clients&#39;</span><span class="p">;</span><span class="w">     </span><span class="c1">-- ≥ 1
</span></span></span><span class="line"><span class="ln"> 4</span><span class="cl"><span class="c1"></span><span class="k">SHOW</span><span class="w"> </span><span class="n">STATUS</span><span class="w"> </span><span class="k">LIKE</span><span class="w"> </span><span class="s1">&#39;Rpl_semi_sync_master_yes_tx&#39;</span><span class="p">;</span><span class="w">      </span><span class="c1">-- &gt; 0 (有 transaction 走 semi-sync)
</span></span></span><span class="line"><span class="ln"> 5</span><span class="cl"><span class="c1"></span><span class="k">SHOW</span><span class="w"> </span><span class="n">STATUS</span><span class="w"> </span><span class="k">LIKE</span><span class="w"> </span><span class="s1">&#39;Rpl_semi_sync_master_no_tx&#39;</span><span class="p">;</span><span class="w">       </span><span class="c1">-- 應該 = 0 (沒有 fallback 成 async)
</span></span></span><span class="line"><span class="ln"> 6</span><span class="cl"><span class="c1"></span><span class="w">
</span></span></span><span class="line"><span class="ln"> 7</span><span class="cl"><span class="w"></span><span class="c1">-- Replica: 確認 GTID + IO thread 正常
</span></span></span><span class="line"><span class="ln"> 8</span><span class="cl"><span class="c1"></span><span class="k">SHOW</span><span class="w"> </span><span class="n">SLAVE</span><span class="w"> </span><span class="n">STATUS</span><span class="err">\</span><span class="k">G</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 9</span><span class="cl"><span class="w"></span><span class="c1">-- Slave_IO_Running: Yes
</span></span></span><span class="line"><span class="ln">10</span><span class="cl"><span class="c1">-- Slave_SQL_Running: Yes
</span></span></span><span class="line"><span class="ln">11</span><span class="cl"><span class="c1">-- Retrieved_Gtid_Set: 跟 primary Executed_Gtid_Set 接近
</span></span></span><span class="line"><span class="ln">12</span><span class="cl"><span class="c1">-- Seconds_Behind_Master: 觀察 lag</span></span></span></code></pre></div><h2 id="5-個-production-踩雷">5 個 Production 踩雷</h2>
<h3 id="1-replication-lag-暴衝--單-sql-thread-bottleneck">1. Replication lag 暴衝 — 單 SQL thread bottleneck</h3>
<p>預設 replica 的 SQL thread 是 <em>單 thread</em> apply、primary 多 thread 寫入時 replica 跟不上、lag 從 &lt; 100ms 飆到分鐘級。常見觸發：批次 UPDATE / DELETE、大 transaction、index rebuild。</p>
<p>修法：</p>
<ul>
<li>啟用 <em>multi-thread replication</em>：<code>slave_parallel_workers = 8</code>（per database 或 per logical clock parallel）</li>
<li>5.7+ 用 <code>slave_parallel_type = LOGICAL_CLOCK</code>：依 primary 上的 group commit 並行度自動 parallel</li>
<li>8.0+ 的 <em>writeset-based parallel</em>：<code>binlog_transaction_dependency_tracking = WRITESET</code>、更細粒度並行</li>
</ul>
<p>監控：<code>Seconds_Behind_Master</code> 是 <em>表面指標</em>、實際看 <code>Executed_Gtid_Set</code> 跟 primary 對比的 GTID gap 更準。</p>
<h3 id="2-semi-sync-timeout-fallback-成-async沒監控就看不見">2. Semi-sync timeout fallback 成 async（沒監控就看不見）</h3>
<p><code>rpl_semi_sync_master_timeout</code> 預設 10000ms（10 秒）、超時後 <em>自動 fallback async</em>、直到 replica 重連。Application 視角看不到任何 error、但 <em>durability guarantee 已失效</em>。</p>
<p>修法：</p>
<ul>
<li>監控 <code>Rpl_semi_sync_master_status</code> — fallback 後變 OFF</li>
<li>監控 <code>Rpl_semi_sync_master_no_tx</code> — fallback 期間每個 transaction 都計數</li>
<li>Alert 規則：5 分鐘內 <code>no_tx</code> 增加 &gt; 0 即告警</li>
<li>Timeout 設太短（&lt; 5s）容易 false positive、設太長（&gt; 30s）crash 時 data loss 風險增</li>
</ul>
<h3 id="3-gtid-gap--replica-無法-attach">3. GTID gap — replica 無法 attach</h3>
<p>Replica 重新 attach primary 時報 <code>ERROR 1236: ... transactions you need from master are purged</code>、原因是 primary 的 <code>binlog_expire_logs_seconds</code> 過短、需要的 binlog 已被清掉。GTID 模式下這個錯誤更明顯（直接看 GTID gap）、但 binlog position 模式下也一樣。</p>
<p>修法：</p>
<ul>
<li><code>binlog_expire_logs_seconds = 604800</code>（7 天）作為 baseline</li>
<li>大流量 server 確認 disk 容量能撐 7 天 binlog（一個高峰小時 binlog 可能 GB 級）</li>
<li>真的 gap 太大時用 <em>base backup + replay binlog</em> 重建 replica、不要硬 reset GTID</li>
</ul>
<h3 id="4-loss-less-semi-sync-不一定真的-loss-less">4. Loss-Less semi-sync 不一定真的 loss-less</h3>
<p><code>AFTER_SYNC</code> 模式 <em>primary 寫 binlog → 等 ack → commit</em>、看起來 zero loss。但 <em>primary 寫完 binlog 還沒等 ack 時 crash</em> + replica <em>剛好沒收到那個 binlog event</em> + replica promote — 這個 binlog event 在新 primary 不存在、但舊 primary 的 binlog 仍紀錄為 <em>已寫 binlog 未 commit</em>。client 收到 <em>connection lost</em>、不知道 transaction 是否成功。</p>
<p>修法：</p>
<ul>
<li>接受這個 <em>edge case unknown state</em>、application 用 idempotency key + retry 處理</li>
<li>Loss-Less semi-sync 保證的是 <em>已 commit transaction 不會丟</em>、不是 <em>所有寫入都 ack-and-tell</em></li>
<li>真的 zero unknown state 需要 group replication / Galera Cluster / MySQL Cluster（synchronous multi-primary）</li>
</ul>
<h3 id="5-chained-replication-雪崩">5. Chained replication 雪崩</h3>
<p>Topology 是 <code>primary → replica1 → replica2 → ...</code>（hub-and-spoke 之外的選擇、節省 primary 出口頻寬）。Replica1 SQL thread 卡住、replica2 跟 replica3 都被 block、整條 chain 雪崩。</p>
<p>修法：</p>
<ul>
<li>避免超過 2 層 chain（primary → tier1 replica → tier2 replica 是上限）</li>
<li>用 <em>parallel binary log relay</em>（5.7+ <code>slave_pending_jobs_size_max</code> + parallel workers）讓 chain 中段不阻塞</li>
<li>規模真的大、改用 <em>binlog server</em>（如 Maxwell / MaxScale）解耦 chain dependency</li>
<li>跨 region 用 <em>region-local hub + cross-region async</em>、不是長 chain</li>
</ul>
<h2 id="容量--cost-對照">容量 / cost 對照</h2>
<table>
  <thead>
      <tr>
          <th>配置</th>
          <th>寫吞吐影響</th>
          <th>Replica overhead</th>
          <th>適合 workload</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>Async + binlog position</td>
          <td>baseline</td>
          <td>低（IO + SQL thread）</td>
          <td>高吞吐、容忍 sub-second loss</td>
      </tr>
      <tr>
          <td>Async + GTID</td>
          <td>baseline</td>
          <td>同上、failover 容易</td>
          <td>大多數 production 預設</td>
      </tr>
      <tr>
          <td>Loss-Less semi-sync + GTID（1 ack）</td>
          <td>-10% ~ -20%</td>
          <td>同上 + ack RTT</td>
          <td>金融、訂單、不容忍 data loss</td>
      </tr>
      <tr>
          <td>Loss-Less semi-sync + GTID（2 ack）</td>
          <td>-15% ~ -30%</td>
          <td>同上、跨 AZ</td>
          <td>強 durability + multi-AZ HA</td>
      </tr>
      <tr>
          <td>Group Replication（synchronous）</td>
          <td>-30% ~ -50%</td>
          <td>高（每 transaction quorum）</td>
          <td>不允許 single-primary、multi-primary 寫入</td>
      </tr>
  </tbody>
</table>
<p>跨 AZ semi-sync 通常加 1-3ms、跨 region 加 50-200ms — 寫密集 workload 跨 region semi-sync 通常不划算、改用 <em>region-local sync + cross-region async chain</em>。</p>
<h2 id="整合--下一步">整合 / 下一步</h2>
<h3 id="aurora-mysql">Aurora MySQL</h3>
<p>Aurora MySQL 用 <em>AWS-managed storage layer</em>、storage 自動 replicate 6 份跨 3 AZ、不需要應用層配 semi-sync。從自管 MySQL 遷 Aurora 時、上方所有 semi-sync 配置 <em>消失</em>、改成 Aurora storage quorum（4 of 6 write、3 of 6 read）。</p>
<p>trade-off 軸的 <em>durability</em> 完全交給 Aurora、application 只關心 <em>latency</em> + <em>consistency</em>。詳見 <a href="/blog/backend/01-database/vendors/aurora/" data-link-title="AWS Aurora" data-link-desc="AWS managed PostgreSQL / MySQL、storage / compute 分離、&#43;75% 效能改善的 production 證據">Aurora vendor page</a>。</p>
<h3 id="vitesssharding-layer">Vitess（sharding layer）</h3>
<p>Vitess shard 內部仍用 MySQL replication（async or semi-sync）、Vitess 不取代 replication topology、是 <em>上層 routing</em>。Vitess <code>vttablet</code> 每個 shard 有自己的 primary + replica、跟本文 topology 設計一致。</p>
<p>Vitess 比較大議題在 <em>cross-shard transaction</em>（VReplication 跨 shard binlog stream）、不是 replication topology — 詳見 MySQL backlog 中 <em>Vitess sharding 設計</em> 篇（待寫）。</p>
<h3 id="proxysqlread-replica-routing">ProxySQL（read replica routing）</h3>
<p>ProxySQL 是 MySQL 生態的 <em>connection pool + query routing</em> 標準、按 query type（SELECT vs DML）跟 replica lag 自動 route。寫入路 primary、讀走 replica、replica lag &gt; N 秒時暫時退路 primary 維持 consistency。</p>
<p>ProxySQL 跟本文 replication topology 是 <em>互補不重疊</em> — replication 設定哪些 server 有什麼資料、ProxySQL 設定 query 怎麼分配。詳見 MySQL backlog 中 <em>ProxySQL 配置</em> 篇（待寫）。</p>
<h3 id="orchestratorha-failover">Orchestrator（HA failover）</h3>
<p>Orchestrator 是 MySQL HA topology 管理 + 自動 failover 工具、用 GTID 偵測 replica 進度、failover 時自動 promote 最新 replica。對比 PostgreSQL 的 Patroni（詳見 <a href="/blog/backend/01-database/vendors/postgresql/patroni-ha/" data-link-title="PostgreSQL Patroni HA：從 leader 失聯到 client 重連的 5 段 failover lifecycle" data-link-desc="Patroni 把 PostgreSQL HA 拆成 detection / election / promotion / reconfiguration / recovery 五段 lifecycle、每段都有獨立配置跟 failure mode；DCS quorum &#43; watchdog 防 split-brain、async/sync replication 取捨、5 個 production 踩雷、跟 PgBouncer / HAProxy / cert-manager 整合">Patroni HA</a>）— 兩者角色相同、Orchestrator 需要 GTID + 對 MySQL 行為熟、Patroni 需要 DCS（etcd / Consul）+ 對 PG 行為熟。</p>
<p>詳見 MySQL backlog 中 <em>Orchestrator failover 設計</em> 篇（待寫）。</p>
<h3 id="cdcmaxwell--debezium">CDC（Maxwell / Debezium）</h3>
<p>Maxwell（Zendesk 出品、MySQL-only）跟 Debezium（Red Hat、MySQL / PG / MongoDB 都支援）都讀 MySQL binlog 轉成 event stream（Kafka / Kinesis / Pulsar）。Binlog 必須 <code>ROW</code> format、GTID 啟用後 <em>exactly-once</em> delivery 更好維護（不需算 binlog position）。</p>
<p>跟 PG logical replication + Debezium 對比、MySQL 用 binlog（physical / row-level）不是 logical decoding、所以 schema change 時 <em>CDC consumer 要 schema-aware</em> 處理。詳見 MySQL backlog 中 <em>Binary log + Maxwell / Debezium CDC</em> 篇（待寫）。</p>
<h2 id="相關連結">相關連結</h2>
<ul>
<li><a href="/blog/backend/01-database/vendors/mysql/" data-link-title="MySQL" data-link-desc="高併發網路服務常用關聯式資料庫、Vitess / PlanetScale 分片生態、GitHub / Shopify / Facebook 規模驗證">MySQL vendor overview</a></li>
<li><a href="/blog/backend/01-database/vendors/postgresql/replication-topology/" data-link-title="PostgreSQL Replication Topology：async / sync / quorum 三模式跟 LSN &#43; replication slot 的三軸組合" data-link-desc="PostgreSQL streaming replication 不是「sync 或 async」、是 *durability / latency / consistency* 三軸組合 &#43; LSN-based 進度追蹤 &#43; replication slot 治理。本文走 3 軸取捨模型、async / sync / quorum-based sync 行為對比、LSN &#43; replication slot 機制、配置 step-by-step、5 production 踩雷（standby lag 暴衝 / sync standby 退回 async / orphan replication slot / cascading replication 雪崩 / failover 後 timeline 分歧）、跟 Patroni HA &#43; logical replication 整合">PostgreSQL Replication Topology</a>（PG sibling、streaming + LSN + slot 機制 vs MySQL binlog 對位）</li>
<li><a href="/blog/backend/01-database/vendors/postgresql/patroni-ha/" data-link-title="PostgreSQL Patroni HA：從 leader 失聯到 client 重連的 5 段 failover lifecycle" data-link-desc="Patroni 把 PostgreSQL HA 拆成 detection / election / promotion / reconfiguration / recovery 五段 lifecycle、每段都有獨立配置跟 failure mode；DCS quorum &#43; watchdog 防 split-brain、async/sync replication 取捨、5 個 production 踩雷、跟 PgBouncer / HAProxy / cert-manager 整合">PostgreSQL Patroni HA</a>（PG sibling、不同 HA 機制）</li>
<li><a href="/blog/backend/01-database/vendors/postgresql/logical-replication-debezium/" data-link-title="PostgreSQL Logical Replication &#43; Debezium CDC：replication slot × failure × recovery 對照" data-link-desc="PostgreSQL logical replication slot 跟 Debezium CDC 的失效模式對照表：slot lag 撐爆 primary disk / schema change 斷流 / 初始 COPY 鎖表 / zombie slot 不釋放 / replay storm 後 offset reset；publication / subscription / pgoutput 配置、跟 Kafka outbox pattern 整合">PostgreSQL Logical Replication + Debezium</a>（PG CDC sibling、不同 replication 抽象層）</li>
<li><a href="/blog/backend/01-database/vendors/postgresql/replication-slot-management/" data-link-title="PostgreSQL Replication Slot Management：Physical / Logical / Failover Slot 治理" data-link-desc="PG replication slot 是 *primary 端的 standby 進度紀錄*、防 WAL premature deletion。但 orphan slot 會吃 disk、failover 後 logical slot 不會自動跟新 primary、是 PG 操作的 hidden complexity。本文走 physical / logical slot 差異、slot lifecycle、failover slot synchronization（PG 17&#43; 新特性）、orphan slot 治理、5 production 踩雷（orphan slot disk 爆 / logical slot lag / failover 後 slot 丟 / wal_keep_size 跟 slot 衝突 / connection 同時打 slot 數量限制）">PostgreSQL Replication Slot Management</a>（PG slot 治理、MySQL 無對應概念）</li>
<li><a href="/blog/backend/01-database/vendors/aurora/" data-link-title="AWS Aurora" data-link-desc="AWS managed PostgreSQL / MySQL、storage / compute 分離、&#43;75% 效能改善的 production 證據">Aurora vendor page</a>（managed MySQL、replication 交給 storage layer）</li>
<li><a href="/blog/backend/01-database/transaction-boundary/" data-link-title="1.3 Transaction 與一致性邊界" data-link-desc="交易邊界、isolation level、retry 策略、distributed transaction（2PC、Saga）與跨 region 強一致取捨">1.3 Transaction Boundary</a>（transaction 行為跟 replication 互動）</li>
<li><a href="/blog/backend/01-database/kv-document-capacity-planning/" data-link-title="1.10 KV / Document DB 容量規劃" data-link-desc="DynamoDB / Cosmos DB / Bigtable / MongoDB 等 KV / Document DB 的容量設計、partition key 取捨、capacity mode 選擇">1.10 KV / Document DB 容量規劃</a> / <a href="/blog/backend/01-database/global-distributed-oltp/" data-link-title="1.11 全球分散式 OLTP" data-link-desc="Spanner / Aurora DSQL / Cosmos DB multi-region write / CockroachDB / TiDB 的全球一致性取捨">1.11 全球分散式 OLTP</a>（替代路徑）</li>
<li><a href="/blog/backend/knowledge-cards/quorum/" data-link-title="Quorum" data-link-desc="分散式系統以多數節點同意作為提交或讀取有效性的門檻">quorum 卡片</a> / <a href="/blog/backend/knowledge-cards/eventual-consistency/" data-link-title="Eventual Consistency" data-link-desc="允許短暫不一致、最終收斂到同一資料狀態的一致性語意">eventual consistency 卡片</a> / <a href="/blog/backend/knowledge-cards/stale-read/" data-link-title="Stale Read" data-link-desc="讀取到落後於最新寫入版本的舊資料">stale read 卡片</a></li>
<li>官方：<a href="https://dev.mysql.com/doc/refman/8.0/en/replication.html">MySQL Replication</a> / <a href="https://dev.mysql.com/doc/refman/8.0/en/replication-semisync.html">Semi-Sync</a> / <a href="https://dev.mysql.com/doc/refman/8.0/en/replication-gtids.html">GTID</a></li>
</ul>
]]></content:encoded></item></channel></rss>