<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/"><channel><title>Innodb-Cluster on Tarragon</title><link>https://tarrragon.github.io/blog/tags/innodb-cluster/</link><description>Recent content in Innodb-Cluster on Tarragon</description><generator>Hugo -- gohugo.io</generator><language>zh-TW</language><copyright>Tarragon (CC BY 4.0)</copyright><lastBuildDate>Tue, 19 May 2026 00:00:00 +0000</lastBuildDate><atom:link href="https://tarrragon.github.io/blog/tags/innodb-cluster/index.xml" rel="self" type="application/rss+xml"/><item><title>MySQL Group Replication / InnoDB Cluster：single-primary vs multi-primary mode 對 transaction certification 的影響</title><link>https://tarrragon.github.io/blog/backend/01-database/vendors/mysql/group-replication/</link><pubDate>Tue, 19 May 2026 00:00:00 +0000</pubDate><guid>https://tarrragon.github.io/blog/backend/01-database/vendors/mysql/group-replication/</guid><description>&lt;blockquote>
&lt;p>本文是 &lt;a href="https://tarrragon.github.io/blog/backend/01-database/vendors/mysql/" data-link-title="MySQL" data-link-desc="高併發網路服務常用關聯式資料庫、Vitess / PlanetScale 分片生態、GitHub / Shopify / Facebook 規模驗證">MySQL&lt;/a> overview 的 implementation-layer deep article。Overview 已說明 MySQL 在 OLTP 譜系的定位、本文聚焦 &lt;em>Group Replication + InnoDB Cluster&lt;/em> — synchronous multi-primary 的 transaction model + 部署模型。&lt;/p>&lt;/blockquote>
&lt;hr>
&lt;p>把「Group Replication multi-primary mode」當成「multi-primary 直接線性 scale write」是常見誤解。&lt;/p>
&lt;p>Single-primary 跟 multi-primary 共用同一套 GR 機制（GCE atomic broadcast + certification + applier）— 切換 mode 是 &lt;em>配置變更&lt;/em>。但 &lt;em>性能效果&lt;/em> 經常跟讀者預期不同：在 single-primary cluster 上加開 &lt;code>group_replication_single_primary_mode=OFF&lt;/code>、預期 &lt;em>3 個 instance 都可以接受 write&lt;/em> 帶來吞吐倍增、實際上每個寫入仍要全 cluster GCE broadcast + certification、寫吞吐沒爆增 / latency 飆高 / certification 衝突回退增加。&lt;/p>
&lt;p>這篇 deep article 把 GR 的 &lt;em>certification 流程&lt;/em> 講清楚 — 為什麼「multi-primary」聽起來像「線性 scale」、實際是「保 strong consistency 的 multi-entry」。然後展開 InnoDB Cluster（GR + MySQL Shell + MySQL Router）作為 production deployment 工具。&lt;/p>
&lt;h2 id="group-replication-的-transaction-model">Group Replication 的 transaction model&lt;/h2>
&lt;p>GR 用 &lt;em>Group Communication Engine (GCE)&lt;/em>（Paxos 變種）達成 &lt;em>atomic broadcast&lt;/em> — 任何 write transaction 必須先 broadcast 到所有 member、所有 member 確認 &lt;em>certification pass&lt;/em> 才 commit。&lt;/p>
&lt;p>每個 transaction 的 GR lifecycle：&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-text" data-lang="text">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">1. Client → Member A: BEGIN; UPDATE ...; COMMIT;
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">2&lt;/span>&lt;span class="cl">2. Member A: 先 local execute、收集 write_set（被改的 row + PK + transaction GTID）
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">3&lt;/span>&lt;span class="cl">3. Member A: write_set + binlog event → GCE broadcast to all members
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">4&lt;/span>&lt;span class="cl">4. GCE: Paxos consensus、所有 member 收到 broadcast、按 *相同順序*
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">5&lt;/span>&lt;span class="cl">5. Each Member: certification phase — 看 write_set 跟 *尚未 apply 的 incoming transactions* 是否有 PK 衝突
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">6&lt;/span>&lt;span class="cl">6. 若無衝突 → apply 該 transaction（local + remote member 都 apply）、回 client COMMIT OK
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">7&lt;/span>&lt;span class="cl">7. 若衝突 → certification fail、Member A 對 client 回 ERR_LOCK_DEADLOCK / GR_CONFLICT、application 必須 retry&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>&lt;strong>核心結論&lt;/strong>：&lt;/p></description><content:encoded><![CDATA[<blockquote>
<p>本文是 <a href="/blog/backend/01-database/vendors/mysql/" data-link-title="MySQL" data-link-desc="高併發網路服務常用關聯式資料庫、Vitess / PlanetScale 分片生態、GitHub / Shopify / Facebook 規模驗證">MySQL</a> overview 的 implementation-layer deep article。Overview 已說明 MySQL 在 OLTP 譜系的定位、本文聚焦 <em>Group Replication + InnoDB Cluster</em> — synchronous multi-primary 的 transaction model + 部署模型。</p></blockquote>
<hr>
<p>把「Group Replication multi-primary mode」當成「multi-primary 直接線性 scale write」是常見誤解。</p>
<p>Single-primary 跟 multi-primary 共用同一套 GR 機制（GCE atomic broadcast + certification + applier）— 切換 mode 是 <em>配置變更</em>。但 <em>性能效果</em> 經常跟讀者預期不同：在 single-primary cluster 上加開 <code>group_replication_single_primary_mode=OFF</code>、預期 <em>3 個 instance 都可以接受 write</em> 帶來吞吐倍增、實際上每個寫入仍要全 cluster GCE broadcast + certification、寫吞吐沒爆增 / latency 飆高 / certification 衝突回退增加。</p>
<p>這篇 deep article 把 GR 的 <em>certification 流程</em> 講清楚 — 為什麼「multi-primary」聽起來像「線性 scale」、實際是「保 strong consistency 的 multi-entry」。然後展開 InnoDB Cluster（GR + MySQL Shell + MySQL Router）作為 production deployment 工具。</p>
<h2 id="group-replication-的-transaction-model">Group Replication 的 transaction model</h2>
<p>GR 用 <em>Group Communication Engine (GCE)</em>（Paxos 變種）達成 <em>atomic broadcast</em> — 任何 write transaction 必須先 broadcast 到所有 member、所有 member 確認 <em>certification pass</em> 才 commit。</p>
<p>每個 transaction 的 GR lifecycle：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="ln">1</span><span class="cl">1. Client → Member A: BEGIN; UPDATE ...; COMMIT;
</span></span><span class="line"><span class="ln">2</span><span class="cl">2. Member A: 先 local execute、收集 write_set（被改的 row + PK + transaction GTID）
</span></span><span class="line"><span class="ln">3</span><span class="cl">3. Member A: write_set + binlog event → GCE broadcast to all members
</span></span><span class="line"><span class="ln">4</span><span class="cl">4. GCE: Paxos consensus、所有 member 收到 broadcast、按 *相同順序*
</span></span><span class="line"><span class="ln">5</span><span class="cl">5. Each Member: certification phase — 看 write_set 跟 *尚未 apply 的 incoming transactions* 是否有 PK 衝突
</span></span><span class="line"><span class="ln">6</span><span class="cl">6. 若無衝突 → apply 該 transaction（local + remote member 都 apply）、回 client COMMIT OK
</span></span><span class="line"><span class="ln">7</span><span class="cl">7. 若衝突 → certification fail、Member A 對 client 回 ERR_LOCK_DEADLOCK / GR_CONFLICT、application 必須 retry</span></span></code></pre></div><p><strong>核心結論</strong>：</p>
<ul>
<li><em>Single-primary mode</em>：只有指定 member 接受 write、其他 member 純 apply、certification 仍跑（但衝突極少、因只有一個寫入源）</li>
<li><em>Multi-primary mode</em>：所有 member 都接受 write、certification 衝突常見、application 必須處理 conflict retry</li>
</ul>
<p><strong>「multi-primary 不會線性 scale write」的原因</strong>：</p>
<ul>
<li>每個 write 仍要全 cluster GCE broadcast + certification</li>
<li>寫吞吐 ceiling 受 <em>最慢 member + 網路延遲</em> 限制（不是「N members × M throughput」）</li>
<li>多寫入源增加 certification 衝突機率、衝突 retry 反而拖 throughput</li>
</ul>
<p><strong>「multi-primary 真實價值」</strong>：</p>
<ul>
<li><em>跨 region multi-active deploy</em>（每個 region local member 接受 local write、無 cross-region write latency）— 但需求極少、多數場景 single-primary + Aurora DSQL / Spanner 更實際</li>
<li><em>零停機 maintenance</em>（任一 member 下線、其他繼續接 write、不必 failover）— 但 single-primary mode 也提供同等 HA</li>
</ul>
<p>對 99% production case：<strong>single-primary mode</strong> 才是正確選擇。Multi-primary 是 <em>特殊 use case 工具</em>、不是 <em>預設 mode</em>。</p>
<h2 id="group-communication-enginegce">Group Communication Engine（GCE）</h2>
<p>GR 內建 GCE、基於 <em>XCom</em> protocol（Paxos 變種）。GCE 責任：</p>
<ul>
<li>Atomic broadcast：保證 message 到所有 member、按相同順序</li>
<li>Group membership：偵測 member join / leave / fail、reconfigure consensus</li>
<li>Network partition handling：minority partition 自動 fence（read-only）、majority 繼續服務</li>
</ul>
<p><strong>GCE 跟 Raft 對比</strong>：</p>
<table>
  <thead>
      <tr>
          <th>維度</th>
          <th>GR XCom (Paxos-like)</th>
          <th>Raft</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>Leader</td>
          <td>沒固定 leader、每個 message 選一個 sender</td>
          <td>固定 leader、其他 follower</td>
      </tr>
      <tr>
          <td>配置複雜度</td>
          <td>高（cluster member 列表 + IP allowlist）</td>
          <td>中（更易理解）</td>
      </tr>
      <tr>
          <td>Member 數量</td>
          <td>預設 3 (max 9)</td>
          <td>預設 3-5</td>
      </tr>
      <tr>
          <td>Performance</td>
          <td>高吞吐、低延遲（不必每次選 leader）</td>
          <td>Leader bottleneck 偶有</td>
      </tr>
      <tr>
          <td>工程實作</td>
          <td>XCom 在 MySQL 內部、不暴露 API</td>
          <td>etcd / Consul / TiKV 等獨立工具</td>
      </tr>
  </tbody>
</table>
<p>GR 的設計取捨：<em>緊耦合 MySQL</em>（不必外部 DCS）、<em>Paxos-like consensus</em>（不像 Raft 那麼簡單但效率更高）。trade-off 是 <em>對 ops 的 transparency 較低</em> — XCom 內部行為對 DBA 是 black box。</p>
<h2 id="innodb-clustergr--mysql-shell--mysql-router">InnoDB Cluster：GR + MySQL Shell + MySQL Router</h2>
<p>純 GR 是 <em>底層 replication mechanism</em>、要組成 production deployment 需要：</p>
<ul>
<li><em>MySQL Shell</em> (<code>mysqlsh</code>)：CLI 工具、提供 <code>dba.createCluster()</code> / <code>cluster.addInstance()</code> 等 cluster 管理 API</li>
<li><em>MySQL Router</em>：connection routing layer、自動發現 cluster topology、寫入 routing 給 primary、讀取 routing replica</li>
<li><em>MySQL Group Replication plugin</em>：在每個 MySQL instance 啟用</li>
</ul>
<p><strong>InnoDB Cluster = GR + Shell + Router</strong>、是 Oracle 推薦的 production GR deployment 方式。</p>
<h3 id="起始部署3-member-single-primary-cluster">起始部署（3 member single-primary cluster）</h3>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="c1"># Step 1: 在每個 instance 啟 GR plugin + 配 my.cnf</span>
</span></span><span class="line"><span class="ln"> 2</span><span class="cl"><span class="o">[</span>mysqld<span class="o">]</span>
</span></span><span class="line"><span class="ln"> 3</span><span class="cl"><span class="nv">server_id</span> <span class="o">=</span> <span class="m">1</span>                          <span class="c1"># 各 instance 不同</span>
</span></span><span class="line"><span class="ln"> 4</span><span class="cl"><span class="nv">gtid_mode</span> <span class="o">=</span> ON
</span></span><span class="line"><span class="ln"> 5</span><span class="cl"><span class="nv">enforce_gtid_consistency</span> <span class="o">=</span> ON
</span></span><span class="line"><span class="ln"> 6</span><span class="cl"><span class="nv">log_bin</span> <span class="o">=</span> mysql-bin
</span></span><span class="line"><span class="ln"> 7</span><span class="cl"><span class="nv">binlog_format</span> <span class="o">=</span> ROW
</span></span><span class="line"><span class="ln"> 8</span><span class="cl"><span class="nv">master_info_repository</span> <span class="o">=</span> TABLE
</span></span><span class="line"><span class="ln"> 9</span><span class="cl"><span class="nv">relay_log_info_repository</span> <span class="o">=</span> TABLE
</span></span><span class="line"><span class="ln">10</span><span class="cl"><span class="nv">transaction_write_set_extraction</span> <span class="o">=</span> XXHASH64
</span></span><span class="line"><span class="ln">11</span><span class="cl"><span class="nv">plugin_load_add</span> <span class="o">=</span> <span class="s1">&#39;group_replication.so&#39;</span>
</span></span><span class="line"><span class="ln">12</span><span class="cl">
</span></span><span class="line"><span class="ln">13</span><span class="cl"><span class="nv">group_replication_group_name</span> <span class="o">=</span> <span class="s2">&#34;aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee&#34;</span>
</span></span><span class="line"><span class="ln">14</span><span class="cl"><span class="nv">group_replication_start_on_boot</span> <span class="o">=</span> OFF
</span></span><span class="line"><span class="ln">15</span><span class="cl"><span class="nv">group_replication_local_address</span> <span class="o">=</span> <span class="s2">&#34;node1.example.com:33061&#34;</span>
</span></span><span class="line"><span class="ln">16</span><span class="cl"><span class="nv">group_replication_group_seeds</span> <span class="o">=</span> <span class="s2">&#34;node1:33061,node2:33061,node3:33061&#34;</span>
</span></span><span class="line"><span class="ln">17</span><span class="cl"><span class="nv">group_replication_bootstrap_group</span> <span class="o">=</span> OFF
</span></span><span class="line"><span class="ln">18</span><span class="cl"><span class="nv">group_replication_single_primary_mode</span> <span class="o">=</span> ON       <span class="c1"># 99% 場景用 ON</span>
</span></span><span class="line"><span class="ln">19</span><span class="cl"><span class="nv">group_replication_enforce_update_everywhere_checks</span> <span class="o">=</span> OFF
</span></span><span class="line"><span class="ln">20</span><span class="cl">
</span></span><span class="line"><span class="ln">21</span><span class="cl"><span class="c1"># Step 2: 用 MySQL Shell 從第一個 member bootstrap cluster</span>
</span></span><span class="line"><span class="ln">22</span><span class="cl">mysqlsh --user<span class="o">=</span>root --host<span class="o">=</span>node1.example.com
</span></span><span class="line"><span class="ln">23</span><span class="cl">&gt; dba.configureInstance<span class="o">(</span><span class="s1">&#39;root@node1:3306&#39;</span><span class="o">)</span>
</span></span><span class="line"><span class="ln">24</span><span class="cl">&gt; var <span class="nv">cluster</span> <span class="o">=</span> dba.createCluster<span class="o">(</span><span class="s1">&#39;prodCluster&#39;</span><span class="o">)</span>
</span></span><span class="line"><span class="ln">25</span><span class="cl">&gt; cluster.addInstance<span class="o">(</span><span class="s1">&#39;root@node2:3306&#39;</span><span class="o">)</span>
</span></span><span class="line"><span class="ln">26</span><span class="cl">&gt; cluster.addInstance<span class="o">(</span><span class="s1">&#39;root@node3:3306&#39;</span><span class="o">)</span>
</span></span><span class="line"><span class="ln">27</span><span class="cl">&gt; cluster.status<span class="o">()</span>  <span class="c1"># 應該顯示 3 member、1 PRIMARY + 2 SECONDARY</span>
</span></span><span class="line"><span class="ln">28</span><span class="cl">
</span></span><span class="line"><span class="ln">29</span><span class="cl"><span class="c1"># Step 3: 部署 MySQL Router</span>
</span></span><span class="line"><span class="ln">30</span><span class="cl">mysqlrouter --bootstrap root@node1:3306 --directory /etc/mysql-router --user<span class="o">=</span>mysqlrouter
</span></span><span class="line"><span class="ln">31</span><span class="cl">systemctl start mysql-router
</span></span><span class="line"><span class="ln">32</span><span class="cl">
</span></span><span class="line"><span class="ln">33</span><span class="cl"><span class="c1"># 完成 — application 連 mysql-router:6446 (R/W) 或 :6447 (R/O)</span></span></span></code></pre></div><p>Application 連 Router、Router 自動發現 cluster topology + 自動 failover routing。Application 不必知道哪個 instance 是 primary。</p>
<h2 id="5-個-production-踩雷">5 個 Production 踩雷</h2>
<h3 id="1-certification-lag--multi-primary-模式-retry-storm">1. Certification lag — Multi-primary 模式 retry storm</h3>
<p>Multi-primary mode 下、3 個 instance 同時收到 <em>相同 row</em> 的 conflicting write、certification 階段必有 N-1 個 transaction 被退回。Application 看到 <code>ER_GR_CONFLICT_TRANSACTION_ABORTED</code>、retry、若不智能 retry（exponential backoff）會 retry storm、整個 cluster 寫吞吐暴降。</p>
<p>修法：</p>
<ul>
<li>99% 場景用 <em>single-primary mode</em>、避開 conflict</li>
<li>真的需要 multi-primary：application 必須 sharding-aware（不同 entry 寫不同 row range）、本質上跟 Vitess sharding 同概念但用 GR 機制</li>
<li>Application retry 用 <em>jitter exponential backoff</em>、不直接 retry</li>
</ul>
<h3 id="2-certification-queue-爆炸--single-primary-mode-仍受-cert-backlog-影響">2. Certification queue 爆炸 — Single-primary mode 仍受 cert backlog 影響</h3>
<p>Single-primary mode 下 primary 接受 write、broadcast 到 secondary。Secondary 跟 primary network latency / 處理速度差時、cert queue 累積。Cert queue 滿 → primary write 也被卡（GR 設計：所有 member 同步前不接受新 write、保 consistency）。</p>
<p>修法：</p>
<ul>
<li>監控 <code>group_replication_member_stats</code> view：<code>COUNT_TRANSACTIONS_IN_QUEUE</code> 持續 &gt; 0 是警訊</li>
<li>提高 <code>group_replication_message_cache_size</code>（預設 1 GB）給 large transaction 緩衝</li>
<li>確認 <em>所有 member 同 instance class</em>、不要混 spec</li>
<li>跨 region GR：完全不推薦（network latency 殺 cert throughput）</li>
</ul>
<h3 id="3-large-transaction--全-cluster-卡住">3. Large transaction — 全 cluster 卡住</h3>
<p>GR 必須把整個 transaction（含所有 write_set）一次 broadcast。10 GB transaction（大批量 UPDATE）必須一次塞滿 GCE buffer、cluster 內所有 member 都暫停接受新 transaction 直到 broadcast / apply 完成。常見場景：批次 archive / 大 backfill / <code>INSERT ... SELECT 1 億 row</code>。</p>
<p>修法：</p>
<ul>
<li><code>group_replication_transaction_size_limit</code>（預設 150 MB）超過直接 reject、不要設 unlimited</li>
<li>大批量寫入拆 chunk（每 chunk &lt; 100 MB）、用 application 層 loop</li>
<li>對 archive / backfill 用 <code>INSERT INTO archive SELECT ... LIMIT 10000</code> chunked、不是一個 transaction</li>
</ul>
<h3 id="4-network-partition--minority-partition-自動-read-only">4. Network partition — Minority partition 自動 read-only</h3>
<p>3 member cluster、network partition 把 1 個 member 隔離。被隔離 member 是 <em>minority</em>、自動進入 <em>read-only mode</em>（不接受 write）、防 split-brain。Application 連到 minority member 寫入會失敗。</p>
<p>修法：</p>
<ul>
<li>MySQL Router 自動發現 cluster topology、自動 route write 到 majority partition primary</li>
<li>Application 必須處理 connection error + retry（甚至 connection string 改成 <em>Router endpoint</em> 而非個別 instance）</li>
<li>監控 <code>group_replication_primary_member</code> UDF、確認哪個是真 primary</li>
</ul>
<h3 id="5-member-加入-catch-up--大量-binlog-阻擋-cluster-service">5. Member 加入 catch-up — 大量 binlog 阻擋 cluster service</h3>
<p>新 member 加入 cluster（new instance / 復原 failed member）必須 <em>catch-up</em> — apply 從 GR cluster start 到當前所有 binlog 才能 join consensus。如果 cluster 已運作 1 個月、binlog 累積 100 GB、catch-up 可能 6-12 小時、catch-up 期間 <em>該 member 不投票、其他 member 仍 service</em>、但 majority 安全邊界縮小（3 → 2 member working）。</p>
<p>修法：</p>
<ul>
<li>
<p>用 <em>MySQL Shell clone plugin</em> 直接 physical-snapshot 一個 existing member、跳過 binlog replay：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">&gt; cluster.addInstance<span class="o">(</span><span class="s1">&#39;root@node4:3306&#39;</span>, <span class="o">{</span>recoveryMethod: <span class="s1">&#39;clone&#39;</span><span class="o">})</span></span></span></code></pre></div></li>
<li>
<p>Clone 期間原 member 暫不接 write traffic（用 Router temporarily 排除）</p>
</li>
<li>
<p>規劃 maintenance window 加 member、不要在 peak load 期間</p>
</li>
</ul>
<h2 id="何時用-gr--innodb-cluster">何時用 GR / InnoDB Cluster</h2>
<table>
  <thead>
      <tr>
          <th>條件</th>
          <th>建議</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>需要 <em>zero-data-loss HA</em>（不容忍任何 binlog gap）</td>
          <td>GR single-primary</td>
      </tr>
      <tr>
          <td>需要 <em>自動 failover 而不必 Orchestrator + fence script</em></td>
          <td>GR / InnoDB Cluster</td>
      </tr>
      <tr>
          <td>需要 <em>跨 region multi-active</em>（且 conflict 可接受 / sharding-aware）</td>
          <td>GR multi-primary</td>
      </tr>
      <tr>
          <td>流量 &lt; 50K WPS、無嚴格 zero-loss 需求</td>
          <td>傳統 Orchestrator + Semi-sync 更簡單</td>
      </tr>
      <tr>
          <td>已用 Aurora / Cloud SQL 等 managed</td>
          <td>不用 GR、用 managed offering</td>
      </tr>
      <tr>
          <td>需要分散式 SQL（跨 region linearizable）</td>
          <td>Spanner / CockroachDB / Aurora DSQL（GR 不解決這個）</td>
      </tr>
  </tbody>
</table>
<h2 id="跟其他模組整合">跟其他模組整合</h2>
<h3 id="跟-replication-topology">跟 Replication topology</h3>
<p>GR 取代傳統 async / semi-sync replication、不是 <em>加在上面</em>。啟用 GR 後不要再配 <code>master-slave</code> style replication。詳見 <a href="/blog/backend/01-database/vendors/mysql/replication-topology/" data-link-title="MySQL Replication Topology：async / semi-sync / GTID 不是三選一、是三個 trade-off 軸的疊加" data-link-desc="MySQL replication 不是「選 async 還是 semi-sync」、是 *durability / latency / consistency* 三個 trade-off 軸的疊加；GTID 是跨 mode 的 infrastructure layer、不是第三種 mode。本文走 3 軸取捨模型 → async / semi-sync 行為對比 → GTID 替代 binlog-position 的好處 → 配置 step-by-step → 5 production 踩雷（lag 暴衝 / semi-sync 退回 async / GTID gap / Loss-Less semi-sync 真的 loss-less / chained replication 雪崩）→ 跟 Aurora MySQL / Vitess / ProxySQL / Orchestrator 整合">Replication Topology</a>。</p>
<h3 id="跟-orchestrator">跟 Orchestrator</h3>
<p>Orchestrator 跟 InnoDB Cluster 不該 <em>同時用</em> — 兩者都會 trigger failover、會打架。GR / InnoDB Cluster 內建 failover、不需要 Orchestrator。詳見 <a href="/blog/backend/01-database/vendors/mysql/orchestrator-failover/" data-link-title="MySQL Orchestrator Failover：HA 工具自己怎麼 HA？raft cluster &#43; GTID-based promotion 的兩段 paradox" data-link-desc="Orchestrator 是 MySQL HA 自動 failover 的 de facto standard、但讀者第一個問題往往是「HA 工具自己會壞嗎」。本文走 Orchestrator 的雙層架構（管 MySQL 的 raft cluster &#43; 被 raft 管的 orchestrator instance）→ topology discovery → failure detection → failover decision tree → promote action → 5 production 踩雷（split-brain 跟 fencing / pre-failover hook 失敗 / anti-flapping window / GTID errant transaction / VIP 跟 ProxySQL 整合斷層）→ 跟 ProxySQL / Patroni / RDS 對比">Orchestrator Failover</a>。</p>
<h3 id="跟-proxysql--mysql-router">跟 ProxySQL / MySQL Router</h3>
<p>ProxySQL 可以連 GR cluster（自動偵測 read_only flag）、但 <em>MySQL Router</em> 是 GR 原生的 routing layer、跟 InnoDB Cluster 緊耦合（透過 MySQL Shell metadata）。</p>
<p>選擇邏輯：</p>
<ul>
<li><em>純 MySQL stack, 想 Oracle-supported 整套</em> → MySQL Router</li>
<li><em>已用 ProxySQL（包含其他非 GR cluster）+ 統一 routing</em> → 仍用 ProxySQL</li>
</ul>
<p>詳見 <a href="/blog/backend/01-database/vendors/mysql/proxysql-config/" data-link-title="MySQL ProxySQL 配置：connection / query / route / response 四段 lifecycle 跟 query rule 設計" data-link-desc="ProxySQL 是 MySQL 生態的 connection pool &#43; query routing 標準。本文走 connection → query parse → route → response 四段 lifecycle、query rule engine 的 rule chain 設計、Hostgroup / Server / User 三層 schema、配置 step-by-step（讀寫分離 &#43; replica lag-aware routing）、5 production 踩雷（query rule 順序錯亂 / connection 漂移 / write 路由到 replica / runtime / disk schema drift / mirror traffic 副作用）、跟 Replication / Orchestrator / HAProxy 整合">ProxySQL 配置</a>。</p>
<h3 id="跟-innodb-tuning">跟 InnoDB Tuning</h3>
<p>GR 對 <code>innodb_flush_log_at_trx_commit</code> / <code>sync_binlog</code> 行為更敏感 — GR 要求 binlog 必須 <em>fsync to disk</em>（<code>sync_binlog=1</code>）保 zero-loss、不能用 <code>sync_binlog=0</code> 換速度。詳見 <a href="/blog/backend/01-database/vendors/mysql/innodb-tuning/" data-link-title="MySQL InnoDB Tuning：為什麼一個 100 GB DB 在 64 GB RAM server 上 query 慢 5 倍" data-link-desc="InnoDB 是 MySQL 預設 storage engine、預設值給 256 MB buffer pool（早期 default）。本文從一個常見痛點開場（DB &gt; RAM 但 server 仍 swap）、走 4 個 critical knob（buffer pool / redo log / flush method / IO capacity）、各自如何影響讀寫吞吐、配置 step-by-step、5 production 踩雷（buffer pool warm-up / log file 大小 / 設 sync_binlog=0 換速度 / IO scheduler / undo log 膨脹）、跟 SSD / NVMe / EBS 的 IO 假設">InnoDB Tuning</a>。</p>
<h3 id="跟-postgresql-patroni-對比">跟 PostgreSQL Patroni 對比</h3>
<table>
  <thead>
      <tr>
          <th>維度</th>
          <th>InnoDB Cluster</th>
          <th>Patroni + PostgreSQL</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>Consensus</td>
          <td>GCE (Paxos-like) 內建</td>
          <td>依賴外部 DCS (etcd / Consul)</td>
      </tr>
      <tr>
          <td>Multi-primary</td>
          <td>支援（但少用）</td>
          <td>不支援（PG single-primary）</td>
      </tr>
      <tr>
          <td>HA tooling</td>
          <td>MySQL Shell + Router 整套</td>
          <td>Patroni + HAProxy + pgBouncer</td>
      </tr>
      <tr>
          <td>Setup 複雜度</td>
          <td>中（MySQL Shell 帶很多 abstraction）</td>
          <td>中（Patroni config + DCS）</td>
      </tr>
      <tr>
          <td>5-year production maturity</td>
          <td>Oracle-backed</td>
          <td>community-driven、廣用</td>
      </tr>
  </tbody>
</table>
<p>兩者角色相同、設計取捨不同。詳見 <a href="/blog/backend/01-database/vendors/postgresql/patroni-ha/" data-link-title="PostgreSQL Patroni HA：從 leader 失聯到 client 重連的 5 段 failover lifecycle" data-link-desc="Patroni 把 PostgreSQL HA 拆成 detection / election / promotion / reconfiguration / recovery 五段 lifecycle、每段都有獨立配置跟 failure mode；DCS quorum &#43; watchdog 防 split-brain、async/sync replication 取捨、5 個 production 踩雷、跟 PgBouncer / HAProxy / cert-manager 整合">PostgreSQL Patroni HA</a>。</p>
<h2 id="容量規劃要點">容量規劃要點</h2>
<table>
  <thead>
      <tr>
          <th>元件</th>
          <th>配置建議</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>Member 數量</td>
          <td>3 (預設、容忍 1 failure)、5 (容忍 2 failure)</td>
      </tr>
      <tr>
          <td>Member 間 network latency</td>
          <td>&lt; 5ms（同 region 同 AZ 或跨 AZ）</td>
      </tr>
      <tr>
          <td>Network bandwidth</td>
          <td>至少 1 Gbps、broadcast traffic 重</td>
      </tr>
      <tr>
          <td>Transaction size limit</td>
          <td><code>group_replication_transaction_size_limit=150M</code></td>
      </tr>
      <tr>
          <td>Message cache</td>
          <td><code>group_replication_message_cache_size=1G</code>（預設）+ 看 lag 調</td>
      </tr>
      <tr>
          <td>MySQL Router instance</td>
          <td>至少 2 個（HA）、放 application 同 LB 後</td>
      </tr>
  </tbody>
</table>
<p>Member 跨 region：<em>不推薦</em>。GR 對 latency 敏感、跨 region 50-200ms RTT 嚴重影響 cert throughput。multi-region 需求用 Aurora Global Database / Spanner 等專為跨 region 設計的方案。</p>
<h2 id="相關連結">相關連結</h2>
<ul>
<li><a href="/blog/backend/01-database/vendors/mysql/" data-link-title="MySQL" data-link-desc="高併發網路服務常用關聯式資料庫、Vitess / PlanetScale 分片生態、GitHub / Shopify / Facebook 規模驗證">MySQL vendor overview</a></li>
<li><a href="/blog/backend/01-database/vendors/mysql/replication-topology/" data-link-title="MySQL Replication Topology：async / semi-sync / GTID 不是三選一、是三個 trade-off 軸的疊加" data-link-desc="MySQL replication 不是「選 async 還是 semi-sync」、是 *durability / latency / consistency* 三個 trade-off 軸的疊加；GTID 是跨 mode 的 infrastructure layer、不是第三種 mode。本文走 3 軸取捨模型 → async / semi-sync 行為對比 → GTID 替代 binlog-position 的好處 → 配置 step-by-step → 5 production 踩雷（lag 暴衝 / semi-sync 退回 async / GTID gap / Loss-Less semi-sync 真的 loss-less / chained replication 雪崩）→ 跟 Aurora MySQL / Vitess / ProxySQL / Orchestrator 整合">MySQL Replication Topology</a>（GR 取代傳統 replication）</li>
<li><a href="/blog/backend/01-database/vendors/mysql/orchestrator-failover/" data-link-title="MySQL Orchestrator Failover：HA 工具自己怎麼 HA？raft cluster &#43; GTID-based promotion 的兩段 paradox" data-link-desc="Orchestrator 是 MySQL HA 自動 failover 的 de facto standard、但讀者第一個問題往往是「HA 工具自己會壞嗎」。本文走 Orchestrator 的雙層架構（管 MySQL 的 raft cluster &#43; 被 raft 管的 orchestrator instance）→ topology discovery → failure detection → failover decision tree → promote action → 5 production 踩雷（split-brain 跟 fencing / pre-failover hook 失敗 / anti-flapping window / GTID errant transaction / VIP 跟 ProxySQL 整合斷層）→ 跟 ProxySQL / Patroni / RDS 對比">MySQL Orchestrator Failover</a>（GR / InnoDB Cluster 不必 Orchestrator）</li>
<li><a href="/blog/backend/01-database/vendors/mysql/proxysql-config/" data-link-title="MySQL ProxySQL 配置：connection / query / route / response 四段 lifecycle 跟 query rule 設計" data-link-desc="ProxySQL 是 MySQL 生態的 connection pool &#43; query routing 標準。本文走 connection → query parse → route → response 四段 lifecycle、query rule engine 的 rule chain 設計、Hostgroup / Server / User 三層 schema、配置 step-by-step（讀寫分離 &#43; replica lag-aware routing）、5 production 踩雷（query rule 順序錯亂 / connection 漂移 / write 路由到 replica / runtime / disk schema drift / mirror traffic 副作用）、跟 Replication / Orchestrator / HAProxy 整合">MySQL ProxySQL 配置</a>（routing layer 對比）</li>
<li><a href="/blog/backend/01-database/vendors/mysql/innodb-tuning/" data-link-title="MySQL InnoDB Tuning：為什麼一個 100 GB DB 在 64 GB RAM server 上 query 慢 5 倍" data-link-desc="InnoDB 是 MySQL 預設 storage engine、預設值給 256 MB buffer pool（早期 default）。本文從一個常見痛點開場（DB &gt; RAM 但 server 仍 swap）、走 4 個 critical knob（buffer pool / redo log / flush method / IO capacity）、各自如何影響讀寫吞吐、配置 step-by-step、5 production 踩雷（buffer pool warm-up / log file 大小 / 設 sync_binlog=0 換速度 / IO scheduler / undo log 膨脹）、跟 SSD / NVMe / EBS 的 IO 假設">MySQL InnoDB Tuning</a>（GR durability 需求）</li>
<li><a href="/blog/backend/01-database/vendors/postgresql/bdr-multi-master/" data-link-title="PostgreSQL BDR / Multi-Master：active-active 寫入的 3 種路徑跟 conflict 治理" data-link-desc="PG 預設是 single-primary、active-active 多寫入入口需要 *BDR (EDB)* / *pgEdge* / *Bucardo* 等 extension。本文走 3 種 multi-master 方案對比、conflict detection &#43; resolution model、async vs sync 取捨、配置 step-by-step（pgEdge 為主）、5 production 踩雷（last-write-wins data loss / sequence collision / DDL replication / conflict log 治理 / failover 後 timeline 分歧）、跟 MySQL Group Replication sibling 對比">PostgreSQL BDR / Multi-Master</a>（PG sibling、active-active 寫入 3 種路徑跟 conflict 治理）</li>
<li><a href="/blog/backend/01-database/vendors/postgresql/patroni-ha/" data-link-title="PostgreSQL Patroni HA：從 leader 失聯到 client 重連的 5 段 failover lifecycle" data-link-desc="Patroni 把 PostgreSQL HA 拆成 detection / election / promotion / reconfiguration / recovery 五段 lifecycle、每段都有獨立配置跟 failure mode；DCS quorum &#43; watchdog 防 split-brain、async/sync replication 取捨、5 個 production 踩雷、跟 PgBouncer / HAProxy / cert-manager 整合">PostgreSQL Patroni HA</a>（PG sibling、不同 consensus）</li>
<li><a href="/blog/backend/knowledge-cards/quorum/" data-link-title="Quorum" data-link-desc="分散式系統以多數節點同意作為提交或讀取有效性的門檻">quorum 卡片</a> / <a href="/blog/backend/knowledge-cards/quorum/" data-link-title="Quorum" data-link-desc="分散式系統以多數節點同意作為提交或讀取有效性的門檻">Paxos / Raft 對比</a></li>
<li>官方：<a href="https://dev.mysql.com/doc/refman/8.0/en/group-replication.html">MySQL Group Replication</a> / <a href="https://dev.mysql.com/doc/mysql-shell/8.0/en/mysql-innodb-cluster.html">InnoDB Cluster</a></li>
</ul>
]]></content:encoded></item></channel></rss>