<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/"><channel><title>Multi-Region on Tarragon</title><link>https://tarrragon.github.io/blog/tags/multi-region/</link><description>Recent content in Multi-Region on Tarragon</description><generator>Hugo -- gohugo.io</generator><language>zh-TW</language><copyright>Tarragon (CC BY 4.0)</copyright><lastBuildDate>Tue, 02 Jun 2026 00:00:00 +0000</lastBuildDate><atom:link href="https://tarrragon.github.io/blog/tags/multi-region/index.xml" rel="self" type="application/rss+xml"/><item><title>DynamoDB Global Tables：multi-region active-active、LWW conflict 與 cross-device sync 正向用例</title><link>https://tarrragon.github.io/blog/backend/01-database/vendors/dynamodb/global-tables-conflict/</link><pubDate>Wed, 27 May 2026 00:00:00 +0000</pubDate><guid>https://tarrragon.github.io/blog/backend/01-database/vendors/dynamodb/global-tables-conflict/</guid><description>&lt;blockquote>
&lt;p>本文是 &lt;a href="https://tarrragon.github.io/blog/backend/01-database/vendors/dynamodb/" data-link-title="DynamoDB" data-link-desc="AWS managed key-value、partition-based scaling、9000 萬 RPS sustained 實戰證據">DynamoDB&lt;/a> overview 的 implementation-layer deep article。寫作參照 &lt;a href="https://tarrragon.github.io/blog/posts/vendor-%E6%B7%B1%E5%BA%A6%E6%8A%80%E8%A1%93%E6%96%87%E7%AB%A0%E6%96%B9%E6%B3%95%E8%AB%96%E7%9A%84%E6%BC%94%E5%8C%96%E7%B4%80%E9%8C%84%E5%90%8C-vendor-%E7%B3%BB%E5%88%97%E7%9A%84%E9%96%8B%E5%A0%B4%E8%BC%AA%E6%9B%BF%E9%A9%97%E8%AD%89/" data-link-title="Vendor 深度技術文章方法論的演化紀錄：同 vendor 系列的開場輪替驗證" data-link-desc="vendor overview 飽和後要寫單一功能深度文章、需要選題與結構依據時回來。這套方法論的驗證來源與 cadence variant 在高風險場景（同 vendor sub-tool 系列）的實證。">vendor deep article methodology&lt;/a>。&lt;/p>&lt;/blockquote>
&lt;p>B2B SaaS 跟客戶 SLA 寫 99.99%、單 region 跑了一年遇過兩次 region-level outage、合計 downtime 已逼近 SLA 上限。team 要把核心 table 改 Global Tables active-active、首問是「multi-region write 之後資料還會一致嗎」。這個問題的答案是：&lt;em>不會、但有工程解法&lt;/em>；DynamoDB Global Tables 用 LWW（Last Writer Wins）跨 region async 同步、conflict 偵測跟 reconciliation 要 application 自己加。&lt;/p>
&lt;p>但 Global Tables 不只是 conflict 痛點。Disney+ 用同一個機制處理 cross-device sync（手機看一半回家用電視繼續）、Genesys 用同一個機制做 15 region B2B 客服平台的 99.999% 可用性。本文先講正向 access pattern（避免讓讀者誤以為 Global Tables 只是「跨 region 寫入會 conflict、所以痛苦」）、再展開 conflict resolution 跟 reconciliation 設計。&lt;/p>
&lt;blockquote>
&lt;p>&lt;strong>Workload 適配本 vendor 才繼續&lt;/strong>：DynamoDB 4 軸判讀（PK 天然均勻 / control plane vs data plane / consistency 可接受 eventual / access pattern 穩定）軸見 &lt;a href="../single-table-design-pattern/#dynamodb-%e9%81%a9%e7%94%a8%e5%ba%a6%e5%89%8d%e7%bd%ae%e5%88%a4%e8%ae%804-%e8%bb%b8">single-table-design-pattern 開頭 4 軸前置判讀&lt;/a>。Global Tables 是 &lt;em>已選 DynamoDB 後&lt;/em> 的拓樸決策；strong global consistency 必要的 workload 應走 Spanner / Cosmos DB strong consistency level、不是用 LWW 補。&lt;/p>&lt;/blockquote>
&lt;h2 id="b2b-saas-vs-b2c-業務-driver-對比">B2B SaaS vs B2C 業務 driver 對比&lt;/h2>
&lt;p>Global Tables 不是預設選擇、是 &lt;em>業務性質&lt;/em> 決定的工程投資。&lt;code>9.C24 Genesys&lt;/code> 揭露兩條關鍵 frame — 可用性目標的業務 driver、跟每多一個 9 的 cost 指數成長。&lt;/p>
&lt;table>
 &lt;thead>
 &lt;tr>
 &lt;th>業務性質&lt;/th>
 &lt;th>典型可用性目標&lt;/th>
 &lt;th>年停機容忍&lt;/th>
 &lt;th>Multi-region 投資邏輯&lt;/th>
 &lt;/tr>
 &lt;/thead>
 &lt;tbody>
 &lt;tr>
 &lt;td>B2C 大型網站&lt;/td>
 &lt;td>99.9%&lt;/td>
 &lt;td>8.76 小時&lt;/td>
 &lt;td>通常單 region + PITR / cross-region backup 划算&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>B2B SaaS&lt;/td>
 &lt;td>99.95% 或 99.99%（合約）&lt;/td>
 &lt;td>4.4 小時 / 52.6 分鐘&lt;/td>
 &lt;td>合約義務、客戶 SLA 違約有金錢損失、ROI 正向&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>客服平台類&lt;/td>
 &lt;td>99.999%（合約客戶）&lt;/td>
 &lt;td>5.26 分鐘&lt;/td>
 &lt;td>客戶停線損失極大、15 region 投資合理（Genesys）&lt;/td>
 &lt;/tr>
 &lt;/tbody>
&lt;/table>
&lt;p>&lt;strong>B2C 大型網站&lt;/strong>通常 99.9% SLA、年停機 8.76 小時可接受、單 region + PITR + cross-region backup 是常見配置；改 Global Tables 邊際成本高、ROI 通常不正向。&lt;/p></description><content:encoded><![CDATA[<blockquote>
<p>本文是 <a href="/blog/backend/01-database/vendors/dynamodb/" data-link-title="DynamoDB" data-link-desc="AWS managed key-value、partition-based scaling、9000 萬 RPS sustained 實戰證據">DynamoDB</a> overview 的 implementation-layer deep article。寫作參照 <a href="/blog/posts/vendor-%E6%B7%B1%E5%BA%A6%E6%8A%80%E8%A1%93%E6%96%87%E7%AB%A0%E6%96%B9%E6%B3%95%E8%AB%96%E7%9A%84%E6%BC%94%E5%8C%96%E7%B4%80%E9%8C%84%E5%90%8C-vendor-%E7%B3%BB%E5%88%97%E7%9A%84%E9%96%8B%E5%A0%B4%E8%BC%AA%E6%9B%BF%E9%A9%97%E8%AD%89/" data-link-title="Vendor 深度技術文章方法論的演化紀錄：同 vendor 系列的開場輪替驗證" data-link-desc="vendor overview 飽和後要寫單一功能深度文章、需要選題與結構依據時回來。這套方法論的驗證來源與 cadence variant 在高風險場景（同 vendor sub-tool 系列）的實證。">vendor deep article methodology</a>。</p></blockquote>
<p>B2B SaaS 跟客戶 SLA 寫 99.99%、單 region 跑了一年遇過兩次 region-level outage、合計 downtime 已逼近 SLA 上限。team 要把核心 table 改 Global Tables active-active、首問是「multi-region write 之後資料還會一致嗎」。這個問題的答案是：<em>不會、但有工程解法</em>；DynamoDB Global Tables 用 LWW（Last Writer Wins）跨 region async 同步、conflict 偵測跟 reconciliation 要 application 自己加。</p>
<p>但 Global Tables 不只是 conflict 痛點。Disney+ 用同一個機制處理 cross-device sync（手機看一半回家用電視繼續）、Genesys 用同一個機制做 15 region B2B 客服平台的 99.999% 可用性。本文先講正向 access pattern（避免讓讀者誤以為 Global Tables 只是「跨 region 寫入會 conflict、所以痛苦」）、再展開 conflict resolution 跟 reconciliation 設計。</p>
<blockquote>
<p><strong>Workload 適配本 vendor 才繼續</strong>：DynamoDB 4 軸判讀（PK 天然均勻 / control plane vs data plane / consistency 可接受 eventual / access pattern 穩定）軸見 <a href="../single-table-design-pattern/#dynamodb-%e9%81%a9%e7%94%a8%e5%ba%a6%e5%89%8d%e7%bd%ae%e5%88%a4%e8%ae%804-%e8%bb%b8">single-table-design-pattern 開頭 4 軸前置判讀</a>。Global Tables 是 <em>已選 DynamoDB 後</em> 的拓樸決策；strong global consistency 必要的 workload 應走 Spanner / Cosmos DB strong consistency level、不是用 LWW 補。</p></blockquote>
<h2 id="b2b-saas-vs-b2c-業務-driver-對比">B2B SaaS vs B2C 業務 driver 對比</h2>
<p>Global Tables 不是預設選擇、是 <em>業務性質</em> 決定的工程投資。<code>9.C24 Genesys</code> 揭露兩條關鍵 frame — 可用性目標的業務 driver、跟每多一個 9 的 cost 指數成長。</p>
<table>
  <thead>
      <tr>
          <th>業務性質</th>
          <th>典型可用性目標</th>
          <th>年停機容忍</th>
          <th>Multi-region 投資邏輯</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>B2C 大型網站</td>
          <td>99.9%</td>
          <td>8.76 小時</td>
          <td>通常單 region + PITR / cross-region backup 划算</td>
      </tr>
      <tr>
          <td>B2B SaaS</td>
          <td>99.95% 或 99.99%（合約）</td>
          <td>4.4 小時 / 52.6 分鐘</td>
          <td>合約義務、客戶 SLA 違約有金錢損失、ROI 正向</td>
      </tr>
      <tr>
          <td>客服平台類</td>
          <td>99.999%（合約客戶）</td>
          <td>5.26 分鐘</td>
          <td>客戶停線損失極大、15 region 投資合理（Genesys）</td>
      </tr>
  </tbody>
</table>
<p><strong>B2C 大型網站</strong>通常 99.9% SLA、年停機 8.76 小時可接受、單 region + PITR + cross-region backup 是常見配置；改 Global Tables 邊際成本高、ROI 通常不正向。</p>
<p><strong>B2B SaaS</strong> 99.95% 或 99.99% SLA 多半寫進合約、違約有具體金錢損失；Global Tables 的 N region cost 對比 SLA 違約成本通常 ROI 正向。critical 的是 <em>合約義務</em> 不是 <em>技術完美</em>。</p>
<p><strong>客服平台類</strong> 99.999% 是極端可用性目標、年停機 5.26 分鐘、Genesys 撐 8000+ orgs 的客服平台、客戶停線損失極大、跨 15 region 的 active-active 是合理投資。但 <em>不是每個 SaaS 都該追 99.999%</em>、是 <em>業務性質決定下限</em>。</p>
<p><strong>成本對比</strong>（<code>9.C24</code> 揭露）：15 region 成本約 = 1 region 的 15x（base table cost）+ 跨 region replication WCU。每多一個 9、容量規劃跟運維成本指數成長。</p>
<blockquote>
<p><strong>Scope warning（指標口徑紀律）</strong>：99.999% 是「12 個月滾動歷史值、不代表未來持續達成」（<code>9.C24</code> 警惕段第 1 條）。可用性是滾動指標、不是恆久承諾。引用 Genesys 99.999% 數字時要明示口徑（滾動 / customer-facing），不要寫成「DynamoDB 保證 99.999%」。</p></blockquote>
<h2 id="正向-access-pattern不只-conflict-議題">正向 access pattern：不只 conflict 議題</h2>
<p>Global Tables 不只是 DR / availability、也是正向 access pattern 的工程方案。先建立正向用例的判讀、再進 conflict 細節。</p>
<p><strong>Cross-device sync</strong>（<code>9.C27 Disney+</code> 揭露）：用戶在手機看到一半、晚上回家用電視繼續、播放進度跨裝置同步。Global Tables 自然解這個 access pattern — 用戶在不同 region 登入同帳號、寫入自動同步、最終一致性可接受場景。</p>
<p><strong>Global read（latency 優化）</strong>：跨地域用戶讀取就近 region 副本、latency 從 200ms 降到 &lt; 10ms。read 比 write 多很多倍的 workload（feed / catalog / user profile）受益最大。</p>
<p><strong>DR failover</strong>：region-level outage 時 application 切到 secondary region 繼續服務、RTO 通常 &lt; 5 分鐘（DNS / routing 切換時間、不含 application 端 reconnect）。</p>
<p><strong>B2C 也可能划算的場景</strong>：cross-device sync 是 <em>user-facing experience</em>、不是合規 / SLA driver。B2C 大規模平台（Disney+ / Spotify 類）也可能投資 Global Tables。判讀軸是「sync 體驗是否核心 UX」、不只「合約 SLA」。</p>
<h2 id="核心機制lww-conflict-resolution">核心機制：LWW conflict resolution</h2>
<p>Global Tables 的 first-class concept：</p>
<ul>
<li><strong>Multi-region active-active</strong>：每個 region 都能寫、async replication；typical replication latency &lt; 1s 但 <em>無 SLA</em></li>
<li><strong>LWW by wall clock</strong>：conflict 由 attribute <code>aws:rep:updatetime</code> 決定、純物理時間；不是 logical clock、不是 vector clock</li>
<li><strong>同 region read-your-write</strong>：本 region 寫立即可讀（同 region quorum 內）、其他 region 看到要等 replication</li>
<li><strong>Capacity 獨立</strong>：每個 region 自己的 RCU/WCU、<code>ReplicatedWriteCapacityUnits</code> 是跨 region replication 額外 WCU、按 region 數倍計</li>
</ul>
<p>對應 knowledge card：<a href="/blog/backend/knowledge-cards/consistency-level/" data-link-title="Consistency Level" data-link-desc="資料系統對讀寫一致性語意的可選擇層級">consistency level</a>、<a href="/blog/backend/knowledge-cards/rto/" data-link-title="RTO" data-link-desc="說明恢復時間目標如何約束事故回復策略">rto</a>、<a href="/blog/backend/knowledge-cards/rpo/" data-link-title="RPO" data-link-desc="說明恢復點目標如何定義可接受資料損失範圍">rpo</a>。</p>
<h2 id="設計流程">設計流程</h2>
<p>從 access pattern 分類到 reconciliation pipeline 的 6 步流程。</p>
<h4 id="step-1access-pattern-分類">Step 1：access pattern 分類</h4>
<p>把 table 中的資料分兩類：</p>
<ul>
<li><strong>region-pinned data</strong>：user 主要 region（合規 / 地理 affinity）；不啟用 Global Tables、用 region-pinned cluster</li>
<li><strong>global data</strong>：跨 region read / cross-device sync；啟用 Global Tables</li>
</ul>
<p>不是所有 table 都該上 Global Tables；user profile 跨 region 同步、但用戶交易紀錄可能該 pin 在合規 region。</p>
<h4 id="step-2啟用-global-tables">Step 2：啟用 Global Tables</h4>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">aws dynamodb update-table <span class="se">\
</span></span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="se"></span>  --table-name orders <span class="se">\
</span></span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="se"></span>  --replica-updates <span class="se">\
</span></span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="se"></span>  <span class="s1">&#39;[{&#34;Create&#34;: {&#34;RegionName&#34;: &#34;us-east-1&#34;}}]&#39;</span></span></span></code></pre></div><p>加 region 後 vendor 自動 backfill；backfill 期間 capacity 雙倍（原 region + 新 region 同步流量）、要預留 capacity buffer。</p>
<h4 id="step-3application-寫入策略">Step 3：application 寫入策略</h4>
<p>兩種寫入策略：</p>
<ul>
<li><strong>home region write</strong>：每 user 固定一個 home region 寫、避免 conflict；user 跨 region 漫遊時透過 routing 仍寫 home</li>
<li><strong>nearest region write</strong>：latency 優先、user 寫就近 region；conflict 機率高、必須加 idempotency 跟 reconciliation</li>
</ul>
<p>選擇：</p>
<table>
  <thead>
      <tr>
          <th>場景</th>
          <th>寫入策略</th>
          <th>理由</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>user profile / 設定</td>
          <td>home region write</td>
          <td>conflict 少、簡單</td>
      </tr>
      <tr>
          <td>cross-device sync</td>
          <td>nearest region write</td>
          <td>用戶在不同裝置同時操作、容忍 LWW</td>
      </tr>
      <tr>
          <td>訂單 / 金流</td>
          <td>home region write</td>
          <td>業務不容許 conflict 損失</td>
      </tr>
  </tbody>
</table>
<h4 id="step-4idempotency-設計">Step 4：idempotency 設計</h4>
<p>每筆 write 加 <code>request_id</code> 或 <code>client_timestamp</code>、application 端去重：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="k">def</span> <span class="nf">write_with_idempotency</span><span class="p">(</span><span class="n">user_id</span><span class="p">,</span> <span class="n">action</span><span class="p">,</span> <span class="n">request_id</span><span class="p">):</span>
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">    <span class="n">table</span><span class="o">.</span><span class="n">put_item</span><span class="p">(</span>
</span></span><span class="line"><span class="ln"> 3</span><span class="cl">        <span class="n">Item</span><span class="o">=</span><span class="p">{</span>
</span></span><span class="line"><span class="ln"> 4</span><span class="cl">            <span class="s2">&#34;PK&#34;</span><span class="p">:</span> <span class="sa">f</span><span class="s2">&#34;USER#</span><span class="si">{</span><span class="n">user_id</span><span class="si">}</span><span class="s2">&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 5</span><span class="cl">            <span class="s2">&#34;SK&#34;</span><span class="p">:</span> <span class="sa">f</span><span class="s2">&#34;ACTION#</span><span class="si">{</span><span class="n">action</span><span class="si">}</span><span class="s2">#</span><span class="si">{</span><span class="n">request_id</span><span class="si">}</span><span class="s2">&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 6</span><span class="cl">            <span class="s2">&#34;ts&#34;</span><span class="p">:</span> <span class="n">datetime</span><span class="o">.</span><span class="n">utcnow</span><span class="p">()</span><span class="o">.</span><span class="n">isoformat</span><span class="p">(),</span>
</span></span><span class="line"><span class="ln"> 7</span><span class="cl">            <span class="s2">&#34;request_id&#34;</span><span class="p">:</span> <span class="n">request_id</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 8</span><span class="cl">        <span class="p">},</span>
</span></span><span class="line"><span class="ln"> 9</span><span class="cl">        <span class="n">ConditionExpression</span><span class="o">=</span><span class="s2">&#34;attribute_not_exists(request_id)&#34;</span>
</span></span><span class="line"><span class="ln">10</span><span class="cl">    <span class="p">)</span></span></span></code></pre></div><p><code>ConditionExpression</code> 在同一 region 內擋重複；跨 region eventual 仍可能 race，conflict 落到 LWW + reconciliation。</p>
<blockquote>
<p><strong>Scope warning（重要）</strong>：「加 request_id 或 client_timestamp」具體實作屬通用工程知識、<code>9.C26 PayPay</code> case 揭露「通知不可丟失」的需求分層、<em>沒有</em> 揭露具體 idempotency 實作。引用 PayPay 時要降溫成「PayPay 揭露需求分層（通知 vs 訊息）、idempotency 為通用工程實作」、不寫成「PayPay 使用 request_id」（陷阱 4：把通用工程實作寫成 case 揭露）。</p></blockquote>
<h4 id="step-5conflict-detection">Step 5：conflict detection</h4>
<p>DynamoDB Streams 訂閱、Lambda 比較 <code>aws:rep:updatetime</code> 跟 application timestamp、抓出可疑 conflict 進 reconciliation queue：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="k">def</span> <span class="nf">detect_conflict</span><span class="p">(</span><span class="n">stream_event</span><span class="p">):</span>
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">    <span class="n">new_image</span> <span class="o">=</span> <span class="n">stream_event</span><span class="p">[</span><span class="s2">&#34;dynamodb&#34;</span><span class="p">][</span><span class="s2">&#34;NewImage&#34;</span><span class="p">]</span>
</span></span><span class="line"><span class="ln"> 3</span><span class="cl">    <span class="n">repl_time</span> <span class="o">=</span> <span class="n">new_image</span><span class="p">[</span><span class="s2">&#34;aws:rep:updatetime&#34;</span><span class="p">][</span><span class="s2">&#34;S&#34;</span><span class="p">]</span>
</span></span><span class="line"><span class="ln"> 4</span><span class="cl">    <span class="n">app_time</span> <span class="o">=</span> <span class="n">new_image</span><span class="p">[</span><span class="s2">&#34;client_timestamp&#34;</span><span class="p">][</span><span class="s2">&#34;S&#34;</span><span class="p">]</span>
</span></span><span class="line"><span class="ln"> 5</span><span class="cl">
</span></span><span class="line"><span class="ln"> 6</span><span class="cl">    <span class="k">if</span> <span class="nb">abs</span><span class="p">(</span><span class="n">parse</span><span class="p">(</span><span class="n">repl_time</span><span class="p">)</span> <span class="o">-</span> <span class="n">parse</span><span class="p">(</span><span class="n">app_time</span><span class="p">))</span> <span class="o">&gt;</span> <span class="n">timedelta</span><span class="p">(</span><span class="n">seconds</span><span class="o">=</span><span class="mi">5</span><span class="p">):</span>
</span></span><span class="line"><span class="ln"> 7</span><span class="cl">        <span class="c1"># 可疑 conflict、進 reconciliation</span>
</span></span><span class="line"><span class="ln"> 8</span><span class="cl">        <span class="n">sqs</span><span class="o">.</span><span class="n">send_message</span><span class="p">(</span>
</span></span><span class="line"><span class="ln"> 9</span><span class="cl">            <span class="n">QueueUrl</span><span class="o">=</span><span class="n">RECONCILIATION_QUEUE</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">10</span><span class="cl">            <span class="n">MessageBody</span><span class="o">=</span><span class="n">json</span><span class="o">.</span><span class="n">dumps</span><span class="p">(</span><span class="n">stream_event</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">11</span><span class="cl">        <span class="p">)</span></span></span></code></pre></div><blockquote>
<p><strong>Scope warning</strong>：DynamoDB Streams 用法屬通用工程實作、<code>9.C26 PayPay</code> case <em>沒有</em> 明示用 Streams、引用時要分層（PayPay 揭露需求、Streams 是工程實作的標準解）。</p></blockquote>
<h4 id="step-6reconciliation-pipeline">Step 6：reconciliation pipeline</h4>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="ln">1</span><span class="cl">Conflict event → SQS queue → Lambda / human review → merge logic → write back</span></span></code></pre></div><p>merge logic 視業務而定：</p>
<ul>
<li>訂單金額 conflict：抓最大值（避免少收）</li>
<li>用戶設定 conflict：抓最新（user-facing 行為一致）</li>
<li>watchlist conflict：union（兩裝置加的都保留）</li>
</ul>
<p><strong>驗證點</strong>：DR drill 演 region outage、確認 secondary region 接管後 read / write 都正常；<code>ReplicationLatency</code> p99 &lt; 1s。</p>
<p><strong>Rollback boundary</strong>：region 可逐個移除、但 active-active 改 active-passive 期間 application 需配合路由切換；先 application 切再移 region、不可同時做。</p>
<h2 id="失敗模式">失敗模式</h2>
<p>實際部署常見的 5 種失敗：</p>
<h4 id="case-1lww-默默吃掉-write">Case 1：LWW 默默吃掉 write</h4>
<p>跨 region 同一 record concurrent update、後到的 write 因 timestamp 較大蓋過先到的；business 看到「我送出的更新沒了」、稽核 log 才發現 conflict。修法：critical write 加 <code>ConditionExpression</code> 比較 <code>version</code> attribute、conflict 時 application 端 retry + merge；不要依賴 LWW 作為 conflict 解。</p>
<h4 id="case-2clock-skew-讓-lww-倒置">Case 2：Clock skew 讓 LWW 倒置</h4>
<p>region A 寫入 timestamp 因 NTP skew 比 region B 後寫快 200ms、結果舊資料贏。修法：依靠 application timestamp + monotonic counter、不依賴 server wall clock；critical write 用 conditional version + retry。</p>
<blockquote>
<p><strong>Scope warning</strong>：「200ms NTP skew」具體數字屬通用工程估算、case 未揭露具體 skew 範圍。</p></blockquote>
<h4 id="case-3replication-lag-撞-slo">Case 3：Replication lag 撞 SLO</h4>
<p>大 batch write 期間 replication lag 從 1s 變 30s、跨 region read 看到 30s 前資料、application 端 user 操作異常。修法：偵測 <code>ReplicationLatency</code> 升高時 application 端切 home region read、避免跨 region eventual read；把 replication lag 加進 SLO 監控、設 alarm。</p>
<h4 id="case-4dr-切換後-stale-data-持續-propagate">Case 4：DR 切換後 stale data 持續 propagate</h4>
<p>primary region outage 切到 secondary、舊 primary 恢復後仍把 outdated data 推回去、覆蓋 secondary 期間的新寫入。修法：DR runbook 含「舊 primary 恢復後人工 reconciliation 或重建」step、不可全自動 catch-up；舊 primary 恢復前先確認 replication 方向是「從 secondary catch up」而非「推舊資料回 secondary」。</p>
<h4 id="case-5跨-region-transaction-失敗">Case 5：跨 region transaction 失敗</h4>
<p>application 試圖跨 region <code>TransactWriteItems</code>、API 不支援跨 region transaction、原子性破裂。修法：transaction 限同 region 內、跨 region 用 <a href="/blog/backend/knowledge-cards/saga/" data-link-title="Saga" data-link-desc="處理跨服務分散事務的補償型 transaction 序列、用最終一致換 ACID atomic">saga</a> + idempotent + reconciliation；不要把同 region 的 transaction 假設搬到跨 region。</p>
<p><strong>Anti-recommendation</strong>：single-region availability 已達 99.95% + RTO 可接受 1 小時 + 預算敏感（特別 B2C 場景）→ 用 PITR + 跨 region backup 而非 Global Tables；Global Tables cost = N × single region cost 不止（對應 B2B vs B2C driver 對比）。</p>
<h2 id="容量與觀測">容量與觀測</h2>
<p>CloudWatch metric：</p>
<ul>
<li><code>ReplicationLatency</code>：p99 通常 &lt; 1s、建議 SLO 設 5s alarm</li>
<li><code>PendingReplicationCount</code>：積壓量、batch write 期間會升高</li>
<li><code>ReplicatedWriteCapacityUnits</code>：跨 region replication 額外 WCU、按 region 數倍計</li>
</ul>
<p>DynamoDB Streams + Lambda：抓 conflict event、寫進獨立 audit table；reconciliation job 從 audit table 跑、不直接動 base table。</p>
<p><strong>Region-level dashboard</strong>：每個 region 獨立 capacity / latency / error rate panel；DR drill 看是否能在 RTO 內切換。</p>
<p><strong>Cost monitoring</strong>：</p>
<ul>
<li>Global Tables cost ≈ N region × base cost + replication WCU</li>
<li>4 region 成本約 4.5x single region；15 region（Genesys 規模）約 15x</li>
<li>每多一個 region 都要重新算 ROI（軸 6 vendor crossover 的延伸）</li>
</ul>
<p><strong>指標口徑紀律</strong>（重要）：99.99% / 99.999% SLA 是 <em>滾動指標 + 歷史值</em>、不是永久承諾；引用 Genesys 99.999% 時明示「12 個月滾動 / customer-facing」、不寫成「DynamoDB 保證 99.999%」。</p>
<p>接回 <a href="/blog/backend/04-observability/observability-evidence-package/" data-link-title="4.20 Observability Evidence Package" data-link-desc="把 log、metric、trace、audit 與資料品質限制包成可交接證據">4.20 Observability Evidence Package</a>、<a href="/blog/backend/09-performance-capacity/capacity-planning/" data-link-title="9.6 容量規劃模型" data-link-desc="peak forecast、headroom budget、growth curve、autoscaling sizing">9.6 容量規劃模型</a>。</p>
<h2 id="邊界與整合">邊界與整合</h2>
<h3 id="frame-5region-pinned-global-tables-吸收合規邊界">Frame 5：region-pinned Global Tables 吸收合規邊界</h3>
<p>Global Tables 不只是高可用工具、也是 <em>合規邊界</em>（<a href="/blog/backend/knowledge-cards/data-residency/" data-link-title="Data Residency" data-link-desc="合規要求資料留在特定地理邊界內、跨境複製違反合規、推動 fleet 拓樸決策">Data Residency</a> 拓樸）的吸收層。DynamoDB 在 vendor capability 層級支援 <em>region-pinned replication</em> — 每張 table 可獨立決定哪些 region 參與 replication group、部分 region 可不加入。這個 capability 同時服務三類場景：合規分離（受監管市場資料不跨境）、cost / latency 取捨（資料只在主要服務 region 同步）、災備拓樸（少數 region 純讀備援）。<code>9.C24 Genesys</code> 15 region 揭露的是 <em>延遲就近接入</em> 的 B2B SaaS 拓樸（客戶服務延遲敏感、必須在客戶所在地有 region）— case 原文沒明示合規應用、但 region-pinned capability 在 Genesys 規模下天然能容納合規市場分離、是同 capability 的 <em>可能應用維度</em>、不是 case 已驗證的具體實踐。</p>
<p>跨 vendor 對照：</p>
<table>
  <thead>
      <tr>
          <th>Vendor</th>
          <th>合規吸收機制</th>
          <th>拓樸特性</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>DynamoDB</td>
          <td>region-pinned Global Tables（按 region 開關 replication、各市場可分離）</td>
          <td>仍是 active-active、但 replication 範圍可控</td>
      </tr>
      <tr>
          <td>Aurora</td>
          <td>fleet 拓樸（每市場獨立 cluster、合規禁止跨境 = Global Database 反指標）</td>
          <td>active-passive per market、跨市場不複製</td>
      </tr>
      <tr>
          <td>CockroachDB</td>
          <td>locality + placement（邏輯一個 cluster + region pinning + Outposts）</td>
          <td>單 logical cluster、physical row 鎖在合規 region</td>
      </tr>
      <tr>
          <td>MongoDB / Cosmos DB</td>
          <td>cluster-per-region（無 row-level locality 等價物、整 cluster 切割）</td>
          <td>各 region 獨立 cluster、application 層做市場 routing</td>
      </tr>
  </tbody>
</table>
<p><strong>為什麼 DynamoDB 在這個 frame 退化得最輕</strong>：Global Tables 的 region 開關是 <em>attribute 級</em> 設計（每張 table 可獨立決定哪些 region 參與）、不像 Aurora 必須整 cluster 拆。讀者要把「跨境合規 + 高可用」雙重需求兼顧時、DynamoDB 是最少結構性改造的路徑 — 但代價是 LWW conflict 跟 reconciliation 設計仍要自己做。</p>
<p><strong>何時 region-pinned 而非 active-active</strong>：受監管金融 / 個資跨境禁止的市場（如 GDPR strict 條款區、中國個資法 PIPL、巴西 LGPD）— 該 region 仍開 DynamoDB table、但 <em>不加入 Global Tables replication group</em>、跟其他 region 完全切割。capability 設計上支援這種按 region 開關 replication 的拓樸；具體是否套用、要看 <em>讀者自己的市場合規清單</em>、不是把 Genesys 規模當必然證據（Genesys case 揭露的是延遲就近接入、未明示合規分離實踐）。</p>
<h3 id="disney-vs-genesys兩種-global-tables-工程動機">Disney+ vs Genesys：兩種 Global Tables 工程動機</h3>
<p><code>9.C27 Disney+</code> 跟 <code>9.C24 Genesys</code> 是 Global Tables 兩種不同的工程動機：</p>
<ul>
<li><strong>Disney+</strong>：cross-device sync 是 user-facing UX、watchlist + 播放進度跨裝置同步、B2C 但 sync 是 core experience</li>
<li><strong>Genesys</strong>：99.999% B2B SaaS 合約義務、15 region active-active、客服平台停線損失極大</li>
</ul>
<p>兩個 case 都用 Global Tables、但動機完全不同 — Disney+ 是 UX driver、Genesys 是合約 driver。寫進你自己的設計時要明示自己屬哪一型，因為兩種型別的 cost 容忍度跟 conflict 容忍度完全不同。</p>
<h3 id="sibling-與-cross-link">Sibling 與 cross-link</h3>
<ul>
<li><a href="/blog/backend/01-database/vendors/dynamodb/consistency-model-optimization/" data-link-title="DynamoDB Strongly Consistent → Eventually Consistent：same protocol, different contract" data-link-desc="DynamoDB consistency model 從 strongly consistent read 改 eventually consistent read 是 50% cost 優化但風險集中在 application contract — 同 vendor / 同 protocol / 同 table / 不同 read consistency；驗證 [#128](/report/data-topology-as-audit-dimension/) self-aware limitation 提出的 consistency axis 候選；涵蓋 read pattern audit / 5 個 production 踩雷">consistency-model-optimization</a> — 同 region eventual / strong 取捨、本篇是跨 region 延伸</li>
<li><a href="/blog/backend/01-database/vendors/dynamodb/on-demand-vs-provisioned/" data-link-title="DynamoDB On-Demand vs Provisioned：6 軸決策、auto-scaling 邊界與 cost crossover" data-link-desc="capacity mode 選擇不是單軸 peak/avg ratio；本文展開 6 軸決策（peak/avg / 讀寫比 trend / surge 暫時 vs 永久 baseline / predictable-peak vs flash-sale / DBA 工時釋放 / vendor vs 自管 cost crossover），含 Zomato 50% 成本下降、Zoom 30x permanent surge、Amazon Ads sustained workload 等 case 分軸 anchor">on-demand-vs-provisioned</a> — 多 region capacity 規劃放大、軸 5 工時釋放在 multi-region 更顯著</li>
<li><a href="/blog/backend/01-database/vendors/dynamodb/partition-key-antipatterns/" data-link-title="DynamoDB Partition Key 反模式與 Write Sharding：composite key 修復跟 mode × partition 交叉判讀" data-link-desc="DynamoDB partition 上限 1000 WCU 是 hot partition 的根因；composite key（event_id &#43; shard suffix）跟 calculated shard（hash % N）兩種修法、mode × partition 在 provisioned / on-demand 不同表現，以及 9.C15 Tixcraft 6750x 擴展的工程細節">partition-key-antipatterns</a> — hot partition 跨 region 同樣存在、每個 region 的 partition 都要均勻</li>
<li><a href="/blog/backend/01-database/vendors/dynamodb/single-table-design-pattern/" data-link-title="DynamoDB Single-Table Design：從適用度前置判讀到 access pattern 反推 PK/SK" data-link-desc="DynamoDB single-table 設計不是「資料表越少越好」，而是 access pattern 反推 PK/SK 跟 GSI；本文先做 DynamoDB 適用度 4 軸前置判讀（PK 天然均勻 / control plane vs data plane / consistency / access pattern 穩定），再展開設計流程、failure modes 與 durable queue 正向用例">single-table-design-pattern</a> — single-table 設計在 multi-region 仍適用、access pattern 反推 PK/SK 不變</li>
<li>替代路由：global strong consistency 必要 → Spanner / Cosmos DB strong consistency level</li>
<li>Migration playbook：single-region → Global Tables 屬 topology re-layout、對應 <a href="/blog/posts/migration-playbook-%E6%96%B9%E6%B3%95%E8%AB%96%E7%9A%84%E6%BC%94%E5%8C%96%E7%B4%80%E9%8C%84stage-0-variant-%E8%A6%8F%E5%8A%83%E6%8A%8A-collapse-%E7%8E%87%E5%BE%9E-60-%E9%99%8D%E5%88%B0-0/" data-link-title="Migration Playbook 方法論的演化紀錄：Stage 0 variant 規劃把 collapse 率從 60% 降到 0%" data-link-desc="跨 vendor migration playbook 需要獨立寫作方法論的依據，以及這套方法論從三輪 batch dogfood 中演化出來的驗證證據。">migration playbook methodology</a> Type F</li>
<li>跟 <a href="/blog/backend/09-performance-capacity/cases/genesys-dynamodb-99999-availability/" data-link-title="9.C24 Genesys：用 DynamoDB 在 15 region 跑出 99.999% 可用性" data-link-desc="Genesys 客服平台用 DynamoDB 為預設資料層、跨 15 主 region &#43; 5 衛星 region、達成 12 個月 99.999% 可用性">Genesys 9.C24</a> 互引：15 region 5 個 9 可用性的工程實踐 + B2B SaaS 業務 driver</li>
<li>跟 <a href="/blog/backend/09-performance-capacity/cases/disney-plus-content-metadata/" data-link-title="9.C27 Disney&#43;：DynamoDB 撐每日數十億動作的觀看歷史" data-link-desc="Disney&#43; 用 DynamoDB 撐每日數十億動作的觀看歷史、watchlist、播放進度等串流 metadata">Disney+ 9.C27</a> 互引：cross-device sync 作為正向 access pattern</li>
<li>跟 <a href="/blog/backend/09-performance-capacity/cases/paypay-mobile-payment-messaging/" data-link-title="9.C26 PayPay：行動支付每日 3 億訊息的 DynamoDB 後端" data-link-desc="日本最大行動支付 PayPay 每日 3 億訊息、用 DynamoDB 處理通知與訊息功能、支撐次秒級反應">PayPay 9.C26</a> 互引：揭露需求分層（通知 vs 訊息）、idempotency / Streams 為通用工程實作、PayPay 未公開揭露具體實作</li>
</ul>
]]></content:encoded></item><item><title>PostgreSQL Multi-Region GDPR Rollout：政策驅動的 migration 屬本 methodology 嗎</title><link>https://tarrragon.github.io/blog/backend/01-database/vendors/postgresql/multi-region-gdpr-rollout/</link><pubDate>Tue, 19 May 2026 00:00:00 +0000</pubDate><guid>https://tarrragon.github.io/blog/backend/01-database/vendors/postgresql/multi-region-gdpr-rollout/</guid><description>&lt;blockquote>
&lt;p>本文是 &lt;a href="https://tarrragon.github.io/blog/backend/01-database/vendors/postgresql/" data-link-title="PostgreSQL" data-link-desc="多用途 OLTP 主流關聯式資料庫、MVCC、豐富 SQL 特性、是 Aurora / Cosmos DB / Spanner / CockroachDB / Aurora DSQL 的相容目標">PostgreSQL&lt;/a> overview 的 implementation-layer deep article。同時是 &lt;a href="https://tarrragon.github.io/blog/report/data-topology-as-audit-dimension/" data-link-title="Data topology 是 process content 的第 6 audit 維度" data-link-desc="Process content 的 diff dimension audit 原本 5 維（schema / operational / paradigm / components / application change）漏了 *data topology* — 資料在 cluster / partition / region 之間的分佈拓樸；topology 不在既有 5 維任一個、但決定 re-sharding / partition redesign / multi-region rollout 的結構；本卡擴 audit 到 6 維、新增 Type F「Topology re-layout」結構">#128 self-aware limitation&lt;/a> 第 1 點「6 維仍可能漏類（identity / consistency / residency 三軸候選）」的 &lt;em>residency 軸驗證&lt;/em>、跟 &lt;a href="https://tarrragon.github.io/blog/posts/migration-playbook-%E6%96%B9%E6%B3%95%E8%AB%96%E7%9A%84%E6%BC%94%E5%8C%96%E7%B4%80%E9%8C%84stage-0-variant-%E8%A6%8F%E5%8A%83%E6%8A%8A-collapse-%E7%8E%87%E5%BE%9E-60-%E9%99%8D%E5%88%B0-0/" data-link-title="Migration Playbook 方法論的演化紀錄：Stage 0 variant 規劃把 collapse 率從 60% 降到 0%" data-link-desc="跨 vendor migration playbook 需要獨立寫作方法論的依據，以及這套方法論從三輪 batch dogfood 中演化出來的驗證證據。">migration playbook methodology「何時不該套」段&lt;/a> 對「政策合規驅動」是否在 methodology scope 的反思。&lt;/p>&lt;/blockquote>
&lt;h2 id="政策驅動的-migration-屬本-methodology-嗎">政策驅動的 migration 屬本 methodology 嗎&lt;/h2>
&lt;p>&lt;a href="https://tarrragon.github.io/blog/posts/migration-playbook-%E6%96%B9%E6%B3%95%E8%AB%96%E7%9A%84%E6%BC%94%E5%8C%96%E7%B4%80%E9%8C%84stage-0-variant-%E8%A6%8F%E5%8A%83%E6%8A%8A-collapse-%E7%8E%87%E5%BE%9E-60-%E9%99%8D%E5%88%B0-0/" data-link-title="Migration Playbook 方法論的演化紀錄：Stage 0 variant 規劃把 collapse 率從 60% 降到 0%" data-link-desc="跨 vendor migration playbook 需要獨立寫作方法論的依據，以及這套方法論從三輪 batch dogfood 中演化出來的驗證證據。">Migration playbook methodology&lt;/a> 「何時不該套」段曾把「compliance-driven migration」歸為排除情境、後來改寫為「不在排除範圍 — 法規驅動只是 driver、資料層仍走 type A-E 之一」。本文是該改寫的 &lt;em>正面實證&lt;/em> — GDPR EU residency 強制需求驅動 single-region → multi-region rollout、本文是 &lt;em>政策驅動但仍走 audit + type 對映流程&lt;/em> 的 case study。&lt;/p>
&lt;p>但 reviewer D 在第三輪 audit 提出：residency 不只是 &lt;em>driver&lt;/em>、本身是 &lt;em>cross-cutting constraint&lt;/em>、反向約束 topology + operational + schema；該不該升 &lt;em>獨立 audit 軸&lt;/em>？本文是該議題的 dogfood。&lt;/p>
&lt;h2 id="三層約束driver--topology--contract">三層約束：driver / topology / contract&lt;/h2>
&lt;p>GDPR 對 PostgreSQL multi-region rollout 的影響在三個層次：&lt;/p></description><content:encoded><![CDATA[<blockquote>
<p>本文是 <a href="/blog/backend/01-database/vendors/postgresql/" data-link-title="PostgreSQL" data-link-desc="多用途 OLTP 主流關聯式資料庫、MVCC、豐富 SQL 特性、是 Aurora / Cosmos DB / Spanner / CockroachDB / Aurora DSQL 的相容目標">PostgreSQL</a> overview 的 implementation-layer deep article。同時是 <a href="/blog/report/data-topology-as-audit-dimension/" data-link-title="Data topology 是 process content 的第 6 audit 維度" data-link-desc="Process content 的 diff dimension audit 原本 5 維（schema / operational / paradigm / components / application change）漏了 *data topology* — 資料在 cluster / partition / region 之間的分佈拓樸；topology 不在既有 5 維任一個、但決定 re-sharding / partition redesign / multi-region rollout 的結構；本卡擴 audit 到 6 維、新增 Type F「Topology re-layout」結構">#128 self-aware limitation</a> 第 1 點「6 維仍可能漏類（identity / consistency / residency 三軸候選）」的 <em>residency 軸驗證</em>、跟 <a href="/blog/posts/migration-playbook-%E6%96%B9%E6%B3%95%E8%AB%96%E7%9A%84%E6%BC%94%E5%8C%96%E7%B4%80%E9%8C%84stage-0-variant-%E8%A6%8F%E5%8A%83%E6%8A%8A-collapse-%E7%8E%87%E5%BE%9E-60-%E9%99%8D%E5%88%B0-0/" data-link-title="Migration Playbook 方法論的演化紀錄：Stage 0 variant 規劃把 collapse 率從 60% 降到 0%" data-link-desc="跨 vendor migration playbook 需要獨立寫作方法論的依據，以及這套方法論從三輪 batch dogfood 中演化出來的驗證證據。">migration playbook methodology「何時不該套」段</a> 對「政策合規驅動」是否在 methodology scope 的反思。</p></blockquote>
<h2 id="政策驅動的-migration-屬本-methodology-嗎">政策驅動的 migration 屬本 methodology 嗎</h2>
<p><a href="/blog/posts/migration-playbook-%E6%96%B9%E6%B3%95%E8%AB%96%E7%9A%84%E6%BC%94%E5%8C%96%E7%B4%80%E9%8C%84stage-0-variant-%E8%A6%8F%E5%8A%83%E6%8A%8A-collapse-%E7%8E%87%E5%BE%9E-60-%E9%99%8D%E5%88%B0-0/" data-link-title="Migration Playbook 方法論的演化紀錄：Stage 0 variant 規劃把 collapse 率從 60% 降到 0%" data-link-desc="跨 vendor migration playbook 需要獨立寫作方法論的依據，以及這套方法論從三輪 batch dogfood 中演化出來的驗證證據。">Migration playbook methodology</a> 「何時不該套」段曾把「compliance-driven migration」歸為排除情境、後來改寫為「不在排除範圍 — 法規驅動只是 driver、資料層仍走 type A-E 之一」。本文是該改寫的 <em>正面實證</em> — GDPR EU residency 強制需求驅動 single-region → multi-region rollout、本文是 <em>政策驅動但仍走 audit + type 對映流程</em> 的 case study。</p>
<p>但 reviewer D 在第三輪 audit 提出：residency 不只是 <em>driver</em>、本身是 <em>cross-cutting constraint</em>、反向約束 topology + operational + schema；該不該升 <em>獨立 audit 軸</em>？本文是該議題的 dogfood。</p>
<h2 id="三層約束driver--topology--contract">三層約束：driver / topology / contract</h2>
<p>GDPR 對 PostgreSQL multi-region rollout 的影響在三個層次：</p>
<ol>
<li><strong>Driver layer</strong>：EU 客戶資料必須 <em>物理上儲存在 EU</em>（GDPR Article 44-49）— 觸發 multi-region migration 的根本理由</li>
<li><strong>Topology layer</strong>：跨 region replication 不能 <em>自由跨 region 複製</em> EU 客戶資料、必須按 GDPR scope 分區；topology 設計受合規約束</li>
<li><strong>Contract layer</strong>：審計能 <em>demonstrate</em> 「EU 資料在 EU」、操作日誌 + replication evidence 必須可追溯；application + ops contract 多出合規 obligation</li>
</ol>
<p>跑 <a href="/blog/report/content-structure-by-max-diff-dimension/" data-link-title="Process content 結構由最大差異維度決定、不是 universal phased" data-link-desc="跨 X process content（migration / upgrade / rollout / playbook）的結構由 source / target 之間 *差異維度組合* 決定、不存在 universal phased 模板；6 種 migration / process type 實證（schema 差 / drop-in / operational / multi-tool / paradigm / topology re-layout）跑出 6 種不同結構；寫作前必須做 *6 維 diff dimension audit* 才能決定結構、跳過會套錯模板">6 維 diff dimension audit</a> 對「single us-east → us-east + eu-west」：</p>
<table>
  <thead>
      <tr>
          <th>維度</th>
          <th>評估</th>
          <th>等級</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>Schema / API</td>
          <td>同 PostgreSQL、可能加 region column</td>
          <td>Low</td>
      </tr>
      <tr>
          <td>Operational model</td>
          <td>HA / backup / monitoring 跨 region 重設計</td>
          <td><strong>High</strong></td>
      </tr>
      <tr>
          <td>Paradigm</td>
          <td>同 OLTP RDBMS</td>
          <td>Low</td>
      </tr>
      <tr>
          <td>Components</td>
          <td>同 PostgreSQL instance + Patroni</td>
          <td>Low</td>
      </tr>
      <tr>
          <td>Application change</td>
          <td>Routing logic by user region、必改</td>
          <td>Medium</td>
      </tr>
      <tr>
          <td>Data topology</td>
          <td>Single → multi-region replication</td>
          <td><strong>High</strong></td>
      </tr>
      <tr>
          <td><strong>Residency contract</strong></td>
          <td><strong>EU 資料禁止離開 EU、log + replication 範圍受約束</strong></td>
          <td><strong>High</strong></td>
      </tr>
  </tbody>
</table>
<p>6 維 audit 抓不到「Residency contract = High」這軸。用既有 6 維歸類、會走 Type F multi-axis（topology + operational + application change 多 High）+ 政策合規補強段；但這個歸類 <em>漏掉合規對 topology / operational / application 的反向約束</em>：</p>
<ul>
<li>Topology layer：6 維只 audit 「topology 是否變動」、漏 audit 「topology 範圍是否受合規約束」</li>
<li>Operational layer：6 維只 audit 「operational 是否重設計」、漏 audit 「audit log / encryption / access control 是否符合合規要求」</li>
<li>Application layer：6 維只 audit 「application code 是否改」、漏 audit 「資料 routing 是否符合 residency rule」</li>
</ul>
<p><strong>Residency 不只是 driver、是 cross-cutting constraint</strong>、會反向約束其他 3-4 維、且帶獨立工作量（合規 evidence collection / DPIA / audit prep）。</p>
<h2 id="residency-axis-是否獨立3-個論據">Residency axis 是否獨立：3 個論據</h2>
<p><strong>Yes、residency 是獨立軸</strong>：</p>
<ol>
<li><strong>可獨立發生</strong>：原本 multi-region setup、新增「PCI 強制信用卡資料只能 us-east」、是 <em>純 residency 變更</em>、其他 6 維皆 Low（topology 不重設計、operational 不重設計、application 加 routing rule 即可）；但 residency 約束 routing + log 範圍</li>
<li><strong>驅動工作量分佈</strong>：本文 multi-region GDPR rollout 工作量分佈：
<ul>
<li>Topology setup（logical replication / region setup）：~25%</li>
<li>Operational redesign（HA / backup / monitoring）：~20%</li>
<li>Application routing change（region detection / data filter）：~15%</li>
<li><strong>Residency compliance（DPIA / audit log / access control / encryption / evidence）：~40%</strong></li>
</ul>
</li>
<li><strong>Cross-cutting nature</strong>：residency 不只影響「資料放哪」、影響：
<ul>
<li>Backup 可不可以 cross-region store（多數 GDPR 不允許）</li>
<li>Audit log 是否包含 EU PII（需 EU 端 log + 跨 region log filter）</li>
<li>Encryption key 是否可 cross-region share（多數情境不允許）</li>
<li>Application access logs 是否含 EU IP / user ID</li>
</ul>
</li>
</ol>
<p><strong>No、residency 可塞 operational + driver</strong>：</p>
<ul>
<li>反論：residency 是 operational 子議題、加 audit + replication scope 規則就好</li>
<li>拒絕：residency 反向約束 topology / application / operational、且帶獨立合規工作量（DPIA / cross-border transfer agreement / data subject rights）；不是單純 operational 子議題</li>
</ul>
<p>實證：本文 migration 工作量 40% 在 compliance、確認 residency 是 <em>獨立工作量主軸</em>。</p>
<h2 id="結構type-f-multi-axis--residency-compliance-獨立段">結構：Type F multi-axis + residency compliance 獨立段</h2>
<p>本文結構是 <em>Type F 為主</em>（topology high + operational high）+ <em>residency compliance 獨立段</em>（不在 6 維任一個）：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="ln">1</span><span class="cl">1. 政策驅動的 migration 屬本 methodology 嗎（meta-reflection 開頭）
</span></span><span class="line"><span class="ln">2</span><span class="cl">2. 三層約束：driver / topology / contract
</span></span><span class="line"><span class="ln">3</span><span class="cl">3. Residency axis 是否獨立的論據
</span></span><span class="line"><span class="ln">4</span><span class="cl">4. 結構 differentiator（Type F multi-axis + residency compliance 段）
</span></span><span class="line"><span class="ln">5</span><span class="cl">5. EU residency 對 topology / operational / application 的反向約束
</span></span><span class="line"><span class="ln">6</span><span class="cl">6. Migration 流程（含 DPIA 跟 evidence collection 階段）
</span></span><span class="line"><span class="ln">7</span><span class="cl">7. Production 故障演練
</span></span><span class="line"><span class="ln">8</span><span class="cl">8. Capacity / cost（含合規 audit cost）
</span></span><span class="line"><span class="ln">9</span><span class="cl">9. 整合 / 下一步</span></span></code></pre></div><p>9 章節、240-270 行。比標準 Type F 多 1 段（residency compliance）+ 1 段（meta-reflection）。</p>
<h2 id="eu-residency-對其他維度的反向約束">EU residency 對其他維度的反向約束</h2>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="ln"> 1</span><span class="cl">Residency rule → Topology constraint:
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">- EU customer data 不能 replicate to us-east
</span></span><span class="line"><span class="ln"> 3</span><span class="cl">- Backup of EU table 不能 store in non-EU region
</span></span><span class="line"><span class="ln"> 4</span><span class="cl">- Logical replication subscriber 在 us-east 必須 filter out EU data
</span></span><span class="line"><span class="ln"> 5</span><span class="cl">
</span></span><span class="line"><span class="ln"> 6</span><span class="cl">Residency rule → Operational constraint:
</span></span><span class="line"><span class="ln"> 7</span><span class="cl">- Cross-region monitoring 不能 export EU PII to global SaaS (Datadog)
</span></span><span class="line"><span class="ln"> 8</span><span class="cl">- Audit log 含 EU user_id 必須 store 在 EU
</span></span><span class="line"><span class="ln"> 9</span><span class="cl">- Encryption key (KMS) 不能 share 跨 region（EU 端用 EU KMS）
</span></span><span class="line"><span class="ln">10</span><span class="cl">- DBA / SRE access EU data 必須 from EU jurisdiction + 記 audit trail
</span></span><span class="line"><span class="ln">11</span><span class="cl">
</span></span><span class="line"><span class="ln">12</span><span class="cl">Residency rule → Application constraint:
</span></span><span class="line"><span class="ln">13</span><span class="cl">- Application 必須 detect user region + route 對應 DB endpoint
</span></span><span class="line"><span class="ln">14</span><span class="cl">- Cross-region join / aggregate 對 EU user 必須走 EU 端 query
</span></span><span class="line"><span class="ln">15</span><span class="cl">- Data export feature 必須 reject 跨 region export request</span></span></code></pre></div><p>每條反向約束都是 <em>新工作量</em>、不在 6 維 audit 內。</p>
<h2 id="migration-流程含-dpia--evidence-collection">Migration 流程（含 DPIA + evidence collection）</h2>
<p>10 step、跨 5 個月：</p>
<table>
  <thead>
      <tr>
          <th>Phase</th>
          <th>Step</th>
          <th>對應 6 維 / 合規</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>0 Pre-migration</td>
          <td>1. DPIA（Data Protection Impact Assessment）</td>
          <td>Compliance pre-requisite</td>
      </tr>
      <tr>
          <td>0</td>
          <td>2. 法務 review 跨境傳輸 agreement</td>
          <td>Compliance</td>
      </tr>
      <tr>
          <td>1 Setup</td>
          <td>3. EU PostgreSQL cluster build + Patroni</td>
          <td>Operational + Topology</td>
      </tr>
      <tr>
          <td>1</td>
          <td>4. EU KMS + audit log + monitoring stack</td>
          <td>Operational + Residency</td>
      </tr>
      <tr>
          <td>2 Data</td>
          <td>5. Logical replication 設 filter（exclude EU table from us-east）</td>
          <td>Topology + Residency</td>
      </tr>
      <tr>
          <td>2</td>
          <td>6. Initial sync EU table 到 EU cluster</td>
          <td>Topology</td>
      </tr>
      <tr>
          <td>3 App</td>
          <td>7. Application 端加 region detection + routing</td>
          <td>Application change</td>
      </tr>
      <tr>
          <td>3</td>
          <td>8. Cross-region query banning（cross-region join 拒絕 EU table）</td>
          <td>Application + Residency</td>
      </tr>
      <tr>
          <td>4 Verify</td>
          <td>9. Compliance audit + evidence package</td>
          <td>Residency</td>
      </tr>
      <tr>
          <td>4</td>
          <td>10. DPO sign-off + DR drill</td>
          <td>Residency + Operational</td>
      </tr>
  </tbody>
</table>
<p>Step 1 + 9 + 10 是 <em>residency-specific</em>、不在既有 6 維內。</p>
<h2 id="production-故障演練">Production 故障演練</h2>
<h3 id="case-1replication-filter-漏-tableeu-資料-leak-到-us-east">Case 1：Replication filter 漏 table、EU 資料 leak 到 us-east</h3>
<p><strong>徵兆</strong>：6 個月後 internal audit 發現 us-east 端 <code>customers</code> table 含 EU 客戶資料；replication filter 設定漏改、新加的 <code>eu_customer_extensions</code> table 被自動 replicate 到 us-east。</p>
<p><strong>根因</strong>：PostgreSQL logical replication publication 預設 <code>FOR ALL TABLES</code>、新加的 table 自動納入；應該明示 <code>FOR TABLE list...</code> 並 GDPR review。</p>
<p><strong>修法</strong>：</p>
<ol>
<li><strong>Publication 改 explicit table list</strong>：<code>CREATE PUBLICATION xxx FOR TABLE users, orders, ...</code>、不用 <code>FOR ALL TABLES</code></li>
<li><strong>Schema change review 加 GDPR check</strong>：每個 DDL PR 必須答「新 table 是否含 EU PII、是否該 filter」</li>
<li><strong>Replication monitor</strong>：定期跑 <code>SELECT * FROM pg_publication_tables</code> 對照 expected list、漂移立刻 alert</li>
<li><strong>Evidence collection</strong>：filter 配置 + audit log 留檔、出事 DPO 知道何時 leak</li>
</ol>
<h3 id="case-2backup-跨-region-store合規違規">Case 2：Backup 跨 region store、合規違規</h3>
<p><strong>徵兆</strong>：跑 1 年後 GDPR audit 抓到 EU table 的 backup 存在 us-west S3 bucket；違反 Article 44-49 限制。</p>
<p><strong>根因</strong>：pgBackRest 預設用 <em>global S3 bucket</em>（在 us-east-1）；EU PostgreSQL cluster backup 跑去 us-east、跨境傳輸無 transfer mechanism。</p>
<p><strong>修法</strong>：</p>
<ol>
<li><strong>Per-region backup config</strong>：EU cluster 用 EU S3 bucket（eu-west-1）、寫進 pgBackRest config</li>
<li><strong>Backup test</strong>：每月跑一次 backup restore drill、validate backup 是 from EU region</li>
<li><strong>Bucket policy 強 enforce</strong>：EU bucket 加 <code>aws:RequestedRegion=eu-west-1</code> 強制 region match</li>
<li><strong>Audit log archive 同理</strong>：log shipping 也必須 region-respect</li>
</ol>
<h3 id="case-3monitor-saas-收集-eu-pii合規-alert">Case 3：Monitor SaaS 收集 EU PII、合規 alert</h3>
<p><strong>徵兆</strong>：Datadog APM 收集了 EU customer 端 request 含 user_email 在 trace、被 DPO catch、required to delete 過去 90 天的 Datadog data。</p>
<p><strong>根因</strong>：APM trace 預設收集 application context、含 PII；Datadog 是 us-east SaaS、PII 跨境到 Datadog us-east、違規。</p>
<p><strong>修法</strong>：</p>
<ol>
<li><strong>APM scrub PII</strong>：application 端在 trace 前 scrub user_email / user_id 替換成 hash</li>
<li><strong>EU-specific monitor stack</strong>：EU PostgreSQL + APM 用 Grafana on EU EKS、不送 Datadog</li>
<li><strong>跨 region SaaS use 必須 audit</strong>：所有外部 SaaS（Datadog / Sentry / NewRelic）必須 GDPR-friendly 配置</li>
<li><strong>Privacy by design</strong>：log / trace 預設 scrub PII、不是 opt-in</li>
</ol>
<h3 id="case-4cross-region-query-跑-eu--us-資料residency-違規">Case 4：Cross-region query 跑 EU + US 資料、residency 違規</h3>
<p><strong>徵兆</strong>：BI dashboard 跑跨 region aggregation query（EU sales + US sales）、PostgreSQL FDW 從 us-east cluster query EU cluster、EU 端 server log 顯示「PII export to us-east」。</p>
<p><strong>根因</strong>：開發者用 PostgreSQL Foreign Data Wrapper（FDW）方便跑跨 region query、不知道這在 GDPR 視為跨境 PII export。</p>
<p><strong>修法</strong>：</p>
<ol>
<li><strong>Architecture: aggregate at edge</strong>：BI 跑 <em>per-region aggregate</em>、再在 BI layer compose（無 PII）；不直接跨 region join</li>
<li><strong>FDW 限制</strong>：disable FDW from us-east → EU cluster、enforce one-way data flow</li>
<li><strong>DBA access policy</strong>：DBA 不能直接 query EU cluster 從 us-east jumpbox</li>
<li><strong>Query audit</strong>：production query log 跑 PII detection（regex / NER）、發現跨境 export 立即 alert</li>
</ol>
<h3 id="case-5dr-drill-跨-region-failover暴露-residency-assumption-失敗">Case 5：DR drill 跨 region failover、暴露 residency assumption 失敗</h3>
<p><strong>徵兆</strong>：DR drill「EU 完全不可用、切到 us-east」執行後、發現 us-east 端 <em>沒 EU 資料</em> — 因為一直 strict residency filter；business 端 EU 客戶 24 小時無法服務。</p>
<p><strong>根因</strong>：strict GDPR residency 跟 strict DR availability 衝突 — 要 <em>跨 region DR</em> 就要 <em>跨 region 持有資料</em>、要 <em>strict residency</em> 就 <em>DR 範圍受限</em>。</p>
<p><strong>修法</strong>：</p>
<ol>
<li><strong>DR strategy revision</strong>：EU 端 multi-AZ within EU、不靠跨 region；EU region 全不可用情境接受 longer RTO</li>
<li><strong>Compliance + DR negotiation</strong>：跟 DPO / 法務談 <em>DR 跨境 short-window 是否可接受</em>、簽 cross-border transfer agreement</li>
<li><strong>Backup recovery 在 EU 內</strong>：EU 端 backup 跨 AZ store、不跨 region；EU AZ 災難用 EU 另一個 AZ 重建</li>
<li><strong>明示 RTO trade-off</strong>：EU customer SLA 寫「regional DR 內 RTO 1 小時、global DR 24-48 小時」、residency 跟 DR 是 <em>互斥取捨</em></li>
</ol>
<h2 id="capacity--cost">Capacity / cost</h2>
<table>
  <thead>
      <tr>
          <th>維度</th>
          <th>Single region</th>
          <th>Multi-region GDPR-compliant</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>Infrastructure cost</td>
          <td>baseline</td>
          <td>+60-100%（雙 cluster + cross-region replication）</td>
      </tr>
      <tr>
          <td>Operational FTE</td>
          <td>0.5-1</td>
          <td>1-2 FTE（雙 region SRE + compliance）</td>
      </tr>
      <tr>
          <td>Compliance cost</td>
          <td>0</td>
          <td>$50-200K USD setup（DPIA / audit / DPO time）+ ongoing</td>
      </tr>
      <tr>
          <td>Egress cost</td>
          <td>Low</td>
          <td>High（cross-region replication 流量）</td>
      </tr>
      <tr>
          <td>Application latency</td>
          <td>Single AZ</td>
          <td>EU customer 連 EU、低；US customer 連 US、低</td>
      </tr>
      <tr>
          <td>DR RTO</td>
          <td>30 分鐘 (single region)</td>
          <td>EU regional 1 小時 / global 24-48 小時</td>
      </tr>
      <tr>
          <td>Audit cost</td>
          <td>Minimal</td>
          <td>季度 DPIA + 年度 compliance audit</td>
      </tr>
  </tbody>
</table>
<p><strong>判讀</strong>：GDPR multi-region 成本 1.5-2.5x、但合規是 <em>必要 spend</em>、用 cost optimization 的框架看會誤判；多數歐洲業務 7+ 年回本（避免 4% revenue fine）。</p>
<h2 id="整合--下一步">整合 / 下一步</h2>
<h3 id="跟-postgresql--aurora-對位">跟 <a href="/blog/backend/01-database/vendors/postgresql/migrate-to-aurora/" data-link-title="PostgreSQL → Aurora Migration：protocol 相容、operational 重設計" data-link-desc="Aurora 號稱 PostgreSQL-compatible 但 operational model 不同（storage decouple / cluster endpoint / instance class / 自家備份）；遷移流程是混合（protocol drop-in &#43; operational phased）、5 個 production 踩雷（extension 不支援 / replication slot 不直通 / autovacuum 行為差 / IAM 認證強制 / cost model 換算）、跟 Patroni / read replica / DR 對位">PostgreSQL → Aurora</a> 對位</h3>
<p>Aurora Global Database 可簡化跨 region setup、但 residency filter 仍需 application 端；不是「Aurora 就解決 GDPR」。</p>
<h3 id="跟-multi-dc-mongodb-對位">跟 <a href="/blog/backend/01-database/vendors/mongodb/shard-expansion-multi-dc/" data-link-title="MongoDB Shard Expansion &#43; Multi-DC：Type F「不需要 parallel run」的 multi-region 例外" data-link-desc="MongoDB sharded cluster 加 shard &#43; 跨 DC expansion 是 Type F「topology re-layout」第 3 個 dogfood — 同時改 sharding &#43; replication topology &#43; region distribution；驗證 [#128](/report/data-topology-as-audit-dimension/) self-aware limitation 第 3 點「Type F 不需要 parallel run」claim 的例外（multi-region rollout 必須 parallel run &#43; 切流量）；涵蓋 chunk migration / replica set add member / cross-DC routing">Multi-DC MongoDB</a> 對位</h3>
<p>兩篇都是 multi-region rollout、但本文加合規維度；MongoDB 篇純 capacity + DR driver、本文加 residency constraint、結構不同。</p>
<h3 id="跟-128-self-aware-limitation-第-1-點對位">跟 #128 self-aware limitation 第 1 點對位</h3>
<p>本文驗證 <em>residency axis 候選</em>：</p>
<ul>
<li><strong>Yes 軸獨立</strong>：reverse-constrain topology + operational + application、且帶獨立 compliance 工作量（DPIA / evidence collection / DPO sign-off）</li>
<li><strong>作為 driver 不夠</strong>：methodology 把 residency 歸為 driver 太窄、忽略 cross-cutting constraint 性質</li>
</ul>
<p>未來 audit 可能擴 7 維（加 residency / compliance contract）；累積 PCI / HIPAA / SOX 等不同合規 case 後再評估。</p>
<h3 id="下一步議題">下一步議題</h3>
<ul>
<li><strong>Identity + Consistency + Residency 三軸候選統合</strong>：本批 3 篇分別驗證、未來累積 evidence 後考慮獨立 #129 卡 / 擴 audit 到 7-8 維</li>
<li><strong>Schrems II + new EU data transfer rules</strong>：跨大西洋資料傳輸法規變動快、playbook 半衰期短</li>
<li><strong>Data localization in China / Russia / India</strong>：類似 GDPR 但細節不同、未來 case 累積後評估</li>
</ul>
<h2 id="相關連結">相關連結</h2>
<ul>
<li>上游 vendor 頁：<a href="/blog/backend/01-database/vendors/postgresql/" data-link-title="PostgreSQL" data-link-desc="多用途 OLTP 主流關聯式資料庫、MVCC、豐富 SQL 特性、是 Aurora / Cosmos DB / Spanner / CockroachDB / Aurora DSQL 的相容目標">PostgreSQL</a></li>
<li>平行 multi-region case：<a href="/blog/backend/01-database/vendors/mongodb/shard-expansion-multi-dc/" data-link-title="MongoDB Shard Expansion &#43; Multi-DC：Type F「不需要 parallel run」的 multi-region 例外" data-link-desc="MongoDB sharded cluster 加 shard &#43; 跨 DC expansion 是 Type F「topology re-layout」第 3 個 dogfood — 同時改 sharding &#43; replication topology &#43; region distribution；驗證 [#128](/report/data-topology-as-audit-dimension/) self-aware limitation 第 3 點「Type F 不需要 parallel run」claim 的例外（multi-region rollout 必須 parallel run &#43; 切流量）；涵蓋 chunk migration / replica set add member / cross-DC routing">MongoDB Shard + Multi-DC</a></li>
<li>平行 axis 候選驗證：<a href="/blog/backend/07-security-data-protection/vendors/hashicorp-vault/migrate-to-aws-secrets-manager/" data-link-title="Vault → AWS Secrets Manager：「secret」不是「secret」、identity model 才是核心差異" data-link-desc="Vault → AWS Secrets Manager migration 表面是 secret store 替換、實際核心是 identity model 對位（Vault token &#43; policy vs AWS IAM &#43; resource policy）；驗證 [#128](/report/data-topology-as-audit-dimension/) self-aware limitation 提出的 identity axis 候選 — identity 是否獨立 audit 軸；5 個 production 踩雷（IAM principal 對位 / dynamic credential 對等失敗 / lease lifecycle 模型不同 / audit log 結構差 / 計費模型反轉）">Vault → AWS Secrets Manager</a>（identity 候選）/ <a href="/blog/backend/01-database/vendors/dynamodb/consistency-model-optimization/" data-link-title="DynamoDB Strongly Consistent → Eventually Consistent：same protocol, different contract" data-link-desc="DynamoDB consistency model 從 strongly consistent read 改 eventually consistent read 是 50% cost 優化但風險集中在 application contract — 同 vendor / 同 protocol / 同 table / 不同 read consistency；驗證 [#128](/report/data-topology-as-audit-dimension/) self-aware limitation 提出的 consistency axis 候選；涵蓋 read pattern audit / 5 個 production 踩雷">DynamoDB Consistency Model</a>（consistency 候選）</li>
<li>Methodology：<a href="/blog/posts/migration-playbook-%E6%96%B9%E6%B3%95%E8%AB%96%E7%9A%84%E6%BC%94%E5%8C%96%E7%B4%80%E9%8C%84stage-0-variant-%E8%A6%8F%E5%8A%83%E6%8A%8A-collapse-%E7%8E%87%E5%BE%9E-60-%E9%99%8D%E5%88%B0-0/" data-link-title="Migration Playbook 方法論的演化紀錄：Stage 0 variant 規劃把 collapse 率從 60% 降到 0%" data-link-desc="跨 vendor migration playbook 需要獨立寫作方法論的依據，以及這套方法論從三輪 batch dogfood 中演化出來的驗證證據。">Migration playbook methodology</a> / <a href="/blog/report/data-topology-as-audit-dimension/" data-link-title="Data topology 是 process content 的第 6 audit 維度" data-link-desc="Process content 的 diff dimension audit 原本 5 維（schema / operational / paradigm / components / application change）漏了 *data topology* — 資料在 cluster / partition / region 之間的分佈拓樸；topology 不在既有 5 維任一個、但決定 re-sharding / partition redesign / multi-region rollout 的結構；本卡擴 audit 到 6 維、新增 Type F「Topology re-layout」結構">#128 self-aware limitation 第 1 點</a>（residency axis 候選驗證、本文是該驗證的 dogfood）</li>
</ul>
]]></content:encoded></item><item><title>Cosmos DB Multi-Region Write：active-active、LWW、custom merge、Strong + multi-region 互斥的 AP 取捨</title><link>https://tarrragon.github.io/blog/backend/01-database/vendors/cosmosdb/multi-region-write-conflict/</link><pubDate>Wed, 27 May 2026 00:00:00 +0000</pubDate><guid>https://tarrragon.github.io/blog/backend/01-database/vendors/cosmosdb/multi-region-write-conflict/</guid><description>&lt;p>Cosmos DB 是 &lt;em>AP 系統&lt;/em>（&lt;a href="https://tarrragon.github.io/blog/backend/knowledge-cards/cap/" data-link-title="CAP Theorem" data-link-desc="分散式系統在網路分區時一致性與可用性的取捨框架">CAP&lt;/a> 三選二、放棄跨 region linearizability 換取 multi-region write 可用性）。跨 region 寫同一筆 document 必然有 conflict、Cosmos DB 提供三種 resolution policy 處理：LWW（Last-Writer-Wins）、custom merge stored procedure、conflict feed manual reconciliation。本文先講 AP 取捨的硬約束（為什麼 Strong consistency 跟 multi-region write 互斥）、再進三種 resolution 機制、再進廣告 SLA vs 實測可用性的鏈路拆解（DB 端 SLA 不等於使用者體驗）。&lt;/p>
&lt;p>本文是 &lt;a href="https://tarrragon.github.io/blog/backend/01-database/vendors/cosmosdb/" data-link-title="Azure Cosmos DB" data-link-desc="全球分散式 multi-model DB、5 個 consistency levels、Microsoft 自家 dogfood 證據">Cosmos DB vendor 頁&lt;/a> 的深度展開、也是 &lt;em>Strong + multi-region 互斥&lt;/em> 議題的 SSoT 主寫位置（&lt;a href="../consistency-levels-engineering/">consistency-levels-engineering&lt;/a> cross-link 過來、不展開）。Case anchor 是 &lt;a href="https://tarrragon.github.io/blog/backend/09-performance-capacity/cases/minecraft-earth-cosmos-db-global/" data-link-title="9.C11 Minecraft Earth：Azure Cosmos DB 上的全球分散式 AR 遊戲" data-link-desc="Minecraft Earth 用 Cosmos DB 跨地區分散、測試到 100 萬 RU/s 仍維持承諾延遲">9.C11 Minecraft Earth&lt;/a>（AR 遊戲跨 region 寫入、5 consistency level + multi-region SLA）+ &lt;a href="https://tarrragon.github.io/blog/backend/09-performance-capacity/cases/asos-cosmos-db-black-friday/" data-link-title="9.C21 ASOS：Cosmos DB 在 Black Friday 撐 1.67 億請求" data-link-desc="ASOS 在 2016 Black Friday 用 Azure Cosmos DB 撐 24 小時 1.67 億請求、3500 req/sec、48ms 平均延遲">9.C21 ASOS&lt;/a>（Black Friday 全球零售）+ &lt;a href="https://tarrragon.github.io/blog/backend/09-performance-capacity/cases/toyota-connected-mongodb-telematics-iot/" data-link-title="9.C38 Toyota Connected：MongoDB Atlas 撐 900 萬車輛 telematics、月 180 億 transaction" data-link-desc="Toyota Connected 用 MongoDB Atlas 撐 Safety Connect 900 萬車、月 180 億 transaction、緊急訊號 3 秒內到 agent">9.C38 Toyota Connected&lt;/a>（鏈路 SLA 拆解、跨 vendor 適用做 frame anchor）。&lt;/p>
&lt;blockquote>
&lt;p>&lt;strong>Cosmos DB 適用度前置判讀&lt;/strong>：本篇假設 workload 已通過 Cosmos DB 適用度四層 framing（API model 三型遷移路徑 / RU 思維轉換成本 / multi-model 差異化是否真用上 / 跨雲 hedging vs 單雲 lock-in）— 詳見 &lt;a href="../mongodb-api-vs-sql-api/#%e5%9b%9b%e5%b1%a4-framingvendor-selection-%e7%9a%84%e7%9c%9f%e5%af%a6%e6%b1%ba%e7%ad%96%e8%bb%b8">mongodb-api-vs-sql-api 開頭四層 framing&lt;/a>、本篇不重複展開。Multi-region write + conflict resolution 是 &lt;em>已選 Cosmos DB 後&lt;/em> 的拓樸決策；strong global consistency 必要的 workload 應走 Spanner 或 Cosmos DB Strong（單一 write region）、不是用 LWW 補。&lt;/p></description><content:encoded><![CDATA[<p>Cosmos DB 是 <em>AP 系統</em>（<a href="/blog/backend/knowledge-cards/cap/" data-link-title="CAP Theorem" data-link-desc="分散式系統在網路分區時一致性與可用性的取捨框架">CAP</a> 三選二、放棄跨 region linearizability 換取 multi-region write 可用性）。跨 region 寫同一筆 document 必然有 conflict、Cosmos DB 提供三種 resolution policy 處理：LWW（Last-Writer-Wins）、custom merge stored procedure、conflict feed manual reconciliation。本文先講 AP 取捨的硬約束（為什麼 Strong consistency 跟 multi-region write 互斥）、再進三種 resolution 機制、再進廣告 SLA vs 實測可用性的鏈路拆解（DB 端 SLA 不等於使用者體驗）。</p>
<p>本文是 <a href="/blog/backend/01-database/vendors/cosmosdb/" data-link-title="Azure Cosmos DB" data-link-desc="全球分散式 multi-model DB、5 個 consistency levels、Microsoft 自家 dogfood 證據">Cosmos DB vendor 頁</a> 的深度展開、也是 <em>Strong + multi-region 互斥</em> 議題的 SSoT 主寫位置（<a href="../consistency-levels-engineering/">consistency-levels-engineering</a> cross-link 過來、不展開）。Case anchor 是 <a href="/blog/backend/09-performance-capacity/cases/minecraft-earth-cosmos-db-global/" data-link-title="9.C11 Minecraft Earth：Azure Cosmos DB 上的全球分散式 AR 遊戲" data-link-desc="Minecraft Earth 用 Cosmos DB 跨地區分散、測試到 100 萬 RU/s 仍維持承諾延遲">9.C11 Minecraft Earth</a>（AR 遊戲跨 region 寫入、5 consistency level + multi-region SLA）+ <a href="/blog/backend/09-performance-capacity/cases/asos-cosmos-db-black-friday/" data-link-title="9.C21 ASOS：Cosmos DB 在 Black Friday 撐 1.67 億請求" data-link-desc="ASOS 在 2016 Black Friday 用 Azure Cosmos DB 撐 24 小時 1.67 億請求、3500 req/sec、48ms 平均延遲">9.C21 ASOS</a>（Black Friday 全球零售）+ <a href="/blog/backend/09-performance-capacity/cases/toyota-connected-mongodb-telematics-iot/" data-link-title="9.C38 Toyota Connected：MongoDB Atlas 撐 900 萬車輛 telematics、月 180 億 transaction" data-link-desc="Toyota Connected 用 MongoDB Atlas 撐 Safety Connect 900 萬車、月 180 億 transaction、緊急訊號 3 秒內到 agent">9.C38 Toyota Connected</a>（鏈路 SLA 拆解、跨 vendor 適用做 frame anchor）。</p>
<blockquote>
<p><strong>Cosmos DB 適用度前置判讀</strong>：本篇假設 workload 已通過 Cosmos DB 適用度四層 framing（API model 三型遷移路徑 / RU 思維轉換成本 / multi-model 差異化是否真用上 / 跨雲 hedging vs 單雲 lock-in）— 詳見 <a href="../mongodb-api-vs-sql-api/#%e5%9b%9b%e5%b1%a4-framingvendor-selection-%e7%9a%84%e7%9c%9f%e5%af%a6%e6%b1%ba%e7%ad%96%e8%bb%b8">mongodb-api-vs-sql-api 開頭四層 framing</a>、本篇不重複展開。Multi-region write + conflict resolution 是 <em>已選 Cosmos DB 後</em> 的拓樸決策；strong global consistency 必要的 workload 應走 Spanner 或 Cosmos DB Strong（單一 write region）、不是用 LWW 補。</p></blockquote>
<h2 id="問題情境active-active-的-conflict-是必然代價">問題情境：active-active 的 conflict 是必然代價</h2>
<p>典型觸發場景：產品要 global active-active（每個 region 都能寫、低延遲）、Cosmos DB 是 AP 系統、不像 Spanner 用 quorum 強一致；跨 region 寫同一筆 document 必然有 conflict、團隊不知道「conflict 真的發生時、誰贏 / 怎麼處理 / 業務語義保不保得住」。</p>
<p>讀者徵兆：</p>
<ul>
<li>「multi-region write 開了、user 在 A region 寫『加入購物車』、B region 寫『移除購物車』、最後哪個贏」</li>
<li>「LWW 用 timestamp 決定、client clock skew 不就破壞了嗎」</li>
<li>「conflict feed 是什麼、要不要消費」</li>
<li>「multi-region write 開了之後 consistency level 還能設 Strong 嗎」</li>
<li>「廣告寫 99.999%、為什麼實測只有 99%」</li>
</ul>
<p>真實壓力：購物車跨 region 寫入丟失、遊戲玩家狀態跨 region 衝突回滾、IoT device 跨 region 寫 telemetry 後消失。這些事故的根因不是 bug、是 multi-region write 的 <em>設計取捨</em>、需要在 selection 階段就決定 conflict resolution policy。</p>
<h2 id="核心機制">核心機制</h2>
<h3 id="ap-取捨的硬約束為什麼-strong--multi-region-write-互斥">AP 取捨的硬約束：為什麼 Strong + multi-region write 互斥</h3>
<p>Cosmos DB 是 AP 系統（在 partition 的情況下選 availability 跟 partition tolerance、放棄 cross-region linearizability）。multi-region write 的兩個前置條件：</p>
<ul>
<li>account 開啟 <code>enableMultipleWriteLocations = true</code></li>
<li>consistency level <em>不能設 Strong</em>（multi-region write 跟 Strong 互斥、時間敏感 claim、查 <a href="https://learn.microsoft.com/azure/cosmos-db/consistency-levels">最新文件</a>）</li>
</ul>
<p>為什麼互斥（CAP 三選二的硬約束）：</p>
<ul>
<li><strong>Strong consistency</strong> 在 Cosmos DB 的實作是 quorum-based linearizable read — 確保 read 拿到最新 commit、需要 <em>單一 write region</em> 來保證寫入順序</li>
<li><strong>Multi-region write</strong> 是 active-active、每個 region 都能寫 — 不存在「單一 write region」、寫入是 LWW-based eventual consistency</li>
<li>兩者在技術上 <em>不能同時成立</em> — 不是 Microsoft 工程選擇問題、是 distributed system 的基本限制（跟 Spanner 用 Paxos quorum + TrueTime 不同的設計路徑）</li>
</ul>
<p>對 selection 的意義：產品要「全球都能寫」就接受 eventual consistency；產品要「全球 linearizable」就轉 Spanner / Aurora DSQL、Cosmos DB 不是替代品。把 Cosmos DB Strong 跟 Spanner external consistency 等同視之是 <em>常見的選型誤判</em>。</p>
<p><a href="../consistency-levels-engineering/">consistency-levels-engineering</a> 的 Strong 段只 cross-link 過來、不展開 conflict resolution 細節 — 本篇是 SSoT 主寫位置。</p>
<h3 id="conflict-偵測">Conflict 偵測</h3>
<p>同一 document（partition key + id）在多 region 並發寫入、Cosmos DB 偵測為 conflict。偵測機制基於 LSN（log sequence number）、不是 timestamp — 兩個 region 對同一 document 寫入時、replication 過程比對 LSN 發現分歧、進 resolution。</p>
<h3 id="三種-conflict-resolution-policy">三種 conflict resolution policy</h3>
<h4 id="lwwlast-writer-wins預設">LWW（Last-Writer-Wins、預設）</h4>
<ul>
<li>機制：用 <code>_ts</code>（system timestamp）或自訂 numeric property、value 大的贏</li>
<li>副作用：clock skew 在 ms 級就能讓「先寫的反而贏」、業務邏輯破洞</li>
<li>適合：純覆寫場景（如玩家位置最新值、IoT 最新讀數）— write 順序不影響業務語義</li>
</ul>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-json" data-lang="json"><span class="line"><span class="ln">1</span><span class="cl"><span class="s2">&#34;conflictResolutionPolicy&#34;</span><span class="err">:</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl">  <span class="nt">&#34;mode&#34;</span><span class="p">:</span> <span class="s2">&#34;LastWriterWins&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">3</span><span class="cl">  <span class="nt">&#34;conflictResolutionPath&#34;</span><span class="p">:</span> <span class="s2">&#34;/customTimestamp&#34;</span>
</span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="p">}</span></span></span></code></pre></div><h4 id="custom-merge-stored-procedure">Custom merge stored procedure</h4>
<ul>
<li>機制：寫一個 JavaScript stored proc、conflict 時 Cosmos DB 呼叫、proc 回傳 merge 結果</li>
<li>適合：要保留業務語義的場景（購物車 merge = union 兩邊 items、計數器 merge = sum、status 機器 merge = 狀態圖規則）</li>
<li>風險：stored proc 在 Cosmos DB JavaScript runtime 跑、有 timeout / RU 限制；複雜 merge 邏輯難 debug</li>
</ul>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-json" data-lang="json"><span class="line"><span class="ln">1</span><span class="cl"><span class="s2">&#34;conflictResolutionPolicy&#34;</span><span class="err">:</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl">  <span class="nt">&#34;mode&#34;</span><span class="p">:</span> <span class="s2">&#34;Custom&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">3</span><span class="cl">  <span class="nt">&#34;conflictResolutionProcedure&#34;</span><span class="p">:</span> <span class="s2">&#34;dbs/mydb/colls/mycoll/sprocs/resolveCart&#34;</span>
</span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="p">}</span></span></span></code></pre></div><h4 id="conflict-feed-manual-reconciliation">Conflict feed manual reconciliation</h4>
<ul>
<li>機制：Cosmos DB 把 conflict 寫入 conflict feed、不自動解決、app 自行消費並 reconcile</li>
<li>適合：conflict 需要人工 / 業務流程判斷、不能 auto-resolve（如金融交易、合規場景）</li>
<li>風險：feed 不消費就累積、後續分析失準；app 需要實作 reconcile 流程</li>
</ul>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-json" data-lang="json"><span class="line"><span class="ln">1</span><span class="cl"><span class="s2">&#34;conflictResolutionPolicy&#34;</span><span class="err">:</span> <span class="p">{</span> <span class="nt">&#34;mode&#34;</span><span class="p">:</span> <span class="s2">&#34;Custom&#34;</span> <span class="p">}</span></span></span></code></pre></div><p>（沒指 procedure、conflict 全進 feed、app 用 SDK <code>ReadConflictsAsync()</code> / Change Feed Processor pattern 消費）</p>
<h3 id="跟其他-vendor-對比">跟其他 vendor 對比</h3>
<ul>
<li><strong>DynamoDB Global Tables</strong>：也是 LWW、<em>無</em> custom merge、<em>無</em> conflict feed — 行為比 Cosmos DB 簡單但彈性少</li>
<li><strong>Spanner</strong>：用 Paxos quorum、<em>不會有 conflict</em>（CP 系統、可用性換一致性）— 跨 region write 需 quorum、latency 100-200ms</li>
<li><strong>Aurora Global Database</strong>：single-primary（一個 region 寫、其他 region 讀）、不是真 multi-region write、無 conflict</li>
</ul>
<p>對應 knowledge cards：<a href="/blog/backend/knowledge-cards/stale-read/" data-link-title="Stale Read" data-link-desc="讀取到落後於最新寫入版本的舊資料">stale-read</a>、<a href="/blog/backend/knowledge-cards/rpo/" data-link-title="RPO" data-link-desc="說明恢復點目標如何定義可接受資料損失範圍">rpo</a>、<a href="/blog/backend/knowledge-cards/rto/" data-link-title="RTO" data-link-desc="說明恢復時間目標如何約束事故回復策略">rto</a>。</p>
<h2 id="操作流程">操作流程</h2>
<h3 id="開啟-multi-region-write">開啟 multi-region write</h3>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">az cosmosdb update --name mycosmos --resource-group myrg <span class="se">\
</span></span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="se"></span>  --enable-multiple-write-locations <span class="nb">true</span> <span class="se">\
</span></span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="se"></span>  --locations <span class="nv">regionName</span><span class="o">=</span>eastus <span class="nv">failoverPriority</span><span class="o">=</span><span class="m">0</span> <span class="se">\
</span></span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="se"></span>  --locations <span class="nv">regionName</span><span class="o">=</span>westeurope <span class="nv">failoverPriority</span><span class="o">=</span><span class="m">1</span></span></span></code></pre></div><p>開啟後 <em>不能直接關回</em>、要 disable + 改 region 配置 + re-enable、有停機窗口。</p>
<h3 id="設定-lww-policycontainer-層">設定 LWW policy（container 層）</h3>
<p>建 container 時指定、可事後改但 conflict 行為以新 policy 為準（既有 conflict 不會重 resolve）。預設用 <code>_ts</code> 比較；改成 customTimestamp 時要保證 application 寫入時 <em>用單調遞增</em> 的 timestamp source（不能用 client clock）。</p>
<h3 id="設定-custom-merge">設定 custom merge</h3>
<p>建 stored proc：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-javascript" data-lang="javascript"><span class="line"><span class="ln">1</span><span class="cl"><span class="kd">function</span> <span class="nx">resolveCart</span><span class="p">(</span><span class="nx">incomingItem</span><span class="p">,</span> <span class="nx">existingItem</span><span class="p">,</span> <span class="nx">isTombstone</span><span class="p">,</span> <span class="nx">conflictingItems</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl">  <span class="c1">// 範例：merge 購物車 items（取 union）
</span></span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="c1"></span>  <span class="kd">var</span> <span class="nx">merged</span> <span class="o">=</span> <span class="nx">existingItem</span><span class="p">;</span>
</span></span><span class="line"><span class="ln">4</span><span class="cl">  <span class="nx">merged</span><span class="p">.</span><span class="nx">items</span> <span class="o">=</span> <span class="nx">mergeArrays</span><span class="p">(</span><span class="nx">existingItem</span><span class="p">.</span><span class="nx">items</span><span class="p">,</span> <span class="nx">incomingItem</span><span class="p">.</span><span class="nx">items</span><span class="p">);</span>
</span></span><span class="line"><span class="ln">5</span><span class="cl">  <span class="nx">merged</span><span class="p">.</span><span class="nx">_ts</span> <span class="o">=</span> <span class="nb">Math</span><span class="p">.</span><span class="nx">max</span><span class="p">(</span><span class="nx">existingItem</span><span class="p">.</span><span class="nx">_ts</span><span class="p">,</span> <span class="nx">incomingItem</span><span class="p">.</span><span class="nx">_ts</span><span class="p">);</span>
</span></span><span class="line"><span class="ln">6</span><span class="cl">  <span class="nx">__</span><span class="p">.</span><span class="nx">response</span><span class="p">.</span><span class="nx">setBody</span><span class="p">(</span><span class="nx">merged</span><span class="p">);</span>
</span></span><span class="line"><span class="ln">7</span><span class="cl"><span class="p">}</span></span></span></code></pre></div>




<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-json" data-lang="json"><span class="line"><span class="ln">1</span><span class="cl"><span class="s2">&#34;conflictResolutionPolicy&#34;</span><span class="err">:</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl">  <span class="nt">&#34;mode&#34;</span><span class="p">:</span> <span class="s2">&#34;Custom&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">3</span><span class="cl">  <span class="nt">&#34;conflictResolutionProcedure&#34;</span><span class="p">:</span> <span class="s2">&#34;dbs/mydb/colls/mycoll/sprocs/resolveCart&#34;</span>
</span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="p">}</span></span></span></code></pre></div><p>驗證：proc 內處理 timeout / exception；測 edge case（空 array / null / 並發 3+ region 寫入）。</p>
<h3 id="消費-conflict-feed">消費 conflict feed</h3>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-csharp" data-lang="csharp"><span class="line"><span class="ln">1</span><span class="cl"><span class="c1">// .NET SDK</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="kt">var</span> <span class="n">iterator</span> <span class="p">=</span> <span class="n">container</span><span class="p">.</span><span class="n">GetItemQueryIterator</span><span class="p">&lt;</span><span class="n">ConflictProperties</span><span class="p">&gt;(</span>
</span></span><span class="line"><span class="ln">3</span><span class="cl">    <span class="s">&#34;SELECT * FROM c&#34;</span><span class="p">);</span>
</span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="k">while</span> <span class="p">(</span><span class="n">iterator</span><span class="p">.</span><span class="n">HasMoreResults</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">5</span><span class="cl">    <span class="kt">var</span> <span class="n">response</span> <span class="p">=</span> <span class="k">await</span> <span class="n">iterator</span><span class="p">.</span><span class="n">ReadNextAsync</span><span class="p">();</span>
</span></span><span class="line"><span class="ln">6</span><span class="cl">    <span class="k">foreach</span> <span class="p">(</span><span class="kt">var</span> <span class="n">conflict</span> <span class="k">in</span> <span class="n">response</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">7</span><span class="cl">        <span class="k">await</span> <span class="n">ProcessConflict</span><span class="p">(</span><span class="n">conflict</span><span class="p">);</span>
</span></span><span class="line"><span class="ln">8</span><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="ln">9</span><span class="cl"><span class="p">}</span></span></span></code></pre></div><p>用 Change Feed Processor pattern 把 conflict feed 當 stream 消費、寫到 reconcile queue、由業務流程處理。</p>
<h3 id="驗證點">驗證點</h3>
<ul>
<li>跨 region 並發寫測試（synthetic load）、觀察 conflict count / resolution result</li>
<li>Custom merge stored proc 跑過 edge case（exception / null / 並發 3+）</li>
<li>Conflict feed 不積壓（lag &lt; 5 min）</li>
<li>Region 故障時 application 仍能寫（active-active 設計、不需 manual failover）</li>
</ul>
<h2 id="失敗模式">失敗模式</h2>
<h3 id="failure-1全用-lww--用-server-timestamp">Failure 1：全用 LWW + 用 server timestamp</h3>
<p>clock skew 在 ms 級可能讓「先寫的反而贏」、業務邏輯破洞。常見徵兆：使用者反映「我明明先按確認、後來改的反而是舊的」、debug 才發現是跨 region clock skew。</p>
<p>修：</p>
<ul>
<li>用 <code>customTimestamp</code> 從 application 端 monotonic source 取（如 Snowflake ID、HLC、Lamport clock）</li>
<li>或改用 custom merge stored proc、用業務邏輯而非 timestamp 決勝</li>
<li>或拆 collection、把 conflict 高的 collection 用 stored proc、低的用 LWW</li>
</ul>
<h3 id="failure-2業務語義不適合-lww">Failure 2：業務語義不適合 LWW</h3>
<p>購物車（要 union）、計數器（要 sum）、status 機器（要狀態圖）全用 LWW = <em>資料丟失</em>。LWW 的設計假設是「最新 write 就是正確答案」、但很多業務語義不是覆寫關係。</p>
<p>修：盤點 collection 的業務語義、選對應 resolution policy：</p>
<ul>
<li>覆寫關係 → LWW</li>
<li>累積關係 → custom merge stored proc（union / sum / set 合併）</li>
<li>狀態機 → custom merge stored proc（按狀態圖規則 resolve）</li>
<li>需要人工裁決 → conflict feed</li>
</ul>
<h3 id="failure-3custom-merge-stored-proc-沒測-edge-case">Failure 3：Custom merge stored proc 沒測 edge case</h3>
<p>proc throw exception 時 Cosmos DB 行為：conflict 留 feed、不會自動 retry。團隊以為 proc 跑了就沒事、實際 conflict 累積在 feed、後續分析失準。</p>
<p>修：proc 內部 try-catch、log exception、確保 <em>任何輸入都能 return 一個合理結果</em>（即使是 fallback 到 LWW）；定期掃 conflict feed 檢查積壓。</p>
<h3 id="failure-4不消費-conflict-feed">Failure 4：不消費 conflict feed</h3>
<p>選 manual mode 後忘記實作 feed consumer、conflict 累積、後續分析失準。常見徵兆：feed lag metric alert、或業務反映「資料對不上」、最後發現 conflict feed 裡躺著一堆未處理的 conflict。</p>
<p>修：選 conflict feed mode 前先實作 consumer pipeline（Azure Function trigger on Change Feed / 自建 worker）；設 alert：feed lag &gt; 5 min 通知。</p>
<h3 id="failure-5期待-multi-region-write-還有-strong-consistency">Failure 5：期待 multi-region write 還有 Strong consistency</h3>
<p>兩者互斥、開啟 multi-region write 後 Strong 自動 downgrade（或拒絕設定、時間敏感、查最新文件）。團隊以為「multi-region + Strong = 全球 linearizable」、底層是設計 incompatibility。</p>
<p>修：在 selection 階段就決定「要 active-active write 還是要 Strong」 — 兩者只能擇一。要全球 linearizable 轉 Spanner / Aurora DSQL、要 active-active 就接受 eventual / session / bounded staleness。</p>
<h3 id="failure-6跨-region-寫入後立即同-session-read-看不到">Failure 6：跨 region 寫入後立即同 session read 看不到</h3>
<p>session token 沒跨 region 傳遞、看似 inconsistency 其實是 session 沒對齊。典型 anti-pattern：service A 在 region 1 寫、用 region 1 session token；service B 在 region 2 讀、沒拿到 A 的 token、看不到 A 的寫。</p>
<p>修：session token 隨 request 傳遞（通常進 HTTP header）；或改 account 層 Bounded staleness（提供跨 session 的 K/T bound）；見 <a href="../consistency-levels-engineering/">consistency-levels-engineering</a> 的 session token 管理段。</p>
<h3 id="failure-7region-故障時的-failover-邏輯誤判">Failure 7：Region 故障時的 failover 邏輯誤判</h3>
<p>multi-region write 已是 active-active、<em>不需要 manual failover</em> — 一個 region 掛、其他 region 自動承接寫入。但若用了 <code>failoverPriority</code> 配置、failover 邏輯仍要審 — priority 是 <em>當 multi-region read 切到哪個 region 為 primary</em>、不是 active-active 的 routing。</p>
<p>修：multi-region write 場景不用依賴 failoverPriority、用 Traffic Manager / Front Door 做 region routing；application 端 SDK 配置 <code>PreferredLocations</code> 讓 SDK 自己選 nearest region。</p>
<h2 id="容量與觀測">容量與觀測</h2>
<ul>
<li>必看 metric：<code>ConflictCount</code>、<code>ReplicationLatency</code> per region pair、conflict feed lag</li>
<li>Conflict rate 監控：正常 &lt; 0.01%、突增代表 hot key 或 region 同步異常</li>
<li>Cost 影響：multi-region write 開啟後、寫入成本 × region 數（每個 region 都 replicate）— 3 region active-active = 3x write <a href="/blog/backend/knowledge-cards/request-unit/" data-link-title="Request Unit" data-link-desc="Cosmos DB 的容量抽象單位、1 RU = 1KB document strong-consistent read 的 CPU &#43; memory &#43; IOPS 綜合 cost、寫 ~5 RU、複雜 query 數百 RU">Request Unit</a> cost</li>
<li>對應 <a href="/blog/backend/09-performance-capacity/capacity-planning/" data-link-title="9.6 容量規劃模型" data-link-desc="peak forecast、headroom budget、growth curve、autoscaling sizing">9.6 容量規劃模型</a>：multi-region write multiplier 進 sizing</li>
<li>對應 <a href="/blog/backend/04-observability/observability-evidence-package/" data-link-title="4.20 Observability Evidence Package" data-link-desc="把 log、metric、trace、audit 與資料品質限制包成可交接證據">4.20 Observability Evidence Package</a>：conflict rate 當 reliability evidence</li>
<li>Alert：conflict rate &gt; 0.1%、conflict feed lag &gt; 5 min、cross-region replication lag &gt; SLA</li>
</ul>
<h3 id="廣告-sla-vs-實測可用性鏈路拆解本章合成-frame">廣告 SLA vs 實測可用性鏈路拆解（本章合成 frame）</h3>
<p>9.C11 Minecraft Earth 平台揭露的 Cosmos DB SLA：</p>
<ul>
<li>single-region 99.99%</li>
<li>multi-region 99.999%</li>
</ul>
<p>這是 <em>DB 端 SLA</em>、不是 <em>端到端系統 SLA</em>。真實 production 系統的可用性是鏈路乘積：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="ln">1</span><span class="cl">實測可用性 = DB SLA × 網路 SLA × 應用層 SLA × 客戶端可達性</span></span></code></pre></div><p><a href="/blog/backend/09-performance-capacity/cases/toyota-connected-mongodb-telematics-iot/" data-link-title="9.C38 Toyota Connected：MongoDB Atlas 撐 900 萬車輛 telematics、月 180 億 transaction" data-link-desc="Toyota Connected 用 MongoDB Atlas 撐 Safety Connect 900 萬車、月 180 億 transaction、緊急訊號 3 秒內到 agent">9.C38 Toyota Connected</a> 揭露「99.99% target vs 99% 實測」段的觀察：兩個 9 的差距 <em>不是</em> MongoDB / Atlas 自身問題、是 end-to-end 鏈路（車輛無線網路 / cellular tower / cloud network / event bus / microservice / DB cluster 任一環節掉都會打掉可用性）。Cosmos DB multi-region write 同模型：</p>
<ul>
<li>多 region active-active 可解 <em>DB 端可用性</em>、但網路 / 應用層任一掉、實測仍 &lt; 99.99%</li>
<li>廣告 99.999% 是 multi-region availability zone 級、<em>不是</em> 「使用者 request 成功率」</li>
</ul>
<p>引用時必須明示：Cosmos DB multi-region 廣告 99.999% 是 DB 端、要算實測可用性必須補網路 / 應用層 SLA 乘積、Toyota case 的「99% 實測」揭露的就是這個鏈路問題、跨 vendor 都適用。</p>
<p>跟 conflict resolution 的關係：多 region 高可用性 <em>買來</em> 的代價是 conflict、conflict rate 是 reliability 的暗稅 — 廣告 SLA 不計 conflict 處理成本。production 設計要把「conflict resolution 的工程成本」加進 multi-region write 的 ROI 評估。</p>
<h2 id="邊界與整合">邊界與整合</h2>
<ul>
<li>Sibling deep articles：<a href="../consistency-levels-engineering/">consistency-levels-engineering</a>（multi-region write 跟 Strong 互斥的 cross-link 來源）、<a href="../partition-key-design/">partition-key-design</a>（hot partition 會放大 conflict）、<a href="../ru-cost-model-sizing/">ru-cost-model-sizing</a>（multi-region cost × region 數）</li>
<li>跟 <a href="/blog/backend/01-database/vendors/spanner/" data-link-title="Google Cloud Spanner" data-link-desc="全球分散式 strong-consistency OLTP、TrueTime API、線性擴展到 10 億 req/sec">Spanner vendor</a> 對比：CP vs AP、無 conflict vs LWW / custom</li>
<li>跟 DynamoDB Global Tables 對比：兩者都 LWW、Cosmos DB 多 custom merge + conflict feed</li>
<li>跟 1.x 章節：<a href="/blog/backend/01-database/global-distributed-oltp/" data-link-title="1.11 全球分散式 OLTP" data-link-desc="Spanner / Aurora DSQL / Cosmos DB multi-region write / CockroachDB / TiDB 的全球一致性取捨">1.11 全球分散式 OLTP</a> 把 multi-region write 模式並陳</li>
<li>Knowledge cards：<a href="/blog/backend/knowledge-cards/stale-read/" data-link-title="Stale Read" data-link-desc="讀取到落後於最新寫入版本的舊資料">stale-read</a> / <a href="/blog/backend/knowledge-cards/rpo/" data-link-title="RPO" data-link-desc="說明恢復點目標如何定義可接受資料損失範圍">rpo</a> / <a href="/blog/backend/knowledge-cards/rto/" data-link-title="RTO" data-link-desc="說明恢復時間目標如何約束事故回復策略">rto</a></li>
<li>Anti-recommendation：single-region write + cross-region read replica 在大多數情況更便宜、更易推理；只有 <em>write residency</em> 是產品契約（合規 / latency / 業務需求）時才升 multi-region write</li>
</ul>
<h2 id="相關連結">相關連結</h2>
<ul>
<li><a href="/blog/backend/01-database/vendors/cosmosdb/" data-link-title="Azure Cosmos DB" data-link-desc="全球分散式 multi-model DB、5 個 consistency levels、Microsoft 自家 dogfood 證據">Cosmos DB vendor overview</a> — 本文是該頁尾 multi-region write + conflict resolution backlog 的深度展開</li>
<li><a href="/blog/backend/09-performance-capacity/cases/minecraft-earth-cosmos-db-global/" data-link-title="9.C11 Minecraft Earth：Azure Cosmos DB 上的全球分散式 AR 遊戲" data-link-desc="Minecraft Earth 用 Cosmos DB 跨地區分散、測試到 100 萬 RU/s 仍維持承諾延遲">9.C11 Minecraft Earth case</a> — multi-region 99.999% / single-region 99.99% SLA 來源</li>
<li><a href="/blog/backend/09-performance-capacity/cases/asos-cosmos-db-black-friday/" data-link-title="9.C21 ASOS：Cosmos DB 在 Black Friday 撐 1.67 億請求" data-link-desc="ASOS 在 2016 Black Friday 用 Azure Cosmos DB 撐 24 小時 1.67 億請求、3500 req/sec、48ms 平均延遲">9.C21 ASOS case</a> — 全球零售 multi-region 補充</li>
<li><a href="/blog/backend/09-performance-capacity/cases/toyota-connected-mongodb-telematics-iot/" data-link-title="9.C38 Toyota Connected：MongoDB Atlas 撐 900 萬車輛 telematics、月 180 億 transaction" data-link-desc="Toyota Connected 用 MongoDB Atlas 撐 Safety Connect 900 萬車、月 180 億 transaction、緊急訊號 3 秒內到 agent">9.C38 Toyota Connected case</a> — 鏈路 SLA 拆解 frame anchor（跨 vendor 適用）</li>
<li><a href="../consistency-levels-engineering/">consistency-levels-engineering</a> — Strong + multi-region 互斥的 cross-link 目的地</li>
<li><a href="/blog/backend/knowledge-cards/stale-read/" data-link-title="Stale Read" data-link-desc="讀取到落後於最新寫入版本的舊資料">Stale Read 卡片</a> / <a href="/blog/backend/knowledge-cards/rpo/" data-link-title="RPO" data-link-desc="說明恢復點目標如何定義可接受資料損失範圍">RPO 卡片</a> / <a href="/blog/backend/knowledge-cards/rto/" data-link-title="RTO" data-link-desc="說明恢復時間目標如何約束事故回復策略">RTO 卡片</a> — 概念基底</li>
<li>官方：<a href="https://learn.microsoft.com/azure/cosmos-db/conflict-resolution-policies">Cosmos DB conflict resolution</a> / <a href="https://learn.microsoft.com/azure/cosmos-db/how-to-multi-master">Multi-region writes</a></li>
</ul>
]]></content:encoded></item><item><title>Aurora Global Database：跨 region async replication、&lt; 1 秒 lag 與合規 anti-recommendation</title><link>https://tarrragon.github.io/blog/backend/01-database/vendors/aurora/global-database-multi-region/</link><pubDate>Wed, 27 May 2026 00:00:00 +0000</pubDate><guid>https://tarrragon.github.io/blog/backend/01-database/vendors/aurora/global-database-multi-region/</guid><description>&lt;p>Aurora Global Database 是 &lt;em>跨 region async replication&lt;/em>、&amp;lt; 1 秒 typical lag、最多 5 個 secondary region — 看起來是 multi-region OLTP 的標準解、但 &lt;a href="https://tarrragon.github.io/blog/backend/09-performance-capacity/cases/standard-chartered-aurora-banking/" data-link-title="9.C14 Standard Chartered：受監管銀行的 Aurora 4000 TPS 容量提升" data-link-desc="Standard Chartered 銀行遷移到 Aurora 後吞吐量提升 10 倍至 4000 TPS、跨 7 個受監管市場">9.C14 Standard Chartered&lt;/a> 揭露一個受監管產業的 anti-recommendation：合規禁止跨境複製場景下、Global Database &lt;em>違反合規&lt;/em>、要改用每市場獨立 cluster + 應用層市場切換。本文展開 Global Database 適用條件、跟 cross-AZ failover 的 RTO 數量級差、合規邊界、跟 Aurora DSQL / Spanner / CockroachDB 的決策樹。&lt;/p>
&lt;p>本文不是 Aurora overview（請看 &lt;a href="https://tarrragon.github.io/blog/backend/01-database/vendors/aurora/" data-link-title="AWS Aurora" data-link-desc="AWS managed PostgreSQL / MySQL、storage / compute 分離、&amp;#43;75% 效能改善的 production 證據">Aurora vendor 頁&lt;/a>）— 而是 Global Database 的實作層教學。前置閱讀建議 &lt;a href="../storage-architecture/">Aurora storage architecture&lt;/a>（理解 storage-level replication）、&lt;a href="../cross-az-failover-rto/">Aurora cross-AZ failover RTO&lt;/a>（對照單 region failover）。&lt;/p>
&lt;h2 id="問題情境">問題情境&lt;/h2>
&lt;p>典型觸發場景：global SaaS / 跨地理金融服務、需要 region-level DR（us-east-1 整 region 失效時 &amp;lt; 5 分鐘恢復寫入）、或跨地理 read（歐洲用戶查美國 primary 延遲 100ms+ 不可接受）、但又不到「multi-region active-active write」需求。&lt;/p>
&lt;p>讀者常見的具體疑問：&lt;/p>
&lt;ul>
&lt;li>「Global Database 是 sync 還是 async？lag 多少？」&lt;/li>
&lt;li>「Secondary region 可以寫嗎？」&lt;/li>
&lt;li>「Region failover 流程跟 cross-AZ 一樣嗎？」&lt;/li>
&lt;li>「跟 Aurora DSQL / Spanner / CockroachDB 怎麼選？」&lt;/li>
&lt;li>「合規場景一定要用 Global Database 嗎？」&lt;/li>
&lt;/ul>
&lt;p>進一步問題：Global Database 對一般 SaaS 是合理的 DR + 跨地理 read 工具、但對 &lt;em>受監管產業&lt;/em> 是反指標。&lt;a href="https://tarrragon.github.io/blog/backend/09-performance-capacity/cases/standard-chartered-aurora-banking/" data-link-title="9.C14 Standard Chartered：受監管銀行的 Aurora 4000 TPS 容量提升" data-link-desc="Standard Chartered 銀行遷移到 Aurora 後吞吐量提升 10 倍至 4000 TPS、跨 7 個受監管市場">9.C14 Standard Chartered&lt;/a> 7 個受監管市場、各自獨立 Aurora cluster、不用 Global Database — 不是技術不夠、是合規要求「資料不能跨境複製」。讀者規劃 multi-region 架構時、合規維度要在技術維度之前判斷。&lt;/p>
&lt;h2 id="核心機制跨-region-async-storage-replication">核心機制：跨 region async storage replication&lt;/h2>
&lt;p>Aurora Global Database 的 first-class concept 是 &lt;em>跨 region storage-level async replication&lt;/em>。跟 logical replication / streaming replication 不同、Global Database 在 storage layer 複製、lag 上限相對穩定。&lt;/p></description><content:encoded><![CDATA[<p>Aurora Global Database 是 <em>跨 region async replication</em>、&lt; 1 秒 typical lag、最多 5 個 secondary region — 看起來是 multi-region OLTP 的標準解、但 <a href="/blog/backend/09-performance-capacity/cases/standard-chartered-aurora-banking/" data-link-title="9.C14 Standard Chartered：受監管銀行的 Aurora 4000 TPS 容量提升" data-link-desc="Standard Chartered 銀行遷移到 Aurora 後吞吐量提升 10 倍至 4000 TPS、跨 7 個受監管市場">9.C14 Standard Chartered</a> 揭露一個受監管產業的 anti-recommendation：合規禁止跨境複製場景下、Global Database <em>違反合規</em>、要改用每市場獨立 cluster + 應用層市場切換。本文展開 Global Database 適用條件、跟 cross-AZ failover 的 RTO 數量級差、合規邊界、跟 Aurora DSQL / Spanner / CockroachDB 的決策樹。</p>
<p>本文不是 Aurora overview（請看 <a href="/blog/backend/01-database/vendors/aurora/" data-link-title="AWS Aurora" data-link-desc="AWS managed PostgreSQL / MySQL、storage / compute 分離、&#43;75% 效能改善的 production 證據">Aurora vendor 頁</a>）— 而是 Global Database 的實作層教學。前置閱讀建議 <a href="../storage-architecture/">Aurora storage architecture</a>（理解 storage-level replication）、<a href="../cross-az-failover-rto/">Aurora cross-AZ failover RTO</a>（對照單 region failover）。</p>
<h2 id="問題情境">問題情境</h2>
<p>典型觸發場景：global SaaS / 跨地理金融服務、需要 region-level DR（us-east-1 整 region 失效時 &lt; 5 分鐘恢復寫入）、或跨地理 read（歐洲用戶查美國 primary 延遲 100ms+ 不可接受）、但又不到「multi-region active-active write」需求。</p>
<p>讀者常見的具體疑問：</p>
<ul>
<li>「Global Database 是 sync 還是 async？lag 多少？」</li>
<li>「Secondary region 可以寫嗎？」</li>
<li>「Region failover 流程跟 cross-AZ 一樣嗎？」</li>
<li>「跟 Aurora DSQL / Spanner / CockroachDB 怎麼選？」</li>
<li>「合規場景一定要用 Global Database 嗎？」</li>
</ul>
<p>進一步問題：Global Database 對一般 SaaS 是合理的 DR + 跨地理 read 工具、但對 <em>受監管產業</em> 是反指標。<a href="/blog/backend/09-performance-capacity/cases/standard-chartered-aurora-banking/" data-link-title="9.C14 Standard Chartered：受監管銀行的 Aurora 4000 TPS 容量提升" data-link-desc="Standard Chartered 銀行遷移到 Aurora 後吞吐量提升 10 倍至 4000 TPS、跨 7 個受監管市場">9.C14 Standard Chartered</a> 7 個受監管市場、各自獨立 Aurora cluster、不用 Global Database — 不是技術不夠、是合規要求「資料不能跨境複製」。讀者規劃 multi-region 架構時、合規維度要在技術維度之前判斷。</p>
<h2 id="核心機制跨-region-async-storage-replication">核心機制：跨 region async storage replication</h2>
<p>Aurora Global Database 的 first-class concept 是 <em>跨 region storage-level async replication</em>。跟 logical replication / streaming replication 不同、Global Database 在 storage layer 複製、lag 上限相對穩定。</p>
<p><strong>Architecture</strong>：</p>
<ul>
<li>Primary region：1 個 writer cluster + N read replica</li>
<li>Secondary region：最多 5 個 secondary region、每 region N 個 reader-only cluster（最多 16 個 reader 含 1 個 headless）</li>
<li>Storage replication：primary region 寫 storage 後 <em>async</em> push 到 secondary region storage、不等 ack</li>
</ul>
<p><strong>Write path</strong>：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="ln">1</span><span class="cl">Application
</span></span><span class="line"><span class="ln">2</span><span class="cl">    ↓ writer endpoint (primary region only)
</span></span><span class="line"><span class="ln">3</span><span class="cl">Primary region compute
</span></span><span class="line"><span class="ln">4</span><span class="cl">    ↓ redo log
</span></span><span class="line"><span class="ln">5</span><span class="cl">Primary region storage (4-of-6 quorum)
</span></span><span class="line"><span class="ln">6</span><span class="cl">    ↓ async replication (typical &lt; 1 秒)
</span></span><span class="line"><span class="ln">7</span><span class="cl">Secondary region storage</span></span></code></pre></div><p><strong>Read path</strong>：</p>
<ul>
<li>Secondary region 直接從 local storage 讀、不需要跨 region 拉</li>
<li>Read latency 是 secondary region local latency、不是跨 region</li>
</ul>
<p><strong>DR 切換 RTO 跟 cross-AZ 對比</strong>：</p>
<table>
  <thead>
      <tr>
          <th>場景</th>
          <th>RTO</th>
          <th>機制</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>Cross-AZ failover</td>
          <td>&lt; 30 秒</td>
          <td>storage 跨 AZ 共享、replica 升 primary 即可</td>
      </tr>
      <tr>
          <td>Planned failover</td>
          <td>&lt; 2 分鐘</td>
          <td>managed graceful failover、無資料丟失</td>
      </tr>
      <tr>
          <td>Unplanned failover</td>
          <td>5-15 分鐘</td>
          <td>整 region 失效、手動 promote secondary</td>
      </tr>
  </tbody>
</table>
<p>數量級不同 — cross-AZ 是 <em>seconds</em>、cross-region planned 是 <em>minutes</em>、unplanned 是 <em>tens of minutes</em>。</p>
<p><strong>對應 knowledge card</strong>：<a href="/blog/backend/knowledge-cards/stale-read/" data-link-title="Stale Read" data-link-desc="讀取到落後於最新寫入版本的舊資料">stale-read</a>、<a href="/blog/backend/knowledge-cards/rpo/" data-link-title="RPO" data-link-desc="說明恢復點目標如何定義可接受資料損失範圍">rpo</a>、<a href="/blog/backend/knowledge-cards/rto/" data-link-title="RTO" data-link-desc="說明恢復時間目標如何約束事故回復策略">rto</a>。</p>
<p><strong>跟通用 cross-region replication 差在哪</strong>：Aurora 在 storage layer 複製、lag 上限更穩定；vs PostgreSQL logical replication lag 受寫速度影響大、heavy write 期間可能秒級到分鐘級。</p>
<h2 id="step-by-step-配置">Step-by-step 配置</h2>
<p><strong>建 global cluster</strong>：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="c1"># Step 1：在 primary region 建 global cluster</span>
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">aws rds create-global-cluster <span class="se">\
</span></span></span><span class="line"><span class="ln"> 3</span><span class="cl"><span class="se"></span>  --global-cluster-identifier myglobal <span class="se">\
</span></span></span><span class="line"><span class="ln"> 4</span><span class="cl"><span class="se"></span>  --source-db-cluster-identifier arn:aws:rds:us-east-1:123:cluster:primary-cluster <span class="se">\
</span></span></span><span class="line"><span class="ln"> 5</span><span class="cl"><span class="se"></span>  --region us-east-1
</span></span><span class="line"><span class="ln"> 6</span><span class="cl">
</span></span><span class="line"><span class="ln"> 7</span><span class="cl"><span class="c1"># Step 2：在 secondary region 加 reader cluster</span>
</span></span><span class="line"><span class="ln"> 8</span><span class="cl">aws rds create-db-cluster <span class="se">\
</span></span></span><span class="line"><span class="ln"> 9</span><span class="cl"><span class="se"></span>  --db-cluster-identifier secondary-cluster <span class="se">\
</span></span></span><span class="line"><span class="ln">10</span><span class="cl"><span class="se"></span>  --global-cluster-identifier myglobal <span class="se">\
</span></span></span><span class="line"><span class="ln">11</span><span class="cl"><span class="se"></span>  --engine aurora-postgresql <span class="se">\
</span></span></span><span class="line"><span class="ln">12</span><span class="cl"><span class="se"></span>  --source-region us-east-1 <span class="se">\
</span></span></span><span class="line"><span class="ln">13</span><span class="cl"><span class="se"></span>  --region eu-west-1
</span></span><span class="line"><span class="ln">14</span><span class="cl">
</span></span><span class="line"><span class="ln">15</span><span class="cl"><span class="c1"># Step 3：在 secondary region 建 db instance</span>
</span></span><span class="line"><span class="ln">16</span><span class="cl">aws rds create-db-instance <span class="se">\
</span></span></span><span class="line"><span class="ln">17</span><span class="cl"><span class="se"></span>  --db-cluster-identifier secondary-cluster <span class="se">\
</span></span></span><span class="line"><span class="ln">18</span><span class="cl"><span class="se"></span>  --db-instance-identifier secondary-reader-01 <span class="se">\
</span></span></span><span class="line"><span class="ln">19</span><span class="cl"><span class="se"></span>  --db-instance-class db.r6g.4xlarge <span class="se">\
</span></span></span><span class="line"><span class="ln">20</span><span class="cl"><span class="se"></span>  --engine aurora-postgresql <span class="se">\
</span></span></span><span class="line"><span class="ln">21</span><span class="cl"><span class="se"></span>  --region eu-west-1</span></span></code></pre></div><p><strong>Application routing</strong>：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"><span class="line"><span class="ln">1</span><span class="cl"><span class="c"># 寫永遠去 primary region writer endpoint</span><span class="w">
</span></span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="w"></span><span class="nt">primary</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="w">  </span><span class="nt">url</span><span class="p">:</span><span class="w"> </span><span class="l">jdbc:postgresql://primary-cluster.cluster-xxx.us-east-1.rds.amazonaws.com/mydb</span><span class="w">
</span></span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="ln">5</span><span class="cl"><span class="w"></span><span class="c"># read 可走 secondary region reader endpoint（靠近用戶的 region）</span><span class="w">
</span></span></span><span class="line"><span class="ln">6</span><span class="cl"><span class="w"></span><span class="nt">secondary-eu</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln">7</span><span class="cl"><span class="w">  </span><span class="nt">url</span><span class="p">:</span><span class="w"> </span><span class="l">jdbc:postgresql://secondary-cluster.cluster-ro-xxx.eu-west-1.rds.amazonaws.com/mydb</span></span></span></code></pre></div><p><strong>DR 切換（planned failover）</strong>：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">aws rds failover-global-cluster <span class="se">\
</span></span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="se"></span>  --global-cluster-identifier myglobal <span class="se">\
</span></span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="se"></span>  --target-db-cluster-identifier arn:aws:rds:eu-west-1:123:cluster:secondary-cluster</span></span></code></pre></div><p>切換後 application 端要 <em>reconfigure connection string</em> — DNS 不自動切跨 region（vs cross-AZ failover writer endpoint 自動跟）。</p>
<p><strong>Application reconfiguration 模式</strong>：</p>
<ul>
<li>Connection string 用 service discovery（Consul / Route53 health check）動態解析</li>
<li>或在 application config 加入 region-aware logic、failover 後切換 active region</li>
<li>不能假設 application 自動 reconnect 到新 primary region</li>
</ul>
<p><strong>驗證點</strong>：</p>
<ul>
<li><code>AuroraGlobalDBReplicationLag</code> &lt; 1 秒</li>
<li>Planned failover RTO 量測（手動 trigger + heartbeat timestamp diff）</li>
<li>Application 跨 region read 路徑 latency 符合預期</li>
</ul>
<p><strong>Rollback boundary</strong>：promote secondary 後原 primary 變 secondary、不會自動 fallback；rollback 要再做一次 failover。</p>
<h2 id="故障模式--邊界-case">故障模式 / 邊界 case</h2>
<h3 id="case-1期待-multi-region-active-active-write">Case 1：期待 multi-region active-active write</h3>
<p>徵兆：team 在 secondary region application 直連 secondary cluster 寫資料、收到 <code>cannot execute INSERT in a read-only transaction</code> 錯誤。</p>
<p>原因：Global Database secondary 是 <em>reader-only</em>、寫只能去 primary region。要 active-active write 必須改用其他服務（Aurora DSQL / Spanner / CockroachDB）。</p>
<p>修：</p>
<ul>
<li>Application 設計時明確區分 read region vs write region</li>
<li>寫操作永遠路由到 primary region、容忍跨 region write latency</li>
<li>真的需要 active-active write 才考慮 Aurora DSQL（2024-12 preview / 2025-05 GA）</li>
</ul>
<h3 id="case-2dns-不跨-region-自動切">Case 2：DNS 不跨 region 自動切</h3>
<p>徵兆：手動 failover trigger 後、application 端 connection string 仍指向舊 primary region、寫操作全失敗。</p>
<p>原因：cross-AZ failover writer endpoint DNS 自動跟、cross-region 不會 — Global Database 切換要 application 端管 region-specific connection string。</p>
<p>修：</p>
<ul>
<li>Application 用 service discovery（Route53 / Consul / etcd）解析 active primary region</li>
<li>部署 region-aware DNS（Route53 latency-based routing + health check）</li>
<li>Failover 演練要包含 application reconfiguration step、不只是 DB layer</li>
</ul>
<h3 id="case-3跨-region-read-假設-strong-consistency">Case 3：跨 region read 假設 strong consistency</h3>
<p>徵兆：用戶在 primary region 寫資料、隨即在 secondary region read、看到舊資料、客訴 inconsistency。</p>
<p>原因：Global Database 是 async replication、&lt; 1 秒 lag 不是 zero、read-after-write 場景仍會看到 stale data。</p>
<p>修：</p>
<ul>
<li>用戶寫操作後短期內 read 走 primary region（read-after-write window）</li>
<li>接受最終一致性、application 端做 versioning / timestamp 比對</li>
<li>強一致性需求改 Aurora DSQL / Spanner</li>
</ul>
<h3 id="case-4lag-spike-during-bulk-operation">Case 4：Lag spike during bulk operation</h3>
<p>徵兆：DDL 或 bulk insert 期間 cross-region lag 從 &lt; 1 秒跳到秒級到分鐘級、secondary region read 大量 stale。</p>
<p>原因：Global Database 「&lt; 1 秒」是 typical、heavy write 期間 lag 拉大。Storage-level replication 比 logical 穩定、但 <em>不是 zero variance</em>。</p>
<p>修：</p>
<ul>
<li>DDL 跟 bulk insert 在低峰期跑、避開跨 region read traffic</li>
<li>監測 <code>AuroraGlobalDBReplicationLag</code>、spike 超過閾值 trigger application 端 fallback（read 切回 primary region）</li>
<li>重要 DDL 用 <a href="https://github.com/reorg/pg_repack">pg_repack</a> 避免長時間 lag</li>
</ul>
<h3 id="case-5合規邊界誤用-global-database--standard-chartered-anti-pattern">Case 5：合規邊界誤用 Global Database — Standard Chartered anti-pattern</h3>
<p>徵兆：team 以為 Global Database 是受監管金融的標準 DR 解、配置完才發現監管機構不接受跨境資料複製、被迫拆掉 Global Database 重建獨立 cluster。</p>
<p><a href="/blog/backend/09-performance-capacity/cases/standard-chartered-aurora-banking/" data-link-title="9.C14 Standard Chartered：受監管銀行的 Aurora 4000 TPS 容量提升" data-link-desc="Standard Chartered 銀行遷移到 Aurora 後吞吐量提升 10 倍至 4000 TPS、跨 7 個受監管市場">9.C14 Standard Chartered case</a> 「判讀」段第 1 點原文：「7 個受監管市場代表 7 個獨立 cluster（資料不能跨境）、容量規劃變成『7 個獨立規劃 × 各自合規門檻』」。</p>
<p>原因：受監管市場資料 <em>不能跨境複製</em>（<a href="/blog/backend/knowledge-cards/data-residency/" data-link-title="Data Residency" data-link-desc="合規要求資料留在特定地理邊界內、跨境複製違反合規、推動 fleet 拓樸決策">Data Residency</a> 硬約束）、Global Database 本質上就是跨 region storage replication、配置了就違反合規。Standard Chartered 的選擇是 <em>每市場獨立 cluster</em>、跨市場 DR 走應用層市場切換、不靠 Global Database。</p>
<p>修：</p>
<ul>
<li>規劃 multi-region 前先確認合規要求（資料駐留、跨境複製禁令、稽核要求）</li>
<li>合規禁止跨境複製場景：每市場獨立 cluster + cross-AZ failover 吸收 RTO（見 <a href="../cross-az-failover-rto/">cross-az-failover-rto</a>）</li>
<li>跨市場 DR 設計成 <em>市場切換</em>（用戶從 A 市場切到 B 市場）、不是 <em>資料切換</em></li>
<li>Fleet 拓樸（多市場 → 多 cluster）詳見 <a href="../read-replica-scaling/">Aurora read replica scaling</a> fleet 治理 SSoT</li>
</ul>
<p><strong>scope warning（必明示）</strong>：Standard Chartered case 未公開是 PostgreSQL 還是 MySQL、未公開具體 cost 數字、屬「相關 case study」匿名對照。引用時不能擴寫具體 engine。</p>
<h3 id="case-6cost-trap--cross-region-data-transfer">Case 6：Cost trap — cross-region data transfer</h3>
<p>徵兆：開了 Global Database 後月帳變高 50%、發現 cross-region data transfer 是主要費用、不是 instance。</p>
<p>原因：Aurora 跨 region replication 走 AWS 內部網路、但 <em>cross-region data transfer 仍計費</em>。Heavy write workload 月費可能 doubled。</p>
<p>修：</p>
<ul>
<li>用 <code>AuroraGlobalDBReplicatedWriteIO</code> × per-region transfer rate 估月費</li>
<li>Write-heavy workload 評估 Global Database ROI（保險、低費用版本是用 cross-region snapshot 做冷備）</li>
<li>Cost 跟 RTO 一起看 — 如果接受 hours RTO、cross-region snapshot 更便宜</li>
</ul>
<h3 id="case-7fanduel-雙峰-case-對照避免-over-extrapolate">Case 7：FanDuel 雙峰 case 對照（避免 over-extrapolate）</h3>
<p>如果 team 引用 <a href="/blog/backend/09-performance-capacity/cases/fanduel-dual-peak-betting-streaming/" data-link-title="9.C28 FanDuel：體育直播 &#43; 投注的雙重峰值" data-link-desc="FanDuel 3.5M MAU、Super Bowl 期間擴容 5-10 倍、用 AWS Local Zones &#43; Wavelength &#43; Outposts 處理 20&#43; 州的雙重峰值">9.C28 FanDuel</a> 規劃 multi-region 部署、要明示 scope warning。</p>
<p><strong>case「判讀」段第 1 點原文</strong>：「直播跟投注是兩種完全不同 SLO：直播容忍秒級延遲（用 CDN + ABR 串流）、投注必須毫秒級成交。兩個服務必須各自獨立擴容、各自獨立 SLO」。</p>
<p><strong>scope warning（必明示）</strong>：</p>
<ul>
<li>FanDuel 5-10x 是 <em>betting 服務的 Aurora 擴容倍數</em>、不是 streaming（streaming 走 CDN、不走 Aurora）</li>
<li>不能壓成「Aurora 撐 5-10x」單一數字</li>
<li>案例自承：betting transaction TPS 跟 concurrent streams 未公開、不能 over-extrapolate</li>
</ul>
<p>引用 FanDuel 規劃自家 multi-region betting workload 時、看 <em>策略</em>（事件型分級 + 雙 SLO 拆分 + 多層 edge）、不套用 <em>具體數字</em>。</p>
<h2 id="跟-aurora-dsql--spanner--cockroachdb-的決策樹">跟 Aurora DSQL / Spanner / CockroachDB 的決策樹</h2>
<p>Global Database 是 <em>async + reader-only secondary</em>、不是 multi-region active-active。當 active-active write 是核心需求時、要看 distributed SQL 方案。</p>
<table>
  <thead>
      <tr>
          <th>維度</th>
          <th>Aurora Global Database</th>
          <th>Aurora DSQL</th>
          <th>Spanner</th>
          <th>CockroachDB</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>Replication</td>
          <td>Async storage-level</td>
          <td>Sync distributed</td>
          <td>Sync TrueTime</td>
          <td>Sync Raft consensus</td>
      </tr>
      <tr>
          <td>Secondary</td>
          <td>Reader-only</td>
          <td>Active-active</td>
          <td>Active-active</td>
          <td>Active-active</td>
      </tr>
      <tr>
          <td>Lag</td>
          <td>&lt; 1 秒 typical</td>
          <td>None (sync)</td>
          <td>None (sync)</td>
          <td>None (sync)</td>
      </tr>
      <tr>
          <td>Write</td>
          <td>Primary region only</td>
          <td>Multi-region</td>
          <td>Multi-region</td>
          <td>Multi-region</td>
      </tr>
      <tr>
          <td>Strong consistency cross-region</td>
          <td>No</td>
          <td>Yes</td>
          <td>Yes</td>
          <td>Yes</td>
      </tr>
      <tr>
          <td>適用</td>
          <td>DR + 跨地理 read</td>
          <td>Multi-region OLTP</td>
          <td>Global scale OLTP</td>
          <td>Cross-cloud OLTP</td>
      </tr>
      <tr>
          <td>邊界</td>
          <td>active-active 不支援、合規 反指標</td>
          <td>AWS-only、新服務</td>
          <td>GCP-only、學習曲線</td>
          <td>跨雲、operational 複雜</td>
      </tr>
  </tbody>
</table>
<p><strong>何時選 Global Database</strong>：</p>
<ul>
<li>DR + 跨地理 read 是主要需求</li>
<li>寫流量集中在一個 region（單 region write 撐得住）</li>
<li>合規允許跨境複製（一般 SaaS、非受監管）</li>
<li>從 single-region Aurora 升級、不想換 engine</li>
</ul>
<p><strong>何時改 Aurora DSQL / Spanner / CockroachDB</strong>：</p>
<ul>
<li>Multi-region active-active write</li>
<li>跨 region strong consistency 是業務需求</li>
<li>跨雲 / on-prem 需求（CockroachDB）</li>
</ul>
<p><strong>何時不用 Global Database</strong>：</p>
<ul>
<li>合規禁止跨境複製（Standard Chartered case）→ 每市場獨立 cluster</li>
<li>Single-region 已滿足 DR / read 需求</li>
<li>跨 region cost 不划算（write-heavy workload）</li>
</ul>
<h2 id="容量與觀測">容量與觀測</h2>
<p><strong>核心 metric</strong>：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="ln">1</span><span class="cl">AuroraGlobalDBReplicationLag       # secondary lag、&lt; 1 秒 typical
</span></span><span class="line"><span class="ln">2</span><span class="cl">AuroraGlobalDBReplicatedWriteIO    # cross-region data transfer 量
</span></span><span class="line"><span class="ln">3</span><span class="cl">AuroraGlobalDBProgressLag          # storage replication progress</span></span></code></pre></div><p><strong>容量上限</strong>：</p>
<ul>
<li>1 primary region + 5 secondary region</li>
<li>每 secondary region 16 個 reader 含 1 個 headless（可升 writer）</li>
</ul>
<p><strong>Cost signal</strong>：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="ln">1</span><span class="cl">月費 ≈ AuroraGlobalDBReplicatedWriteIO × per-region transfer rate
</span></span><span class="line"><span class="ln">2</span><span class="cl">     + secondary region instance + storage
</span></span><span class="line"><span class="ln">3</span><span class="cl">     + cross-region snapshot (optional)</span></span></code></pre></div><p>Write 量大的 workload 月費可能 doubled（primary region + secondary region 都計費）、要在規劃時估準。</p>
<p><strong>驗證 DR</strong>：</p>
<ul>
<li>Planned failover drill 每季一次、量測 RTO / RPO</li>
<li>受監管產業：每月一次、有合規 sign-off 記錄</li>
<li>重大版本升級前必跑一次</li>
</ul>
<p><strong>回路徑</strong>：<a href="/blog/backend/09-performance-capacity/capacity-planning/" data-link-title="9.6 容量規劃模型" data-link-desc="peak forecast、headroom budget、growth curve、autoscaling sizing">9.6 容量規劃模型</a> cross-region cost、<a href="/blog/backend/08-incident-response/" data-link-title="模組八：事故處理與復盤" data-link-desc="用 IR 領域詞彙建問題節點、以服務級案例庫累積事故脈絡，先建概念與案例庫再進實作交接">8.x DR playbook</a> region-level failover decision。</p>
<h2 id="邊界與整合--下一步">邊界與整合 / 下一步</h2>
<p><strong>Sibling deep articles</strong>：</p>
<ul>
<li><a href="../storage-architecture/">Aurora storage architecture</a> — cross-region replication 是 storage-level 延伸</li>
<li><a href="../cross-az-failover-rto/">Aurora cross-AZ failover RTO</a> — cross-AZ 跟 cross-region failover RTO 數量級對比</li>
<li><a href="../read-replica-scaling/">Aurora read replica scaling</a> — fleet 治理 SSoT、合規驅動 fleet 拓樸的展開</li>
</ul>
<p><strong>Migration playbook</strong>：</p>
<ul>
<li><a href="../migrate-from-self-managed-pg-mysql/">PostgreSQL / MySQL → Aurora</a> — 從 PostgreSQL streaming replication 跨 region 升級的差異</li>
</ul>
<p><strong>1.x 章節互引</strong>：</p>
<ul>
<li><a href="/blog/backend/01-database/global-distributed-oltp/" data-link-title="1.11 全球分散式 OLTP" data-link-desc="Spanner / Aurora DSQL / Cosmos DB multi-region write / CockroachDB / TiDB 的全球一致性取捨">1.11 全球分散式 OLTP</a> — Global Database vs distributed SQL 對比</li>
</ul>
<p><strong>何時不用本文</strong>：single-region OLTP、無跨 region DR / read 需求時可跳過、看 <a href="/blog/backend/01-database/vendors/aurora/" data-link-title="AWS Aurora" data-link-desc="AWS managed PostgreSQL / MySQL、storage / compute 分離、&#43;75% 效能改善的 production 證據">Aurora vendor overview</a> 即可。</p>
<h2 id="相關連結">相關連結</h2>
<ul>
<li><a href="/blog/backend/01-database/vendors/aurora/" data-link-title="AWS Aurora" data-link-desc="AWS managed PostgreSQL / MySQL、storage / compute 分離、&#43;75% 效能改善的 production 證據">Aurora vendor overview</a> — 服務定位、適用 / 不適用場景</li>
<li><a href="/blog/backend/knowledge-cards/stale-read/" data-link-title="Stale Read" data-link-desc="讀取到落後於最新寫入版本的舊資料">Stale Read 卡片</a> — read-after-write 容忍度</li>
<li><a href="/blog/backend/knowledge-cards/rpo/" data-link-title="RPO" data-link-desc="說明恢復點目標如何定義可接受資料損失範圍">RPO 卡片</a> — DR RPO 判讀</li>
<li><a href="/blog/backend/09-performance-capacity/cases/standard-chartered-aurora-banking/" data-link-title="9.C14 Standard Chartered：受監管銀行的 Aurora 4000 TPS 容量提升" data-link-desc="Standard Chartered 銀行遷移到 Aurora 後吞吐量提升 10 倍至 4000 TPS、跨 7 個受監管市場">9.C14 Standard Chartered</a> — 合規驅動的 Global Database anti-pattern</li>
<li><a href="/blog/backend/09-performance-capacity/cases/fanduel-dual-peak-betting-streaming/" data-link-title="9.C28 FanDuel：體育直播 &#43; 投注的雙重峰值" data-link-desc="FanDuel 3.5M MAU、Super Bowl 期間擴容 5-10 倍、用 AWS Local Zones &#43; Wavelength &#43; Outposts 處理 20&#43; 州的雙重峰值">9.C28 FanDuel</a> — 雙 SLO 並行的 multi-region 策略對照</li>
<li><a href="/blog/posts/vendor-%E6%B7%B1%E5%BA%A6%E6%8A%80%E8%A1%93%E6%96%87%E7%AB%A0%E6%96%B9%E6%B3%95%E8%AB%96%E7%9A%84%E6%BC%94%E5%8C%96%E7%B4%80%E9%8C%84%E5%90%8C-vendor-%E7%B3%BB%E5%88%97%E7%9A%84%E9%96%8B%E5%A0%B4%E8%BC%AA%E6%9B%BF%E9%A9%97%E8%AD%89/" data-link-title="Vendor 深度技術文章方法論的演化紀錄：同 vendor 系列的開場輪替驗證" data-link-desc="vendor overview 飽和後要寫單一功能深度文章、需要選題與結構依據時回來。這套方法論的驗證來源與 cadence variant 在高風險場景（同 vendor sub-tool 系列）的實證。">Vendor 深度技術文章方法論</a> — 本文遵循的 6 規格面寫作模板</li>
<li>官方：<a href="https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/aurora-global-database.html">Aurora Global Database</a></li>
</ul>
]]></content:encoded></item><item><title>CockroachDB Locality-Aware Schema：跨州合規 + 邏輯一個 cluster 的 region placement 策略</title><link>https://tarrragon.github.io/blog/backend/01-database/vendors/cockroachdb/locality-aware-schema/</link><pubDate>Wed, 27 May 2026 00:00:00 +0000</pubDate><guid>https://tarrragon.github.io/blog/backend/01-database/vendors/cockroachdb/locality-aware-schema/</guid><description>&lt;blockquote>
&lt;p>本文是 &lt;a href="https://tarrragon.github.io/blog/backend/01-database/vendors/cockroachdb/" data-link-title="CockroachDB" data-link-desc="分散式 SQL、PostgreSQL 相容、跨區強一致、Spanner 的開源 / 跨雲替代">CockroachDB vendor overview&lt;/a> 的 implementation-layer deep article。Overview 已界定 CockroachDB 的 multi-region 能力、本文聚焦 &lt;em>locality 配置怎麼解合規地理邊界 + 跨 boundary 業務邏輯需求&lt;/em> — 用 Hard Rock Digital 跨 8 州單一邏輯 cluster 作為 concrete framing。Replica placement 機制屬前置、見 &lt;a href="../hlc-raft-consensus/">HLC + Raft consensus&lt;/a>、survival goal 互動見 &lt;a href="../survival-goals/">survival goals&lt;/a>。&lt;/p>&lt;/blockquote>
&lt;hr>
&lt;h2 id="問題情境hard-rock-的跨州-sportsbook-拓樸創新">問題情境：Hard Rock 的跨州 sportsbook 拓樸創新&lt;/h2>
&lt;p>美國 sportsbook 受 &lt;em>Wire Act&lt;/em> 規範、betting data 必須在下注州內處理 → 每個營運州都要有州內運算資源。傳統路徑是「每州一個獨立 silo、each silo 一個獨立 DB cluster」、合規上沒問題、但撞牆於三個業務需求：&lt;/p>
&lt;ul>
&lt;li>&lt;em>跨州統一帳戶&lt;/em>：玩家在 NJ 跟 FL 兩州都有帳戶、登入要看到統一 portfolio&lt;/li>
&lt;li>&lt;em>跨州 reporting&lt;/em>：總公司 BI / 財務 reporting 要橫跨所有州、不能 query N 個 cluster 後再合&lt;/li>
&lt;li>&lt;em>跨州欺詐偵測&lt;/em>：同一張身分證在不同州 IP 同時下注 → 風控引擎要看 &lt;em>cross-state aggregated&lt;/em> 資料&lt;/li>
&lt;/ul>
&lt;p>&lt;a href="https://tarrragon.github.io/blog/backend/09-performance-capacity/cases/hard-rock-digital-cockroachdb-sports-betting/" data-link-title="9.C41 Hard Rock Digital：CockroachDB on AWS Outposts、Wire Act 合規 &amp;#43; 跨州單一邏輯 DB" data-link-desc="Hard Rock Digital 用 CockroachDB 跨 AWS Outposts &amp;#43; US-East-1、Wire Act 強制資料留州、單一邏輯 DB 解多州 sportsbook、100 node 32 vCPU 撐 Super Bowl">9.C41 Hard Rock Digital&lt;/a> 跨 8 州（AZ / IN / TN / FL / OH / IL / NJ / VA）用 AWS Outposts 把運算放進州內、但邏輯上仍是 &lt;em>一個&lt;/em> CockroachDB cluster — region placement 配置決定哪些 range 釘在哪個 Outpost / AWS region。case 觀察段直接揭露「跨所有 region 一個 logical database」這個拓樸 fact。&lt;/p>
&lt;p>讀者常問：&lt;/p>
&lt;ul>
&lt;li>合規逼我每州一 cluster、但跨州帳戶 / 風控 / 欺詐偵測撞牆怎麼辦？&lt;/li>
&lt;li>&lt;code>REGIONAL BY ROW&lt;/code> 跟 &lt;code>REGIONAL BY TABLE&lt;/code> 怎麼選、&lt;code>GLOBAL&lt;/code> 又在什麼場景？&lt;/li>
&lt;li>&lt;code>GLOBAL&lt;/code> table 為什麼讀快但寫慢、預設為什麼不全部用？&lt;/li>
&lt;li>AWS Outposts 是 latency 工具還是合規工具？&lt;/li>
&lt;/ul>
&lt;p>對照 &lt;a href="https://tarrragon.github.io/blog/backend/09-performance-capacity/cases/netflix-cockroachdb-multi-region-fleet/" data-link-title="9.C40 Netflix：380&amp;#43; CockroachDB cluster 的 multi-active 拓樸艦隊" data-link-desc="Netflix 把 Cassandra 不夠用的 transactional workload 移到 CockroachDB、380&amp;#43; cluster / 60&amp;#43; 跨 region、含 Open Connect、studio cloud drive、gaming control plane">9.C40 Netflix&lt;/a>：60+ multi-region cluster、最大 Gaming cluster 48-node 跨 4 region、locality 配置直接影響 cluster 規模治理。&lt;/p></description><content:encoded><![CDATA[<blockquote>
<p>本文是 <a href="/blog/backend/01-database/vendors/cockroachdb/" data-link-title="CockroachDB" data-link-desc="分散式 SQL、PostgreSQL 相容、跨區強一致、Spanner 的開源 / 跨雲替代">CockroachDB vendor overview</a> 的 implementation-layer deep article。Overview 已界定 CockroachDB 的 multi-region 能力、本文聚焦 <em>locality 配置怎麼解合規地理邊界 + 跨 boundary 業務邏輯需求</em> — 用 Hard Rock Digital 跨 8 州單一邏輯 cluster 作為 concrete framing。Replica placement 機制屬前置、見 <a href="../hlc-raft-consensus/">HLC + Raft consensus</a>、survival goal 互動見 <a href="../survival-goals/">survival goals</a>。</p></blockquote>
<hr>
<h2 id="問題情境hard-rock-的跨州-sportsbook-拓樸創新">問題情境：Hard Rock 的跨州 sportsbook 拓樸創新</h2>
<p>美國 sportsbook 受 <em>Wire Act</em> 規範、betting data 必須在下注州內處理 → 每個營運州都要有州內運算資源。傳統路徑是「每州一個獨立 silo、each silo 一個獨立 DB cluster」、合規上沒問題、但撞牆於三個業務需求：</p>
<ul>
<li><em>跨州統一帳戶</em>：玩家在 NJ 跟 FL 兩州都有帳戶、登入要看到統一 portfolio</li>
<li><em>跨州 reporting</em>：總公司 BI / 財務 reporting 要橫跨所有州、不能 query N 個 cluster 後再合</li>
<li><em>跨州欺詐偵測</em>：同一張身分證在不同州 IP 同時下注 → 風控引擎要看 <em>cross-state aggregated</em> 資料</li>
</ul>
<p><a href="/blog/backend/09-performance-capacity/cases/hard-rock-digital-cockroachdb-sports-betting/" data-link-title="9.C41 Hard Rock Digital：CockroachDB on AWS Outposts、Wire Act 合規 &#43; 跨州單一邏輯 DB" data-link-desc="Hard Rock Digital 用 CockroachDB 跨 AWS Outposts &#43; US-East-1、Wire Act 強制資料留州、單一邏輯 DB 解多州 sportsbook、100 node 32 vCPU 撐 Super Bowl">9.C41 Hard Rock Digital</a> 跨 8 州（AZ / IN / TN / FL / OH / IL / NJ / VA）用 AWS Outposts 把運算放進州內、但邏輯上仍是 <em>一個</em> CockroachDB cluster — region placement 配置決定哪些 range 釘在哪個 Outpost / AWS region。case 觀察段直接揭露「跨所有 region 一個 logical database」這個拓樸 fact。</p>
<p>讀者常問：</p>
<ul>
<li>合規逼我每州一 cluster、但跨州帳戶 / 風控 / 欺詐偵測撞牆怎麼辦？</li>
<li><code>REGIONAL BY ROW</code> 跟 <code>REGIONAL BY TABLE</code> 怎麼選、<code>GLOBAL</code> 又在什麼場景？</li>
<li><code>GLOBAL</code> table 為什麼讀快但寫慢、預設為什麼不全部用？</li>
<li>AWS Outposts 是 latency 工具還是合規工具？</li>
</ul>
<p>對照 <a href="/blog/backend/09-performance-capacity/cases/netflix-cockroachdb-multi-region-fleet/" data-link-title="9.C40 Netflix：380&#43; CockroachDB cluster 的 multi-active 拓樸艦隊" data-link-desc="Netflix 把 Cassandra 不夠用的 transactional workload 移到 CockroachDB、380&#43; cluster / 60&#43; 跨 region、含 Open Connect、studio cloud drive、gaming control plane">9.C40 Netflix</a>：60+ multi-region cluster、最大 Gaming cluster 48-node 跨 4 region、locality 配置直接影響 cluster 規模治理。</p>
<p>對照 <a href="/blog/backend/09-performance-capacity/cases/standard-chartered-aurora-banking/" data-link-title="9.C14 Standard Chartered：受監管銀行的 Aurora 4000 TPS 容量提升" data-link-desc="Standard Chartered 銀行遷移到 Aurora 後吞吐量提升 10 倍至 4000 TPS、跨 7 個受監管市場">9.C14 Standard Chartered</a> Aurora 7 cluster fleet：銀行業跨國合規邊界、走的是「每市場獨立 Aurora cluster」路徑 — 跟 Hard Rock 邏輯一個 cluster 的拓樸完全不同。兩條路徑沒有對錯、trigger 條件不同（合規顆粒 × 跨 boundary 業務邏輯需求）。</p>
<h2 id="核心機制三種-table-locality--row-level-region-標記">核心機制：三種 table locality + row-level region 標記</h2>
<h3 id="三種-locality-模式">三種 locality 模式</h3>
<p>CockroachDB 用 <a href="/blog/backend/knowledge-cards/range-sharding/" data-link-title="Range Sharding" data-link-desc="分散式 SQL 把 key space 切成可自動 split / merge 的 range、每個 range 自己的 consensus group、application 透明">Range Sharding</a> 把 multi-region table 抽象成三種 locality、配合 <a href="/blog/backend/knowledge-cards/data-residency/" data-link-title="Data Residency" data-link-desc="合規要求資料留在特定地理邊界內、跨境複製違反合規、推動 fleet 拓樸決策">Data Residency</a> 合規邊界決定 row 落在哪個 region：</p>
<table>
  <thead>
      <tr>
          <th>Locality</th>
          <th>Read 行為</th>
          <th>Write 行為</th>
          <th>適用場景</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td><code>REGIONAL BY TABLE</code></td>
          <td>本 region 快、其他 region 走 follower read</td>
          <td>本 region 快、其他 region 慢</td>
          <td>整 table 服務單一 region（如：us-orders）</td>
      </tr>
      <tr>
          <td><code>REGIONAL BY ROW</code></td>
          <td>該 row 所在 region 快、其他 follower</td>
          <td>該 row 所在 region 快、其他慢</td>
          <td>用戶資料跟地理綁定（玩家 / 訂單 / 帳戶）</td>
      </tr>
      <tr>
          <td><code>GLOBAL</code></td>
          <td>每 region local（快）</td>
          <td>跨 region quorum（慢）</td>
          <td>reference data（國碼、貨幣、規則表）</td>
      </tr>
  </tbody>
</table>
<h3 id="regional-by-row每-row-帶-crdb_region-隱含欄位">REGIONAL BY ROW：每 row 帶 <code>crdb_region</code> 隱含欄位</h3>
<p><code>REGIONAL BY ROW</code> 是 Hard Rock 場景的主要選擇。每 row 自動帶一個 <code>crdb_region</code> 隱含欄位、根據這個欄位把 row 對應的 range 釘在指定 region：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-sql" data-lang="sql"><span class="line"><span class="ln">1</span><span class="cl"><span class="k">ALTER</span><span class="w"> </span><span class="k">DATABASE</span><span class="w"> </span><span class="n">sportsbook</span><span class="w"> </span><span class="k">PRIMARY</span><span class="w"> </span><span class="n">REGION</span><span class="w"> </span><span class="s2">&#34;us-east1-az&#34;</span><span class="p">;</span><span class="w">
</span></span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="w"></span><span class="k">ALTER</span><span class="w"> </span><span class="k">DATABASE</span><span class="w"> </span><span class="n">sportsbook</span><span class="w"> </span><span class="k">ADD</span><span class="w"> </span><span class="n">REGION</span><span class="w"> </span><span class="s2">&#34;us-east1-nj&#34;</span><span class="p">;</span><span class="w">
</span></span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="w"></span><span class="k">ALTER</span><span class="w"> </span><span class="k">DATABASE</span><span class="w"> </span><span class="n">sportsbook</span><span class="w"> </span><span class="k">ADD</span><span class="w"> </span><span class="n">REGION</span><span class="w"> </span><span class="s2">&#34;us-east1-fl&#34;</span><span class="p">;</span><span class="w">
</span></span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="ln">5</span><span class="cl"><span class="w"></span><span class="k">ALTER</span><span class="w"> </span><span class="k">TABLE</span><span class="w"> </span><span class="n">bets</span><span class="w"> </span><span class="k">SET</span><span class="w"> </span><span class="n">LOCALITY</span><span class="w"> </span><span class="n">REGIONAL</span><span class="w"> </span><span class="k">BY</span><span class="w"> </span><span class="k">ROW</span><span class="p">;</span><span class="w">
</span></span></span><span class="line"><span class="ln">6</span><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="ln">7</span><span class="cl"><span class="w"></span><span class="c1">-- 寫入時指定 row 屬哪個 region
</span></span></span><span class="line"><span class="ln">8</span><span class="cl"><span class="c1"></span><span class="k">INSERT</span><span class="w"> </span><span class="k">INTO</span><span class="w"> </span><span class="n">bets</span><span class="w"> </span><span class="p">(</span><span class="n">id</span><span class="p">,</span><span class="w"> </span><span class="n">user_id</span><span class="p">,</span><span class="w"> </span><span class="n">amount</span><span class="p">,</span><span class="w"> </span><span class="n">crdb_region</span><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="ln">9</span><span class="cl"><span class="w"></span><span class="k">VALUES</span><span class="w"> </span><span class="p">(...,</span><span class="w"> </span><span class="p">...,</span><span class="w"> </span><span class="p">...,</span><span class="w"> </span><span class="s1">&#39;us-east1-nj&#39;</span><span class="p">);</span></span></span></code></pre></div><p>CockroachDB planner 自動感知 <code>crdb_region</code>、把 read / write 路由到 row 所在 region 的 leaseholder。application 不用手動配 shard key、不用 application 端路由邏輯 — 這是 distributed SQL 的「宣告式 locality」優勢。</p>
<h3 id="global每-region-local-read跨-region-sync-write">GLOBAL：每 region local read、跨 region sync write</h3>
<p><code>GLOBAL</code> table 適合 <em>reference data</em> — 變更少、read 頻繁、需要全球 local read latency：</p>
<ul>
<li>read：每 region 都有 leaseholder、本地 read p99 跟 single-region 一樣</li>
<li>write：跨 region quorum、p99 100ms+</li>
</ul>
<p>實務上 <code>GLOBAL</code> 只放國家代碼、貨幣表、規則 lookup 等 <em>變更頻率低</em> 的 reference data。把 high-write workload 設成 <code>GLOBAL</code> 是典型錯配（見失敗模式段）。</p>
<h3 id="follower-readnon-voting-replica-提供本地-read">Follower read：non-voting replica 提供本地 read</h3>
<p>CockroachDB 區分 voting 跟 non-voting replica：</p>
<ul>
<li>voting replica 參與 Raft majority、決定 commit</li>
<li>non-voting replica 不參與 commit、只 serve <a href="/blog/backend/knowledge-cards/follower-read/" data-link-title="Follower Read" data-link-desc="分散式 SQL 從 non-voting replica 讀 closed timestamp 之前的資料、不參與 Raft commit、低 latency 但 read-after-write 場景仍可能 stale">Follower Read</a></li>
</ul>
<p><code>REGIONAL BY ROW</code> + <code>SURVIVE REGION FAILURE</code> 配合時：row 所在 region 是 voting + <a href="/blog/backend/knowledge-cards/leaseholder/" data-link-title="Leaseholder" data-link-desc="分散式 SQL 每個 range 在任一時間點的 read / write entry point、通常等於 Raft leader、承擔該 range 的 coordination">Leaseholder</a>、其他 region 有 voting replica（survival 需要）+ non-voting replica（本地 follower read）。</p>
<p>Follower read 讀到的是 <em>closed timestamp</em> 之前的資料 — strong consistency 場景不能用（read-after-write 會 stale）、但 dashboard / reporting / 風控分析等 <em>容忍 stale</em> 場景大幅降低 cross-region latency。</p>
<h3 id="配置語法跟驗證">配置語法跟驗證</h3>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-sql" data-lang="sql"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="c1">-- 設 database 的 region
</span></span></span><span class="line"><span class="ln"> 2</span><span class="cl"><span class="c1"></span><span class="k">ALTER</span><span class="w"> </span><span class="k">DATABASE</span><span class="w"> </span><span class="n">mydb</span><span class="w"> </span><span class="k">PRIMARY</span><span class="w"> </span><span class="n">REGION</span><span class="w"> </span><span class="s2">&#34;us-east1&#34;</span><span class="p">;</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 3</span><span class="cl"><span class="w"></span><span class="k">ALTER</span><span class="w"> </span><span class="k">DATABASE</span><span class="w"> </span><span class="n">mydb</span><span class="w"> </span><span class="k">ADD</span><span class="w"> </span><span class="n">REGION</span><span class="w"> </span><span class="s2">&#34;europe-west1&#34;</span><span class="p">;</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 4</span><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="ln"> 5</span><span class="cl"><span class="w"></span><span class="c1">-- 設 table locality
</span></span></span><span class="line"><span class="ln"> 6</span><span class="cl"><span class="c1"></span><span class="k">ALTER</span><span class="w"> </span><span class="k">TABLE</span><span class="w"> </span><span class="n">users</span><span class="w"> </span><span class="k">SET</span><span class="w"> </span><span class="n">LOCALITY</span><span class="w"> </span><span class="n">REGIONAL</span><span class="w"> </span><span class="k">BY</span><span class="w"> </span><span class="k">ROW</span><span class="p">;</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 7</span><span class="cl"><span class="w"></span><span class="k">ALTER</span><span class="w"> </span><span class="k">TABLE</span><span class="w"> </span><span class="n">country_codes</span><span class="w"> </span><span class="k">SET</span><span class="w"> </span><span class="n">LOCALITY</span><span class="w"> </span><span class="k">GLOBAL</span><span class="p">;</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 8</span><span class="cl"><span class="w"></span><span class="k">ALTER</span><span class="w"> </span><span class="k">TABLE</span><span class="w"> </span><span class="n">orders_us</span><span class="w"> </span><span class="k">SET</span><span class="w"> </span><span class="n">LOCALITY</span><span class="w"> </span><span class="n">REGIONAL</span><span class="w"> </span><span class="k">BY</span><span class="w"> </span><span class="k">TABLE</span><span class="w"> </span><span class="k">IN</span><span class="w"> </span><span class="s2">&#34;us-east1&#34;</span><span class="p">;</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 9</span><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="ln">10</span><span class="cl"><span class="w"></span><span class="c1">-- 驗證
</span></span></span><span class="line"><span class="ln">11</span><span class="cl"><span class="c1"></span><span class="k">SHOW</span><span class="w"> </span><span class="n">LOCALITY</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="k">TABLE</span><span class="w"> </span><span class="n">users</span><span class="p">;</span><span class="w">
</span></span></span><span class="line"><span class="ln">12</span><span class="cl"><span class="w"></span><span class="k">SHOW</span><span class="w"> </span><span class="n">RANGES</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="k">TABLE</span><span class="w"> </span><span class="n">users</span><span class="p">;</span><span class="w">  </span><span class="c1">-- 看 replica 分佈
</span></span></span><span class="line"><span class="ln">13</span><span class="cl"><span class="c1"></span><span class="k">EXPLAIN</span><span class="w"> </span><span class="k">ANALYZE</span><span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">users</span><span class="w"> </span><span class="k">WHERE</span><span class="w"> </span><span class="n">id</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span><span class="w">  </span><span class="c1">-- 看 query plan 是否 local</span></span></span></code></pre></div><p>對應 <a href="/blog/backend/knowledge-cards/stale-read/" data-link-title="Stale Read" data-link-desc="讀取到落後於最新寫入版本的舊資料">stale read 卡</a>、<a href="/blog/backend/knowledge-cards/table-partitioning/" data-link-title="Table Partitioning" data-link-desc="說明單一資料庫內如何把大表拆成多個分區，並由查詢規劃器只掃相關片段">table partitioning 卡</a> 的具體機制實現。</p>
<h2 id="操作流程從合規-boundary-到-schema-配置">操作流程：從合規 boundary 到 schema 配置</h2>
<h3 id="配置-multi-region-database">配置 multi-region database</h3>
<p>第一步是把所有 region 加入 database：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-sql" data-lang="sql"><span class="line"><span class="ln">1</span><span class="cl"><span class="c1">-- 假設 cluster 已跨 8 個州（透過 AWS Outposts 在每州內）
</span></span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="c1"></span><span class="k">ALTER</span><span class="w"> </span><span class="k">DATABASE</span><span class="w"> </span><span class="n">sportsbook</span><span class="w"> </span><span class="k">PRIMARY</span><span class="w"> </span><span class="n">REGION</span><span class="w"> </span><span class="s2">&#34;us-east1-virginia&#34;</span><span class="p">;</span><span class="w">
</span></span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="w"></span><span class="k">ALTER</span><span class="w"> </span><span class="k">DATABASE</span><span class="w"> </span><span class="n">sportsbook</span><span class="w"> </span><span class="k">ADD</span><span class="w"> </span><span class="n">REGION</span><span class="w"> </span><span class="s2">&#34;us-east1-nj&#34;</span><span class="p">;</span><span class="w">
</span></span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="w"></span><span class="k">ALTER</span><span class="w"> </span><span class="k">DATABASE</span><span class="w"> </span><span class="n">sportsbook</span><span class="w"> </span><span class="k">ADD</span><span class="w"> </span><span class="n">REGION</span><span class="w"> </span><span class="s2">&#34;us-east1-fl&#34;</span><span class="p">;</span><span class="w">
</span></span></span><span class="line"><span class="ln">5</span><span class="cl"><span class="w"></span><span class="k">ALTER</span><span class="w"> </span><span class="k">DATABASE</span><span class="w"> </span><span class="n">sportsbook</span><span class="w"> </span><span class="k">ADD</span><span class="w"> </span><span class="n">REGION</span><span class="w"> </span><span class="s2">&#34;us-east1-az&#34;</span><span class="p">;</span><span class="w">
</span></span></span><span class="line"><span class="ln">6</span><span class="cl"><span class="w"></span><span class="c1">-- ...其他州</span></span></span></code></pre></div><p>每個「region」對應一個 Outpost / AWS region 的 locality tag、CockroachDB Raft 根據 locality 自動分佈 replica。</p>
<h3 id="table-level-locality-配置">Table-level locality 配置</h3>
<p>bet placement / settlement table 走 <code>REGIONAL BY ROW</code>（資料跟玩家所在州綁定）：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-sql" data-lang="sql"><span class="line"><span class="ln">1</span><span class="cl"><span class="k">ALTER</span><span class="w"> </span><span class="k">TABLE</span><span class="w"> </span><span class="n">bets</span><span class="w"> </span><span class="k">SET</span><span class="w"> </span><span class="n">LOCALITY</span><span class="w"> </span><span class="n">REGIONAL</span><span class="w"> </span><span class="k">BY</span><span class="w"> </span><span class="k">ROW</span><span class="p">;</span><span class="w">
</span></span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="w"></span><span class="k">ALTER</span><span class="w"> </span><span class="k">TABLE</span><span class="w"> </span><span class="n">settlements</span><span class="w"> </span><span class="k">SET</span><span class="w"> </span><span class="n">LOCALITY</span><span class="w"> </span><span class="n">REGIONAL</span><span class="w"> </span><span class="k">BY</span><span class="w"> </span><span class="k">ROW</span><span class="p">;</span></span></span></code></pre></div><p>account / user profile 跨州統一帳戶 — 玩家可能在多州下注、但 <em>主檔</em> 留 single region：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-sql" data-lang="sql"><span class="line"><span class="ln">1</span><span class="cl"><span class="k">ALTER</span><span class="w"> </span><span class="k">TABLE</span><span class="w"> </span><span class="n">accounts</span><span class="w"> </span><span class="k">SET</span><span class="w"> </span><span class="n">LOCALITY</span><span class="w"> </span><span class="n">REGIONAL</span><span class="w"> </span><span class="k">BY</span><span class="w"> </span><span class="k">TABLE</span><span class="w"> </span><span class="k">IN</span><span class="w"> </span><span class="s2">&#34;us-east1-virginia&#34;</span><span class="p">;</span></span></span></code></pre></div><p>reference data（運動類別、賽事 metadata）— 全球變更少、每州都要快速 read：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-sql" data-lang="sql"><span class="line"><span class="ln">1</span><span class="cl"><span class="k">ALTER</span><span class="w"> </span><span class="k">TABLE</span><span class="w"> </span><span class="n">sports_metadata</span><span class="w"> </span><span class="k">SET</span><span class="w"> </span><span class="n">LOCALITY</span><span class="w"> </span><span class="k">GLOBAL</span><span class="p">;</span></span></span></code></pre></div><h3 id="application-端寫入">Application 端寫入</h3>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-sql" data-lang="sql"><span class="line"><span class="ln">1</span><span class="cl"><span class="c1">-- 顯式指定 row 所在 region（推薦、明確）
</span></span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="c1"></span><span class="k">INSERT</span><span class="w"> </span><span class="k">INTO</span><span class="w"> </span><span class="n">bets</span><span class="w"> </span><span class="p">(</span><span class="n">id</span><span class="p">,</span><span class="w"> </span><span class="n">user_id</span><span class="p">,</span><span class="w"> </span><span class="k">state</span><span class="p">,</span><span class="w"> </span><span class="n">amount</span><span class="p">,</span><span class="w"> </span><span class="n">crdb_region</span><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="w"></span><span class="k">VALUES</span><span class="w"> </span><span class="p">(...,</span><span class="w"> </span><span class="p">...,</span><span class="w"> </span><span class="s1">&#39;NJ&#39;</span><span class="p">,</span><span class="w"> </span><span class="mi">100</span><span class="p">.</span><span class="mi">00</span><span class="p">,</span><span class="w"> </span><span class="s1">&#39;us-east1-nj&#39;</span><span class="p">);</span><span class="w">
</span></span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="ln">5</span><span class="cl"><span class="w"></span><span class="c1">-- 或用 gateway_region() default（依 application 連到的 region）
</span></span></span><span class="line"><span class="ln">6</span><span class="cl"><span class="c1"></span><span class="k">INSERT</span><span class="w"> </span><span class="k">INTO</span><span class="w"> </span><span class="n">bets</span><span class="w"> </span><span class="p">(</span><span class="n">id</span><span class="p">,</span><span class="w"> </span><span class="n">user_id</span><span class="p">,</span><span class="w"> </span><span class="k">state</span><span class="p">,</span><span class="w"> </span><span class="n">amount</span><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="ln">7</span><span class="cl"><span class="w"></span><span class="k">VALUES</span><span class="w"> </span><span class="p">(...,</span><span class="w"> </span><span class="p">...,</span><span class="w"> </span><span class="s1">&#39;NJ&#39;</span><span class="p">,</span><span class="w"> </span><span class="mi">100</span><span class="p">.</span><span class="mi">00</span><span class="p">);</span><span class="w">  </span><span class="c1">-- crdb_region 自動填 gateway 端</span></span></span></code></pre></div><p><code>gateway_region()</code> 是便利但有風險的 default — 如果 application server 在 us-east1-fl 但 user 在 NJ 下注、row 會被放到 FL 而不是 NJ、違反 Wire Act 合規。Hard Rock 場景下顯式指定 <code>crdb_region</code> 是更安全的做法。</p>
<h3 id="rollback-邊界">Rollback 邊界</h3>
<p>locality 變更即時生效、Raft 自動 rebalance — 無不可逆動作。但 rebalance 期間 cross-region traffic 暴增、p99 短期 spike。production 環境改 locality 應該選低流量時段、並監控 rebalance queue。</p>
<h2 id="失敗模式">失敗模式</h2>
<h3 id="拆獨立-cluster-解合規但破壞業務邏輯反模式hard-rock-對比-standard-charteredf410">「拆獨立 cluster 解合規但破壞業務邏輯」反模式（Hard Rock 對比 Standard Chartered、F4.10）</h3>
<p>直覺路徑是「合規要求資料留某地理邊界 → 每邊界開一個獨立 cluster」、合規上沒問題。但獨立 cluster 之間：</p>
<ul>
<li>玩家統一帳戶撞牆 — 每 cluster 各自有 user table、跨 cluster query 麻煩</li>
<li>跨州 reporting 要 N 個 cluster + ETL pipeline</li>
<li>欺詐偵測要 <em>cross-state aggregated view</em> — 獨立 cluster 拼不出</li>
</ul>
<p>Hard Rock 選擇 <em>邏輯一個 cluster + 物理跨州 Outpost placement</em> — 合規 boundary 用 region placement 表達、不是 cluster fragmentation。對比 Standard Chartered：</p>
<ul>
<li><strong>Standard Chartered Aurora 7 cluster fleet</strong>：銀行業跨國合規邊界、<em>跨 cluster 業務邏輯需求弱</em>（每市場用戶獨立、跨境統一帳戶不是核心 driver）→ 用 fleet 拓樸吸收合規可行</li>
<li><strong>Hard Rock Wire Act 跨州</strong>：跨州統一帳戶 + 跨州 reporting + 欺詐偵測是 <em>核心業務需求</em> → 必須邏輯一個 cluster、用 locality + placement 吸收合規</li>
</ul>
<p>兩條路徑沒有對錯、trigger 條件不同。判讀軸線：</p>
<ul>
<li>合規顆粒（跨國 vs 跨州 vs 跨 AZ）</li>
<li>跨 boundary 業務邏輯需求強度（強 → CockroachDB locality / 弱 → 拆獨立 cluster 可行）</li>
<li>團隊運維能力（CockroachDB 邏輯一個 cluster vs Aurora 多 cluster fleet 的人月成本）</li>
</ul>
<h3 id="outposts-是-latency-工具動機誤判f413case-反直覺判讀">「Outposts 是 latency 工具」動機誤判（F4.13、case 反直覺判讀）</h3>
<p>AWS Outposts 主要為「資料留某地理邊界」存在、latency 改善是 <em>副作用</em>。Hard Rock 策略段 2 明確警告：「決策時先看合規驅動力、latency 改善列為 bonus」。</p>
<p>若把 Outposts 當跨州 latency 改善工具、會在沒合規驅動的場景過度投資 — Outposts 硬體成本 + 維運複雜度遠高於純 AWS region 部署。實務判讀：</p>
<ul>
<li>有合規驅動（Wire Act / GDPR / 各州博彩牌照）→ Outposts 是合理投資</li>
<li>純 latency 優化 → 用 AWS Local Zones、用 CDN、用 edge cache、不要碰 Outposts</li>
<li>兩者並存 → Outposts 投資按 <em>合規</em> 計算、latency 改善是 ROI 加分項</li>
</ul>
<h3 id="global-table-write-太慢"><code>GLOBAL</code> table write 太慢</h3>
<p><code>GLOBAL</code> table 每次 write 跨 region quorum、p99 100ms+。用在 high-write workload 是典型錯配 — 該用在 reference data（國家代碼、貨幣表、規則 lookup）。</p>
<p>判讀：</p>
<ul>
<li>write QPS &lt; 10 + read QPS 跨 region 高 → <code>GLOBAL</code> 合理</li>
<li>write QPS &gt; 100 → 不要用 <code>GLOBAL</code>、改 <code>REGIONAL BY ROW</code> + 接受 cross-region read 偶爾走 follower</li>
</ul>
<h3 id="regional-by-row-但-row-沒設-crdb_region"><code>REGIONAL BY ROW</code> 但 row 沒設 <code>crdb_region</code></h3>
<p>application 寫入時忘了設 <code>crdb_region</code>、default 走 <code>gateway_region()</code> — application server 所在 region 變成 row 的 region。常見後果：</p>
<ul>
<li>application server 集中部署 → 所有 row 跑同一 region、locality 失效</li>
<li>application server 跟 user 不同 region → 合規 violation（Wire Act 場景）</li>
</ul>
<p>修法：顯式指定 <code>crdb_region</code>、把 user 的合規區域當業務欄位明確管理。</p>
<h3 id="cross-region-join-跑爆-latency">Cross-region join 跑爆 latency</h3>
<p>兩個 <code>REGIONAL BY ROW</code> table join、planner 要跨 region 拉資料、p99 暴漲。</p>
<p>修法：</p>
<ul>
<li>兩個 table partition by <em>同樣</em> 的 key（如：user_id）、保證 join 對應 row 在同 region</li>
<li>不能保證 co-location 時、考慮用 follower read 接受 stale 資料</li>
<li>query 重寫成多步：先在各 region 算 local 結果、application 端 merge</li>
</ul>
<h3 id="follower-read-假設-strong-consistency">Follower read 假設 strong consistency</h3>
<p>non-voting replica 是 <em>closed timestamp</em> 之前的資料、read-after-write 場景仍會 stale。</p>
<p>修法：</p>
<ul>
<li>read-after-write critical（如：剛下注立刻顯示「下注成功」）→ 不能走 follower、要走 leaseholder</li>
<li>dashboard / 分析 / reporting 容忍 stale → follower read 安全、大幅降 latency</li>
</ul>
<h3 id="data-residency-違規">Data residency 違規</h3>
<p>受監管州 / 國資料應留 boundary 內、但 application 從別 region 寫入 row、沒設 <code>crdb_region</code>、資料跑出 boundary、合規 violation（Wire Act / GDPR / 各州博彩牌照都有類似條款）。</p>
<p>修法（schema-level + application-level 雙保險）：</p>
<ul>
<li>schema：<code>REGIONAL BY ROW</code> + <code>crdb_region</code> 是 NOT NULL + CHECK constraint 限制可選值</li>
<li>application：寫入前明確驗證 <code>crdb_region</code> 對應 user 所在合規區</li>
<li>監控：定期跑 <code>SELECT crdb_region, count(*) FROM bets GROUP BY crdb_region</code> 確認分佈符合預期</li>
</ul>
<h3 id="hard-rock-場景的組合配置9c41">Hard Rock 場景的組合配置（9.C41）</h3>
<p>bet placement / settlement / account management 都需要跨州資料存取 + 州內合規 placement。Hard Rock 案例揭露的具體組合：</p>
<ul>
<li><code>REGIONAL BY ROW</code> + <code>crdb_region</code> 標州別 + region placement pin Outpost</li>
<li>account 跨州統一 → <code>REGIONAL BY TABLE</code> IN primary region、其他州走 follower read</li>
<li>sports metadata → <code>GLOBAL</code>、reference data 全州 local read</li>
</ul>
<p>這是滿足 Wire Act + 跨州業務邏輯的組合、不是唯一解、但揭露了 schema 設計的 <em>判讀軸</em> — 不是「locality 越強越好」、是「locality 對應業務 + 合規邊界」。</p>
<h2 id="容量與觀測">容量與觀測</h2>
<h3 id="必看-metric">必看 metric</h3>
<ul>
<li><code>Range locality distribution</code>：range 分佈跟 locality 配置是否一致</li>
<li><code>Cross-region query count</code>：cross-region query 數量、locality 失效訊號</li>
<li><code>Follower read rate</code>：follower read 命中率、降 latency 效果</li>
<li><code>Leaseholder distribution by region</code>：leaseholder 在 region 間是否均勻</li>
</ul>
<h3 id="容量公式">容量公式</h3>
<ul>
<li>cross-region traffic = <code>GLOBAL</code> table write QPS × region count</li>
<li><code>REGIONAL BY ROW</code> 跨 region read = follower read rate × QPS</li>
<li>storage 用量 = base storage × replication factor × (voting + non-voting replica count)</li>
</ul>
<h3 id="容量上限">容量上限</h3>
<ul>
<li>region count：建議 ≤ 5（多 region 增加 quorum latency + 維運複雜度）</li>
<li><code>GLOBAL</code> table 數量：建議只放 reference data、總 row 數 &lt; 10 萬</li>
<li>single range 寫 throughput ~1000 QPS（通用估算、見 <a href="../hlc-raft-consensus/">HLC + Raft consensus</a>）</li>
</ul>
<h3 id="回路徑">回路徑</h3>
<ul>
<li><a href="/blog/backend/09-performance-capacity/" data-link-title="模組九：效能工程與容量規劃" data-link-desc="把『目前配置能撐多少、要加多少機器』變成可量化、可驗證、可改進的工程流程">9.5 瓶頸定位流程</a> 判斷 cross-region-bound vs CPU-bound</li>
<li><a href="/blog/backend/01-database/global-distributed-oltp/" data-link-title="1.11 全球分散式 OLTP" data-link-desc="Spanner / Aurora DSQL / Cosmos DB multi-region write / CockroachDB / TiDB 的全球一致性取捨">1.11 全球分散式 OLTP</a> 上游合規 / latency 取捨</li>
</ul>
<h2 id="邊界與整合">邊界與整合</h2>
<h3 id="sibling-deep-articles">Sibling deep articles</h3>
<ul>
<li><a href="../survival-goals/">survival goals</a>：locality + survival goal 一起決定 replica placement</li>
<li><a href="../transaction-retry-pattern/">transaction retry pattern</a>：partition 降低 hot row contention 的 schema 路徑</li>
<li><a href="../hlc-raft-consensus/">HLC + Raft consensus</a>：leaseholder 跟 locality 的關係</li>
</ul>
<h3 id="跟-aurora-global-database-對照">跟 Aurora Global Database 對照</h3>
<p>Aurora 不支援 row-level locality — 跨 region 只能 cluster-per-region + async replication。CockroachDB 在一個 cluster 內可以 fine-grained locality、application 不需要管 cross-cluster 路由。Aurora Global Database 適合 <em>async DR</em> 場景、不適合 <em>跨 region 強一致 + row-level locality</em> 需求。</p>
<h3 id="跟-spanner-interleaved-tables-對照">跟 Spanner interleaved tables 對照</h3>
<p>Spanner 的 <a href="/blog/backend/knowledge-cards/interleaved-table/" data-link-title="Interleaved Table" data-link-desc="Spanner 把 parent / child table row 物理交錯儲存、parent &#43; child JOIN 不跨 split">Interleaved Table</a> 跟 CockroachDB 的 <code>REGIONAL BY ROW</code> 概念類似（parent-child row co-location）、語法不同。Spanner 在 GCP region 內 placement、無 Outposts 等效 — Hard Rock 場景下 Spanner 不能直接套用。</p>
<h3 id="aurora-dsql--spanner-對比">Aurora DSQL / Spanner 對比</h3>
<p>完整三家 distributed SQL 在 locality / multi-region placement 的取捨、見 <a href="../aurora-dsql-spanner-decision-tree/">aurora-dsql-spanner-decision-tree</a>。</p>
<h3 id="1x-章節互引">1.x 章節互引</h3>
<ul>
<li><a href="/blog/backend/01-database/global-distributed-oltp/" data-link-title="1.11 全球分散式 OLTP" data-link-desc="Spanner / Aurora DSQL / Cosmos DB multi-region write / CockroachDB / TiDB 的全球一致性取捨">1.11 全球分散式 OLTP</a> 上游</li>
<li><a href="/blog/backend/knowledge-cards/stale-read/" data-link-title="Stale Read" data-link-desc="讀取到落後於最新寫入版本的舊資料">stale read 卡</a></li>
<li><a href="/blog/backend/knowledge-cards/table-partitioning/" data-link-title="Table Partitioning" data-link-desc="說明單一資料庫內如何把大表拆成多個分區，並由查詢規劃器只掃相關片段">table partitioning 卡</a></li>
</ul>
<h3 id="何時不用本文">何時不用本文</h3>
<ul>
<li>single-region 部署、無 data residency 需求 → 用 default locality 即可</li>
<li>合規邊界 <em>禁止</em> 跨境 replica（如 Standard Chartered 模式）→ 拆 cluster-per-市場、不走本文 locality 路徑</li>
<li>純 latency 優化、無合規驅動 → 用 CDN / cache / Local Zones、不必動 schema</li>
</ul>
<h2 id="相關連結">相關連結</h2>
<ul>
<li><a href="/blog/backend/01-database/vendors/cockroachdb/" data-link-title="CockroachDB" data-link-desc="分散式 SQL、PostgreSQL 相容、跨區強一致、Spanner 的開源 / 跨雲替代">CockroachDB vendor overview</a></li>
<li><a href="/blog/backend/09-performance-capacity/cases/hard-rock-digital-cockroachdb-sports-betting/" data-link-title="9.C41 Hard Rock Digital：CockroachDB on AWS Outposts、Wire Act 合規 &#43; 跨州單一邏輯 DB" data-link-desc="Hard Rock Digital 用 CockroachDB 跨 AWS Outposts &#43; US-East-1、Wire Act 強制資料留州、單一邏輯 DB 解多州 sportsbook、100 node 32 vCPU 撐 Super Bowl">9.C41 Hard Rock Digital</a>（concrete framing — 跨 8 州 + Outposts + 邏輯一個 cluster）</li>
<li><a href="/blog/backend/09-performance-capacity/cases/netflix-cockroachdb-multi-region-fleet/" data-link-title="9.C40 Netflix：380&#43; CockroachDB cluster 的 multi-active 拓樸艦隊" data-link-desc="Netflix 把 Cassandra 不夠用的 transactional workload 移到 CockroachDB、380&#43; cluster / 60&#43; 跨 region、含 Open Connect、studio cloud drive、gaming control plane">9.C40 Netflix</a>（多 region locality 規模治理）</li>
<li><a href="/blog/backend/09-performance-capacity/cases/standard-chartered-aurora-banking/" data-link-title="9.C14 Standard Chartered：受監管銀行的 Aurora 4000 TPS 容量提升" data-link-desc="Standard Chartered 銀行遷移到 Aurora 後吞吐量提升 10 倍至 4000 TPS、跨 7 個受監管市場">9.C14 Standard Chartered</a>（fleet 拓樸對照、不同合規邊界）</li>
<li><a href="/blog/backend/knowledge-cards/stale-read/" data-link-title="Stale Read" data-link-desc="讀取到落後於最新寫入版本的舊資料">stale read 卡</a> / <a href="/blog/backend/knowledge-cards/table-partitioning/" data-link-title="Table Partitioning" data-link-desc="說明單一資料庫內如何把大表拆成多個分區，並由查詢規劃器只掃相關片段">table partitioning 卡</a></li>
<li>官方：<a href="https://www.cockroachlabs.com/docs/stable/multiregion-overview.html">CockroachDB Multi-Region Capabilities</a> / <a href="https://www.cockroachlabs.com/docs/stable/table-localities.html">Table Localities</a> / <a href="https://www.cockroachlabs.com/docs/stable/follower-reads.html">Follower Reads</a></li>
</ul>
]]></content:encoded></item><item><title>CockroachDB Multi-region Table 配置：三種 table locality 的選擇與 latency / 一致性取捨</title><link>https://tarrragon.github.io/blog/backend/01-database/vendors/cockroachdb/multi-region-table-config/</link><pubDate>Tue, 02 Jun 2026 00:00:00 +0000</pubDate><guid>https://tarrragon.github.io/blog/backend/01-database/vendors/cockroachdb/multi-region-table-config/</guid><description>&lt;blockquote>
&lt;p>本文是 &lt;a href="https://tarrragon.github.io/blog/backend/01-database/vendors/cockroachdb/" data-link-title="CockroachDB" data-link-desc="分散式 SQL、PostgreSQL 相容、跨區強一致、Spanner 的開源 / 跨雲替代">CockroachDB vendor overview&lt;/a> 的 implementation-layer deep article。寫作參照 &lt;a href="https://tarrragon.github.io/blog/posts/vendor-%E6%B7%B1%E5%BA%A6%E6%8A%80%E8%A1%93%E6%96%87%E7%AB%A0%E6%96%B9%E6%B3%95%E8%AB%96%E7%9A%84%E6%BC%94%E5%8C%96%E7%B4%80%E9%8C%84%E5%90%8C-vendor-%E7%B3%BB%E5%88%97%E7%9A%84%E9%96%8B%E5%A0%B4%E8%BC%AA%E6%9B%BF%E9%A9%97%E8%AD%89/" data-link-title="Vendor 深度技術文章方法論的演化紀錄：同 vendor 系列的開場輪替驗證" data-link-desc="vendor overview 飽和後要寫單一功能深度文章、需要選題與結構依據時回來。這套方法論的驗證來源與 cadence variant 在高風險場景（同 vendor sub-tool 系列）的實證。">vendor deep article methodology&lt;/a>。本文聚焦 &lt;em>三種 table locality 怎麼選、選錯的 latency / 一致性後果與重配代價&lt;/em>。Schema 怎麼配合 locality 設計（合規 boundary、跨州業務邏輯、Outposts 拓樸）主寫於 &lt;a href="../locality-aware-schema/">locality-aware schema&lt;/a>、survival goal 的存活機制主寫於 &lt;a href="../survival-goals/">survival goals&lt;/a>、本文兩者都 cross-link、不重複展開。&lt;/p>&lt;/blockquote>
&lt;hr>
&lt;h2 id="問題情境multi-region-cluster-起來了每張-table-該設哪種-locality">問題情境：multi-region cluster 起來了、每張 table 該設哪種 locality&lt;/h2>
&lt;p>團隊把 CockroachDB 跨 region 拉起來、&lt;code>ALTER DATABASE ... ADD REGION&lt;/code> 也跑完了，接下來面對的是逐張 table 的 locality 決策。這個決策的成本結構很不對稱：設對了，read / write 走本地 leaseholder、latency 貼著單區水準；設錯了，每次寫入或讀取都吃一趟跨 region round trip，p99 從個位數毫秒跳到上百毫秒。&lt;/p>
&lt;p>multi-region table locality 是 &lt;em>把「資料的地理歸屬」跟「讀寫路徑」綁在一起&lt;/em> 的宣告。CockroachDB 提供三種 locality，對應三種「資料屬於誰、誰要快」的業務形狀：&lt;/p>
&lt;ul>
&lt;li>&lt;code>REGIONAL BY TABLE&lt;/code>：整張 table 歸屬單一 region，該 region 讀寫快、其他 region 慢。&lt;/li>
&lt;li>&lt;code>REGIONAL BY ROW&lt;/code>：每一 row 各自歸屬一個 region，row 所在 region 讀寫快。&lt;/li>
&lt;li>&lt;code>GLOBAL&lt;/code>：資料屬於所有 region，每個 region 本地讀都快，但寫入要跨 region 達成共識。&lt;/li>
&lt;/ul>
&lt;p>讀者進來最常卡的三題：&lt;/p>
&lt;ul>
&lt;li>三種 locality 對應什麼業務形狀、判讀軸是什麼？&lt;/li>
&lt;li>&lt;code>GLOBAL&lt;/code> 既然每區讀都快，為什麼不全部設 &lt;code>GLOBAL&lt;/code>？&lt;/li>
&lt;li>上線後發現 locality 設錯，重配的代價有多高、能不能無痛改？&lt;/li>
&lt;/ul>
&lt;p>這三題都是 &lt;em>把業務的資料歸屬與讀寫熱點，翻譯成副本拓樸&lt;/em> 的設計決策，語法層面反而簡單。&lt;/p>
&lt;p>問題情境最常見的 trigger：&lt;a href="https://tarrragon.github.io/blog/backend/09-performance-capacity/cases/netflix-cockroachdb-multi-region-fleet/" data-link-title="9.C40 Netflix：380&amp;#43; CockroachDB cluster 的 multi-active 拓樸艦隊" data-link-desc="Netflix 把 Cassandra 不夠用的 transactional workload 移到 CockroachDB、380&amp;#43; cluster / 60&amp;#43; 跨 region、含 Open Connect、studio cloud drive、gaming control plane">9.C40 Netflix&lt;/a> 的 60+ multi-region cluster、最大 Gaming cluster 48-node 跨 4 region。case 揭露一個反直覺判讀 — multi-region 的主要動機是 &lt;em>region failure 0 downtime&lt;/em>、不是降 latency；跨 region quorum 物理上會 &lt;em>增&lt;/em> 寫入 latency。這條判讀直接決定 table locality 怎麼設：當 multi-region 的目的是 survival 而非 latency，把高寫入 table 設成 &lt;code>GLOBAL&lt;/code>（跨區同步寫）就是把成本花在錯的地方。&lt;/p>
&lt;p>&lt;a href="https://tarrragon.github.io/blog/backend/09-performance-capacity/cases/hard-rock-digital-cockroachdb-sports-betting/" data-link-title="9.C41 Hard Rock Digital：CockroachDB on AWS Outposts、Wire Act 合規 &amp;#43; 跨州單一邏輯 DB" data-link-desc="Hard Rock Digital 用 CockroachDB 跨 AWS Outposts &amp;#43; US-East-1、Wire Act 強制資料留州、單一邏輯 DB 解多州 sportsbook、100 node 32 vCPU 撐 Super Bowl">9.C41 Hard Rock Digital&lt;/a> 則提供 row-level 歸屬的 concrete framing：跨 8 州 sportsbook、bet 資料按下注州歸屬、邏輯上仍是一個 cluster。case 觀察段揭露「跨所有 region 一個 logical database」這個拓樸 fact — 也就是 row-level locality 撐起了「合規分州 placement + 單一邏輯 DB」的組合。Hard Rock 的合規驅動與 schema 設計細節在 &lt;a href="../locality-aware-schema/">locality-aware schema&lt;/a> 展開，本文只取「row-level 歸屬」這個 locality 選擇本身。&lt;/p></description><content:encoded><![CDATA[<blockquote>
<p>本文是 <a href="/blog/backend/01-database/vendors/cockroachdb/" data-link-title="CockroachDB" data-link-desc="分散式 SQL、PostgreSQL 相容、跨區強一致、Spanner 的開源 / 跨雲替代">CockroachDB vendor overview</a> 的 implementation-layer deep article。寫作參照 <a href="/blog/posts/vendor-%E6%B7%B1%E5%BA%A6%E6%8A%80%E8%A1%93%E6%96%87%E7%AB%A0%E6%96%B9%E6%B3%95%E8%AB%96%E7%9A%84%E6%BC%94%E5%8C%96%E7%B4%80%E9%8C%84%E5%90%8C-vendor-%E7%B3%BB%E5%88%97%E7%9A%84%E9%96%8B%E5%A0%B4%E8%BC%AA%E6%9B%BF%E9%A9%97%E8%AD%89/" data-link-title="Vendor 深度技術文章方法論的演化紀錄：同 vendor 系列的開場輪替驗證" data-link-desc="vendor overview 飽和後要寫單一功能深度文章、需要選題與結構依據時回來。這套方法論的驗證來源與 cadence variant 在高風險場景（同 vendor sub-tool 系列）的實證。">vendor deep article methodology</a>。本文聚焦 <em>三種 table locality 怎麼選、選錯的 latency / 一致性後果與重配代價</em>。Schema 怎麼配合 locality 設計（合規 boundary、跨州業務邏輯、Outposts 拓樸）主寫於 <a href="../locality-aware-schema/">locality-aware schema</a>、survival goal 的存活機制主寫於 <a href="../survival-goals/">survival goals</a>、本文兩者都 cross-link、不重複展開。</p></blockquote>
<hr>
<h2 id="問題情境multi-region-cluster-起來了每張-table-該設哪種-locality">問題情境：multi-region cluster 起來了、每張 table 該設哪種 locality</h2>
<p>團隊把 CockroachDB 跨 region 拉起來、<code>ALTER DATABASE ... ADD REGION</code> 也跑完了，接下來面對的是逐張 table 的 locality 決策。這個決策的成本結構很不對稱：設對了，read / write 走本地 leaseholder、latency 貼著單區水準；設錯了，每次寫入或讀取都吃一趟跨 region round trip，p99 從個位數毫秒跳到上百毫秒。</p>
<p>multi-region table locality 是 <em>把「資料的地理歸屬」跟「讀寫路徑」綁在一起</em> 的宣告。CockroachDB 提供三種 locality，對應三種「資料屬於誰、誰要快」的業務形狀：</p>
<ul>
<li><code>REGIONAL BY TABLE</code>：整張 table 歸屬單一 region，該 region 讀寫快、其他 region 慢。</li>
<li><code>REGIONAL BY ROW</code>：每一 row 各自歸屬一個 region，row 所在 region 讀寫快。</li>
<li><code>GLOBAL</code>：資料屬於所有 region，每個 region 本地讀都快，但寫入要跨 region 達成共識。</li>
</ul>
<p>讀者進來最常卡的三題：</p>
<ul>
<li>三種 locality 對應什麼業務形狀、判讀軸是什麼？</li>
<li><code>GLOBAL</code> 既然每區讀都快，為什麼不全部設 <code>GLOBAL</code>？</li>
<li>上線後發現 locality 設錯，重配的代價有多高、能不能無痛改？</li>
</ul>
<p>這三題都是 <em>把業務的資料歸屬與讀寫熱點，翻譯成副本拓樸</em> 的設計決策，語法層面反而簡單。</p>
<p>問題情境最常見的 trigger：<a href="/blog/backend/09-performance-capacity/cases/netflix-cockroachdb-multi-region-fleet/" data-link-title="9.C40 Netflix：380&#43; CockroachDB cluster 的 multi-active 拓樸艦隊" data-link-desc="Netflix 把 Cassandra 不夠用的 transactional workload 移到 CockroachDB、380&#43; cluster / 60&#43; 跨 region、含 Open Connect、studio cloud drive、gaming control plane">9.C40 Netflix</a> 的 60+ multi-region cluster、最大 Gaming cluster 48-node 跨 4 region。case 揭露一個反直覺判讀 — multi-region 的主要動機是 <em>region failure 0 downtime</em>、不是降 latency；跨 region quorum 物理上會 <em>增</em> 寫入 latency。這條判讀直接決定 table locality 怎麼設：當 multi-region 的目的是 survival 而非 latency，把高寫入 table 設成 <code>GLOBAL</code>（跨區同步寫）就是把成本花在錯的地方。</p>
<p><a href="/blog/backend/09-performance-capacity/cases/hard-rock-digital-cockroachdb-sports-betting/" data-link-title="9.C41 Hard Rock Digital：CockroachDB on AWS Outposts、Wire Act 合規 &#43; 跨州單一邏輯 DB" data-link-desc="Hard Rock Digital 用 CockroachDB 跨 AWS Outposts &#43; US-East-1、Wire Act 強制資料留州、單一邏輯 DB 解多州 sportsbook、100 node 32 vCPU 撐 Super Bowl">9.C41 Hard Rock Digital</a> 則提供 row-level 歸屬的 concrete framing：跨 8 州 sportsbook、bet 資料按下注州歸屬、邏輯上仍是一個 cluster。case 觀察段揭露「跨所有 region 一個 logical database」這個拓樸 fact — 也就是 row-level locality 撐起了「合規分州 placement + 單一邏輯 DB」的組合。Hard Rock 的合規驅動與 schema 設計細節在 <a href="../locality-aware-schema/">locality-aware schema</a> 展開，本文只取「row-level 歸屬」這個 locality 選擇本身。</p>
<h2 id="核心機制三種-locality-的判讀軸--survival-goal-互動">核心機制：三種 locality 的判讀軸 + survival goal 互動</h2>
<p>三種 table locality 的差異，本質是 <em>leaseholder（讀寫入口）跟資料歸屬 region 之間的關係</em>。leaseholder 機制屬前置、見 <a href="../hlc-raft-consensus/">HLC + Raft consensus</a>；本文聚焦三種 locality 把 leaseholder 放在哪、因此誰快誰慢。</p>
<h3 id="判讀軸資料歸屬的顆粒--讀寫熱點分佈">判讀軸：資料歸屬的顆粒 × 讀寫熱點分佈</h3>
<p>選 locality 的第一個判讀軸是 <em>資料歸屬的顆粒</em>：整張 table 屬於一個 region（table 級），還是每 row 各屬一個 region（row 級），還是屬於所有 region（global）。第二個判讀軸是 <em>讀寫熱點落在哪</em>：本地讀為主、本地寫為主、還是全球讀為主。</p>
<table>
  <thead>
      <tr>
          <th>Locality</th>
          <th>資料歸屬顆粒</th>
          <th>Read 快的條件</th>
          <th>Write 快的條件</th>
          <th>對應業務形狀</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td><code>REGIONAL BY TABLE</code></td>
          <td>整張 table 一個 region</td>
          <td>從歸屬 region 讀</td>
          <td>從歸屬 region 寫</td>
          <td>整張表服務單一市場（例：日本訂單表）</td>
      </tr>
      <tr>
          <td><code>REGIONAL BY ROW</code></td>
          <td>每 row 一個 region</td>
          <td>從 row 歸屬 region 讀</td>
          <td>從 row 歸屬 region 寫</td>
          <td>資料跟用戶地理綁定（玩家、帳戶、訂單）</td>
      </tr>
      <tr>
          <td><code>GLOBAL</code></td>
          <td>所有 region 共有</td>
          <td>任何 region 本地讀都快</td>
          <td>沒有「快」的寫（跨區共識）</td>
          <td>reference data（國碼、貨幣、規則表）</td>
      </tr>
  </tbody>
</table>
<p>每一格的判讀都要回到該情境，不能只看表。</p>
<p><code>REGIONAL BY TABLE</code> 適合 <em>整張表的讀寫熱點集中在單一 region</em> 的情況。例如一張只服務日本市場的訂單表，把整張表的 leaseholder 釘在 <code>asia-northeast1</code>，日本端的應用讀寫都走本地 leaseholder，跨區應用偶爾讀則走 follower read 接受 stale。判讀訊號：這張表的寫入請求是否 95% 以上來自同一 region。如果不是，table 級歸屬會讓多數寫入吃跨區延遲。</p>
<p><code>REGIONAL BY ROW</code> 適合 <em>每一 row 跟某個地理位置強綁定、但整張表跨多 region</em> 的情況。玩家帳戶、訂單、下注紀錄都屬於這類 — 每筆資料屬於某個用戶所在 region，但整張表服務所有 region 的用戶。row 透過隱含的 <code>crdb_region</code> 欄位決定歸屬，leaseholder 跟著 row 走。判讀訊號：同一張表的不同 row，讀寫熱點是否分散在不同 region。是的話，row 級歸屬讓每個 row 都貼著自己的用戶。</p>
<p><code>GLOBAL</code> 適合 <em>讀遠多於寫、且每個 region 都要本地快讀</em> 的 reference data。國家代碼、貨幣表、運動賽事 metadata 這類資料變更稀少、但每個 region 的每次查詢都要用到。<code>GLOBAL</code> 讓每個 region 都能本地讀（讀到 closed timestamp 前的一致快照），代價是寫入要跨 region 達成共識。判讀訊號：寫入頻率是否低到「跨區寫的慢可以忽略」。</p>
<h3 id="為什麼不全部設-global">為什麼不全部設 GLOBAL</h3>
<p><code>GLOBAL</code> 的「每區讀都快」看似適合全表套用，但它對 <em>寫入</em> 收取跨 region quorum 的全額成本。<code>GLOBAL</code> table 的讀之所以能本地完成，是因為 CockroachDB 維護一個全球同步的 closed timestamp，讓每個 region 都能安全地本地讀稍早的快照；維護這個 timestamp 的代價是每次寫入都要跟所有 region 協調。</p>
<blockquote>
<p><strong>Scope warning</strong>：<code>GLOBAL</code> table 的跨 region 寫入 p99、<code>REGIONAL BY ROW</code> 的本地寫入 p99、closed timestamp 的傳播間隔等具體數字，屬 vendor 規格與部署拓樸（region 距離、replica 數）的函數，三個 anchor case（DoorDash / Netflix / Hard Rock）都未揭露單一 table 的 latency 數字。本文只給量級判讀（本地 quorum vs 跨洲 quorum 差一到兩個數量級），具體值需 benchmark 自身拓樸並 cross-verify <a href="https://www.cockroachlabs.com/docs/stable/table-localities.html">CockroachDB Table Localities 文件</a>。</p></blockquote>
<p>因此「全部設 <code>GLOBAL</code>」會把所有寫入推上跨 region 路徑，等於放棄了 distributed SQL 把寫入分散到各 region 的核心優勢。<code>GLOBAL</code> 的正確用法是限定在 <em>變更頻率低、全球都要快讀</em> 的 reference data。</p>
<h3 id="survival-goal-怎麼跟-locality-一起決定副本拓樸">Survival goal 怎麼跟 locality 一起決定副本拓樸</h3>
<p>table locality 決定 <em>leaseholder 放哪、讀寫走哪條路徑</em>；survival goal 決定 <em>副本要分佈到幾個 failure domain 才能在故障後存活</em>。兩者一起決定每張 table 的副本拓樸。</p>
<p>survival goal 的存活機制本身（<code>SURVIVE ZONE FAILURE</code> vs <code>SURVIVE REGION FAILURE</code>、怎麼從業務 SLO 倒推、RTO / RPO 怎麼算）是 <a href="../survival-goals/">survival goals</a> 的 SSoT，本文不重複展開。本文只取兩者 <em>互動</em> 的一個關鍵後果：把 <code>SURVIVE REGION FAILURE</code> 套到 <code>REGIONAL BY ROW</code> table 時，每個 region 的 row 不只需要本地 voting replica，還需要在 <em>其他 region</em> 放足夠的 voting replica 才能在整個 region 失效後仍達成 quorum。這會把跨 region 的 voting replica 數量推高，間接增加寫入要協調的範圍。</p>
<p>判讀路線：先依業務的資料歸屬與讀寫熱點選 locality（本文），再依業務的 region failure 容忍度選 survival goal（<a href="../survival-goals/">survival goals</a>），兩者疊加後才得到最終副本拓樸與 latency 結構。</p>
<h2 id="操作流程配置驗證每步檢查生效">操作流程：配置、驗證、每步檢查生效</h2>
<h3 id="第一步確認-database-已加入所有-region">第一步：確認 database 已加入所有 region</h3>
<p>table locality 的前提是 database 已宣告 region。先確認 region 列表正確，再設 table locality。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-sql" data-lang="sql"><span class="line"><span class="ln">1</span><span class="cl"><span class="c1">-- 看 database 已有哪些 region、哪個是 primary
</span></span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="c1"></span><span class="k">SHOW</span><span class="w"> </span><span class="n">REGIONS</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="k">DATABASE</span><span class="w"> </span><span class="n">mydb</span><span class="p">;</span></span></span></code></pre></div><p>驗證點：輸出的 region 數量與名稱要對齊實際部署的 region。少一個 region，後面把 table 設成該 region 的 <code>REGIONAL BY TABLE</code> 會直接報錯。</p>
<h3 id="第二步依判讀軸設定每張-table-的-locality">第二步：依判讀軸設定每張 table 的 locality</h3>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-sql" data-lang="sql"><span class="line"><span class="ln">1</span><span class="cl"><span class="c1">-- 整張表服務單一市場
</span></span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="c1"></span><span class="k">ALTER</span><span class="w"> </span><span class="k">TABLE</span><span class="w"> </span><span class="n">orders_jp</span><span class="w"> </span><span class="k">SET</span><span class="w"> </span><span class="n">LOCALITY</span><span class="w"> </span><span class="n">REGIONAL</span><span class="w"> </span><span class="k">BY</span><span class="w"> </span><span class="k">TABLE</span><span class="w"> </span><span class="k">IN</span><span class="w"> </span><span class="s2">&#34;asia-northeast1&#34;</span><span class="p">;</span><span class="w">
</span></span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="w"></span><span class="c1">-- 資料跟用戶地理綁定
</span></span></span><span class="line"><span class="ln">5</span><span class="cl"><span class="c1"></span><span class="k">ALTER</span><span class="w"> </span><span class="k">TABLE</span><span class="w"> </span><span class="n">accounts</span><span class="w"> </span><span class="k">SET</span><span class="w"> </span><span class="n">LOCALITY</span><span class="w"> </span><span class="n">REGIONAL</span><span class="w"> </span><span class="k">BY</span><span class="w"> </span><span class="k">ROW</span><span class="p">;</span><span class="w">
</span></span></span><span class="line"><span class="ln">6</span><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="ln">7</span><span class="cl"><span class="w"></span><span class="c1">-- 低寫入、全球本地讀的 reference data
</span></span></span><span class="line"><span class="ln">8</span><span class="cl"><span class="c1"></span><span class="k">ALTER</span><span class="w"> </span><span class="k">TABLE</span><span class="w"> </span><span class="n">currency_codes</span><span class="w"> </span><span class="k">SET</span><span class="w"> </span><span class="n">LOCALITY</span><span class="w"> </span><span class="k">GLOBAL</span><span class="p">;</span></span></span></code></pre></div><p>驗證點：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-sql" data-lang="sql"><span class="line"><span class="ln">1</span><span class="cl"><span class="c1">-- 確認每張 table 的 locality 設定符合預期
</span></span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="c1"></span><span class="k">SHOW</span><span class="w"> </span><span class="k">CREATE</span><span class="w"> </span><span class="k">TABLE</span><span class="w"> </span><span class="n">accounts</span><span class="p">;</span><span class="w">   </span><span class="c1">-- locality 子句會出現在輸出尾段</span></span></span></code></pre></div><h3 id="第三步驗證讀寫路徑真的走本地">第三步：驗證讀寫路徑真的走本地</h3>
<p>設了 locality 不代表查詢真的走本地路徑 — 寫入時 row 的 <code>crdb_region</code> 沒設對、或 query 沒帶上對應條件，仍會跨區。用 <code>EXPLAIN ANALYZE</code> 看實際 plan。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-sql" data-lang="sql"><span class="line"><span class="ln">1</span><span class="cl"><span class="c1">-- 看 query 是否在 row 歸屬 region 本地完成、有沒有跨 region 拉資料
</span></span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="c1"></span><span class="k">EXPLAIN</span><span class="w"> </span><span class="k">ANALYZE</span><span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">accounts</span><span class="w"> </span><span class="k">WHERE</span><span class="w"> </span><span class="n">id</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="err">$</span><span class="mi">1</span><span class="p">;</span></span></span></code></pre></div><p>驗證點：plan 中不應出現大量跨 region 的 distributed scan；<code>REGIONAL BY ROW</code> 的點查應落在 row 歸屬 region 的單一 leaseholder。</p>
<h3 id="第四步驗證副本分佈符合-locality--survival-goal">第四步：驗證副本分佈符合 locality + survival goal</h3>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-sql" data-lang="sql"><span class="line"><span class="ln">1</span><span class="cl"><span class="c1">-- 看每張 table 的 range 副本實際分佈在哪些 region
</span></span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="c1"></span><span class="k">SHOW</span><span class="w"> </span><span class="n">RANGES</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="k">TABLE</span><span class="w"> </span><span class="n">accounts</span><span class="p">;</span></span></span></code></pre></div><p>驗證點：副本分佈要同時滿足 locality（leaseholder 在歸屬 region）跟 survival goal（跨足夠 failure domain）。兩者衝突時，CockroachDB 以 survival goal 為硬約束調整副本數，這會反過來影響 latency — 對應 <a href="../survival-goals/">survival goals</a> 的 latency 暴漲失敗模式。</p>
<h2 id="失敗模式locality-選錯的高代價回退">失敗模式：locality 選錯的高代價回退</h2>
<h3 id="global-套到高寫入-table"><code>GLOBAL</code> 套到高寫入 table</h3>
<p>把高寫入 table（訂單、下注、status 變更）設成 <code>GLOBAL</code>，每筆寫入都跨 region 共識，寫入 p99 結構性暴漲、寫入吞吐被跨區協調卡死。徵兆：CockroachDB Console 的跨 region network traffic 隨寫入量線性成長、寫入 p99 跟 region 距離正相關。</p>
<p>修法：把 table 改成 <code>REGIONAL BY ROW</code>（按用戶歸屬）或 <code>REGIONAL BY TABLE</code>（按市場歸屬）。</p>
<p>Anti-recommendation：reference data 之外的任何 table，預設都不要設 <code>GLOBAL</code>。<code>GLOBAL</code> 的判準是「寫入頻率低到跨區寫的慢可以忽略」，高寫入 workload 直接排除。</p>
<h3 id="regional-by-row-但-row-沒帶正確-crdb_region"><code>REGIONAL BY ROW</code> 但 row 沒帶正確 <code>crdb_region</code></h3>
<p><code>REGIONAL BY ROW</code> 靠 <code>crdb_region</code> 決定 row 歸屬。寫入時沒顯式指定，default 走 <code>gateway_region()</code> — application server 所在 region 變成 row 歸屬。後果是 row 被釘在 application server 那一區，而非用戶所在區，locality 形同失效（甚至在合規場景違反 data residency，見 <a href="../locality-aware-schema/">locality-aware schema</a>）。</p>
<p>修法：寫入時顯式指定 <code>crdb_region</code> 為用戶所在 region，並用 NOT NULL + CHECK constraint 把可選值鎖死。</p>
<h3 id="選錯-locality-的重配代價高代價不可逆情境的回退敘事">選錯 locality 的重配代價（高代價不可逆情境的回退敘事）</h3>
<p>table locality 選錯，重配本身語法上一行就能改（<code>ALTER TABLE ... SET LOCALITY ...</code>），但 <em>資料層面的重配代價高且有持續影響</em>，需要專屬回退計畫，不能比照「改個 config 重啟」對待。</p>
<p>重配 locality 會觸發 CockroachDB 把受影響 range 的副本搬到新拓樸對應的位置。把一張大 table 從 <code>GLOBAL</code> 改成 <code>REGIONAL BY ROW</code>，或從 single region 改成 row-level 跨多 region，意味著大量 range 要 rebalance — 期間跨 region network 流量暴增、leaseholder 反覆換手、p99 持續波動，table 越大、region 越多，rebalance 窗口越長。這是隨資料量延長的背景過程，遠非秒級操作。</p>
<p>更關鍵的是 <code>REGIONAL BY ROW</code> 的 <code>crdb_region</code> 是 <em>資料內容</em>，不只是 metadata。如果原本 row 的歸屬區設錯（例如全部落到 application server 那一區），重配 locality 不會自動把 row 搬到正確的用戶 region — 還要 <em>回填 <code>crdb_region</code> 欄位</em>，這是一次 data migration，不是 schema 變更。合規場景下，錯誤歸屬期間寫入的資料可能已經違反 data residency，回退時要連同合規證據一起盤點。</p>
<p>回退計畫的要素：</p>
<ul>
<li>重配前估算受影響 range 數量與資料量，換算 rebalance 窗口，選低流量時段執行。</li>
<li>重配 <code>REGIONAL BY ROW</code> 時，分開處理「locality 宣告變更」與「<code>crdb_region</code> 回填」兩個動作，回填走分批 update 並監控 contention。</li>
<li>重配期間監控 rebalance queue 與跨 region traffic，設好「波動超過閾值就暫停 rebalance」的 tripwire。</li>
<li>合規場景下，先盤點錯誤歸屬期間的資料是否已違規，再決定回填策略與是否需要合規通報。</li>
</ul>
<p>Anti-recommendation：不要在 production 高峰時段直接對大 table 改 locality 試效果。locality 是「上線前依業務形狀想清楚再設」的決策，不是「線上 A/B 試」的旋鈕。</p>
<h3 id="cross-region-join-跑爆-latency">Cross-region join 跑爆 latency</h3>
<p>兩張 <code>REGIONAL BY ROW</code> table join，若 join key 不保證兩邊 row 在同 region，planner 要跨 region 拉資料，p99 暴漲。</p>
<p>修法：兩張 table 用同一個歸屬 key（如 user_id），讓 join 對應的 row co-locate 在同 region；無法 co-locate 時，對容忍 stale 的查詢改走 follower read。</p>
<h2 id="容量與觀測">容量與觀測</h2>
<h3 id="必看-metric">必看 metric</h3>
<ul>
<li><code>Cross-region query count</code>：locality 是否生效的直接訊號，數值高代表查詢在跨區拉資料。</li>
<li><code>Leaseholder distribution by region</code>：leaseholder 是否落在資料歸屬 region，不均代表 locality 配置或 <code>crdb_region</code> 有偏。</li>
<li><code>Rebalance queue size</code>：locality 重配 / 副本搬遷期間的進度訊號，持續非零代表 rebalance 未收斂。</li>
<li><code>Cross-region network bytes</code>：<code>GLOBAL</code> table 寫入與 cross-region join 的成本訊號。</li>
</ul>
<h3 id="容量判讀">容量判讀</h3>
<ul>
<li><code>GLOBAL</code> table 的跨區寫入成本 ≈ 寫入 QPS × region 數，region 越多成本越高，所以 <code>GLOBAL</code> 只放低寫入 reference data。</li>
<li><code>REGIONAL BY ROW</code> 的跨區讀成本 ≈ 落到非歸屬 region 的讀 QPS，這部分若高，代表 <code>crdb_region</code> 歸屬與實際讀熱點不一致。</li>
<li>region 數量建議維持精簡 — 每多一個 region，跨區協調與重配窗口都變長。</li>
</ul>
<blockquote>
<p><strong>Scope warning</strong>：region 數量上限建議、單 range 寫入吞吐量級、closed timestamp 傳播間隔等為 vendor 通用估算，非 case 揭露數字，容量規劃前以 <a href="https://www.cockroachlabs.com/docs/stable/multiregion-overview.html">CockroachDB Multi-Region 文件</a> cross-verify 並 benchmark 自身拓樸。</p></blockquote>
<h3 id="回路徑">回路徑</h3>
<ul>
<li><a href="/blog/backend/09-performance-capacity/" data-link-title="模組九：效能工程與容量規劃" data-link-desc="把『目前配置能撐多少、要加多少機器』變成可量化、可驗證、可改進的工程流程">9.5 瓶頸定位流程</a> 判斷 cross-region-bound vs CPU-bound。</li>
<li><a href="/blog/backend/09-performance-capacity/" data-link-title="模組九：效能工程與容量規劃" data-link-desc="把『目前配置能撐多少、要加多少機器』變成可量化、可驗證、可改進的工程流程">9.6 容量規劃模型</a> region count × replica × latency budget。</li>
<li><a href="/blog/backend/knowledge-cards/latency-budget/" data-link-title="Latency Budget" data-link-desc="把 user-perceived latency 拆到每個 stage 的配額、反推架構選擇">latency budget 卡</a> 跨 region quorum 預算。</li>
</ul>
<h2 id="邊界與整合">邊界與整合</h2>
<h3 id="sibling-deep-articles">Sibling deep articles</h3>
<ul>
<li><a href="../locality-aware-schema/">locality-aware schema</a>：schema 怎麼配合 locality 設計 — 合規 boundary、跨州業務邏輯、Outposts 拓樸、<code>crdb_region</code> 作為合規欄位的管理。本文是「三種 locality 怎麼選」、該文是「選好後 schema 怎麼配合」，兩者互補不重複。</li>
<li><a href="../survival-goals/">survival goals</a>：survival goal 的存活機制與 SLO 倒推 — 本文只取「survival goal 與 locality 互動如何影響副本拓樸」這一個交點，存活機制本身以該文為 SSoT。</li>
<li><a href="../hlc-raft-consensus/">HLC + Raft consensus</a>：leaseholder 與 range 機制 — locality 決定 leaseholder 放哪，前置機制在該文。</li>
</ul>
<h3 id="跟-spanner--aurora-對照">跟 Spanner / Aurora 對照</h3>
<p>Spanner 在 GCP region 內做 placement，無 AWS Outposts 等效；Aurora 不支援 row-level locality，跨 region 只能 cluster-per-region + async replication。完整三家 distributed SQL 在 multi-region placement 的選型對比，是 <a href="../aurora-dsql-spanner-decision-tree/">aurora-dsql-spanner-decision-tree</a> 的 SSoT，本文不重展三方對比。</p>
<h3 id="1x-章節互引">1.x 章節互引</h3>
<ul>
<li><a href="/blog/backend/01-database/global-distributed-oltp/" data-link-title="1.11 全球分散式 OLTP" data-link-desc="Spanner / Aurora DSQL / Cosmos DB multi-region write / CockroachDB / TiDB 的全球一致性取捨">1.11 全球分散式 OLTP</a> 上游 latency / 一致性取捨。</li>
<li><a href="/blog/backend/knowledge-cards/stale-read/" data-link-title="Stale Read" data-link-desc="讀取到落後於最新寫入版本的舊資料">stale read 卡</a>、<a href="/blog/backend/knowledge-cards/follower-read/" data-link-title="Follower Read" data-link-desc="分散式 SQL 從 non-voting replica 讀 closed timestamp 之前的資料、不參與 Raft commit、低 latency 但 read-after-write 場景仍可能 stale">follower read 卡</a> — <code>GLOBAL</code> 與跨區讀的一致性語意。</li>
</ul>
<h3 id="何時不用本文">何時不用本文</h3>
<ul>
<li>single-region 部署：用 default locality 即可，三種 locality 在單區無差異。</li>
<li>從 PostgreSQL 遷到 CockroachDB 的整體流程：見 <a href="/blog/backend/01-database/vendors/postgresql/migrate-to-cockroachdb/" data-link-title="PostgreSQL → CockroachDB：三維皆 High 的多重歸類 migration" data-link-desc="PostgreSQL → CockroachDB 是 Schema / Operational / Paradigm 三維皆 High 的 multi-axis migration、實證 [#127](/report/content-structure-by-max-diff-dimension/) 的「多重歸類跟 tie-breaking」規則；主結構走 Type E paradigm shift、Schema 差 &#43; Operational redesign 抽出獨立段；涵蓋 transaction model 重設計、SQL dialect gap、5 個 production 踩雷">PostgreSQL → CockroachDB migration</a>，本文只處理遷移後的 table locality 配置。</li>
</ul>
<h2 id="相關連結">相關連結</h2>
<ul>
<li><a href="/blog/backend/01-database/vendors/cockroachdb/" data-link-title="CockroachDB" data-link-desc="分散式 SQL、PostgreSQL 相容、跨區強一致、Spanner 的開源 / 跨雲替代">CockroachDB vendor overview</a></li>
<li><a href="/blog/backend/09-performance-capacity/cases/netflix-cockroachdb-multi-region-fleet/" data-link-title="9.C40 Netflix：380&#43; CockroachDB cluster 的 multi-active 拓樸艦隊" data-link-desc="Netflix 把 Cassandra 不夠用的 transactional workload 移到 CockroachDB、380&#43; cluster / 60&#43; 跨 region、含 Open Connect、studio cloud drive、gaming control plane">9.C40 Netflix</a>（multi-region 動機是 survival 非 latency）</li>
<li><a href="/blog/backend/09-performance-capacity/cases/hard-rock-digital-cockroachdb-sports-betting/" data-link-title="9.C41 Hard Rock Digital：CockroachDB on AWS Outposts、Wire Act 合規 &#43; 跨州單一邏輯 DB" data-link-desc="Hard Rock Digital 用 CockroachDB 跨 AWS Outposts &#43; US-East-1、Wire Act 強制資料留州、單一邏輯 DB 解多州 sportsbook、100 node 32 vCPU 撐 Super Bowl">9.C41 Hard Rock Digital</a>（row-level 歸屬 + 單一邏輯 cluster）</li>
<li><a href="/blog/backend/knowledge-cards/distributed-sql/" data-link-title="Distributed SQL" data-link-desc="把 SQL 與交易語意延伸到多節點與多區域的資料庫形態">distributed SQL 卡</a></li>
<li>官方：<a href="https://www.cockroachlabs.com/docs/stable/table-localities.html">CockroachDB Table Localities</a> / <a href="https://www.cockroachlabs.com/docs/stable/multiregion-overview.html">Multi-Region Overview</a> / <a href="https://www.cockroachlabs.com/docs/stable/follower-reads.html">Follower Reads</a></li>
</ul>
]]></content:encoded></item></channel></rss>