<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/"><channel><title>Sharding on Tarragon</title><link>https://tarrragon.github.io/blog/tags/sharding/</link><description>Recent content in Sharding on Tarragon</description><generator>Hugo -- gohugo.io</generator><language>zh-TW</language><copyright>Tarragon (CC BY 4.0)</copyright><lastBuildDate>Wed, 27 May 2026 00:00:00 +0000</lastBuildDate><atom:link href="https://tarrragon.github.io/blog/tags/sharding/index.xml" rel="self" type="application/rss+xml"/><item><title>MySQL Vitess Sharding：VTGate / VTTablet / VReplication / VSchema 四件套協作</title><link>https://tarrragon.github.io/blog/backend/01-database/vendors/mysql/vitess-sharding/</link><pubDate>Tue, 19 May 2026 00:00:00 +0000</pubDate><guid>https://tarrragon.github.io/blog/backend/01-database/vendors/mysql/vitess-sharding/</guid><description>&lt;blockquote>
&lt;p>本文是 &lt;a href="https://tarrragon.github.io/blog/backend/01-database/vendors/mysql/" data-link-title="MySQL" data-link-desc="高併發網路服務常用關聯式資料庫、Vitess / PlanetScale 分片生態、GitHub / Shopify / Facebook 規模驗證">MySQL&lt;/a> overview 的 implementation-layer deep article。Overview 已說明 MySQL 在 OLTP 譜系的定位、本文聚焦 &lt;em>Vitess sharding&lt;/em> — 4 個 component 協作的完整 sharding 系統。&lt;/p>&lt;/blockquote>
&lt;hr>
&lt;h2 id="問題情境mysql-寫吞吐撞上-single-primary-上限">問題情境：MySQL 寫吞吐撞上 single primary 上限&lt;/h2>
&lt;p>MySQL primary 單機極限大致 50K-100K WPS（依 schema / hardware）。超過這個級別、選項三條：&lt;/p>
&lt;ol>
&lt;li>&lt;em>Application 層 sharding&lt;/em>：每張 table 自己決定怎麼分片、application 寫 routing logic、跨 shard query / migration 都要自己處理&lt;/li>
&lt;li>&lt;em>Vitess&lt;/em>：proxy layer 自動 routing、cross-shard query 可選自動 split、resharding 自動化&lt;/li>
&lt;li>&lt;em>Distributed SQL&lt;/em>（CockroachDB / Spanner / Aurora DSQL）：跟 MySQL 不同 engine、application 改 driver&lt;/li>
&lt;/ol>
&lt;p>選 Vitess 的核心 driver：&lt;em>保留 MySQL wire protocol + 應用層幾乎不必改 + 透明分片&lt;/em>。代價是 4 個 component 的 operational complexity — Vitess 的責任範圍是完整分散式系統，而非單純 proxy。&lt;/p>
&lt;p>閱讀本文前可先對齊 &lt;a href="https://tarrragon.github.io/blog/backend/knowledge-cards/database-sharding/" data-link-title="Database Sharding" data-link-desc="說明資料庫如何依 shard key 分散資料、路由請求與承擔跨 shard 查詢成本">Database Sharding&lt;/a> 的 shard key、routing、resharding 與 cross-shard query 語意；容量失衡時再接 &lt;a href="https://tarrragon.github.io/blog/backend/knowledge-cards/hot-partition/" data-link-title="Hot Partition" data-link-desc="說明分散式 KV / OLTP 中、單一 partition 流量遠超其他的容量問題">Hot Partition&lt;/a>。&lt;/p>
&lt;h2 id="vitess-四件套每個-component-的責任">Vitess 四件套：每個 component 的責任&lt;/h2>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-text" data-lang="text">&lt;span class="line">&lt;span class="ln"> 1&lt;/span>&lt;span class="cl"> ┌─────────────────┐
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 2&lt;/span>&lt;span class="cl"> Application ────→ │ VTGate │ ← 對外 MySQL wire protocol
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 3&lt;/span>&lt;span class="cl"> │ (proxy + parse + route + aggregate) │
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 4&lt;/span>&lt;span class="cl"> └────┬─────┬──────┘
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 5&lt;/span>&lt;span class="cl"> │ │
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 6&lt;/span>&lt;span class="cl"> ┌────────────┘ └──────────────┐
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 7&lt;/span>&lt;span class="cl"> ▼ ▼
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 8&lt;/span>&lt;span class="cl"> ┌──────────────┐ ┌──────────────┐
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 9&lt;/span>&lt;span class="cl"> │ VTTablet │ │ VTTablet │
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">10&lt;/span>&lt;span class="cl"> │ (per-MySQL │ │ (per-MySQL │
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">11&lt;/span>&lt;span class="cl"> │ sidecar) │ │ sidecar) │
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">12&lt;/span>&lt;span class="cl"> └─────┬────────┘ └─────┬────────┘
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">13&lt;/span>&lt;span class="cl"> │ │
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">14&lt;/span>&lt;span class="cl"> ▼ ▼
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">15&lt;/span>&lt;span class="cl"> ┌──────────────┐ ┌──────────────┐
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">16&lt;/span>&lt;span class="cl"> │ MySQL │ │ MySQL │
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">17&lt;/span>&lt;span class="cl"> │ (Shard -80) │ │ (Shard 80-) │
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">18&lt;/span>&lt;span class="cl"> └──────────────┘ └──────────────┘
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">19&lt;/span>&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">20&lt;/span>&lt;span class="cl"> Topology Service (etcd / Consul / ZooKeeper)
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">21&lt;/span>&lt;span class="cl"> ↑↓ 所有 component 共享 metadata
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">22&lt;/span>&lt;span class="cl"> VSchema：keyspace 結構、shard 範圍、Vindex 定義&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h3 id="vtgate--query-routing-layer">VTGate — query routing layer&lt;/h3>
&lt;p>對 application 看起來像 MySQL（同樣 port、同樣 wire protocol、同樣 query 語法）、實際是 stateless proxy。每個 query VTGate：&lt;/p></description><content:encoded><![CDATA[<blockquote>
<p>本文是 <a href="/blog/backend/01-database/vendors/mysql/" data-link-title="MySQL" data-link-desc="高併發網路服務常用關聯式資料庫、Vitess / PlanetScale 分片生態、GitHub / Shopify / Facebook 規模驗證">MySQL</a> overview 的 implementation-layer deep article。Overview 已說明 MySQL 在 OLTP 譜系的定位、本文聚焦 <em>Vitess sharding</em> — 4 個 component 協作的完整 sharding 系統。</p></blockquote>
<hr>
<h2 id="問題情境mysql-寫吞吐撞上-single-primary-上限">問題情境：MySQL 寫吞吐撞上 single primary 上限</h2>
<p>MySQL primary 單機極限大致 50K-100K WPS（依 schema / hardware）。超過這個級別、選項三條：</p>
<ol>
<li><em>Application 層 sharding</em>：每張 table 自己決定怎麼分片、application 寫 routing logic、跨 shard query / migration 都要自己處理</li>
<li><em>Vitess</em>：proxy layer 自動 routing、cross-shard query 可選自動 split、resharding 自動化</li>
<li><em>Distributed SQL</em>（CockroachDB / Spanner / Aurora DSQL）：跟 MySQL 不同 engine、application 改 driver</li>
</ol>
<p>選 Vitess 的核心 driver：<em>保留 MySQL wire protocol + 應用層幾乎不必改 + 透明分片</em>。代價是 4 個 component 的 operational complexity — Vitess 的責任範圍是完整分散式系統，而非單純 proxy。</p>
<p>閱讀本文前可先對齊 <a href="/blog/backend/knowledge-cards/database-sharding/" data-link-title="Database Sharding" data-link-desc="說明資料庫如何依 shard key 分散資料、路由請求與承擔跨 shard 查詢成本">Database Sharding</a> 的 shard key、routing、resharding 與 cross-shard query 語意；容量失衡時再接 <a href="/blog/backend/knowledge-cards/hot-partition/" data-link-title="Hot Partition" data-link-desc="說明分散式 KV / OLTP 中、單一 partition 流量遠超其他的容量問題">Hot Partition</a>。</p>
<h2 id="vitess-四件套每個-component-的責任">Vitess 四件套：每個 component 的責任</h2>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="ln"> 1</span><span class="cl">                        ┌─────────────────┐
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">   Application ────→    │     VTGate      │  ← 對外 MySQL wire protocol
</span></span><span class="line"><span class="ln"> 3</span><span class="cl">                        │  (proxy + parse + route + aggregate)  │
</span></span><span class="line"><span class="ln"> 4</span><span class="cl">                        └────┬─────┬──────┘
</span></span><span class="line"><span class="ln"> 5</span><span class="cl">                             │     │
</span></span><span class="line"><span class="ln"> 6</span><span class="cl">                ┌────────────┘     └──────────────┐
</span></span><span class="line"><span class="ln"> 7</span><span class="cl">                ▼                                 ▼
</span></span><span class="line"><span class="ln"> 8</span><span class="cl">        ┌──────────────┐                  ┌──────────────┐
</span></span><span class="line"><span class="ln"> 9</span><span class="cl">        │   VTTablet   │                  │   VTTablet   │
</span></span><span class="line"><span class="ln">10</span><span class="cl">        │ (per-MySQL   │                  │ (per-MySQL   │
</span></span><span class="line"><span class="ln">11</span><span class="cl">        │  sidecar)    │                  │  sidecar)    │
</span></span><span class="line"><span class="ln">12</span><span class="cl">        └─────┬────────┘                  └─────┬────────┘
</span></span><span class="line"><span class="ln">13</span><span class="cl">              │                                 │
</span></span><span class="line"><span class="ln">14</span><span class="cl">              ▼                                 ▼
</span></span><span class="line"><span class="ln">15</span><span class="cl">        ┌──────────────┐                  ┌──────────────┐
</span></span><span class="line"><span class="ln">16</span><span class="cl">        │    MySQL     │                  │    MySQL     │
</span></span><span class="line"><span class="ln">17</span><span class="cl">        │  (Shard -80) │                  │  (Shard 80-) │
</span></span><span class="line"><span class="ln">18</span><span class="cl">        └──────────────┘                  └──────────────┘
</span></span><span class="line"><span class="ln">19</span><span class="cl">
</span></span><span class="line"><span class="ln">20</span><span class="cl">   Topology Service (etcd / Consul / ZooKeeper)
</span></span><span class="line"><span class="ln">21</span><span class="cl">   ↑↓ 所有 component 共享 metadata
</span></span><span class="line"><span class="ln">22</span><span class="cl">   VSchema：keyspace 結構、shard 範圍、Vindex 定義</span></span></code></pre></div><h3 id="vtgate--query-routing-layer">VTGate — query routing layer</h3>
<p>對 application 看起來像 MySQL（同樣 port、同樣 wire protocol、同樣 query 語法）、實際是 stateless proxy。每個 query VTGate：</p>
<ol>
<li>Parse SQL → 找出 routing key（從 WHERE column 拿）</li>
<li>查 VSchema → 計算 routing key 對應的 shard</li>
<li>把 query 送該 shard 的 VTTablet</li>
<li>等 response、aggregate（如果是 cross-shard query）、回 application</li>
</ol>
<p>Stateless 設計 → VTGate 可以隨意 scale、放 N 個前面接 LB。多數 production 部署 3-10 個 VTGate per region。</p>
<h3 id="vttablet--per-mysql-agent">VTTablet — per-MySQL agent</h3>
<p>每個 MySQL instance 旁邊都跑一個 VTTablet。VTTablet 責任：</p>
<ul>
<li>把 MySQL primary 標記、上報給 topology</li>
<li>接 VTGate 的 query、轉發給 local MySQL</li>
<li>跑 <em>connection pool</em>（VTGate 跟 VTTablet 之間少量連線、VTTablet 跟 local MySQL 共享 connection）</li>
<li>跑 <em>query plan cache</em> / <em>transactional consistency check</em></li>
<li>處理 <em>online schema change</em>（Vitess 內建 OSC）</li>
<li>跟 VTOrc（fork of Orchestrator）配合做 failover</li>
</ul>
<p>VTTablet 是 Vitess 跟 MySQL 唯一連接點 — 沒 VTTablet 直接連 MySQL 不在 Vitess 管理下。</p>
<h3 id="vreplication--跨-shard-資料移動">VReplication — 跨 shard 資料移動</h3>
<p>VReplication 是 Vitess <em>跨 shard / 跨 keyspace / 跨 cluster</em> 資料移動引擎、底層用 MySQL binlog。用途：</p>
<ul>
<li><em>Resharding</em>：把 shard -80 拆成 -40 + 40-80、VReplication 自動拆 binlog event 對應 shard</li>
<li><em>Materialized view</em>：cross-shard aggregation 預計算</li>
<li><em>MoveTables</em>：跨 keyspace 移 table（schema-level migration）</li>
<li><em>VStream</em>：CDC、binlog event 對外輸出（可接 Kafka / Debezium）</li>
</ul>
<p>VReplication 的主要使用者是 <em>Vitess operator</em>，它和 application 行為直接相關（resharding 期間有 write split 行為）。</p>
<h3 id="vschema--sharding-metadata">VSchema — sharding metadata</h3>
<p>VSchema 是 keyspace 內 <em>哪張 table 怎麼 shard</em> 的定義、JSON 格式存 topology service。例子：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-json" data-lang="json"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="p">{</span>
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">  <span class="nt">&#34;sharded&#34;</span><span class="p">:</span> <span class="kc">true</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 3</span><span class="cl">  <span class="nt">&#34;vindexes&#34;</span><span class="p">:</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln"> 4</span><span class="cl">    <span class="nt">&#34;hash&#34;</span><span class="p">:</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln"> 5</span><span class="cl">      <span class="nt">&#34;type&#34;</span><span class="p">:</span> <span class="s2">&#34;hash&#34;</span>
</span></span><span class="line"><span class="ln"> 6</span><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="ln"> 7</span><span class="cl">  <span class="p">},</span>
</span></span><span class="line"><span class="ln"> 8</span><span class="cl">  <span class="nt">&#34;tables&#34;</span><span class="p">:</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln"> 9</span><span class="cl">    <span class="nt">&#34;orders&#34;</span><span class="p">:</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">10</span><span class="cl">      <span class="nt">&#34;column_vindexes&#34;</span><span class="p">:</span> <span class="p">[</span>
</span></span><span class="line"><span class="ln">11</span><span class="cl">        <span class="p">{</span>
</span></span><span class="line"><span class="ln">12</span><span class="cl">          <span class="nt">&#34;column&#34;</span><span class="p">:</span> <span class="s2">&#34;user_id&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">13</span><span class="cl">          <span class="nt">&#34;name&#34;</span><span class="p">:</span> <span class="s2">&#34;hash&#34;</span>
</span></span><span class="line"><span class="ln">14</span><span class="cl">        <span class="p">}</span>
</span></span><span class="line"><span class="ln">15</span><span class="cl">      <span class="p">]</span>
</span></span><span class="line"><span class="ln">16</span><span class="cl">    <span class="p">},</span>
</span></span><span class="line"><span class="ln">17</span><span class="cl">    <span class="nt">&#34;users&#34;</span><span class="p">:</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">18</span><span class="cl">      <span class="nt">&#34;column_vindexes&#34;</span><span class="p">:</span> <span class="p">[</span>
</span></span><span class="line"><span class="ln">19</span><span class="cl">        <span class="p">{</span>
</span></span><span class="line"><span class="ln">20</span><span class="cl">          <span class="nt">&#34;column&#34;</span><span class="p">:</span> <span class="s2">&#34;user_id&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">21</span><span class="cl">          <span class="nt">&#34;name&#34;</span><span class="p">:</span> <span class="s2">&#34;hash&#34;</span>
</span></span><span class="line"><span class="ln">22</span><span class="cl">        <span class="p">}</span>
</span></span><span class="line"><span class="ln">23</span><span class="cl">      <span class="p">]</span>
</span></span><span class="line"><span class="ln">24</span><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="ln">25</span><span class="cl">  <span class="p">}</span>
</span></span><span class="line"><span class="ln">26</span><span class="cl"><span class="p">}</span></span></span></code></pre></div><p><code>orders.user_id</code> 跟 <code>users.user_id</code> 用同一個 Vindex（hash）+ 同一個 column → 同 user_id 的 orders + users 落在同 shard、可以 JOIN 不跨 shard。</p>
<h2 id="vindexvitess-的-sharding-function">Vindex：Vitess 的 sharding function</h2>
<p>Vindex 是 Vitess 的 <em>shard key 計算函數</em>。內建多種：</p>
<table>
  <thead>
      <tr>
          <th>Vindex 類型</th>
          <th>計算方式</th>
          <th>適用</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td><code>hash</code></td>
          <td>3DES-based null hash（非 MD5）→ 對應 shard range</td>
          <td>預設、均勻分布、適合 primary key</td>
      </tr>
      <tr>
          <td><code>binary_md5</code></td>
          <td>MD5(binary)</td>
          <td>binary key</td>
      </tr>
      <tr>
          <td><code>unicode_loose_xxhash</code></td>
          <td>xxHash on lowercased unicode</td>
          <td>string key</td>
      </tr>
      <tr>
          <td><code>numeric</code></td>
          <td>直接 numeric value</td>
          <td>連續 numeric range（適合 time-based）</td>
      </tr>
      <tr>
          <td><code>numeric_static_map</code></td>
          <td>預定義 map</td>
          <td>國家 code / region 等少 enum</td>
      </tr>
      <tr>
          <td><code>lookup_hash</code></td>
          <td>透過 lookup table 查 shard</td>
          <td>多個 column 都要 shard、需要二級 index</td>
      </tr>
  </tbody>
</table>
<p>最常用：<code>hash</code>（primary key）+ <code>lookup_hash</code>（secondary access pattern）。</p>
<h2 id="keyspace--shard--tablet-階層">Keyspace / Shard / Tablet 階層</h2>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="ln"> 1</span><span class="cl">Keyspace (邏輯 database)
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">   └── Shards
</span></span><span class="line"><span class="ln"> 3</span><span class="cl">        ├── -80 (shard range 0-128)
</span></span><span class="line"><span class="ln"> 4</span><span class="cl">        │     ├── Primary tablet (1 MySQL primary)
</span></span><span class="line"><span class="ln"> 5</span><span class="cl">        │     ├── Replica tablet × 2
</span></span><span class="line"><span class="ln"> 6</span><span class="cl">        │     └── RDOnly tablet × 1 (analytics)
</span></span><span class="line"><span class="ln"> 7</span><span class="cl">        └── 80- (shard range 128-256)
</span></span><span class="line"><span class="ln"> 8</span><span class="cl">              ├── Primary tablet
</span></span><span class="line"><span class="ln"> 9</span><span class="cl">              ├── Replica tablet × 2
</span></span><span class="line"><span class="ln">10</span><span class="cl">              └── RDOnly tablet × 1</span></span></code></pre></div><p>Shard range 用 <em>binary hex prefix</em>（<code>-80</code> 表示 0 到 0x80、<code>80-</code> 表示 0x80 到 max）— 給 resharding 留 split 餘地（<code>-80</code> 可切成 <code>-40</code> + <code>40-80</code>）。</p>
<p>Tablet type：</p>
<ul>
<li><em>Primary</em>：寫入入口</li>
<li><em>Replica</em>：read traffic（Vitess query rules 控制）</li>
<li><em>RDOnly</em>：純 analytics / backup / VReplication source、低 SLA、不上 production read traffic</li>
</ul>
<h2 id="配置-step-by-steplocal-cluster">配置 step-by-step（local cluster）</h2>
<p>Production 通常用 Kubernetes operator（vitess-operator）部署、但理解概念用 local cluster 最快：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="c1"># 用 vtctldclient 操作（替代舊的 vtctlclient）</span>
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">
</span></span><span class="line"><span class="ln"> 3</span><span class="cl"><span class="c1"># 1. 建 unsharded keyspace</span>
</span></span><span class="line"><span class="ln"> 4</span><span class="cl">vtctldclient CreateKeyspace --durability-policy<span class="o">=</span>semi_sync commerce
</span></span><span class="line"><span class="ln"> 5</span><span class="cl">
</span></span><span class="line"><span class="ln"> 6</span><span class="cl"><span class="c1"># 2. 從一個 MySQL primary 開始（unsharded）</span>
</span></span><span class="line"><span class="ln"> 7</span><span class="cl">vtctldclient ApplySchema --sql<span class="o">=</span><span class="s2">&#34;CREATE TABLE orders (id INT PRIMARY KEY, user_id INT)&#34;</span> commerce
</span></span><span class="line"><span class="ln"> 8</span><span class="cl">
</span></span><span class="line"><span class="ln"> 9</span><span class="cl"><span class="c1"># 3. 把 keyspace 改成 sharded、定義 VSchema</span>
</span></span><span class="line"><span class="ln">10</span><span class="cl">vtctldclient ApplyVSchema --vschema<span class="o">=</span><span class="s1">&#39;{
</span></span></span><span class="line"><span class="ln">11</span><span class="cl"><span class="s1">  &#34;sharded&#34;: true,
</span></span></span><span class="line"><span class="ln">12</span><span class="cl"><span class="s1">  &#34;vindexes&#34;: {&#34;hash&#34;: {&#34;type&#34;: &#34;hash&#34;}},
</span></span></span><span class="line"><span class="ln">13</span><span class="cl"><span class="s1">  &#34;tables&#34;: {
</span></span></span><span class="line"><span class="ln">14</span><span class="cl"><span class="s1">    &#34;orders&#34;: {
</span></span></span><span class="line"><span class="ln">15</span><span class="cl"><span class="s1">      &#34;column_vindexes&#34;: [{&#34;column&#34;: &#34;user_id&#34;, &#34;name&#34;: &#34;hash&#34;}]
</span></span></span><span class="line"><span class="ln">16</span><span class="cl"><span class="s1">    }
</span></span></span><span class="line"><span class="ln">17</span><span class="cl"><span class="s1">  }
</span></span></span><span class="line"><span class="ln">18</span><span class="cl"><span class="s1">}&#39;</span> commerce
</span></span><span class="line"><span class="ln">19</span><span class="cl">
</span></span><span class="line"><span class="ln">20</span><span class="cl"><span class="c1"># 4. 觸發 resharding：unsharded → 2 shards (-80, 80-)</span>
</span></span><span class="line"><span class="ln">21</span><span class="cl">vtctldclient Reshard --workflow<span class="o">=</span>initial-shard create <span class="se">\
</span></span></span><span class="line"><span class="ln">22</span><span class="cl"><span class="se"></span>  --source-shards<span class="o">=</span><span class="s2">&#34;commerce/0&#34;</span> <span class="se">\
</span></span></span><span class="line"><span class="ln">23</span><span class="cl"><span class="se"></span>  --target-shards<span class="o">=</span><span class="s2">&#34;commerce/-80,commerce/80-&#34;</span>
</span></span><span class="line"><span class="ln">24</span><span class="cl">
</span></span><span class="line"><span class="ln">25</span><span class="cl"><span class="c1"># 5. 等資料 copy 完（VReplication 跑）</span>
</span></span><span class="line"><span class="ln">26</span><span class="cl">vtctldclient Workflow --keyspace<span class="o">=</span>commerce show initial-shard
</span></span><span class="line"><span class="ln">27</span><span class="cl">
</span></span><span class="line"><span class="ln">28</span><span class="cl"><span class="c1"># 6. SwitchTraffic：先切 RDOnly → 再切 Replica → 最後切 Primary</span>
</span></span><span class="line"><span class="ln">29</span><span class="cl">vtctldclient Reshard --workflow<span class="o">=</span>initial-shard switchtraffic <span class="se">\
</span></span></span><span class="line"><span class="ln">30</span><span class="cl"><span class="se"></span>  --tablet-types<span class="o">=</span><span class="s2">&#34;rdonly,replica&#34;</span>
</span></span><span class="line"><span class="ln">31</span><span class="cl">vtctldclient Reshard --workflow<span class="o">=</span>initial-shard switchtraffic <span class="se">\
</span></span></span><span class="line"><span class="ln">32</span><span class="cl"><span class="se"></span>  --tablet-types<span class="o">=</span><span class="s2">&#34;primary&#34;</span>
</span></span><span class="line"><span class="ln">33</span><span class="cl">
</span></span><span class="line"><span class="ln">34</span><span class="cl"><span class="c1"># 7. 完成、cleanup old shard</span>
</span></span><span class="line"><span class="ln">35</span><span class="cl">vtctldclient Reshard --workflow<span class="o">=</span>initial-shard complete</span></span></code></pre></div><p>實際 production 走 <em>Vitess Kubernetes operator</em>、用 <code>VitessCluster</code> CRD 宣告 desired state、operator 自動操作上面這些 step。</p>
<h2 id="5-個-production-踩雷">5 個 Production 踩雷</h2>
<h3 id="1-cross-shard-transaction--vitess-不支援-atomic預設">1. Cross-shard transaction — Vitess 不支援 atomic（預設）</h3>
<p>兩個 user 的 order 在不同 shard、<code>BEGIN; UPDATE orders WHERE user_id=1; UPDATE orders WHERE user_id=2; COMMIT;</code> 跨兩個 shard。Vitess 預設 <em>不保證 atomic</em> — 兩個 shard 各自 commit、可能一個成功一個失敗、application 看到 partial state。</p>
<p>修法：</p>
<ul>
<li><em>避免 cross-shard transaction</em>：schema design 讓 transaction boundary 落在單一 shard 內</li>
<li>啟用 <em>atomic 2-phase commit</em>（Vitess <code>transaction_mode=TWOPC</code>、實驗性、performance penalty 大）</li>
<li>大規模需要 atomic 的場景應該換 distributed SQL（CockroachDB / Spanner），讓資料庫層承擔跨節點一致性</li>
</ul>
<h3 id="2-vstream-lag--resharding-期間-cdc-落後">2. VStream lag — Resharding 期間 CDC 落後</h3>
<p>Resharding 過程 VReplication 大量寫 binlog event、application <em>本來在用</em> 的 VStream（接 Kafka 等）共享同 binlog stream、可能 lag。Downstream consumer 看到 stale data 1-2 小時。</p>
<p>修法：</p>
<ul>
<li>Resharding 期間 <em>暫停非關鍵 VStream</em>（analytics ETL 可暫停、real-time recommendation 需要保留）</li>
<li>確認 binlog disk capacity &gt; resharding 期間預估 binlog 量 × 2（buffer）</li>
<li>Resharding 完成後 <em>手動驗證</em> VStream offset 已 catch up，把驗證結果留成 cutover evidence</li>
</ul>
<h3 id="3-vindex-不均勻--hot-shard">3. Vindex 不均勻 — Hot shard</h3>
<p>Vindex 預設 <code>hash</code> 對 <em>primary key 均勻分布</em>、但對 <em>natural key</em>（country / region / company_id 等）可能不均勻。10 個 country、其中 1 個 country 佔 80% traffic、單一 shard 永遠 hot。</p>
<p>修法：</p>
<ul>
<li><em>Composite Vindex</em>：combine <code>country + user_id</code> 兩 column 作為 shard key、user-level 仍均勻</li>
<li><em>Synthetic shard key</em>：application 層加 <code>sharding_key=hash(actual_key) % N</code>、控制分布</li>
<li>監控 <em>per-shard QPS</em>：<code>vtctldclient ShowVDiff</code> + Prometheus exporter</li>
<li>Hot shard 出現後 Vitess 可以 resharding 解（split hot shard 為 2 個小 shard）、但工作量大</li>
</ul>
<h3 id="4-resharding-切流量瞬間-deadlock">4. Resharding 切流量瞬間 deadlock</h3>
<p>Resharding 最後的 SwitchTraffic 切 primary 階段、舊 shard 仍接 write、Vitess 切 routing、Application 一瞬間連兩個 shard、相同 user_id 寫入可能跑兩邊、deadlock 或 lost update。</p>
<p>修法：</p>
<ul>
<li><em>SwitchTraffic 用 ReverseTraffic 預備</em>：先 switch、確認問題後可 reverse 回去</li>
<li>切流量 <em>只在 known quiet period</em>（夜間 / 週末早上）</li>
<li>VTGate <code>--retry-count=2</code> + <code>--track-vtgate-deadlock-events</code>：deadlock 自動 retry、不暴露給 application</li>
<li>真的失敗用 <code>Reshard cancel</code> 回 old state，讓 workflow 回到可驗證狀態</li>
</ul>
<h3 id="5-vreplication-workflow-卡住--cancel-前需要保護狀態">5. VReplication workflow 卡住 — cancel 前需要保護狀態</h3>
<p>VReplication workflow 跑到 50% 但 <em>某個 row 解析錯誤</em>（schema mismatch / blob 大小超過 limit）、workflow stuck、進度條卡住、無 timeout。整個 resharding flow halt。</p>
<p>修法：</p>
<ul>
<li>平時跑 <em>staging 資料 dry-run</em>、發現 schema 跟 blob 邊界問題</li>
<li>Workflow 卡住時 <code>vtctldclient Workflow show</code> 看 last_message / row_state</li>
<li>手動修問題 row（直接 MySQL 改）後 <em>resume workflow</em></li>
<li>大 cluster 建議 <em>VReplication 跑前先 SchemaApply audit</em>、確認 source / target schema 兼容</li>
</ul>
<h2 id="vitess-跟自管-sharding-對照">Vitess 跟自管 sharding 對照</h2>
<table>
  <thead>
      <tr>
          <th>維度</th>
          <th>Vitess</th>
          <th>Application-level sharding</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>Application 改動</td>
          <td>幾乎不必（保留 MySQL wire）</td>
          <td>大改（routing logic 寫 application）</td>
      </tr>
      <tr>
          <td>Cross-shard query</td>
          <td>VTGate 自動 split（受限）</td>
          <td>Application 自己處理</td>
      </tr>
      <tr>
          <td>Resharding</td>
          <td>VReplication 自動</td>
          <td>手寫腳本、操作複雜</td>
      </tr>
      <tr>
          <td>Online schema change</td>
          <td>Vitess 內建（VReplication-based）</td>
          <td>用 gh-ost / pt-osc</td>
      </tr>
      <tr>
          <td>Failover</td>
          <td>VTOrc 整合</td>
          <td>自管 Orchestrator</td>
      </tr>
      <tr>
          <td>Operational cost</td>
          <td>高（4 component 要懂）</td>
          <td>中（fewer abstractions、但 application logic 多）</td>
      </tr>
      <tr>
          <td>Cross-keyspace 共用 vindex</td>
          <td>內建（lookup_hash 跨 keyspace）</td>
          <td>自寫</td>
      </tr>
  </tbody>
</table>
<p>Vitess 的 <em>operational complexity</em> 是它的代價。10-20 人 SRE 團隊撐得住、5 人團隊用 <em>managed Vitess（PlanetScale）</em> 更實際。</p>
<h2 id="跟其他模組整合">跟其他模組整合</h2>
<h3 id="跟-replication-topology">跟 Replication topology</h3>
<p>Vitess shard 內部仍用 MySQL replication（<a href="/blog/backend/01-database/vendors/mysql/replication-topology/" data-link-title="MySQL Replication Topology：async / semi-sync / GTID 不是三選一、是三個 trade-off 軸的疊加" data-link-desc="MySQL replication 不是「選 async 還是 semi-sync」、是 *durability / latency / consistency* 三個 trade-off 軸的疊加；GTID 是跨 mode 的 infrastructure layer、不是第三種 mode。本文走 3 軸取捨模型 → async / semi-sync 行為對比 → GTID 替代 binlog-position 的好處 → 配置 step-by-step → 5 production 踩雷（lag 暴衝 / semi-sync 退回 async / GTID gap / Loss-Less semi-sync 真的 loss-less / chained replication 雪崩）→ 跟 Aurora MySQL / Vitess / ProxySQL / Orchestrator 整合">Replication Topology</a>）— 每個 shard 有 primary + replica + rdonly。Vitess durability-policy 控制 primary 寫入是否等 replica ack（semi-sync）。</p>
<h3 id="跟-osc-tool">跟 OSC tool</h3>
<p>Vitess <em>不用 gh-ost / pt-osc</em>、用 VReplication-based online DDL。Vitess online DDL：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">vtctldclient ApplySchema --strategy<span class="o">=</span>vitess <span class="se">\
</span></span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="se"></span>  --sql<span class="o">=</span><span class="s2">&#34;ALTER TABLE orders ADD COLUMN status VARCHAR(20)&#34;</span> commerce</span></span></code></pre></div><p>詳見 <a href="/blog/backend/01-database/vendors/mysql/online-schema-change-tools/" data-link-title="MySQL Online Schema Change：gh-ost 跟 pt-online-schema-change 兩條完全不同的 ghost table 路徑" data-link-desc="MySQL ALTER TABLE 可能鎖整張表，production 需要 online schema change 流程。gh-ost（GitHub）跟 pt-online-schema-change（Percona）都用 ghost table 解決、但底層機制完全不同：pt-osc 用 trigger 同步、gh-ost 用 binlog stream 同步。本文走兩工具機制對照表 → trigger vs binlog 各自取捨 → 配置 step-by-step → 5 production 踩雷（trigger overhead / binlog 延遲 / FK constraint / hot trigger lock / 切換瞬間 deadlock）→ 何時用哪一個">Online Schema Change Tools</a>。</p>
<h3 id="跟-proxysql">跟 ProxySQL</h3>
<p><em>Vitess 取代 ProxySQL</em>。VTGate 本身做 connection pool + query routing、不再需要 ProxySQL。混用會造成 routing 衝突（VTGate 期待自己決定 shard、ProxySQL 跟 VTGate 競爭）。詳見 <a href="/blog/backend/01-database/vendors/mysql/proxysql-config/" data-link-title="MySQL ProxySQL 配置：connection / query / route / response 四段 lifecycle 跟 query rule 設計" data-link-desc="ProxySQL 是 MySQL 生態的 connection pool &#43; query routing 標準。本文走 connection → query parse → route → response 四段 lifecycle、query rule engine 的 rule chain 設計、Hostgroup / Server / User 三層 schema、配置 step-by-step（讀寫分離 &#43; replica lag-aware routing）、5 production 踩雷（query rule 順序錯亂 / connection 漂移 / write 路由到 replica / runtime / disk schema drift / mirror traffic 副作用）、跟 Replication / Orchestrator / HAProxy 整合">ProxySQL 配置</a>。</p>
<h3 id="跟-orchestrator">跟 Orchestrator</h3>
<p>Vitess 用 <em>VTOrc</em>（fork of Orchestrator）作 failover、跟 Vitess topology metadata 整合。不用獨立 Orchestrator。詳見 <a href="/blog/backend/01-database/vendors/mysql/orchestrator-failover/" data-link-title="MySQL Orchestrator Failover：HA 工具自己怎麼 HA？raft cluster &#43; GTID-based promotion 的兩段 paradox" data-link-desc="Orchestrator 是 MySQL HA 自動 failover 的 de facto standard、但讀者第一個問題往往是「HA 工具自己會壞嗎」。本文走 Orchestrator 的雙層架構（管 MySQL 的 raft cluster &#43; 被 raft 管的 orchestrator instance）→ topology discovery → failure detection → failover decision tree → promote action → 5 production 踩雷（split-brain 跟 fencing / pre-failover hook 失敗 / anti-flapping window / GTID errant transaction / VIP 跟 ProxySQL 整合斷層）→ 跟 ProxySQL / Patroni / RDS 對比">Orchestrator failover 設計</a>。</p>
<h3 id="跟-planetscalemanaged-vitess">跟 PlanetScale（managed Vitess）</h3>
<p>PlanetScale 是 <em>Vitess managed service</em>、隱藏 4 component operational complexity、加 branch-based schema workflow。詳見 <a href="/blog/backend/01-database/vendors/mysql/migrate-to-planetscale/" data-link-title="MySQL → PlanetScale：managed Vitess &#43; branch-based schema workflow 的 hybrid shift" data-link-desc="自管 MySQL → PlanetScale 加上 Vitess sharding 跟 branch-based schema workflow。本文走 6 維 audit（Paradigm &#43; Operational &#43; Schema 多軸）、4-phase migration、5 production 踩雷、何時不要遷。">PlanetScale migration playbook</a>。</p>
<h3 id="跟-aurora-mysql">跟 Aurora MySQL</h3>
<p>Aurora 跟 Vitess 是 <em>不同 scale 路徑</em>：</p>
<ul>
<li>Aurora：single-region scaling（storage / compute 分離、最高 ~128 TB）</li>
<li>Vitess：horizontal sharding（無上限、靠加 shard scaling）</li>
</ul>
<p>兩者承擔的容量與操作責任不同。超過 Aurora single-region 上限的場景才考慮 Vitess。詳見 <a href="/blog/backend/01-database/vendors/aurora/" data-link-title="AWS Aurora" data-link-desc="AWS managed PostgreSQL / MySQL、storage / compute 分離、&#43;75% 效能改善的 production 證據">Aurora vendor page</a>。</p>
<h2 id="production-caseyoutube--vitess">Production case：YouTube / Vitess</h2>
<p>Vitess 的 production 責任是把 MySQL shard 拓撲變成應用可查詢、可遷移、可操作的資料庫層。YouTube / Vitess 的公開歷史提供的工程訊號是 VTGate、VTTablet、VReplication 與 VSchema 這組元件分工：application query 進 VTGate、tablet 層包住 MySQL、VSchema 描述 routing / sharding 規則、VReplication 支援 resharding 與資料搬移。</p>
<p>這個案例要回收到三個操作判準。第一，Vitess 是一套 database control plane，而非單一 proxy；導入時要把 topology service、tablet lifecycle、backup、failover 與 schema workflow 一起納入 ownership。第二，VSchema 是 application contract，shard key、lookup vindex 與 cross-shard query 都會影響產品功能設計。第三，VReplication 讓 resharding 可操作，但它仍需要 capacity window、backfill 監控與 cutover plan。</p>
<p>Vitess 的 sibling 路由是 <a href="/blog/backend/01-database/vendors/postgresql/citus-distributed/" data-link-title="PostgreSQL Citus Distributed：用 extension 把 PG 變成 sharded cluster" data-link-desc="Citus 是 PG extension、把單機 PG 變成 *coordinator &#43; worker* sharded cluster、保留 PG SQL &#43; 加 distributed table &#43; reference table &#43; columnar storage。本文走 Citus 架構（coordinator / worker / distribution column）、3 種 table type（distributed / reference / local）、配置 step-by-step、5 production 踩雷（distribution column 選錯 / cross-shard transaction / reference table 過大 / colocate 不對齊 / worker failover）、跟 MySQL Vitess sharding sibling 對比">PostgreSQL Citus Distributed</a> 與 <a href="/blog/backend/01-database/global-distributed-oltp/" data-link-title="1.11 全球分散式 OLTP" data-link-desc="Spanner / Aurora DSQL / Cosmos DB multi-region write / CockroachDB / TiDB 的全球一致性取捨">1.11 全球分散式 OLTP</a>。Citus 保留 PostgreSQL 生態並用 coordinator / worker 拆分資料；CockroachDB / Spanner 則用 distributed SQL 重新定義交易與一致性邊界。選型時要先判斷自己是在延伸 MySQL 投資，還是在重新選 global OLTP model。</p>
<h2 id="何時用-vitess">何時用 Vitess</h2>
<table>
  <thead>
      <tr>
          <th>條件</th>
          <th>評估</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>流量 &gt; 50K WPS、單 primary 撐不住</td>
          <td>是 Vitess scope</td>
      </tr>
      <tr>
          <td>已有大量 MySQL 投資、不想換 distributed SQL</td>
          <td>是</td>
      </tr>
      <tr>
          <td>有 5-10 人 SRE / DBA 團隊</td>
          <td>是</td>
      </tr>
      <tr>
          <td>流量 &lt; 10K WPS</td>
          <td>否（過度設計、用單 MySQL + replica）</td>
      </tr>
      <tr>
          <td>5 人團隊、不想養 DBA</td>
          <td>否（用 PlanetScale managed）</td>
      </tr>
      <tr>
          <td>必須 multi-region 強一致 transaction</td>
          <td>否（CockroachDB / Spanner 才對）</td>
      </tr>
      <tr>
          <td>需要複雜 cross-shard analytics</td>
          <td>否（搭配 BigQuery / Snowflake）</td>
      </tr>
  </tbody>
</table>
<h2 id="相關連結">相關連結</h2>
<ul>
<li><a href="/blog/backend/01-database/vendors/mysql/" data-link-title="MySQL" data-link-desc="高併發網路服務常用關聯式資料庫、Vitess / PlanetScale 分片生態、GitHub / Shopify / Facebook 規模驗證">MySQL vendor overview</a></li>
<li><a href="/blog/backend/01-database/vendors/mysql/replication-topology/" data-link-title="MySQL Replication Topology：async / semi-sync / GTID 不是三選一、是三個 trade-off 軸的疊加" data-link-desc="MySQL replication 不是「選 async 還是 semi-sync」、是 *durability / latency / consistency* 三個 trade-off 軸的疊加；GTID 是跨 mode 的 infrastructure layer、不是第三種 mode。本文走 3 軸取捨模型 → async / semi-sync 行為對比 → GTID 替代 binlog-position 的好處 → 配置 step-by-step → 5 production 踩雷（lag 暴衝 / semi-sync 退回 async / GTID gap / Loss-Less semi-sync 真的 loss-less / chained replication 雪崩）→ 跟 Aurora MySQL / Vitess / ProxySQL / Orchestrator 整合">MySQL Replication Topology</a>（Vitess shard 內部）</li>
<li><a href="/blog/backend/01-database/vendors/mysql/online-schema-change-tools/" data-link-title="MySQL Online Schema Change：gh-ost 跟 pt-online-schema-change 兩條完全不同的 ghost table 路徑" data-link-desc="MySQL ALTER TABLE 可能鎖整張表，production 需要 online schema change 流程。gh-ost（GitHub）跟 pt-online-schema-change（Percona）都用 ghost table 解決、但底層機制完全不同：pt-osc 用 trigger 同步、gh-ost 用 binlog stream 同步。本文走兩工具機制對照表 → trigger vs binlog 各自取捨 → 配置 step-by-step → 5 production 踩雷（trigger overhead / binlog 延遲 / FK constraint / hot trigger lock / 切換瞬間 deadlock）→ 何時用哪一個">MySQL Online Schema Change Tools</a>（Vitess 不用 gh-ost / pt-osc）</li>
<li><a href="/blog/backend/01-database/vendors/mysql/proxysql-config/" data-link-title="MySQL ProxySQL 配置：connection / query / route / response 四段 lifecycle 跟 query rule 設計" data-link-desc="ProxySQL 是 MySQL 生態的 connection pool &#43; query routing 標準。本文走 connection → query parse → route → response 四段 lifecycle、query rule engine 的 rule chain 設計、Hostgroup / Server / User 三層 schema、配置 step-by-step（讀寫分離 &#43; replica lag-aware routing）、5 production 踩雷（query rule 順序錯亂 / connection 漂移 / write 路由到 replica / runtime / disk schema drift / mirror traffic 副作用）、跟 Replication / Orchestrator / HAProxy 整合">MySQL ProxySQL 配置</a>（Vitess 取代 ProxySQL）</li>
<li><a href="/blog/backend/01-database/vendors/mysql/orchestrator-failover/" data-link-title="MySQL Orchestrator Failover：HA 工具自己怎麼 HA？raft cluster &#43; GTID-based promotion 的兩段 paradox" data-link-desc="Orchestrator 是 MySQL HA 自動 failover 的 de facto standard、但讀者第一個問題往往是「HA 工具自己會壞嗎」。本文走 Orchestrator 的雙層架構（管 MySQL 的 raft cluster &#43; 被 raft 管的 orchestrator instance）→ topology discovery → failure detection → failover decision tree → promote action → 5 production 踩雷（split-brain 跟 fencing / pre-failover hook 失敗 / anti-flapping window / GTID errant transaction / VIP 跟 ProxySQL 整合斷層）→ 跟 ProxySQL / Patroni / RDS 對比">MySQL Orchestrator failover</a>（VTOrc fork）</li>
<li><a href="/blog/backend/01-database/vendors/postgresql/citus-distributed/" data-link-title="PostgreSQL Citus Distributed：用 extension 把 PG 變成 sharded cluster" data-link-desc="Citus 是 PG extension、把單機 PG 變成 *coordinator &#43; worker* sharded cluster、保留 PG SQL &#43; 加 distributed table &#43; reference table &#43; columnar storage。本文走 Citus 架構（coordinator / worker / distribution column）、3 種 table type（distributed / reference / local）、配置 step-by-step、5 production 踩雷（distribution column 選錯 / cross-shard transaction / reference table 過大 / colocate 不對齊 / worker failover）、跟 MySQL Vitess sharding sibling 對比">PostgreSQL Citus Distributed</a>（PG sibling、coordinator + worker 模型 vs Vitess VTGate + tablet）</li>
<li><a href="/blog/backend/01-database/global-distributed-oltp/" data-link-title="1.11 全球分散式 OLTP" data-link-desc="Spanner / Aurora DSQL / Cosmos DB multi-region write / CockroachDB / TiDB 的全球一致性取捨">1.11 全球分散式 OLTP</a>（Vitess vs CockroachDB vs Spanner）</li>
<li><a href="/blog/backend/knowledge-cards/database-sharding/" data-link-title="Database Sharding" data-link-desc="說明資料庫如何依 shard key 分散資料、路由請求與承擔跨 shard 查詢成本">Database Sharding</a>（shard key、routing、resharding 與 cross-shard query）</li>
<li>官方：<a href="https://vitess.io/docs/">Vitess Documentation</a> / <a href="https://github.com/planetscale/vitess-operator">Vitess Operator</a></li>
</ul>
]]></content:encoded></item><item><title>PostgreSQL Citus Distributed：用 extension 把 PG 變成 sharded cluster</title><link>https://tarrragon.github.io/blog/backend/01-database/vendors/postgresql/citus-distributed/</link><pubDate>Tue, 19 May 2026 00:00:00 +0000</pubDate><guid>https://tarrragon.github.io/blog/backend/01-database/vendors/postgresql/citus-distributed/</guid><description>&lt;blockquote>
&lt;p>本文是 &lt;a href="https://tarrragon.github.io/blog/backend/01-database/vendors/postgresql/" data-link-title="PostgreSQL" data-link-desc="多用途 OLTP 主流關聯式資料庫、MVCC、豐富 SQL 特性、是 Aurora / Cosmos DB / Spanner / CockroachDB / Aurora DSQL 的相容目標">PostgreSQL&lt;/a> overview 的 implementation-layer deep article。Overview 已說明 PG 在 OLTP 譜系的定位、本文聚焦 &lt;em>Citus distributed extension&lt;/em> — 把 PG 變成 sharded cluster 的方式。&lt;/p>&lt;/blockquote>
&lt;hr>
&lt;p>當 PG single-primary 寫吞吐撞上單機極限（50K-100K WPS）、選項三條：&lt;/p>
&lt;ol>
&lt;li>&lt;strong>Application 層 sharding&lt;/strong>：應用層自管 shard routing&lt;/li>
&lt;li>&lt;strong>Citus&lt;/strong>：PG extension、自動 routing + cross-shard query&lt;/li>
&lt;li>&lt;strong>Distributed SQL&lt;/strong>（CockroachDB / Aurora DSQL / Spanner）：不同 engine&lt;/li>
&lt;/ol>
&lt;p>選 Citus 的核心 driver：&lt;em>保留 PG SQL syntax + extension 生態&lt;/em>。但「應用層幾乎不必改」是樂觀說法 — 實際上 application 必須圍繞 distribution column 重設計（query 加 filter / transaction 限定同 shard / reference table 量控制）、跟 Vitess 比 cross-shard query 自動化弱。代價是 &lt;em>coordinator / worker 部署複雜度 + cross-shard query 限制 + application schema 改造工作量&lt;/em>。&lt;/p>
&lt;p>閱讀本文前可先對齊 &lt;a href="https://tarrragon.github.io/blog/backend/knowledge-cards/database-sharding/" data-link-title="Database Sharding" data-link-desc="說明資料庫如何依 shard key 分散資料、路由請求與承擔跨 shard 查詢成本">Database Sharding&lt;/a> 的 shard key、routing、resharding 與 cross-shard query 語意；容量失衡時再接 &lt;a href="https://tarrragon.github.io/blog/backend/knowledge-cards/hot-partition/" data-link-title="Hot Partition" data-link-desc="說明分散式 KV / OLTP 中、單一 partition 流量遠超其他的容量問題">Hot Partition&lt;/a>。&lt;/p>
&lt;p>跟 &lt;a href="https://tarrragon.github.io/blog/backend/01-database/vendors/mysql/vitess-sharding/" data-link-title="MySQL Vitess Sharding：VTGate / VTTablet / VReplication / VSchema 四件套協作" data-link-desc="Vitess 不只是 MySQL sharding proxy、是 4 個 component 協作的完整 sharding 系統 — VTGate（query routing layer）、VTTablet（per-MySQL agent）、VReplication（跨 shard 資料移動）、VSchema（sharding metadata）。本文走 4 件套各自責任、keyspace / shard / tablet 架構、shard key 設計（Vindex）、配置 step-by-step、5 production 踩雷（cross-shard transaction / VStream lag / Vindex 不均勻 / resharding 切流 / VReplication 卡住）、跟自管 sharding 跟 PlanetScale 的對比">MySQL Vitess sharding&lt;/a> 的核心差異：Citus 是 &lt;em>PG extension&lt;/em>（PG 自己跑）、Vitess 是 &lt;em>獨立 proxy + tablet 系統&lt;/em>（包 MySQL）。Citus 用 PG 原生機制（FDW / extension hook）、Vitess 是 &lt;em>外部包裝&lt;/em>。&lt;/p></description><content:encoded><![CDATA[<blockquote>
<p>本文是 <a href="/blog/backend/01-database/vendors/postgresql/" data-link-title="PostgreSQL" data-link-desc="多用途 OLTP 主流關聯式資料庫、MVCC、豐富 SQL 特性、是 Aurora / Cosmos DB / Spanner / CockroachDB / Aurora DSQL 的相容目標">PostgreSQL</a> overview 的 implementation-layer deep article。Overview 已說明 PG 在 OLTP 譜系的定位、本文聚焦 <em>Citus distributed extension</em> — 把 PG 變成 sharded cluster 的方式。</p></blockquote>
<hr>
<p>當 PG single-primary 寫吞吐撞上單機極限（50K-100K WPS）、選項三條：</p>
<ol>
<li><strong>Application 層 sharding</strong>：應用層自管 shard routing</li>
<li><strong>Citus</strong>：PG extension、自動 routing + cross-shard query</li>
<li><strong>Distributed SQL</strong>（CockroachDB / Aurora DSQL / Spanner）：不同 engine</li>
</ol>
<p>選 Citus 的核心 driver：<em>保留 PG SQL syntax + extension 生態</em>。但「應用層幾乎不必改」是樂觀說法 — 實際上 application 必須圍繞 distribution column 重設計（query 加 filter / transaction 限定同 shard / reference table 量控制）、跟 Vitess 比 cross-shard query 自動化弱。代價是 <em>coordinator / worker 部署複雜度 + cross-shard query 限制 + application schema 改造工作量</em>。</p>
<p>閱讀本文前可先對齊 <a href="/blog/backend/knowledge-cards/database-sharding/" data-link-title="Database Sharding" data-link-desc="說明資料庫如何依 shard key 分散資料、路由請求與承擔跨 shard 查詢成本">Database Sharding</a> 的 shard key、routing、resharding 與 cross-shard query 語意；容量失衡時再接 <a href="/blog/backend/knowledge-cards/hot-partition/" data-link-title="Hot Partition" data-link-desc="說明分散式 KV / OLTP 中、單一 partition 流量遠超其他的容量問題">Hot Partition</a>。</p>
<p>跟 <a href="/blog/backend/01-database/vendors/mysql/vitess-sharding/" data-link-title="MySQL Vitess Sharding：VTGate / VTTablet / VReplication / VSchema 四件套協作" data-link-desc="Vitess 不只是 MySQL sharding proxy、是 4 個 component 協作的完整 sharding 系統 — VTGate（query routing layer）、VTTablet（per-MySQL agent）、VReplication（跨 shard 資料移動）、VSchema（sharding metadata）。本文走 4 件套各自責任、keyspace / shard / tablet 架構、shard key 設計（Vindex）、配置 step-by-step、5 production 踩雷（cross-shard transaction / VStream lag / Vindex 不均勻 / resharding 切流 / VReplication 卡住）、跟自管 sharding 跟 PlanetScale 的對比">MySQL Vitess sharding</a> 的核心差異：Citus 是 <em>PG extension</em>（PG 自己跑）、Vitess 是 <em>獨立 proxy + tablet 系統</em>（包 MySQL）。Citus 用 PG 原生機制（FDW / extension hook）、Vitess 是 <em>外部包裝</em>。</p>
<h2 id="citus-架構coordinator--worker">Citus 架構：Coordinator + Worker</h2>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="ln"> 1</span><span class="cl">                ┌─────────────────┐
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">   Application  │   Coordinator   │  ← 對外 PG wire protocol、planner、routing
</span></span><span class="line"><span class="ln"> 3</span><span class="cl">                │   (Citus + PG)  │
</span></span><span class="line"><span class="ln"> 4</span><span class="cl">                └────┬─────┬──────┘
</span></span><span class="line"><span class="ln"> 5</span><span class="cl">                     │     │
</span></span><span class="line"><span class="ln"> 6</span><span class="cl">              ┌──────┘     └──────┐
</span></span><span class="line"><span class="ln"> 7</span><span class="cl">              ▼                   ▼
</span></span><span class="line"><span class="ln"> 8</span><span class="cl">        ┌──────────┐         ┌──────────┐
</span></span><span class="line"><span class="ln"> 9</span><span class="cl">        │ Worker 1 │         │ Worker 2 │  ← 各跑 PG + Citus extension
</span></span><span class="line"><span class="ln">10</span><span class="cl">        │  (PG)    │         │  (PG)    │
</span></span><span class="line"><span class="ln">11</span><span class="cl">        │ shard 1,3│         │ shard 2,4│
</span></span><span class="line"><span class="ln">12</span><span class="cl">        └──────────┘         └──────────┘</span></span></code></pre></div><p><strong>Coordinator</strong>：</p>
<ul>
<li>對 application 看起來像 PG（同 port / 同 wire protocol）</li>
<li>接 SQL → Citus planner 把 query 分解 + route 給 worker</li>
<li>不存 data（distributed table 的 shard 在 worker 上）</li>
<li>存 <em>metadata</em>（哪個 shard 在哪個 worker）</li>
</ul>
<p><strong>Worker</strong>：</p>
<ul>
<li>標準 PG instance + Citus extension</li>
<li>各存若干 shard</li>
<li>接 coordinator 來的 query、跑 local execute、回結果</li>
</ul>
<p><strong>Shard</strong>：</p>
<ul>
<li>Distributed table 拆成 N 個 shard（預設 32）</li>
<li>每 shard 是 worker 上的 <em>physical PG table</em>（含 <code>_&lt;shardid&gt;</code> 後綴）</li>
<li>行為跟一般 PG table 一樣、可以直接連 worker 用 PG 工具 access</li>
</ul>
<h2 id="3-種-table-type">3 種 Table Type</h2>
<h3 id="distributed-table--跨-shard-切分">Distributed table — 跨 shard 切分</h3>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-sql" data-lang="sql"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="c1">-- 建一般 PG table
</span></span></span><span class="line"><span class="ln"> 2</span><span class="cl"><span class="c1"></span><span class="k">CREATE</span><span class="w"> </span><span class="k">TABLE</span><span class="w"> </span><span class="n">orders</span><span class="w"> </span><span class="p">(</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 3</span><span class="cl"><span class="w">    </span><span class="n">id</span><span class="w"> </span><span class="n">BIGSERIAL</span><span class="p">,</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 4</span><span class="cl"><span class="w">    </span><span class="n">user_id</span><span class="w"> </span><span class="nb">BIGINT</span><span class="w"> </span><span class="k">NOT</span><span class="w"> </span><span class="k">NULL</span><span class="p">,</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 5</span><span class="cl"><span class="w">    </span><span class="n">amount</span><span class="w"> </span><span class="nb">DECIMAL</span><span class="p">(</span><span class="mi">10</span><span class="p">,</span><span class="mi">2</span><span class="p">),</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 6</span><span class="cl"><span class="w">    </span><span class="n">created_at</span><span class="w"> </span><span class="k">TIMESTAMP</span><span class="p">,</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 7</span><span class="cl"><span class="w">    </span><span class="k">PRIMARY</span><span class="w"> </span><span class="k">KEY</span><span class="w"> </span><span class="p">(</span><span class="n">user_id</span><span class="p">,</span><span class="w"> </span><span class="n">id</span><span class="p">)</span><span class="w">  </span><span class="c1">-- PK 必須含 distribution column
</span></span></span><span class="line"><span class="ln"> 8</span><span class="cl"><span class="c1"></span><span class="p">);</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 9</span><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="ln">10</span><span class="cl"><span class="w"></span><span class="c1">-- 用 Citus 把它變 distributed
</span></span></span><span class="line"><span class="ln">11</span><span class="cl"><span class="c1"></span><span class="k">SELECT</span><span class="w"> </span><span class="n">create_distributed_table</span><span class="p">(</span><span class="s1">&#39;orders&#39;</span><span class="p">,</span><span class="w"> </span><span class="s1">&#39;user_id&#39;</span><span class="p">);</span></span></span></code></pre></div><p><code>user_id</code> 是 <em>distribution column</em> — Citus 用它的 hash 決定 row 屬哪個 shard。<code>PK 必須含 distribution column</code>（跟 MySQL partitioning 同要求）。</p>
<p>跟 Vitess Vindex 對比：</p>
<ul>
<li>Citus：hash distribution column → shard（單一 hash function、不可選 algorithm）</li>
<li>Vitess：Vindex 可選多種（hash / lookup_hash / xxhash / null）</li>
</ul>
<h3 id="reference-table--全-shard-共有">Reference table — 全 shard 共有</h3>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-sql" data-lang="sql"><span class="line"><span class="ln">1</span><span class="cl"><span class="k">CREATE</span><span class="w"> </span><span class="k">TABLE</span><span class="w"> </span><span class="n">products</span><span class="w"> </span><span class="p">(</span><span class="w">
</span></span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="w">    </span><span class="n">id</span><span class="w"> </span><span class="nb">SERIAL</span><span class="w"> </span><span class="k">PRIMARY</span><span class="w"> </span><span class="k">KEY</span><span class="p">,</span><span class="w">
</span></span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="w">    </span><span class="n">name</span><span class="w"> </span><span class="nb">VARCHAR</span><span class="p">(</span><span class="mi">100</span><span class="p">),</span><span class="w">
</span></span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="w">    </span><span class="n">price</span><span class="w"> </span><span class="nb">DECIMAL</span><span class="w">
</span></span></span><span class="line"><span class="ln">5</span><span class="cl"><span class="w"></span><span class="p">);</span><span class="w">
</span></span></span><span class="line"><span class="ln">6</span><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="ln">7</span><span class="cl"><span class="w"></span><span class="k">SELECT</span><span class="w"> </span><span class="n">create_reference_table</span><span class="p">(</span><span class="s1">&#39;products&#39;</span><span class="p">);</span></span></span></code></pre></div><p><code>products</code> 在 <em>每個 worker 都有完整 copy</em>、寫入 coordinator 廣播給所有 worker。</p>
<p>用途：</p>
<ul>
<li>小 lookup table（country code / product category 等）</li>
<li>跨 distributed table JOIN 時、reference table 在每 worker 上、不必 cross-shard</li>
<li>寫入頻率低（廣播 cost 跟 worker 數 linear）</li>
</ul>
<h3 id="local-table--coordinator-上的-pg-table">Local table — Coordinator 上的 PG table</h3>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-sql" data-lang="sql"><span class="line"><span class="ln">1</span><span class="cl"><span class="k">CREATE</span><span class="w"> </span><span class="k">TABLE</span><span class="w"> </span><span class="n">audit_log</span><span class="w"> </span><span class="p">(</span><span class="w">
</span></span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="w">    </span><span class="n">id</span><span class="w"> </span><span class="nb">SERIAL</span><span class="w"> </span><span class="k">PRIMARY</span><span class="w"> </span><span class="k">KEY</span><span class="p">,</span><span class="w">
</span></span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="w">    </span><span class="n">event</span><span class="w"> </span><span class="n">JSONB</span><span class="w">
</span></span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="w"></span><span class="p">);</span><span class="w">
</span></span></span><span class="line"><span class="ln">5</span><span class="cl"><span class="w"></span><span class="c1">-- 不調用 Citus function、預設留在 coordinator</span></span></span></code></pre></div><p>行為跟一般 PG table 一樣。用於 <em>不需 distribute</em> 的 table（如 admin metadata）。</p>
<h2 id="colocation跨-distributed-table-同-shard-對齊">Colocation：跨 distributed table 同 shard 對齊</h2>
<p>當兩個 distributed table 都用 <em>同 distribution column</em>（例如 <code>user_id</code>）+ 同 shard count、Citus 自動 colocate：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-sql" data-lang="sql"><span class="line"><span class="ln">1</span><span class="cl"><span class="k">SELECT</span><span class="w"> </span><span class="n">create_distributed_table</span><span class="p">(</span><span class="s1">&#39;orders&#39;</span><span class="p">,</span><span class="w"> </span><span class="s1">&#39;user_id&#39;</span><span class="p">);</span><span class="w">
</span></span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="w"></span><span class="k">SELECT</span><span class="w"> </span><span class="n">create_distributed_table</span><span class="p">(</span><span class="s1">&#39;user_addresses&#39;</span><span class="p">,</span><span class="w"> </span><span class="s1">&#39;user_id&#39;</span><span class="p">,</span><span class="w"> </span><span class="n">colocate_with</span><span class="w"> </span><span class="o">=&gt;</span><span class="w"> </span><span class="s1">&#39;orders&#39;</span><span class="p">);</span></span></span></code></pre></div><p>Colocate 後：</p>
<ul>
<li><code>user_id = 100</code> 的 orders 跟 user_addresses 在 <em>同一 worker shard</em></li>
<li>JOIN 不跨 worker、效率高</li>
<li>可用 PG 原生 FK constraint（cross-table 但同 shard）</li>
</ul>
<p>Colocate 是 Citus 設計的核心 <em>跨 table 一致性</em> 機制。沒 colocate 的 cross-table query 變 cross-worker、效率大降。</p>
<h2 id="配置-step-by-steplocal-cluster">配置 step-by-step（local cluster）</h2>
<p>Production 用 Citus Cloud（Microsoft 託管）或 Azure Cosmos DB for PostgreSQL（同 engine）。Self-hosted：</p>
<h3 id="step-1coordinator--worker-都裝-pg--citus">Step 1：Coordinator + worker 都裝 PG + Citus</h3>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl"><span class="c1"># 在每個 node（coordinator + 2 worker）</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl">apt install postgresql-14
</span></span><span class="line"><span class="ln">3</span><span class="cl">apt install postgresql-14-citus-12.0
</span></span><span class="line"><span class="ln">4</span><span class="cl">
</span></span><span class="line"><span class="ln">5</span><span class="cl"><span class="c1"># postgresql.conf</span>
</span></span><span class="line"><span class="ln">6</span><span class="cl"><span class="nv">shared_preload_libraries</span> <span class="o">=</span> <span class="s1">&#39;citus&#39;</span>
</span></span><span class="line"><span class="ln">7</span><span class="cl">
</span></span><span class="line"><span class="ln">8</span><span class="cl">systemctl restart postgresql</span></span></code></pre></div>




<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-sql" data-lang="sql"><span class="line"><span class="ln">1</span><span class="cl"><span class="c1">-- 在每個 node 跑
</span></span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="c1"></span><span class="k">CREATE</span><span class="w"> </span><span class="n">EXTENSION</span><span class="w"> </span><span class="n">citus</span><span class="p">;</span></span></span></code></pre></div><h3 id="step-2coordinator-註冊-worker">Step 2：Coordinator 註冊 worker</h3>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-sql" data-lang="sql"><span class="line"><span class="ln">1</span><span class="cl"><span class="c1">-- 在 coordinator 跑
</span></span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="c1"></span><span class="k">SELECT</span><span class="w"> </span><span class="n">citus_add_node</span><span class="p">(</span><span class="s1">&#39;worker1.example.com&#39;</span><span class="p">,</span><span class="w"> </span><span class="mi">5432</span><span class="p">);</span><span class="w">
</span></span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="w"></span><span class="k">SELECT</span><span class="w"> </span><span class="n">citus_add_node</span><span class="p">(</span><span class="s1">&#39;worker2.example.com&#39;</span><span class="p">,</span><span class="w"> </span><span class="mi">5432</span><span class="p">);</span><span class="w">
</span></span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="ln">5</span><span class="cl"><span class="w"></span><span class="c1">-- 確認
</span></span></span><span class="line"><span class="ln">6</span><span class="cl"><span class="c1"></span><span class="k">SELECT</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">citus_get_active_worker_nodes</span><span class="p">();</span></span></span></code></pre></div><h3 id="step-3建-distributed-table">Step 3：建 distributed table</h3>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-sql" data-lang="sql"><span class="line"><span class="ln">1</span><span class="cl"><span class="k">CREATE</span><span class="w"> </span><span class="k">TABLE</span><span class="w"> </span><span class="n">orders</span><span class="w"> </span><span class="p">(</span><span class="w">
</span></span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="w">    </span><span class="n">id</span><span class="w"> </span><span class="n">BIGSERIAL</span><span class="p">,</span><span class="w">
</span></span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="w">    </span><span class="n">user_id</span><span class="w"> </span><span class="nb">BIGINT</span><span class="w"> </span><span class="k">NOT</span><span class="w"> </span><span class="k">NULL</span><span class="p">,</span><span class="w">
</span></span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="w">    </span><span class="n">amount</span><span class="w"> </span><span class="nb">DECIMAL</span><span class="p">(</span><span class="mi">10</span><span class="p">,</span><span class="mi">2</span><span class="p">),</span><span class="w">
</span></span></span><span class="line"><span class="ln">5</span><span class="cl"><span class="w">    </span><span class="n">created_at</span><span class="w"> </span><span class="k">TIMESTAMP</span><span class="p">,</span><span class="w">
</span></span></span><span class="line"><span class="ln">6</span><span class="cl"><span class="w">    </span><span class="k">PRIMARY</span><span class="w"> </span><span class="k">KEY</span><span class="w"> </span><span class="p">(</span><span class="n">user_id</span><span class="p">,</span><span class="w"> </span><span class="n">id</span><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="ln">7</span><span class="cl"><span class="w"></span><span class="p">);</span><span class="w">
</span></span></span><span class="line"><span class="ln">8</span><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="ln">9</span><span class="cl"><span class="w"></span><span class="k">SELECT</span><span class="w"> </span><span class="n">create_distributed_table</span><span class="p">(</span><span class="s1">&#39;orders&#39;</span><span class="p">,</span><span class="w"> </span><span class="s1">&#39;user_id&#39;</span><span class="p">);</span></span></span></code></pre></div><p>Citus 自動把 <code>orders</code> 拆成 32 個 shard（<code>orders_102008</code> 等）、分配到 worker。</p>
<h3 id="step-4application-連-coordinator">Step 4：Application 連 coordinator</h3>
<p>Application connection string 連 coordinator IP / port（不必知道 worker 存在）。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-sql" data-lang="sql"><span class="line"><span class="ln">1</span><span class="cl"><span class="c1">-- 從 application 跑 query、Citus 透明 route
</span></span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="c1"></span><span class="k">INSERT</span><span class="w"> </span><span class="k">INTO</span><span class="w"> </span><span class="n">orders</span><span class="w"> </span><span class="p">(</span><span class="n">user_id</span><span class="p">,</span><span class="w"> </span><span class="n">amount</span><span class="p">)</span><span class="w"> </span><span class="k">VALUES</span><span class="w"> </span><span class="p">(</span><span class="mi">12345</span><span class="p">,</span><span class="w"> </span><span class="mi">50</span><span class="p">);</span><span class="w">
</span></span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="w"></span><span class="c1">-- → Citus 看 user_id=12345 hash 屬 shard 17、route 給對應 worker
</span></span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="c1"></span><span class="w">
</span></span></span><span class="line"><span class="ln">5</span><span class="cl"><span class="w"></span><span class="k">SELECT</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">orders</span><span class="w"> </span><span class="k">WHERE</span><span class="w"> </span><span class="n">user_id</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">12345</span><span class="p">;</span><span class="w">
</span></span></span><span class="line"><span class="ln">6</span><span class="cl"><span class="w"></span><span class="c1">-- → Single-shard query、極快
</span></span></span><span class="line"><span class="ln">7</span><span class="cl"><span class="c1"></span><span class="w">
</span></span></span><span class="line"><span class="ln">8</span><span class="cl"><span class="w"></span><span class="k">SELECT</span><span class="w"> </span><span class="k">count</span><span class="p">(</span><span class="o">*</span><span class="p">)</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">orders</span><span class="p">;</span><span class="w">
</span></span></span><span class="line"><span class="ln">9</span><span class="cl"><span class="w"></span><span class="c1">-- → Cross-shard aggregation、Citus 並行跑、合併結果</span></span></span></code></pre></div><h2 id="5-個-production-踩雷">5 個 Production 踩雷</h2>
<h3 id="1-distribution-column-選錯--cross-shard-query-變主流">1. Distribution column 選錯 — Cross-shard query 變主流</h3>
<p>選 <code>created_at</code> 或 <code>id</code>（auto increment）作 distribution column、看起來均勻、實際 <em>application query 多以 user_id 為主</em>、變成 <em>每個 query 都 cross-shard</em>、performance 雪崩。</p>
<p>修法：</p>
<ul>
<li><em>Distribution column 選 application 最常 filter / join 的 column</em>（通常是 <code>tenant_id</code> / <code>user_id</code>）</li>
<li>Audit application top query、確認 distribution column 對齊 query pattern</li>
<li>改 distribution column 要 <em>rewrite 所有 shard</em>、像 resharding、大工程</li>
</ul>
<h3 id="2-cross-shard-transaction-限制">2. Cross-shard transaction 限制</h3>
<p>跨多 shard 的 transaction（如：UPDATE 兩個 user_id 不同的 row）Citus 用 <em>2PC</em>（two-phase commit）但有限制：</p>
<ul>
<li>Multi-statement transaction 跨 shard 需明確開 <code>SET citus.multi_shard_modify_mode = 'sequential'</code></li>
<li>部分 isolation level 不保證 serializable across shards</li>
<li>DDL 跨 shard 是 sequential</li>
</ul>
<p>修法：</p>
<ul>
<li>Schema design 避免 cross-shard transaction（同 colocation group 內 transaction 沒問題）</li>
<li>必要 cross-shard 場景明確設 multi-shard mode</li>
<li>對 <em>strict cross-shard consistency</em>、考慮 distributed SQL（CockroachDB / Aurora DSQL）</li>
</ul>
<h3 id="3-reference-table-過大--寫入廣播-cost-爆">3. Reference table 過大 — 寫入廣播 cost 爆</h3>
<p>Reference table 在每 worker 都有 copy、寫入 <em>廣播給所有 worker</em>。Reference table 100K row + 高頻寫入 → 寫一次寫 N worker、cost N x。</p>
<p>修法：</p>
<ul>
<li>Reference table 限 <em>小 + 寫入頻率低</em> 的 lookup data</li>
<li>超大表不該是 reference table、考慮 distributed</li>
<li>監控 reference table 寫入 rate、超 threshold 重新評估</li>
</ul>
<h3 id="4-colocate-沒對齊--隱性-cross-shard-join">4. Colocate 沒對齊 — 隱性 cross-shard JOIN</h3>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-sql" data-lang="sql"><span class="line"><span class="ln">1</span><span class="cl"><span class="c1">-- 看似可以、實際 cross-shard 慢
</span></span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="c1"></span><span class="k">SELECT</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">orders</span><span class="w"> </span><span class="n">o</span><span class="w"> </span><span class="k">JOIN</span><span class="w"> </span><span class="n">user_addresses</span><span class="w"> </span><span class="n">ua</span><span class="w"> </span><span class="k">ON</span><span class="w"> </span><span class="n">o</span><span class="p">.</span><span class="n">user_id</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">ua</span><span class="p">.</span><span class="n">user_id</span><span class="p">;</span></span></span></code></pre></div><p>若 <code>user_addresses</code> 沒 <code>colocate_with =&gt; 'orders'</code>、兩表 shard 分配獨立、JOIN 跨 worker。</p>
<p>修法：</p>
<ul>
<li>建相關 table 時 <code>colocate_with</code> 對齊</li>
<li>用 <code>SELECT * FROM citus_tables</code> 看 colocation_id、確認對齊</li>
<li>跨非 colocate table 的 JOIN 用 <em>materialized view</em> 或 application 層拆 query 避開</li>
</ul>
<h3 id="5-worker-failover--coordinator-必須知道">5. Worker failover — Coordinator 必須知道</h3>
<p>Worker 故障、Citus 預設 <em>coordinator 看到 query 失敗、不自動 failover</em>。</p>
<p>修法（Citus 11+）：</p>
<ul>
<li>用 <em>shard replication</em>（<code>citus.shard_replication_factor = 2</code>）— 每 shard 在 2 個 worker 有 copy</li>
<li>配 PG streaming replication 在 worker 層、外加 Patroni 管 failover</li>
<li>Coordinator 失敗 → 整個 cluster 失能、coordinator 也要 HA（Patroni）</li>
</ul>
<p>跟 Vitess 對比 Citus 的 HA story 較弱、production 必須認真規劃。</p>
<h2 id="何時用-citus">何時用 Citus</h2>
<table>
  <thead>
      <tr>
          <th>條件</th>
          <th>建議</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>Multi-tenant SaaS、tenant_id 為自然 distribution</td>
          <td>是</td>
      </tr>
      <tr>
          <td>寫吞吐 &gt; 50K WPS、單 PG 撐不住</td>
          <td>是</td>
      </tr>
      <tr>
          <td>需要保留 PG SQL + extension（pgvector / TimescaleDB）</td>
          <td>是</td>
      </tr>
      <tr>
          <td>應用 query pattern 80% 都用同一 distribution column</td>
          <td>是</td>
      </tr>
      <tr>
          <td>應用大量 ad-hoc cross-tenant aggregation</td>
          <td>否（cross-shard 慢）</td>
      </tr>
      <tr>
          <td>強 cross-shard consistency 需求</td>
          <td>否（用 CockroachDB）</td>
      </tr>
      <tr>
          <td>想 zero-ops managed</td>
          <td>Azure Cosmos DB for PostgreSQL（同 engine）</td>
      </tr>
  </tbody>
</table>
<h2 id="容量規劃">容量規劃</h2>
<ul>
<li>Coordinator: 中等 CPU + RAM、metadata 不大、不存 data</li>
<li>Worker: per-worker spec 同 single PG production</li>
<li>Shard count: 預設 32、實務常設 worker count × 4-8</li>
<li>Replication factor: production 至少 2</li>
</ul>
<h2 id="跟其他模組整合">跟其他模組整合</h2>
<h3 id="跟-replication-topology">跟 Replication topology</h3>
<p>Coordinator + worker 各跑 PG streaming replication、Citus 不取代 PG replication。Worker failover 用 Patroni / streaming replication。詳見 <a href="/blog/backend/01-database/vendors/postgresql/replication-topology/" data-link-title="PostgreSQL Replication Topology：async / sync / quorum 三模式跟 LSN &#43; replication slot 的三軸組合" data-link-desc="PostgreSQL streaming replication 不是「sync 或 async」、是 *durability / latency / consistency* 三軸組合 &#43; LSN-based 進度追蹤 &#43; replication slot 治理。本文走 3 軸取捨模型、async / sync / quorum-based sync 行為對比、LSN &#43; replication slot 機制、配置 step-by-step、5 production 踩雷（standby lag 暴衝 / sync standby 退回 async / orphan replication slot / cascading replication 雪崩 / failover 後 timeline 分歧）、跟 Patroni HA &#43; logical replication 整合">Replication Topology</a>。</p>
<h3 id="跟-pg-extensions">跟 PG Extensions</h3>
<p>Citus 跟其他 PG extension 多數兼容（pgvector / TimescaleDB / pg_stat_statements）— 它維持 <em>extension</em> 形態，保留 PostgreSQL 生態接點。詳見 <em>PG Extension Ecosystem</em> 篇（待寫）。</p>
<h3 id="跟-mysql-vitess">跟 MySQL Vitess</h3>
<table>
  <thead>
      <tr>
          <th>維度</th>
          <th>Citus</th>
          <th>Vitess</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>部署模型</td>
          <td>PG extension</td>
          <td>獨立 proxy + tablet</td>
      </tr>
      <tr>
          <td>主要場景</td>
          <td>Multi-tenant SaaS</td>
          <td>超大規模分片</td>
      </tr>
      <tr>
          <td>Cross-shard JOIN</td>
          <td>colocate 對齊 + reference table</td>
          <td>VTGate 自動 split + aggregate</td>
      </tr>
      <tr>
          <td>FK</td>
          <td>同 colocation 內可用</td>
          <td>Vitess 18+ 支援、cross-shard 限制</td>
      </tr>
      <tr>
          <td>HA</td>
          <td>依賴 Patroni + replication factor</td>
          <td>VTOrc + replication</td>
      </tr>
      <tr>
          <td>學習曲線</td>
          <td>中（PG ops 經驗夠）</td>
          <td>高（4 component）</td>
      </tr>
  </tbody>
</table>
<p>Citus 對 <em>PG-native</em> 場景更平順、Vitess 對 <em>MySQL-native</em> 場景更平順、不直接競爭。詳見 <a href="/blog/backend/01-database/vendors/mysql/vitess-sharding/" data-link-title="MySQL Vitess Sharding：VTGate / VTTablet / VReplication / VSchema 四件套協作" data-link-desc="Vitess 不只是 MySQL sharding proxy、是 4 個 component 協作的完整 sharding 系統 — VTGate（query routing layer）、VTTablet（per-MySQL agent）、VReplication（跨 shard 資料移動）、VSchema（sharding metadata）。本文走 4 件套各自責任、keyspace / shard / tablet 架構、shard key 設計（Vindex）、配置 step-by-step、5 production 踩雷（cross-shard transaction / VStream lag / Vindex 不均勻 / resharding 切流 / VReplication 卡住）、跟自管 sharding 跟 PlanetScale 的對比">MySQL Vitess Sharding</a>。</p>
<h2 id="相關連結">相關連結</h2>
<ul>
<li><a href="/blog/backend/01-database/vendors/postgresql/" data-link-title="PostgreSQL" data-link-desc="多用途 OLTP 主流關聯式資料庫、MVCC、豐富 SQL 特性、是 Aurora / Cosmos DB / Spanner / CockroachDB / Aurora DSQL 的相容目標">PostgreSQL vendor overview</a></li>
<li><a href="/blog/backend/01-database/vendors/postgresql/replication-topology/" data-link-title="PostgreSQL Replication Topology：async / sync / quorum 三模式跟 LSN &#43; replication slot 的三軸組合" data-link-desc="PostgreSQL streaming replication 不是「sync 或 async」、是 *durability / latency / consistency* 三軸組合 &#43; LSN-based 進度追蹤 &#43; replication slot 治理。本文走 3 軸取捨模型、async / sync / quorum-based sync 行為對比、LSN &#43; replication slot 機制、配置 step-by-step、5 production 踩雷（standby lag 暴衝 / sync standby 退回 async / orphan replication slot / cascading replication 雪崩 / failover 後 timeline 分歧）、跟 Patroni HA &#43; logical replication 整合">PG Replication Topology</a>（per-worker replication）</li>
<li><a href="/blog/backend/01-database/vendors/postgresql/mvcc-lock-model/" data-link-title="PostgreSQL MVCC &#43; Lock Model：為什麼 PG 比 MySQL 少 deadlock、但 vacuum 是別的代價" data-link-desc="PG 用 *MVCC-heavy &#43; 少 explicit lock* 的並行控制、跟 MySQL InnoDB 的 *lock-based*（record / gap / next-key）相反。本文走 MVCC 機制（tuple version &#43; xmin/xmax &#43; visibility）、PG 4 種 lock（row-level / table-level / advisory / predicate）、預測 SERIALIZABLE 行為、5 production 踩雷（idle transaction 卡 vacuum / SELECT FOR UPDATE 跨 transaction / advisory lock 沒釋放 / bloat 不是 vacuum 問題 / predicate lock 在 SSI 下 rollback）、跟 MySQL lock-contention sibling 對比">PG MVCC + Lock Model</a>（cross-shard transaction lock 行為）</li>
<li><a href="/blog/backend/01-database/global-distributed-oltp/" data-link-title="1.11 全球分散式 OLTP" data-link-desc="Spanner / Aurora DSQL / Cosmos DB multi-region write / CockroachDB / TiDB 的全球一致性取捨">1.11 全球分散式 OLTP</a>（Citus vs CockroachDB vs Spanner）</li>
<li><a href="/blog/backend/01-database/vendors/mysql/vitess-sharding/" data-link-title="MySQL Vitess Sharding：VTGate / VTTablet / VReplication / VSchema 四件套協作" data-link-desc="Vitess 不只是 MySQL sharding proxy、是 4 個 component 協作的完整 sharding 系統 — VTGate（query routing layer）、VTTablet（per-MySQL agent）、VReplication（跨 shard 資料移動）、VSchema（sharding metadata）。本文走 4 件套各自責任、keyspace / shard / tablet 架構、shard key 設計（Vindex）、配置 step-by-step、5 production 踩雷（cross-shard transaction / VStream lag / Vindex 不均勻 / resharding 切流 / VReplication 卡住）、跟自管 sharding 跟 PlanetScale 的對比">MySQL Vitess Sharding</a>（sibling、不同實作）</li>
<li><a href="/blog/backend/01-database/vendors/cosmosdb/" data-link-title="Azure Cosmos DB" data-link-desc="全球分散式 multi-model DB、5 個 consistency levels、Microsoft 自家 dogfood 證據">Cosmos DB vendor</a>（Azure Cosmos DB for PostgreSQL = managed Citus）</li>
<li>官方：<a href="https://docs.citusdata.com/">Citus Documentation</a> / <a href="https://github.com/citusdata/citus">Citus on GitHub</a></li>
</ul>
]]></content:encoded></item><item><title>MongoDB Shard Key Selection：hashed vs ranged、單 cluster 切 shard vs 多 cluster 切 blast radius</title><link>https://tarrragon.github.io/blog/backend/01-database/vendors/mongodb/shard-key-selection/</link><pubDate>Wed, 27 May 2026 00:00:00 +0000</pubDate><guid>https://tarrragon.github.io/blog/backend/01-database/vendors/mongodb/shard-key-selection/</guid><description>&lt;p>MongoDB shard key 是 sharded cluster 上線時最難回頭的決策。Shard key 一旦設定錯、5.0 之前完全不可逆、5.0+ 用 &lt;code>reshardCollection&lt;/code> 可改但仍是長時間運算 + 額外磁碟 + 寫入暫停窗口。但 shard key 不是 production 唯一的橫向擴展選項 — 還有「多 cluster」這條路徑（Toyota Connected 揭露），兩者解的問題完全不同。本文把 shard key 三特性（cardinality / frequency / monotonicity）跟「單 cluster vs 多 cluster」對照在一起、配合跨 vendor partition key 可逆性紀律一起討論。&lt;/p>
&lt;p>本文不重複 &lt;a href="https://tarrragon.github.io/blog/backend/01-database/vendors/mongodb/" data-link-title="MongoDB" data-link-desc="Document database 代表、Atlas managed、跨雲可用、許多大規模平台從 MongoDB 起家">MongoDB vendor overview&lt;/a> 已寫過的 sharding 簡介 — 而是 production 設計 + 失敗修復的實作層教學。&lt;/p>
&lt;blockquote>
&lt;p>&lt;strong>MongoDB 適用度前置判讀&lt;/strong>：進到 shard key 設計前先確認 workload 在 MongoDB 適用區（document shape 主導 / contract layer 該放哪 / 跨雲 hedging 是否需要）— 詳見 &lt;a href="../schema-design-pattern/#%e5%95%8f%e9%a1%8c%e6%83%85%e5%a2%83document-%e8%87%aa%e7%94%b1%e7%9a%84%e5%be%8c%e5%ba%a7%e5%8a%9b">schema-design-pattern 開頭 3 軸前置判讀&lt;/a>、本篇不重複展開。Sharded cluster 是 &lt;em>已選 MongoDB 後&lt;/em> 的容量決策、不是 vendor 選型決策。&lt;/p>&lt;/blockquote>
&lt;h2 id="問題情境橫向擴展不是只有-sharded-cluster-一條路">問題情境：橫向擴展不是只有 sharded cluster 一條路&lt;/h2>
&lt;p>典型觸發場景：single replica set 撐到上限、writes 已經把 primary 推到 CPU 90% / disk IO 飽和、working set 超出 RAM。讀者下意識會想到「分 shard」、但同時還有「分 cluster」這條路徑、兩者 trigger 完全不同：&lt;/p>
&lt;ul>
&lt;li>&lt;strong>單 cluster 切 shard&lt;/strong>：解的是 &lt;em>單一資料域寫入飽和&lt;/em>、collection 大到單 replica set 撐不住&lt;/li>
&lt;li>&lt;strong>多 cluster 切 DB&lt;/strong>：解的是 &lt;em>&lt;a href="https://tarrragon.github.io/blog/backend/knowledge-cards/blast-radius/" data-link-title="Blast Radius" data-link-desc="說明事故影響面如何估算與隔離">blast radius&lt;/a> / ownership / 合規邊界&lt;/em>、不一定是吞吐問題&lt;/li>
&lt;/ul>
&lt;p>混淆兩者的後果：吞吐沒撞牆但 blast radius 是議題、強行分 shard → aggregation / transaction / &lt;code>$lookup&lt;/code> 成本全部跳一級、業務 ownership 仍混在一起。或反過來：吞吐撞牆但選了分 cluster → 跨 cluster transaction 不存在、單一 collection 跨多 cluster 要在 application 層拼。&lt;/p>
&lt;p>讀者徵兆：&lt;/p>
&lt;ul>
&lt;li>&lt;code>mongos&lt;/code> 的 &lt;code>targeted query / scatter-gather query&lt;/code> 比例失衡&lt;/li>
&lt;li>單一 shard CPU 遠高其他 shard、balancer 移 chunk 跟不上寫入速度&lt;/li>
&lt;li>&lt;code>chunkMigrated&lt;/code> 異常頻繁、&lt;code>sh.status()&lt;/code> 顯示 chunk 分布偏斜&lt;/li>
&lt;li>微服務 ownership 跟 collection 邊界不對齊、某 microservice 故障打到其他服務&lt;/li>
&lt;/ul>
&lt;p>Case anchor：&lt;a href="https://tarrragon.github.io/blog/backend/09-performance-capacity/cases/toyota-connected-mongodb-telematics-iot/" data-link-title="9.C38 Toyota Connected：MongoDB Atlas 撐 900 萬車輛 telematics、月 180 億 transaction" data-link-desc="Toyota Connected 用 MongoDB Atlas 撐 Safety Connect 900 萬車、月 180 億 transaction、緊急訊號 3 秒內到 agent">9.C38 Toyota Connected&lt;/a> 揭露「20 個 Atlas database 是業務邊界切分、不是吞吐切分」（單 cluster vs 多 cluster 對照）；hot shard 在 e-commerce flash sale / 遊戲開新區 / B2B 大客戶獨佔 chunk 的具體 incident 細節需未來 case 補完、本文以「常見 failure pattern」處理、不憑空編造 incident 數字。&lt;/p></description><content:encoded><![CDATA[<p>MongoDB shard key 是 sharded cluster 上線時最難回頭的決策。Shard key 一旦設定錯、5.0 之前完全不可逆、5.0+ 用 <code>reshardCollection</code> 可改但仍是長時間運算 + 額外磁碟 + 寫入暫停窗口。但 shard key 不是 production 唯一的橫向擴展選項 — 還有「多 cluster」這條路徑（Toyota Connected 揭露），兩者解的問題完全不同。本文把 shard key 三特性（cardinality / frequency / monotonicity）跟「單 cluster vs 多 cluster」對照在一起、配合跨 vendor partition key 可逆性紀律一起討論。</p>
<p>本文不重複 <a href="/blog/backend/01-database/vendors/mongodb/" data-link-title="MongoDB" data-link-desc="Document database 代表、Atlas managed、跨雲可用、許多大規模平台從 MongoDB 起家">MongoDB vendor overview</a> 已寫過的 sharding 簡介 — 而是 production 設計 + 失敗修復的實作層教學。</p>
<blockquote>
<p><strong>MongoDB 適用度前置判讀</strong>：進到 shard key 設計前先確認 workload 在 MongoDB 適用區（document shape 主導 / contract layer 該放哪 / 跨雲 hedging 是否需要）— 詳見 <a href="../schema-design-pattern/#%e5%95%8f%e9%a1%8c%e6%83%85%e5%a2%83document-%e8%87%aa%e7%94%b1%e7%9a%84%e5%be%8c%e5%ba%a7%e5%8a%9b">schema-design-pattern 開頭 3 軸前置判讀</a>、本篇不重複展開。Sharded cluster 是 <em>已選 MongoDB 後</em> 的容量決策、不是 vendor 選型決策。</p></blockquote>
<h2 id="問題情境橫向擴展不是只有-sharded-cluster-一條路">問題情境：橫向擴展不是只有 sharded cluster 一條路</h2>
<p>典型觸發場景：single replica set 撐到上限、writes 已經把 primary 推到 CPU 90% / disk IO 飽和、working set 超出 RAM。讀者下意識會想到「分 shard」、但同時還有「分 cluster」這條路徑、兩者 trigger 完全不同：</p>
<ul>
<li><strong>單 cluster 切 shard</strong>：解的是 <em>單一資料域寫入飽和</em>、collection 大到單 replica set 撐不住</li>
<li><strong>多 cluster 切 DB</strong>：解的是 <em><a href="/blog/backend/knowledge-cards/blast-radius/" data-link-title="Blast Radius" data-link-desc="說明事故影響面如何估算與隔離">blast radius</a> / ownership / 合規邊界</em>、不一定是吞吐問題</li>
</ul>
<p>混淆兩者的後果：吞吐沒撞牆但 blast radius 是議題、強行分 shard → aggregation / transaction / <code>$lookup</code> 成本全部跳一級、業務 ownership 仍混在一起。或反過來：吞吐撞牆但選了分 cluster → 跨 cluster transaction 不存在、單一 collection 跨多 cluster 要在 application 層拼。</p>
<p>讀者徵兆：</p>
<ul>
<li><code>mongos</code> 的 <code>targeted query / scatter-gather query</code> 比例失衡</li>
<li>單一 shard CPU 遠高其他 shard、balancer 移 chunk 跟不上寫入速度</li>
<li><code>chunkMigrated</code> 異常頻繁、<code>sh.status()</code> 顯示 chunk 分布偏斜</li>
<li>微服務 ownership 跟 collection 邊界不對齊、某 microservice 故障打到其他服務</li>
</ul>
<p>Case anchor：<a href="/blog/backend/09-performance-capacity/cases/toyota-connected-mongodb-telematics-iot/" data-link-title="9.C38 Toyota Connected：MongoDB Atlas 撐 900 萬車輛 telematics、月 180 億 transaction" data-link-desc="Toyota Connected 用 MongoDB Atlas 撐 Safety Connect 900 萬車、月 180 億 transaction、緊急訊號 3 秒內到 agent">9.C38 Toyota Connected</a> 揭露「20 個 Atlas database 是業務邊界切分、不是吞吐切分」（單 cluster vs 多 cluster 對照）；hot shard 在 e-commerce flash sale / 遊戲開新區 / B2B 大客戶獨佔 chunk 的具體 incident 細節需未來 case 補完、本文以「常見 failure pattern」處理、不憑空編造 incident 數字。</p>
<h2 id="核心機制shard-keychunkbalancer">核心機制：shard key、chunk、balancer</h2>
<p>Shard key 三特性決定 sharded cluster 行為：</p>
<ul>
<li><strong>Cardinality（基數）</strong>：shard key 的不同值數量。<code>status: &quot;active&quot; | &quot;inactive&quot;</code> 只有兩個值、cardinality = 2、不能分到多 chunk</li>
<li><strong>Frequency（頻率分布）</strong>：值的分布是否平均。<code>country</code> 在全球流量中通常一兩個國家佔 80%</li>
<li><strong>Monotonicity（單調性）</strong>：值是否單調遞增。<code>_id</code>（ObjectId）/ 時間戳 / 自增 ID 都是單調</li>
</ul>
<p>三特性決定 shard key 行為：</p>
<ul>
<li><strong>Hashed shard key</strong>：hash function 把 key 打散、寫入分布均勻、但 range query 變 scatter-gather（每個 shard 都問）</li>
<li><strong>Ranged shard key</strong>：相同 key 相近 → 同 chunk → range query 高效；但單調 key + ranged → 所有寫打最後 chunk</li>
<li><strong>Compound shard key</strong>（5.0+ 是常用做法、對應 <a href="/blog/backend/knowledge-cards/composite-partition-key/" data-link-title="Composite Partition Key" data-link-desc="多欄位合成 partition key 把單一 logical hot key 拆成多個物理 shard、寫入分散讀取 fan-out">Composite Partition Key</a> 的 MongoDB 實作）：例如 <code>{ tenantId: 1, _id: &quot;hashed&quot; }</code> — 先 tenant 隔離、再 hash 避免 tenant 內熱點</li>
<li><strong>Zone sharding</strong>：把特定 chunk 釘到特定 shard（地域 / 合規 / 硬體分層）</li>
</ul>
<p>Chunk 是 MongoDB 在 collection 上劃出的 64MB（預設）邏輯區塊。Balancer 在 shard 間搬 chunk 達成均衡。<strong>Chunk 不可 split 的條件</strong>是 shard key 在該範圍只有一個值（low cardinality / 大 tenant 獨佔範圍）— chunk split 不了、balancer 也搬不開。</p>
<p><code>reshardCollection</code>（4.4+）：透過 temporary collection + chunk 重切 + 雙寫 + cutover、耗時等比於資料量、需額外 ~1.2x 磁碟。是「設計錯了還有補救機會」但不是 free lunch。</p>
<p>對應 knowledge card：<a href="/blog/backend/knowledge-cards/database-sharding/" data-link-title="Database Sharding" data-link-desc="說明資料庫如何依 shard key 分散資料、路由請求與承擔跨 shard 查詢成本">database-sharding</a>、<a href="/blog/backend/knowledge-cards/hot-partition/" data-link-title="Hot Partition" data-link-desc="說明分散式 KV / OLTP 中、單一 partition 流量遠超其他的容量問題">hot-partition</a>、<a href="/blog/backend/knowledge-cards/partition/" data-link-title="Partition" data-link-desc="說明事件流如何切分成多個可並行處理的有序片段">partition</a>。</p>
<h3 id="單-cluster-切-shard-vs-多-cluster-切-blast-radius">單 cluster 切 shard vs 多 cluster 切 blast radius</h3>
<p>跨案合成 frame（本章合成、9.C38 Toyota 揭露事實但 case 原文沒提這個 frame）：橫向擴展不是只有「sharded cluster 一條路」、多 cluster 是另一條路。</p>
<p>9.C38 Toyota Connected 揭露事實：</p>
<ul>
<li>18B transactions / 月 ÷ 30 天 ÷ 86400 秒 ≈ 7K txn/sec（口徑：月度滾動平均、非瞬時尖峰）</li>
<li>單一 MongoDB cluster 完全撐得下這個吞吐</li>
<li>Toyota 切 20 個 Atlas database <strong>不是吞吐切分</strong>、是 <em>microservice ownership</em> + <em>blast radius</em> 切分</li>
<li>「每個 microservice 擁有自己的 DB、單一 DB 故障不影響其他服務」</li>
</ul>
<p>兩條路徑的判讀條件不同：</p>
<table>
  <thead>
      <tr>
          <th>路徑</th>
          <th>Trigger</th>
          <th>代價</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>Sharded cluster（分 shard）</td>
          <td>單一 collection 寫入飽和、storage 撐爆單 replica set、access pattern 在同一個資料域內</td>
          <td>aggregation / transaction / <code>$lookup</code> 成本全部跳一級</td>
      </tr>
      <tr>
          <td>多 cluster（分 DB）</td>
          <td>微服務 ownership 邊界、blast radius 隔離、合規 boundary、不同 workload shape 共處風險</td>
          <td>跨 cluster transaction 不存在、跨 DB join 必須在 application 層做</td>
      </tr>
  </tbody>
</table>
<p>兩者可以同時用：每個 microservice 有獨立 cluster、cluster 內部該分 shard 還是分。寫設計文件時要避免讓讀者以為「sharded cluster 是唯一橫向擴展選項」。</p>
<h3 id="partition-key-可逆性跨-vendor-對照">Partition key 可逆性跨 vendor 對照</h3>
<blockquote>
<p><strong>跨 vendor 可逆性對照 SSoT</strong>：MongoDB / DynamoDB / Cosmos DB 三家可逆性不在同一光譜、跨 vendor 對照的 SSoT 主寫位置在 <a href="/blog/backend/01-database/vendors/db3-vendor-selection/#%e4%b8%89-vendor-%e5%b0%8d%e6%af%94-10-%e8%bb%b8" data-link-title="DB3 Vendor Selection：document / KV / multi-model 三方選型 &#43; workload shape 前置判讀" data-link-desc="MongoDB / DynamoDB / Cosmos DB 三家 NoSQL 選型 entry point：workload shape × access pattern × consistency 三軸前置判讀、migration path 三型、federated DB 視角、三 vendor 對比 10 軸">DB3 entry — 三 vendor 對比 10 軸</a> + 對應的<a href="/blog/backend/01-database/vendors/db3-vendor-selection/#%e8%bb%b8%e7%9a%84%e5%bb%b6%e4%bc%b8%e5%ad%90%e6%ae%b5" data-link-title="DB3 Vendor Selection：document / KV / multi-model 三方選型 &#43; workload shape 前置判讀" data-link-desc="MongoDB / DynamoDB / Cosmos DB 三家 NoSQL 選型 entry point：workload shape × access pattern × consistency 三軸前置判讀、migration path 三型、federated DB 視角、三 vendor 對比 10 軸">軸的延伸子段</a>。本段聚焦 MongoDB 5.0+ <code>reshardCollection</code> 對 shard key 設計的影響、不重複展開三 vendor 全光譜比較。</p></blockquote>
<p>不同 vendor 對 partition key 可逆性紀律完全不在同一光譜：</p>
<table>
  <thead>
      <tr>
          <th>Vendor</th>
          <th>機制</th>
          <th>可逆性</th>
          <th>成本</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>MongoDB</td>
          <td>Shard key（<code>shardCollection</code>）</td>
          <td>4.4+ <code>reshardCollection</code> 可改、5.0 前完全不可逆</td>
          <td>等比資料量、~1.2x 磁碟、雙寫 + cutover</td>
      </tr>
      <tr>
          <td>DynamoDB</td>
          <td>Partition key</td>
          <td>可改（用 backfill 到新 table）</td>
          <td>重設計 access pattern、流量切換成本</td>
      </tr>
      <tr>
          <td>Cosmos DB</td>
          <td>Partition key</td>
          <td>不可改（必須 export-recreate-import）</td>
          <td>全量重灌、雙寫驗證、最大遷移成本</td>
      </tr>
  </tbody>
</table>
<p>寫進設計文件時必須附 vendor + 版本、避免讓讀者把三家當「partition key 都不可改」、也避免把 MongoDB 5.0+ 的 <code>reshardCollection</code> 當「便宜遷移」。</p>
<h2 id="操作流程">操作流程</h2>
<p><strong>Step 1：横向擴展路徑決策</strong>。先問「我要解的是 <em>單一資料域寫入飽和</em> 還是 <em>blast radius / ownership</em>」、選分 shard 或分 cluster。若兩者都要、決定 cluster 邊界後再在 cluster 內分 shard。</p>
<p><strong>Step 2：access pattern audit</strong>。列出所有讀寫 query、標出哪些 query 必須走 single shard（targeted），哪些 query 不在意 scatter-gather。</p>
<p><strong>Step 3：候選 key 評估表</strong>。對每個候選打 cardinality / frequency / monotonicity 三項評分：</p>
<table>
  <thead>
      <tr>
          <th>候選 key</th>
          <th>Cardinality</th>
          <th>Frequency</th>
          <th>Monotonicity</th>
          <th>適合？</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td><code>_id</code>（ObjectId）</td>
          <td>極高</td>
          <td>均勻</td>
          <td>單調</td>
          <td>否（單調寫熱）</td>
      </tr>
      <tr>
          <td><code>tenantId</code></td>
          <td>中</td>
          <td>偏斜</td>
          <td>否</td>
          <td>視 tenant 分布</td>
      </tr>
      <tr>
          <td><code>{ tenantId: 1, _id: &quot;hashed&quot; }</code></td>
          <td>高</td>
          <td>均勻</td>
          <td>否</td>
          <td>通常合適</td>
      </tr>
      <tr>
          <td><code>country</code></td>
          <td>極低（~200）</td>
          <td>嚴重偏斜</td>
          <td>否</td>
          <td>否</td>
      </tr>
  </tbody>
</table>
<p><strong>Step 4：dry-run 採樣</strong>。對既有資料採樣，跑 <code>db.coll.aggregate([{$sample:{size:100000}}, {$group:{_id:&quot;$candidateKey&quot;, c:{$sum:1}}}, {$sort:{c:-1}}])</code> 看分布、確認沒有單一 key value 吃掉 &gt; 20% 流量。</p>
<p><strong>Step 5：shardCollection</strong>。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-javascript" data-lang="javascript"><span class="line"><span class="ln">1</span><span class="cl"><span class="nx">sh</span><span class="p">.</span><span class="nx">enableSharding</span><span class="p">(</span><span class="s2">&#34;shop&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="nx">sh</span><span class="p">.</span><span class="nx">shardCollection</span><span class="p">(</span><span class="s2">&#34;shop.orders&#34;</span><span class="p">,</span> <span class="p">{</span> <span class="nx">tenantId</span><span class="o">:</span> <span class="mi">1</span><span class="p">,</span> <span class="nx">_id</span><span class="o">:</span> <span class="s2">&#34;hashed&#34;</span> <span class="p">})</span></span></span></code></pre></div><p>先在 staging 跑流量重放、確認 chunk 分布平均、targeted query 比例 &gt; 90%。</p>
<p><strong>Step 6：監控</strong>。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-javascript" data-lang="javascript"><span class="line"><span class="ln">1</span><span class="cl"><span class="nx">sh</span><span class="p">.</span><span class="nx">status</span><span class="p">()</span>                              <span class="c1">// 看 cluster 狀態
</span></span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="c1"></span><span class="nx">db</span><span class="p">.</span><span class="nx">orders</span><span class="p">.</span><span class="nx">getShardDistribution</span><span class="p">()</span>         <span class="c1">// 看 chunk 分布
</span></span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="c1"></span><span class="nx">db</span><span class="p">.</span><span class="nx">adminCommand</span><span class="p">({</span> <span class="nx">balancerStatus</span><span class="o">:</span> <span class="mi">1</span> <span class="p">})</span>   <span class="c1">// 看 balancer 狀態
</span></span></span></code></pre></div><p><strong>Step 7：若已上錯 key</strong>。評估 <code>reshardCollection</code>（4.4+）vs application-level 雙寫遷移：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-javascript" data-lang="javascript"><span class="line"><span class="ln">1</span><span class="cl"><span class="nx">db</span><span class="p">.</span><span class="nx">adminCommand</span><span class="p">({</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl">  <span class="nx">reshardCollection</span><span class="o">:</span> <span class="s2">&#34;shop.orders&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">3</span><span class="cl">  <span class="nx">key</span><span class="o">:</span> <span class="p">{</span> <span class="nx">tenantId</span><span class="o">:</span> <span class="mi">1</span><span class="p">,</span> <span class="nx">region</span><span class="o">:</span> <span class="mi">1</span><span class="p">,</span> <span class="nx">_id</span><span class="o">:</span> <span class="s2">&#34;hashed&#34;</span> <span class="p">}</span>
</span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="p">})</span></span></span></code></pre></div><p><code>reshardCollection</code> 進入 cutover 後不能回退、必須 dry-run 估完時間 + 磁碟 + IO 影響再上。</p>
<p>驗證點：targeted query 比例 &gt; 90%、單 shard QPS 變異係數 &lt; 20%、balancer migration 速率追上寫入速率。</p>
<p>Rollback boundary：<code>shardCollection</code> 是不可逆操作（5.0 前完全不可逆、5.0+ 透過 reshardCollection 可改但需重做）；<code>reshardCollection</code> 進入 cutover 後不能回退。</p>
<h2 id="失敗模式">失敗模式</h2>
<p><strong>單調 key 寫熱點</strong>：<code>_id</code>（ObjectId）/ 時間戳 / 自增 ID 當 ranged shard key → 所有寫進最後 chunk，scale-out 等於零。修法是 hashed key 或 compound key 把單調軸拌散。</p>
<p><strong>低 cardinality key</strong>：用 <code>country</code> 當 shard key、某個 country 佔 80% 流量、chunk 無法繼續 split、該 shard 永久熱。修法是加一個高 cardinality 軸（compound key）讓 chunk 可繼續分。</p>
<p><strong>Tenant skew</strong>：B2B 場景大客戶獨佔 chunk、且該 tenant 的 chunk 還會繼續長大、balancer 搬不走。修法 compound key <code>{ tenantId: 1, _id: &quot;hashed&quot; }</code> — tenant 隔離但 tenant 內 hash 散開。</p>
<p><strong>Scatter-gather 過多</strong>：選了 hashed <code>_id</code> 但業務查詢主要是 <code>tenantId</code> 範圍查、每筆 query 打所有 shard、p99 隨 shard 數線性退化。修法 compound key 把常用查詢軸放第一位、targeted query 才能對 single shard。</p>
<p><strong>Resharding 卡在 build 階段</strong>：磁碟不夠（需 1.2x source size）、IO 飽和影響線上 workload、預期 4 小時實際跑 14 小時。修法是先擴磁碟、staging 跑 dry-run 量實際耗時、production 在低峰期啟動。</p>
<p><strong>Zone sharding 規則打架</strong>：合規規則（資料必須留在某 region）跟負載平衡規則衝突、balancer 無法移動 chunk → 熱點固化。修法是 zone 規則 vs balancer 設計階段就劃清、不要事後加 zone。</p>
<p><strong>誤把多 cluster 當分 shard 解</strong>：blast radius 議題塞到 sharded cluster、單 cluster 故障仍打掉全部 microservice。該分 cluster 的就分 cluster、不是塞到 shard。9.C38 Toyota 揭露：7K txn/sec 仍切 20 DB 的 trigger 是 microservice ownership、不是吞吐。</p>
<p><strong>Cluster 擴容時間估計太樂觀</strong>：MongoDB cluster 擴容是天級議題、不是 console 點點就好。9.C36 Coinbase 揭露 cluster 擴容要 70 分鐘（口徑：Coinbase 特定環境 cluster tier / 資料量 / Atlas API 條件下、reactive scaling 起點到完成、非 MongoDB 普遍承諾）；預測性流量必須走 predictive / scheduled scaling、不能只靠 sharded cluster 動態橫向擴展接住 surge（見 <a href="../connection-management-and-cache-layer/">connection management and cache layer</a>）。</p>
<p>Anti-recommendation：</p>
<ul>
<li>寫入 &lt; 5K WPS、storage &lt; 1TB、single replica set 還能撐就不該分 shard；分了之後 aggregation、transaction、<code>$lookup</code>、index 成本全部跳一級</li>
<li><strong>shard vs 多 cluster 對照</strong>：吞吐沒撞牆但 blast radius / ownership 是議題、走多 cluster 不是強行分 shard（9.C38 Toyota 7K txn/sec 仍切 20 DB 的 trigger）</li>
<li>跨 case 合成 frame：「不是所有資料都該進同一個 MongoDB cluster」、按 microservice ownership / blast radius / 合規邊界切</li>
</ul>
<h2 id="容量與觀測">容量與觀測</h2>
<p>關鍵 metric：</p>
<ul>
<li><strong>Shard 分布健康</strong>：每 shard QPS / CPU / disk usage 變異係數（&lt; 20% 合理）</li>
<li><strong>Query 路由</strong>：targeted vs scatter-gather query 比例（targeted &gt; 90% 合理）</li>
<li><strong>Balancer 健康</strong>：chunk migration rate、balancer round duration</li>
<li><strong>Cluster 邊界</strong>：cluster-to-cluster ownership 邊界、跨 cluster query 比例</li>
</ul>
<p>Mongo command：</p>
<ul>
<li><code>sh.status()</code>：cluster 整體狀態</li>
<li><code>db.coll.getShardDistribution()</code>：collection 在各 shard 的分布</li>
<li><code>db.adminCommand({balancerStatus:1})</code>：balancer 狀態</li>
<li><code>db.serverStatus().sharding</code>：sharding metric</li>
</ul>
<p><code>mongos</code> profiler：每 query 帶 <code>executionStats.executionStages.shards[]</code>、看是否 single shard。</p>
<p>回到 <a href="/blog/backend/04-observability/observability-evidence-package/" data-link-title="4.20 Observability Evidence Package" data-link-desc="把 log、metric、trace、audit 與資料品質限制包成可交接證據">4.20 observability evidence</a>：把 shard distribution、targeted ratio、resharding 進度列為 evidence 三件套。</p>
<p>回到 <a href="/blog/backend/09-performance-capacity/saturation-discovery/" data-link-title="9.4 Saturation Discovery" data-link-desc="找出 throughput plateau 與 latency knee 的方法">9.4 saturation discovery</a>：hot shard 是 partition-level saturation 的典型例子。</p>
<p>回到 <a href="/blog/backend/09-performance-capacity/bottleneck-localization/" data-link-title="9.5 瓶頸定位流程" data-link-desc="從 app 到 DB / cache / broker / 第三方 quota 的逐層瓶頸定位">9.5 bottleneck localization</a>：當整 cluster CPU 看似只用 25%、實際是 1/4 shard 在 100%。</p>
<h2 id="邊界與整合">邊界與整合</h2>
<p>Sibling deep articles：</p>
<ul>
<li><a href="../schema-design-pattern/">schema design pattern</a> — document 形狀決定 shard key 選擇空間</li>
<li><a href="../aggregation-pipeline-optimization/">aggregation pipeline optimization</a> — cross-shard aggregation 的 <code>$out</code> / <code>$merge</code> 限制</li>
<li><a href="../change-streams-kafka/">change streams + Kafka</a> — cluster-wide vs collection-level change stream 在 sharded cluster 的差異</li>
<li><a href="../connection-management-and-cache-layer/">connection management and cache layer</a> — cluster 擴容時間是天級議題、必須跟 predictive scaling / proxy 層配合</li>
</ul>
<p>Migration playbook：</p>
<ul>
<li>避免自管 sharding 走 <a href="/blog/backend/01-database/vendors/mongodb/migrate-to-atlas/" data-link-title="MongoDB → Atlas：Atlas 不是 MongoDB &#43; managed、是另一個 product" data-link-desc="Atlas 號稱「MongoDB managed」但 operational model 完全不同（auto-scaling / VPC peering / IAM-driven access / 內建 backup / billing 模型）；本文採用 Type C operational redesign hybrid 結構、4-phase operational migration &#43; drop-in cutover、5 個 production 踩雷（連線數限制 / IP whitelist / backup retention / IAM token 過期 / billing 暴漲）">→ Atlas</a> 用 managed shard tier</li>
<li>徹底重新分區走 <a href="/blog/backend/01-database/vendors/mongodb/shard-expansion-multi-dc/" data-link-title="MongoDB Shard Expansion &#43; Multi-DC：Type F「不需要 parallel run」的 multi-region 例外" data-link-desc="MongoDB sharded cluster 加 shard &#43; 跨 DC expansion 是 Type F「topology re-layout」第 3 個 dogfood — 同時改 sharding &#43; replication topology &#43; region distribution；驗證 [#128](/report/data-topology-as-audit-dimension/) self-aware limitation 第 3 點「Type F 不需要 parallel run」claim 的例外（multi-region rollout 必須 parallel run &#43; 切流量）；涵蓋 chunk migration / replica set add member / cross-DC routing">shard expansion + multi-DC</a></li>
</ul>
<p>跟 1.x 互引：<a href="/blog/backend/01-database/kv-document-capacity-planning/" data-link-title="1.10 KV / Document DB 容量規劃" data-link-desc="DynamoDB / Cosmos DB / Bigtable / MongoDB 等 KV / Document DB 的容量設計、partition key 取捨、capacity mode 選擇">1.10 KV / Document DB 容量規劃</a> 把 shard key 列為 capacity 決策；<a href="/blog/backend/01-database/large-scale-db-migration/" data-link-title="1.12 大規模 DB 遷移實戰" data-link-desc="跨 DB 遷移的 dual-write、[shadow read](/backend/knowledge-cards/shadow-read/)、cutover、rollback 流程 — 從實戰案例提煉的工程做法">1.12 大規模 DB 遷移實戰</a> 收 resharding 失敗 retrospective。</p>
<p>跨 vendor 對照：<a href="/blog/backend/01-database/vendors/dynamodb/" data-link-title="DynamoDB" data-link-desc="AWS managed key-value、partition-based scaling、9000 萬 RPS sustained 實戰證據">DynamoDB vendor page</a>（partition key + adaptive capacity + backfill 可改）、<a href="/blog/backend/01-database/vendors/cosmosdb/" data-link-title="Azure Cosmos DB" data-link-desc="全球分散式 multi-model DB、5 個 consistency levels、Microsoft 自家 dogfood 證據">Cosmos DB vendor page</a>（partition key 不可改）。</p>
<h2 id="相關連結">相關連結</h2>
<ul>
<li><a href="/blog/backend/01-database/vendors/mongodb/" data-link-title="MongoDB" data-link-desc="Document database 代表、Atlas managed、跨雲可用、許多大規模平台從 MongoDB 起家">MongoDB vendor overview</a> — 本文是該頁尾「shard key 選型」backlog 的深度展開</li>
<li><a href="/blog/posts/vendor-%E6%B7%B1%E5%BA%A6%E6%8A%80%E8%A1%93%E6%96%87%E7%AB%A0%E6%96%B9%E6%B3%95%E8%AB%96%E7%9A%84%E6%BC%94%E5%8C%96%E7%B4%80%E9%8C%84%E5%90%8C-vendor-%E7%B3%BB%E5%88%97%E7%9A%84%E9%96%8B%E5%A0%B4%E8%BC%AA%E6%9B%BF%E9%A9%97%E8%AD%89/" data-link-title="Vendor 深度技術文章方法論的演化紀錄：同 vendor 系列的開場輪替驗證" data-link-desc="vendor overview 飽和後要寫單一功能深度文章、需要選題與結構依據時回來。這套方法論的驗證來源與 cadence variant 在高風險場景（同 vendor sub-tool 系列）的實證。">Vendor 深度技術文章方法論</a></li>
<li><a href="/blog/backend/09-performance-capacity/cases/toyota-connected-mongodb-telematics-iot/" data-link-title="9.C38 Toyota Connected：MongoDB Atlas 撐 900 萬車輛 telematics、月 180 億 transaction" data-link-desc="Toyota Connected 用 MongoDB Atlas 撐 Safety Connect 900 萬車、月 180 億 transaction、緊急訊號 3 秒內到 agent">9.C38 Toyota Connected</a> — 20 個 Atlas DB 切 blast radius</li>
<li><a href="/blog/backend/09-performance-capacity/cases/coinbase-mongodb-document-platform/" data-link-title="9.C36 Coinbase：MongoDB 撐 Ruby 單體 &#43; 1.5M reads/sec identity 服務" data-link-desc="Coinbase 以 MongoDB 為主資料層、自建 mongobetween connection proxy、users 服務在加密貨幣 surge 時撐 1.5M reads/sec">9.C36 Coinbase</a> — cluster 擴容 70 分鐘特定環境數字</li>
<li>官方：<a href="https://www.mongodb.com/docs/manual/sharding/">MongoDB Sharding</a>、<a href="https://www.mongodb.com/docs/manual/core/sharding-shard-key/">Choosing a Shard Key</a>、<a href="https://www.mongodb.com/docs/manual/core/sharding-reshard-a-collection/">Resharding</a></li>
</ul>
]]></content:encoded></item></channel></rss>