<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/"><channel><title>Distributed on Tarragon</title><link>https://tarrragon.github.io/blog/tags/distributed/</link><description>Recent content in Distributed on Tarragon</description><generator>Hugo -- gohugo.io</generator><language>zh-TW</language><copyright>Tarragon (CC BY 4.0)</copyright><lastBuildDate>Tue, 19 May 2026 00:00:00 +0000</lastBuildDate><atom:link href="https://tarrragon.github.io/blog/tags/distributed/index.xml" rel="self" type="application/rss+xml"/><item><title>CockroachDB</title><link>https://tarrragon.github.io/blog/backend/01-database/vendors/cockroachdb/</link><pubDate>Wed, 13 May 2026 00:00:00 +0000</pubDate><guid>https://tarrragon.github.io/blog/backend/01-database/vendors/cockroachdb/</guid><description>&lt;p>CockroachDB 是分散式 SQL、PostgreSQL wire protocol 相容、跨 region 強一致。設計理念接近 Spanner（線性化、跨 region &lt;a href="https://tarrragon.github.io/blog/backend/knowledge-cards/quorum/" data-link-title="Quorum" data-link-desc="分散式系統以多數節點同意作為提交或讀取有效性的門檻">quorum&lt;/a>），但採 HLC + Raft 而非 TrueTime hardware，是 open source + 跨雲可用的全球 OLTP 選擇。&lt;/p>
&lt;h2 id="教學路線distributed-sql-與跨雲一致性">教學路線：Distributed SQL 與跨雲一致性&lt;/h2>
&lt;p>CockroachDB 服務頁的教學目標是把 PostgreSQL-like 介面背後的 range sharding、Raft replication、serializable transaction、leaseholder 與 region placement 說清楚。讀者讀完後要能判斷 distributed SQL 何時能取代自管 sharding，何時會把 latency 與 retry 壓力推回應用層。&lt;/p>
&lt;table>
 &lt;thead>
 &lt;tr>
 &lt;th>學習段&lt;/th>
 &lt;th>核心問題&lt;/th>
 &lt;th>對應段落&lt;/th>
 &lt;/tr>
 &lt;/thead>
 &lt;tbody>
 &lt;tr>
 &lt;td>Distributed SQL&lt;/td>
 &lt;td>SQL 介面如何藏住 range sharding 與 Raft replication&lt;/td>
 &lt;td>定位、容量特性&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>Serializable default&lt;/td>
 &lt;td>transaction retry、contention、latency 如何影響應用設計&lt;/td>
 &lt;td>容量規劃要點、&lt;a href="https://tarrragon.github.io/blog/backend/knowledge-cards/isolation-level/" data-link-title="Isolation Level" data-link-desc="說明資料庫交易隔離級別如何影響並發讀寫結果">Isolation Level&lt;/a>&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>Region placement&lt;/td>
 &lt;td>multi-region table、leaseholder、survival goal 如何服務產品需求&lt;/td>
 &lt;td>適用場景、跟其他 vendor 的取捨&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>Migration pressure&lt;/td>
 &lt;td>從 PostgreSQL / MySQL 或自管 sharding 過來時要檢查哪些差異&lt;/td>
 &lt;td>預計實作話題、案例對照&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>替代路由&lt;/td>
 &lt;td>何時留 PostgreSQL、用 Spanner、Aurora DSQL 或 application sharding&lt;/td>
 &lt;td>不適用場景、下一步路由&lt;/td>
 &lt;/tr>
 &lt;/tbody>
&lt;/table>
&lt;h2 id="定位spanner-的開源--跨雲替代">定位：Spanner 的開源 / 跨雲替代&lt;/h2>
&lt;p>CockroachDB 跟 Spanner 解決同一個問題（跨 region 強一致 SQL）、但定位不同：&lt;/p>
&lt;ul>
&lt;li>Spanner：GCP managed service、用 TrueTime hardware&lt;/li>
&lt;li>CockroachDB：開源（雙授權）、可自管 + Cockroach Cloud、跨 AWS / GCP / Azure / on-prem、用 HLC + Raft&lt;/li>
&lt;/ul>
&lt;p>選 CockroachDB 的核心訴求：需要跨 region 強一致 SQL + 想避免雲商 lock-in、想自管或跨雲部署。&lt;/p>
&lt;p>詳見 &lt;a href="https://tarrragon.github.io/blog/backend/01-database/global-distributed-oltp/" data-link-title="1.11 全球分散式 OLTP" data-link-desc="Spanner / Aurora DSQL / Cosmos DB multi-region write / CockroachDB / TiDB 的全球一致性取捨">1.11 全球分散式 OLTP&lt;/a> 的 CockroachDB 段。&lt;/p>
&lt;h2 id="容量特性">容量特性&lt;/h2>
&lt;p>&lt;strong>節點即容量單位&lt;/strong>：&lt;/p>
&lt;ul>
&lt;li>跟 Spanner 同樣設計、節點數量決定容量&lt;/li>
&lt;li>每節點承擔 query + storage + replication&lt;/li>
&lt;li>線性擴展（理論）、實際依 query pattern&lt;/li>
&lt;/ul>
&lt;p>&lt;strong>跨 region 配置&lt;/strong>：&lt;/p>
&lt;ul>
&lt;li>multi-region survival goal（zone-level / region-level）&lt;/li>
&lt;li>跨 region quorum 必要、決定 latency&lt;/li>
&lt;li>跟 Spanner 同樣的物理限制（跨洲 100ms+）&lt;/li>
&lt;/ul>
&lt;p>&lt;strong>Replication&lt;/strong>：&lt;/p>
&lt;ul>
&lt;li>Raft consensus per range&lt;/li>
&lt;li>預設 3-replica&lt;/li>
&lt;li>可配置每個 region 不同 replica count（Survival Goals）&lt;/li>
&lt;/ul>
&lt;h2 id="適用場景">適用場景&lt;/h2>
&lt;p>&lt;strong>1. 需要跨 region 強一致 SQL + 跨雲&lt;/strong>：&lt;/p></description><content:encoded><![CDATA[<p>CockroachDB 是分散式 SQL、PostgreSQL wire protocol 相容、跨 region 強一致。設計理念接近 Spanner（線性化、跨 region <a href="/blog/backend/knowledge-cards/quorum/" data-link-title="Quorum" data-link-desc="分散式系統以多數節點同意作為提交或讀取有效性的門檻">quorum</a>），但採 HLC + Raft 而非 TrueTime hardware，是 open source + 跨雲可用的全球 OLTP 選擇。</p>
<h2 id="教學路線distributed-sql-與跨雲一致性">教學路線：Distributed SQL 與跨雲一致性</h2>
<p>CockroachDB 服務頁的教學目標是把 PostgreSQL-like 介面背後的 range sharding、Raft replication、serializable transaction、leaseholder 與 region placement 說清楚。讀者讀完後要能判斷 distributed SQL 何時能取代自管 sharding，何時會把 latency 與 retry 壓力推回應用層。</p>
<table>
  <thead>
      <tr>
          <th>學習段</th>
          <th>核心問題</th>
          <th>對應段落</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>Distributed SQL</td>
          <td>SQL 介面如何藏住 range sharding 與 Raft replication</td>
          <td>定位、容量特性</td>
      </tr>
      <tr>
          <td>Serializable default</td>
          <td>transaction retry、contention、latency 如何影響應用設計</td>
          <td>容量規劃要點、<a href="/blog/backend/knowledge-cards/isolation-level/" data-link-title="Isolation Level" data-link-desc="說明資料庫交易隔離級別如何影響並發讀寫結果">Isolation Level</a></td>
      </tr>
      <tr>
          <td>Region placement</td>
          <td>multi-region table、leaseholder、survival goal 如何服務產品需求</td>
          <td>適用場景、跟其他 vendor 的取捨</td>
      </tr>
      <tr>
          <td>Migration pressure</td>
          <td>從 PostgreSQL / MySQL 或自管 sharding 過來時要檢查哪些差異</td>
          <td>預計實作話題、案例對照</td>
      </tr>
      <tr>
          <td>替代路由</td>
          <td>何時留 PostgreSQL、用 Spanner、Aurora DSQL 或 application sharding</td>
          <td>不適用場景、下一步路由</td>
      </tr>
  </tbody>
</table>
<h2 id="定位spanner-的開源--跨雲替代">定位：Spanner 的開源 / 跨雲替代</h2>
<p>CockroachDB 跟 Spanner 解決同一個問題（跨 region 強一致 SQL）、但定位不同：</p>
<ul>
<li>Spanner：GCP managed service、用 TrueTime hardware</li>
<li>CockroachDB：開源（雙授權）、可自管 + Cockroach Cloud、跨 AWS / GCP / Azure / on-prem、用 HLC + Raft</li>
</ul>
<p>選 CockroachDB 的核心訴求：需要跨 region 強一致 SQL + 想避免雲商 lock-in、想自管或跨雲部署。</p>
<p>詳見 <a href="/blog/backend/01-database/global-distributed-oltp/" data-link-title="1.11 全球分散式 OLTP" data-link-desc="Spanner / Aurora DSQL / Cosmos DB multi-region write / CockroachDB / TiDB 的全球一致性取捨">1.11 全球分散式 OLTP</a> 的 CockroachDB 段。</p>
<h2 id="容量特性">容量特性</h2>
<p><strong>節點即容量單位</strong>：</p>
<ul>
<li>跟 Spanner 同樣設計、節點數量決定容量</li>
<li>每節點承擔 query + storage + replication</li>
<li>線性擴展（理論）、實際依 query pattern</li>
</ul>
<p><strong>跨 region 配置</strong>：</p>
<ul>
<li>multi-region survival goal（zone-level / region-level）</li>
<li>跨 region quorum 必要、決定 latency</li>
<li>跟 Spanner 同樣的物理限制（跨洲 100ms+）</li>
</ul>
<p><strong>Replication</strong>：</p>
<ul>
<li>Raft consensus per range</li>
<li>預設 3-replica</li>
<li>可配置每個 region 不同 replica count（Survival Goals）</li>
</ul>
<h2 id="適用場景">適用場景</h2>
<p><strong>1. 需要跨 region 強一致 SQL + 跨雲</strong>：</p>
<ul>
<li>multi-region active-active write</li>
<li>GCP-only（Spanner）或 AWS-only（Aurora DSQL）和部署策略不合</li>
<li>對應 <a href="/blog/backend/01-database/global-distributed-oltp/" data-link-title="1.11 全球分散式 OLTP" data-link-desc="Spanner / Aurora DSQL / Cosmos DB multi-region write / CockroachDB / TiDB 的全球一致性取捨">1.11 全球分散式 OLTP</a> 的選型決策</li>
</ul>
<p><strong>2. PostgreSQL wire protocol 相容路徑</strong>：</p>
<ul>
<li>既有 PostgreSQL 應用想升級到分散式</li>
<li>應用層改動小（保留 PostgreSQL driver / ORM）</li>
<li>注意：PostgreSQL 相容要以實際 query、extension 與 migration test 驗證</li>
</ul>
<p><strong>3. 自管 on-prem / hybrid</strong>：</p>
<ul>
<li>金融 / 受監管產業需要 on-prem</li>
<li>Spanner / Aurora DSQL 以 cloud service 為主</li>
<li>CockroachDB 可自管</li>
</ul>
<p><strong>4. 想避免單一 vendor 全球分散式 lock-in</strong>：</p>
<ul>
<li>開源 + 跨雲、可遷移性高</li>
<li>但企業版功能要付費（CockroachDB Cloud 或 Enterprise license）</li>
</ul>
<h2 id="不適用場景">不適用場景</h2>
<p><strong>1. single-region OLTP 夠用</strong>：</p>
<ul>
<li>90% 場景 PostgreSQL / Aurora 已夠</li>
<li>CockroachDB 有分散式 overhead（每個寫經 Raft）</li>
<li>替代：PostgreSQL、Aurora、MySQL</li>
</ul>
<p><strong>2. 極端高吞吐 single-query</strong>：</p>
<ul>
<li>CockroachDB 寫入有 Raft 開銷、單機吞吐 &lt; PostgreSQL</li>
<li>整體吞吐靠 scale-out 達成、單一 query latency 較高</li>
</ul>
<p><strong>3. 跨洲低延遲（&lt; 50ms）</strong>：</p>
<ul>
<li>跟 Spanner 同樣物理限制</li>
<li>跨洲 quorum 100ms+ 是物理成本</li>
</ul>
<p><strong>4. 預算極敏感的小 workload</strong>：</p>
<ul>
<li>CockroachDB 至少 3 個節點（Raft quorum）</li>
<li>跟 single-instance PostgreSQL 比較貴</li>
</ul>
<p><strong>5. 需要 PostgreSQL 進階特性</strong>：</p>
<ul>
<li>部分 PostgreSQL extension 或行為需要替代方案</li>
<li>partial index、exclusion constraint 等可能缺</li>
</ul>
<h2 id="跟其他-vendor-的取捨">跟其他 vendor 的取捨</h2>
<p><strong>vs Spanner（GCP）</strong>：</p>
<ul>
<li>CockroachDB：開源、跨雲、可自管</li>
<li>Spanner：GCP-only、TrueTime hardware、Google 規模驗證</li>
<li>選 CockroachDB：跨雲 / on-prem 需求</li>
<li>選 Spanner：GCP 生態 + managed operation + Google 規模驗證的成熟度</li>
</ul>
<p><strong>vs Aurora DSQL（AWS 2024）</strong>：</p>
<ul>
<li>CockroachDB：跨雲、生產驗證較久</li>
<li>Aurora DSQL：AWS-only、serverless、新（2024）</li>
<li>選 CockroachDB：跨雲、想避免 AWS lock-in</li>
<li>選 Aurora DSQL：AWS 生態 + 已用 PostgreSQL + serverless 訴求</li>
</ul>
<p><strong>vs TiDB</strong>：</p>
<ul>
<li>CockroachDB：PostgreSQL wire、英語 / 歐美生態深</li>
<li>TiDB：MySQL wire、亞洲生態深、HTAP（OLTP + OLAP 同庫）</li>
<li>選 CockroachDB：PostgreSQL 應用、跨雲</li>
<li>選 TiDB：MySQL 應用、需要 OLAP 整合、亞洲市場</li>
</ul>
<p><strong>vs PostgreSQL（傳統）</strong>：</p>
<ul>
<li>CockroachDB：分散式、跨 region 強一致</li>
<li>PostgreSQL：single-primary、跨 region 是 async replication</li>
<li>選 CockroachDB：需要跨 region 強一致</li>
<li>選 PostgreSQL：single-region 夠用（90% 場景）</li>
</ul>
<p><strong>vs Aurora（single-region scaling）</strong>：</p>
<ul>
<li>CockroachDB：multi-region 強一致</li>
<li>Aurora：single-region scaling、跨 region 是 async Global Database</li>
<li>選 CockroachDB：需要 multi-region write</li>
<li>選 Aurora：single-region scaling + AWS 生態</li>
</ul>
<p><strong>vs MySQL + Vitess（self-managed distributed MySQL）</strong>：</p>
<ul>
<li>CockroachDB：PostgreSQL wire、transparent sharding（range-based）、跨 region 強一致內建</li>
<li>MySQL + Vitess：MySQL wire、application 層配 keyspace + shard key、跨 region 靠 application + async replication</li>
<li>選 CockroachDB：PostgreSQL 應用 + transparent multi-region + 想避開 Vitess operation burden</li>
<li>選 MySQL + Vitess：MySQL 應用 + 有 DBA 養 Vitess + 已是 YouTube / Slack 規模</li>
</ul>
<h2 id="容量規劃要點">容量規劃要點</h2>
<p><strong>1. Node count + zone / region 配置</strong>：</p>
<ul>
<li>至少 3 個節點（Raft quorum）</li>
<li>multi-region 通常 9+ 節點（3 region × 3 replica）</li>
<li>Survival Goals 配置決定每 region 復原能力</li>
</ul>
<p><strong>2. Range（CockroachDB 的 partition）</strong>：</p>
<ul>
<li>跟 DynamoDB partition、Spanner split 同類</li>
<li>CockroachDB 自動 split 大 range</li>
<li>application 主要管理 query locality、transaction retry 與 region placement</li>
</ul>
<p><strong>3. Locality 配置</strong>：</p>
<ul>
<li>跟 Spanner 一樣可以指定 voting region</li>
<li>寫入 locality 影響跨 region latency</li>
</ul>
<p><strong>4. Backup / restore</strong>：</p>
<ul>
<li>CockroachDB 原生 backup 支援 cluster-level snapshot</li>
<li>增量 backup 支援</li>
<li>注意：incremental backup chain 可能很長、定期 full backup</li>
</ul>
<p><strong>5. Self-managed vs Cockroach Cloud</strong>：</p>
<ul>
<li>Self-managed：需要 ops team、可跨雲 / on-prem</li>
<li>Cockroach Cloud：managed、跨 cloud（AWS / GCP / Azure）、可考慮 serverless tier</li>
</ul>
<h2 id="deep-article已完成">Deep article（已完成）</h2>
<p>本批 deep article 覆蓋 CockroachDB 從 consensus 機制、multi-region 配置到 managed 形態選型的核心 production 議題：</p>
<table>
  <thead>
      <tr>
          <th>主題</th>
          <th>文章</th>
          <th>對應 production 議題</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>HLC + per-range Raft、leaseholder、寫入 latency 結構</td>
          <td><a href="hlc-raft-consensus/">hlc-raft-consensus</a></td>
          <td>DoorDash Aurora 撞牆訊號（1.636 M QPS）、Netflix 380+ artery of small DBs 容量規劃顆粒</td>
      </tr>
      <tr>
          <td>SURVIVE ZONE / REGION FAILURE 倒推、業務 SLO 決定副本拓樸</td>
          <td><a href="survival-goals/">survival-goals</a></td>
          <td>Hard Rock RPO=0 倒推、Netflix Gaming 48-node 跨 4 region「為求 survival 而非 latency」反直覺</td>
      </tr>
      <tr>
          <td>Serializable default、application 必須包 retry loop、SAVEPOINT 語法</td>
          <td><a href="transaction-retry-pattern/">transaction-retry-pattern</a></td>
          <td>PG → CockroachDB application contract 重塑、5 種 retry failure mode（跨 case 合成 frame）</td>
      </tr>
      <tr>
          <td>REGIONAL BY ROW / TABLE / GLOBAL、跨州合規 + 邏輯一個 cluster</td>
          <td><a href="locality-aware-schema/">locality-aware-schema</a></td>
          <td>Hard Rock 跨 8 州 sportsbook + AWS Outposts、Outposts 是合規工具不是 latency 工具反直覺判讀</td>
      </tr>
      <tr>
          <td>三種 table locality 的選擇與 latency / 一致性取捨、選錯重配代價</td>
          <td><a href="multi-region-table-config/">multi-region-table-config</a></td>
          <td>Netflix multi-region 動機為 survival 非 latency、Hard Rock row-level 歸屬 + 單一邏輯 cluster</td>
      </tr>
      <tr>
          <td>Cockroach Cloud serverless vs dedicated、RU 計費、冷啟動 / scale</td>
          <td><a href="cloud-serverless/">cloud-serverless</a></td>
          <td>Netflix 需 Platform Team 反向 = managed 入口、Hard Rock 可預測賽季擴縮 vs serverless 突發甜蜜區</td>
      </tr>
      <tr>
          <td>Distributed SQL 三選一決策樹：撞牆訊號分型 + 七問題</td>
          <td><a href="aurora-dsql-spanner-decision-tree/">aurora-dsql-spanner-decision-tree</a></td>
          <td>DB4 cross-vendor entry：DoorDash / Netflix / Hard Rock driver path 識別 + sizing barrier</td>
      </tr>
  </tbody>
</table>
<p>DB4 cross-vendor entry：先看 <a href="aurora-dsql-spanner-decision-tree/">aurora-dsql-spanner-decision-tree</a> 識別 driver path、再進個別 vendor 深度。</p>
<p>multi-region-table-config 與 locality-aware-schema 切分：前者主寫「三種 table locality 怎麼選 + 選錯重配代價」、後者主寫「schema 怎麼配合 locality 設計（合規 boundary、跨州業務邏輯、Outposts 拓樸）」、兩者互補、survival goal 機制以 survival-goals 為 SSoT。</p>
<h2 id="後續擴充仍待補">後續擴充（仍待補）</h2>
<ul>
<li>PostgreSQL 相容性 audit（partial index / extension / SQL 行為 gap 清單）</li>
<li>Backup / restore 與 PITR 操作（incremental chain 管理、restore 演練）</li>
<li>Changefeed / CDC 配置（CockroachDB 原生 CDC 到 Kafka / sink）</li>
</ul>
<blockquote>
<p>「從 PostgreSQL 遷到 CockroachDB（playbook）」已由 <a href="/blog/backend/01-database/vendors/postgresql/migrate-to-cockroachdb/" data-link-title="PostgreSQL → CockroachDB：三維皆 High 的多重歸類 migration" data-link-desc="PostgreSQL → CockroachDB 是 Schema / Operational / Paradigm 三維皆 High 的 multi-axis migration、實證 [#127](/report/content-structure-by-max-diff-dimension/) 的「多重歸類跟 tie-breaking」規則；主結構走 Type E paradigm shift、Schema 差 &#43; Operational redesign 抽出獨立段；涵蓋 transaction model 重設計、SQL dialect gap、5 個 production 踩雷">PostgreSQL → CockroachDB migration</a> 涵蓋、不再列為待補。</p></blockquote>
<h2 id="anti-recommendation-與升級路由">Anti-recommendation 與升級路由</h2>
<p>CockroachDB 的 PostgreSQL-like 介面會降低導入門檻，但 distributed SQL 的成本會出現在 transaction retry、range lease、multi-region latency 與操作拓樸。這一段先說何時維持 PostgreSQL / Aurora，再說何時升級 CockroachDB、Cockroach Cloud、Spanner、Aurora DSQL 或 Vitess。</p>
<table>
  <thead>
      <tr>
          <th>機制 / 路線</th>
          <th>維持簡單設計的條件</th>
          <th>升級訊號</th>
          <th>主要引用路徑</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>PostgreSQL / Aurora</td>
          <td>single-region primary、async DR、read replica 已滿足需求</td>
          <td>multi-region write、region failure survival、跨雲部署是硬需求</td>
          <td><a href="/blog/backend/01-database/vendors/postgresql/" data-link-title="PostgreSQL" data-link-desc="多用途 OLTP 主流關聯式資料庫、MVCC、豐富 SQL 特性、是 Aurora / Cosmos DB / Spanner / CockroachDB / Aurora DSQL 的相容目標">PostgreSQL vendor</a>、<a href="/blog/backend/01-database/vendors/aurora/" data-link-title="AWS Aurora" data-link-desc="AWS managed PostgreSQL / MySQL、storage / compute 分離、&#43;75% 效能改善的 production 證據">Aurora vendor</a></td>
      </tr>
      <tr>
          <td>CockroachDB single-region</td>
          <td>需要水平擴容或 future multi-region，但目前在單區運作</td>
          <td>Raft overhead 讓成本高於 PostgreSQL，且沒有 region requirement</td>
          <td><a href="/blog/backend/knowledge-cards/distributed-sql/" data-link-title="Distributed SQL" data-link-desc="把 SQL 與交易語意延伸到多節點與多區域的資料庫形態">Distributed SQL</a></td>
      </tr>
      <tr>
          <td>CockroachDB multi-region</td>
          <td>跨雲 / on-prem、PostgreSQL wire、strong consistency 是主需求</td>
          <td>跨洲 p99 目標過低、transaction retry 影響 user flow</td>
          <td><a href="/blog/backend/knowledge-cards/quorum/" data-link-title="Quorum" data-link-desc="分散式系統以多數節點同意作為提交或讀取有效性的門檻">Quorum</a>、<a href="/blog/backend/knowledge-cards/latency-budget/" data-link-title="Latency Budget" data-link-desc="把 user-perceived latency 拆到每個 stage 的配額、反推架構選擇">Latency Budget</a></td>
      </tr>
      <tr>
          <td>Cockroach Cloud</td>
          <td>團隊仍能自管 Raft、backup、upgrade、node failure</td>
          <td>想把 operation transfer 給 vendor</td>
          <td><a href="/blog/backend/knowledge-cards/rto/" data-link-title="RTO" data-link-desc="說明恢復時間目標如何約束事故回復策略">RTO</a>、<a href="/blog/backend/knowledge-cards/rpo/" data-link-title="RPO" data-link-desc="說明恢復點目標如何定義可接受資料損失範圍">RPO</a></td>
      </tr>
      <tr>
          <td>Spanner</td>
          <td>跨雲或自管是硬需求</td>
          <td>GCP managed、TrueTime 成熟度、Google scale evidence 是主訴求</td>
          <td><a href="/blog/backend/01-database/vendors/spanner/" data-link-title="Google Cloud Spanner" data-link-desc="全球分散式 strong-consistency OLTP、TrueTime API、線性擴展到 10 億 req/sec">Spanner vendor</a></td>
      </tr>
      <tr>
          <td>Aurora DSQL</td>
          <td>跨雲 / on-prem 是硬需求</td>
          <td>AWS-only、serverless、PostgreSQL 相容與 AWS operation model 是主訴求</td>
          <td><a href="/blog/backend/01-database/vendors/postgresql/migrate-to-aurora-dsql/" data-link-title="PostgreSQL → Aurora DSQL Migration：PG wire-compatible Distributed SQL 的 Paradigm Shift" data-link-desc="Aurora DSQL（2024-12 re:Invent preview / 2025-05 GA）是 AWS 推的 PG wire-compatible *active-active distributed SQL*、跟 self-managed PG / Aurora PG 不同 paradigm（OCC &#43; snapshot isolation &#43; multi-region strong consistency）。Migration 結構是 *protocol drop-in &#43; paradigm shift*：app SQL 不太改、但 transaction retry / extension 缺位 / 多 region 一致性需重設計。本文走 DSQL vs Aurora PG vs self-managed PG 三軸對比、為什麼遷的三條 driver（global write / operational zero-touch / region resiliency）、Type E phased plan、5 production 踩雷（transaction retry 沒處理 / extension 缺位 / sequence throughput 限制 / Aurora PG 直升 DSQL 不可行 / region failover semantic）、跟 PG → Aurora 跟 PG → CockroachDB 對比">PG → Aurora DSQL Migration</a></td>
      </tr>
      <tr>
          <td>MySQL + Vitess</td>
          <td>PostgreSQL-like SQL 與 strong consistency 是主需求</td>
          <td>MySQL ecosystem、application sharding 與 Vitess ops 已成熟</td>
          <td><a href="/blog/backend/01-database/vendors/mysql/vitess-sharding/" data-link-title="MySQL Vitess Sharding：VTGate / VTTablet / VReplication / VSchema 四件套協作" data-link-desc="Vitess 不只是 MySQL sharding proxy、是 4 個 component 協作的完整 sharding 系統 — VTGate（query routing layer）、VTTablet（per-MySQL agent）、VReplication（跨 shard 資料移動）、VSchema（sharding metadata）。本文走 4 件套各自責任、keyspace / shard / tablet 架構、shard key 設計（Vindex）、配置 step-by-step、5 production 踩雷（cross-shard transaction / VStream lag / Vindex 不均勻 / resharding 切流 / VReplication 卡住）、跟自管 sharding 跟 PlanetScale 的對比">MySQL Vitess Sharding</a>、<a href="/blog/backend/knowledge-cards/database-sharding/" data-link-title="Database Sharding" data-link-desc="說明資料庫如何依 shard key 分散資料、路由請求與承擔跨 shard 查詢成本">Database Sharding</a></td>
      </tr>
  </tbody>
</table>
<p>CockroachDB 的簡單路徑是先證明 distributed SQL 的價值大於 retry 與 latency 成本。若 workload 仍是 single-region OLTP，PostgreSQL / Aurora 通常提供更低成本；若跨 region 寫入與一致性是產品承諾，CockroachDB 才成為主要候選。</p>
<p>Transaction retry 的升級路徑要進入 application contract。Serializable default 能保護一致性，但 retry 會把 idempotency、timeout、user-visible latency 與 workflow compensation 帶回應用層；這些條件要在 migration playbook 前先盤點。</p>
<h2 id="已知-limitation-與後續路由">已知 limitation 與後續路由</h2>
<p>CockroachDB overview 目前完成 distributed SQL 判斷。下一輪 deep article / playbook 應補 HLC + Raft、range / leaseholder、multi-region table locality、transaction retry pattern、PostgreSQL compatibility audit、Cockroach Cloud operation 與 PostgreSQL → CockroachDB migration。</p>
<h2 id="案例對照">案例對照</h2>
<p>CockroachDB 在 09 案例庫已有三條直接 case 軸線（OLTP 寫入擴展、polyglot 補位、合規邊界），另外兩條對比參考軸線（Spanner 設計理念、受監管金融）一併保留。</p>
<h3 id="direct-casecockroachdb-為主角">Direct case（CockroachDB 為主角）</h3>
<table>
  <thead>
      <tr>
          <th>案例</th>
          <th>主要工程議題</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td><a href="/blog/backend/09-performance-capacity/cases/doordash-cockroachdb-orders-platform/" data-link-title="9.C39 DoorDash：Aurora Postgres 寫入瓶頸 → CockroachDB 多主寫入" data-link-desc="DoorDash 從 Aurora Postgres 遷到 CockroachDB、解 1.6 M QPS 單主寫入瓶頸、外送平台爆量壓力下重做 OLTP 拓樸">9.C39 DoorDash</a></td>
          <td>Aurora Postgres single-primary 1.6 M QPS 撞牆 → multi-primary 解寫入</td>
      </tr>
      <tr>
          <td><a href="/blog/backend/09-performance-capacity/cases/netflix-cockroachdb-multi-region-fleet/" data-link-title="9.C40 Netflix：380&#43; CockroachDB cluster 的 multi-active 拓樸艦隊" data-link-desc="Netflix 把 Cassandra 不夠用的 transactional workload 移到 CockroachDB、380&#43; cluster / 60&#43; 跨 region、含 Open Connect、studio cloud drive、gaming control plane">9.C40 Netflix</a></td>
          <td>380+ cluster 艦隊、Cassandra 不夠用的 transactional workload 補位</td>
      </tr>
      <tr>
          <td><a href="/blog/backend/09-performance-capacity/cases/hard-rock-digital-cockroachdb-sports-betting/" data-link-title="9.C41 Hard Rock Digital：CockroachDB on AWS Outposts、Wire Act 合規 &#43; 跨州單一邏輯 DB" data-link-desc="Hard Rock Digital 用 CockroachDB 跨 AWS Outposts &#43; US-East-1、Wire Act 強制資料留州、單一邏輯 DB 解多州 sportsbook、100 node 32 vCPU 撐 Super Bowl">9.C41 Hard Rock Digital</a></td>
          <td>AWS Outposts + 跨州單一邏輯 DB、Wire Act 合規 + 賽季型擴縮容</td>
      </tr>
  </tbody>
</table>
<h3 id="對比參考案例">對比參考案例</h3>
<table>
  <thead>
      <tr>
          <th>案例（對比參考）</th>
          <th>跟 CockroachDB 的關係</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td><a href="/blog/backend/09-performance-capacity/cases/spanner-planetary-scale-database-gcp/" data-link-title="9.C10 Cloud Spanner：每秒 10 億請求的全球一致性資料庫" data-link-desc="Google Cloud Spanner 內部峰值 10 億 req/sec、跨地區強一致 — 全球分散式 OLTP 容量參考">9.C10 Spanner</a></td>
          <td>設計理念對標、CockroachDB 是開源版本</td>
      </tr>
      <tr>
          <td><a href="/blog/backend/09-performance-capacity/cases/standard-chartered-aurora-banking/" data-link-title="9.C14 Standard Chartered：受監管銀行的 Aurora 4000 TPS 容量提升" data-link-desc="Standard Chartered 銀行遷移到 Aurora 後吞吐量提升 10 倍至 4000 TPS、跨 7 個受監管市場">9.C14 Standard Chartered</a></td>
          <td>受監管金融、CockroachDB 可作為 on-prem 替代候選</td>
      </tr>
  </tbody>
</table>
<p>CockroachDB direct case 的讀法是「寫入擴展（DoorDash）→ polyglot 補位（Netflix）→ 合規邊界（Hard Rock Digital）」三條軸線；對比案例則提醒讀者：Spanner 提供 global consistency 的成熟對照，受監管金融類案例提醒部署位置、合規邊界與自管能力常和一致性需求同時決定 vendor。</p>
<h2 id="反向-sibling-路由">反向 sibling 路由</h2>
<p>CockroachDB 的反向 sibling 路由用來把 PostgreSQL 相容性和 distributed SQL 責任拆開。若讀者從 PostgreSQL 章節過來，先讀 <a href="/blog/backend/01-database/vendors/postgresql/migrate-to-cockroachdb/" data-link-title="PostgreSQL → CockroachDB：三維皆 High 的多重歸類 migration" data-link-desc="PostgreSQL → CockroachDB 是 Schema / Operational / Paradigm 三維皆 High 的 multi-axis migration、實證 [#127](/report/content-structure-by-max-diff-dimension/) 的「多重歸類跟 tie-breaking」規則；主結構走 Type E paradigm shift、Schema 差 &#43; Operational redesign 抽出獨立段；涵蓋 transaction model 重設計、SQL dialect gap、5 個 production 踩雷">PostgreSQL → CockroachDB migration</a>；若只是要 managed SQL 與 storage autoscale，先回 <a href="/blog/backend/01-database/vendors/aurora/" data-link-title="AWS Aurora" data-link-desc="AWS managed PostgreSQL / MySQL、storage / compute 分離、&#43;75% 效能改善的 production 證據">Aurora vendor</a>；若要 Google Cloud 原生 external consistency 與 fully managed control plane，再對照 <a href="/blog/backend/01-database/vendors/spanner/" data-link-title="Google Cloud Spanner" data-link-desc="全球分散式 strong-consistency OLTP、TrueTime API、線性擴展到 10 億 req/sec">Spanner vendor</a>。</p>
<p>這條路由的判準是「應用是否能承擔 distributed transaction 的語意差異」。SQL dialect 相近只降低 migration entry cost，真正的交付風險在 transaction retry、hot range、survival goal、backup restore 與 locality design。</p>
<h2 id="常見陷阱">常見陷阱</h2>
<ul>
<li><strong>single-region 用 CockroachDB</strong>：浪費分散式開銷、PostgreSQL 便宜很多</li>
<li><strong>跨洲 active-active 期待低延遲</strong>：物理限制、跨洲 quorum 100ms+</li>
<li><strong>PostgreSQL extension 假設</strong>：部分 extension 或 SQL 行為需要替代方案，應用要驗證</li>
<li><strong>不規劃 Survival Goals</strong>：default 配置可能不符合 RTO / RPO 需求</li>
<li><strong>backup chain 過長</strong>：incremental 不 full、recovery time 變長</li>
</ul>
<h2 id="下一步路由">下一步路由</h2>
<ul>
<li>完整 T1 對照：<a href="/blog/backend/01-database/vendors/" data-link-title="資料庫 Vendor 清單" data-link-desc="規劃 SQL、managed SQL、document、KV 與 distributed SQL 的服務頁撰寫順序與教學大綱">01-database vendors index</a></li>
<li>平行：<a href="/blog/backend/01-database/vendors/spanner/" data-link-title="Google Cloud Spanner" data-link-desc="全球分散式 strong-consistency OLTP、TrueTime API、線性擴展到 10 億 req/sec">Spanner vendor</a>、<a href="/blog/backend/01-database/vendors/aurora/" data-link-title="AWS Aurora" data-link-desc="AWS managed PostgreSQL / MySQL、storage / compute 分離、&#43;75% 效能改善的 production 證據">Aurora vendor</a>、<a href="/blog/backend/01-database/vendors/postgresql/" data-link-title="PostgreSQL" data-link-desc="多用途 OLTP 主流關聯式資料庫、MVCC、豐富 SQL 特性、是 Aurora / Cosmos DB / Spanner / CockroachDB / Aurora DSQL 的相容目標">PostgreSQL vendor</a></li>
<li>上游：<a href="/blog/backend/01-database/global-distributed-oltp/" data-link-title="1.11 全球分散式 OLTP" data-link-desc="Spanner / Aurora DSQL / Cosmos DB multi-region write / CockroachDB / TiDB 的全球一致性取捨">1.11 全球分散式 OLTP</a> — 完整選型對比</li>
<li>跨模組：<a href="/blog/backend/09-performance-capacity/capacity-planning/" data-link-title="9.6 容量規劃模型" data-link-desc="peak forecast、headroom budget、growth curve、autoscaling sizing">9.6 容量規劃模型</a>、<a href="/blog/backend/09-performance-capacity/slo-performance-budget/" data-link-title="9.12 SLO 與 Performance Budget" data-link-desc="performance budget 跟 SLO / error budget 的對接">9.12 SLO 與 Performance Budget</a></li>
<li>Last reviewed：2026-05-22（PostgreSQL compatibility / survival goal / managed offering 屬時間敏感 claim）</li>
<li>官方：<a href="https://www.cockroachlabs.com/docs/">CockroachDB Documentation</a></li>
</ul>
]]></content:encoded></item><item><title>PostgreSQL Citus Distributed：用 extension 把 PG 變成 sharded cluster</title><link>https://tarrragon.github.io/blog/backend/01-database/vendors/postgresql/citus-distributed/</link><pubDate>Tue, 19 May 2026 00:00:00 +0000</pubDate><guid>https://tarrragon.github.io/blog/backend/01-database/vendors/postgresql/citus-distributed/</guid><description>&lt;blockquote>
&lt;p>本文是 &lt;a href="https://tarrragon.github.io/blog/backend/01-database/vendors/postgresql/" data-link-title="PostgreSQL" data-link-desc="多用途 OLTP 主流關聯式資料庫、MVCC、豐富 SQL 特性、是 Aurora / Cosmos DB / Spanner / CockroachDB / Aurora DSQL 的相容目標">PostgreSQL&lt;/a> overview 的 implementation-layer deep article。Overview 已說明 PG 在 OLTP 譜系的定位、本文聚焦 &lt;em>Citus distributed extension&lt;/em> — 把 PG 變成 sharded cluster 的方式。&lt;/p>&lt;/blockquote>
&lt;hr>
&lt;p>當 PG single-primary 寫吞吐撞上單機極限（50K-100K WPS）、選項三條：&lt;/p>
&lt;ol>
&lt;li>&lt;strong>Application 層 sharding&lt;/strong>：應用層自管 shard routing&lt;/li>
&lt;li>&lt;strong>Citus&lt;/strong>：PG extension、自動 routing + cross-shard query&lt;/li>
&lt;li>&lt;strong>Distributed SQL&lt;/strong>（CockroachDB / Aurora DSQL / Spanner）：不同 engine&lt;/li>
&lt;/ol>
&lt;p>選 Citus 的核心 driver：&lt;em>保留 PG SQL syntax + extension 生態&lt;/em>。但「應用層幾乎不必改」是樂觀說法 — 實際上 application 必須圍繞 distribution column 重設計（query 加 filter / transaction 限定同 shard / reference table 量控制）、跟 Vitess 比 cross-shard query 自動化弱。代價是 &lt;em>coordinator / worker 部署複雜度 + cross-shard query 限制 + application schema 改造工作量&lt;/em>。&lt;/p>
&lt;p>閱讀本文前可先對齊 &lt;a href="https://tarrragon.github.io/blog/backend/knowledge-cards/database-sharding/" data-link-title="Database Sharding" data-link-desc="說明資料庫如何依 shard key 分散資料、路由請求與承擔跨 shard 查詢成本">Database Sharding&lt;/a> 的 shard key、routing、resharding 與 cross-shard query 語意；容量失衡時再接 &lt;a href="https://tarrragon.github.io/blog/backend/knowledge-cards/hot-partition/" data-link-title="Hot Partition" data-link-desc="說明分散式 KV / OLTP 中、單一 partition 流量遠超其他的容量問題">Hot Partition&lt;/a>。&lt;/p>
&lt;p>跟 &lt;a href="https://tarrragon.github.io/blog/backend/01-database/vendors/mysql/vitess-sharding/" data-link-title="MySQL Vitess Sharding：VTGate / VTTablet / VReplication / VSchema 四件套協作" data-link-desc="Vitess 不只是 MySQL sharding proxy、是 4 個 component 協作的完整 sharding 系統 — VTGate（query routing layer）、VTTablet（per-MySQL agent）、VReplication（跨 shard 資料移動）、VSchema（sharding metadata）。本文走 4 件套各自責任、keyspace / shard / tablet 架構、shard key 設計（Vindex）、配置 step-by-step、5 production 踩雷（cross-shard transaction / VStream lag / Vindex 不均勻 / resharding 切流 / VReplication 卡住）、跟自管 sharding 跟 PlanetScale 的對比">MySQL Vitess sharding&lt;/a> 的核心差異：Citus 是 &lt;em>PG extension&lt;/em>（PG 自己跑）、Vitess 是 &lt;em>獨立 proxy + tablet 系統&lt;/em>（包 MySQL）。Citus 用 PG 原生機制（FDW / extension hook）、Vitess 是 &lt;em>外部包裝&lt;/em>。&lt;/p></description><content:encoded><![CDATA[<blockquote>
<p>本文是 <a href="/blog/backend/01-database/vendors/postgresql/" data-link-title="PostgreSQL" data-link-desc="多用途 OLTP 主流關聯式資料庫、MVCC、豐富 SQL 特性、是 Aurora / Cosmos DB / Spanner / CockroachDB / Aurora DSQL 的相容目標">PostgreSQL</a> overview 的 implementation-layer deep article。Overview 已說明 PG 在 OLTP 譜系的定位、本文聚焦 <em>Citus distributed extension</em> — 把 PG 變成 sharded cluster 的方式。</p></blockquote>
<hr>
<p>當 PG single-primary 寫吞吐撞上單機極限（50K-100K WPS）、選項三條：</p>
<ol>
<li><strong>Application 層 sharding</strong>：應用層自管 shard routing</li>
<li><strong>Citus</strong>：PG extension、自動 routing + cross-shard query</li>
<li><strong>Distributed SQL</strong>（CockroachDB / Aurora DSQL / Spanner）：不同 engine</li>
</ol>
<p>選 Citus 的核心 driver：<em>保留 PG SQL syntax + extension 生態</em>。但「應用層幾乎不必改」是樂觀說法 — 實際上 application 必須圍繞 distribution column 重設計（query 加 filter / transaction 限定同 shard / reference table 量控制）、跟 Vitess 比 cross-shard query 自動化弱。代價是 <em>coordinator / worker 部署複雜度 + cross-shard query 限制 + application schema 改造工作量</em>。</p>
<p>閱讀本文前可先對齊 <a href="/blog/backend/knowledge-cards/database-sharding/" data-link-title="Database Sharding" data-link-desc="說明資料庫如何依 shard key 分散資料、路由請求與承擔跨 shard 查詢成本">Database Sharding</a> 的 shard key、routing、resharding 與 cross-shard query 語意；容量失衡時再接 <a href="/blog/backend/knowledge-cards/hot-partition/" data-link-title="Hot Partition" data-link-desc="說明分散式 KV / OLTP 中、單一 partition 流量遠超其他的容量問題">Hot Partition</a>。</p>
<p>跟 <a href="/blog/backend/01-database/vendors/mysql/vitess-sharding/" data-link-title="MySQL Vitess Sharding：VTGate / VTTablet / VReplication / VSchema 四件套協作" data-link-desc="Vitess 不只是 MySQL sharding proxy、是 4 個 component 協作的完整 sharding 系統 — VTGate（query routing layer）、VTTablet（per-MySQL agent）、VReplication（跨 shard 資料移動）、VSchema（sharding metadata）。本文走 4 件套各自責任、keyspace / shard / tablet 架構、shard key 設計（Vindex）、配置 step-by-step、5 production 踩雷（cross-shard transaction / VStream lag / Vindex 不均勻 / resharding 切流 / VReplication 卡住）、跟自管 sharding 跟 PlanetScale 的對比">MySQL Vitess sharding</a> 的核心差異：Citus 是 <em>PG extension</em>（PG 自己跑）、Vitess 是 <em>獨立 proxy + tablet 系統</em>（包 MySQL）。Citus 用 PG 原生機制（FDW / extension hook）、Vitess 是 <em>外部包裝</em>。</p>
<h2 id="citus-架構coordinator--worker">Citus 架構：Coordinator + Worker</h2>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="ln"> 1</span><span class="cl">                ┌─────────────────┐
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">   Application  │   Coordinator   │  ← 對外 PG wire protocol、planner、routing
</span></span><span class="line"><span class="ln"> 3</span><span class="cl">                │   (Citus + PG)  │
</span></span><span class="line"><span class="ln"> 4</span><span class="cl">                └────┬─────┬──────┘
</span></span><span class="line"><span class="ln"> 5</span><span class="cl">                     │     │
</span></span><span class="line"><span class="ln"> 6</span><span class="cl">              ┌──────┘     └──────┐
</span></span><span class="line"><span class="ln"> 7</span><span class="cl">              ▼                   ▼
</span></span><span class="line"><span class="ln"> 8</span><span class="cl">        ┌──────────┐         ┌──────────┐
</span></span><span class="line"><span class="ln"> 9</span><span class="cl">        │ Worker 1 │         │ Worker 2 │  ← 各跑 PG + Citus extension
</span></span><span class="line"><span class="ln">10</span><span class="cl">        │  (PG)    │         │  (PG)    │
</span></span><span class="line"><span class="ln">11</span><span class="cl">        │ shard 1,3│         │ shard 2,4│
</span></span><span class="line"><span class="ln">12</span><span class="cl">        └──────────┘         └──────────┘</span></span></code></pre></div><p><strong>Coordinator</strong>：</p>
<ul>
<li>對 application 看起來像 PG（同 port / 同 wire protocol）</li>
<li>接 SQL → Citus planner 把 query 分解 + route 給 worker</li>
<li>不存 data（distributed table 的 shard 在 worker 上）</li>
<li>存 <em>metadata</em>（哪個 shard 在哪個 worker）</li>
</ul>
<p><strong>Worker</strong>：</p>
<ul>
<li>標準 PG instance + Citus extension</li>
<li>各存若干 shard</li>
<li>接 coordinator 來的 query、跑 local execute、回結果</li>
</ul>
<p><strong>Shard</strong>：</p>
<ul>
<li>Distributed table 拆成 N 個 shard（預設 32）</li>
<li>每 shard 是 worker 上的 <em>physical PG table</em>（含 <code>_&lt;shardid&gt;</code> 後綴）</li>
<li>行為跟一般 PG table 一樣、可以直接連 worker 用 PG 工具 access</li>
</ul>
<h2 id="3-種-table-type">3 種 Table Type</h2>
<h3 id="distributed-table--跨-shard-切分">Distributed table — 跨 shard 切分</h3>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-sql" data-lang="sql"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="c1">-- 建一般 PG table
</span></span></span><span class="line"><span class="ln"> 2</span><span class="cl"><span class="c1"></span><span class="k">CREATE</span><span class="w"> </span><span class="k">TABLE</span><span class="w"> </span><span class="n">orders</span><span class="w"> </span><span class="p">(</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 3</span><span class="cl"><span class="w">    </span><span class="n">id</span><span class="w"> </span><span class="n">BIGSERIAL</span><span class="p">,</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 4</span><span class="cl"><span class="w">    </span><span class="n">user_id</span><span class="w"> </span><span class="nb">BIGINT</span><span class="w"> </span><span class="k">NOT</span><span class="w"> </span><span class="k">NULL</span><span class="p">,</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 5</span><span class="cl"><span class="w">    </span><span class="n">amount</span><span class="w"> </span><span class="nb">DECIMAL</span><span class="p">(</span><span class="mi">10</span><span class="p">,</span><span class="mi">2</span><span class="p">),</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 6</span><span class="cl"><span class="w">    </span><span class="n">created_at</span><span class="w"> </span><span class="k">TIMESTAMP</span><span class="p">,</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 7</span><span class="cl"><span class="w">    </span><span class="k">PRIMARY</span><span class="w"> </span><span class="k">KEY</span><span class="w"> </span><span class="p">(</span><span class="n">user_id</span><span class="p">,</span><span class="w"> </span><span class="n">id</span><span class="p">)</span><span class="w">  </span><span class="c1">-- PK 必須含 distribution column
</span></span></span><span class="line"><span class="ln"> 8</span><span class="cl"><span class="c1"></span><span class="p">);</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 9</span><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="ln">10</span><span class="cl"><span class="w"></span><span class="c1">-- 用 Citus 把它變 distributed
</span></span></span><span class="line"><span class="ln">11</span><span class="cl"><span class="c1"></span><span class="k">SELECT</span><span class="w"> </span><span class="n">create_distributed_table</span><span class="p">(</span><span class="s1">&#39;orders&#39;</span><span class="p">,</span><span class="w"> </span><span class="s1">&#39;user_id&#39;</span><span class="p">);</span></span></span></code></pre></div><p><code>user_id</code> 是 <em>distribution column</em> — Citus 用它的 hash 決定 row 屬哪個 shard。<code>PK 必須含 distribution column</code>（跟 MySQL partitioning 同要求）。</p>
<p>跟 Vitess Vindex 對比：</p>
<ul>
<li>Citus：hash distribution column → shard（單一 hash function、不可選 algorithm）</li>
<li>Vitess：Vindex 可選多種（hash / lookup_hash / xxhash / null）</li>
</ul>
<h3 id="reference-table--全-shard-共有">Reference table — 全 shard 共有</h3>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-sql" data-lang="sql"><span class="line"><span class="ln">1</span><span class="cl"><span class="k">CREATE</span><span class="w"> </span><span class="k">TABLE</span><span class="w"> </span><span class="n">products</span><span class="w"> </span><span class="p">(</span><span class="w">
</span></span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="w">    </span><span class="n">id</span><span class="w"> </span><span class="nb">SERIAL</span><span class="w"> </span><span class="k">PRIMARY</span><span class="w"> </span><span class="k">KEY</span><span class="p">,</span><span class="w">
</span></span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="w">    </span><span class="n">name</span><span class="w"> </span><span class="nb">VARCHAR</span><span class="p">(</span><span class="mi">100</span><span class="p">),</span><span class="w">
</span></span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="w">    </span><span class="n">price</span><span class="w"> </span><span class="nb">DECIMAL</span><span class="w">
</span></span></span><span class="line"><span class="ln">5</span><span class="cl"><span class="w"></span><span class="p">);</span><span class="w">
</span></span></span><span class="line"><span class="ln">6</span><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="ln">7</span><span class="cl"><span class="w"></span><span class="k">SELECT</span><span class="w"> </span><span class="n">create_reference_table</span><span class="p">(</span><span class="s1">&#39;products&#39;</span><span class="p">);</span></span></span></code></pre></div><p><code>products</code> 在 <em>每個 worker 都有完整 copy</em>、寫入 coordinator 廣播給所有 worker。</p>
<p>用途：</p>
<ul>
<li>小 lookup table（country code / product category 等）</li>
<li>跨 distributed table JOIN 時、reference table 在每 worker 上、不必 cross-shard</li>
<li>寫入頻率低（廣播 cost 跟 worker 數 linear）</li>
</ul>
<h3 id="local-table--coordinator-上的-pg-table">Local table — Coordinator 上的 PG table</h3>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-sql" data-lang="sql"><span class="line"><span class="ln">1</span><span class="cl"><span class="k">CREATE</span><span class="w"> </span><span class="k">TABLE</span><span class="w"> </span><span class="n">audit_log</span><span class="w"> </span><span class="p">(</span><span class="w">
</span></span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="w">    </span><span class="n">id</span><span class="w"> </span><span class="nb">SERIAL</span><span class="w"> </span><span class="k">PRIMARY</span><span class="w"> </span><span class="k">KEY</span><span class="p">,</span><span class="w">
</span></span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="w">    </span><span class="n">event</span><span class="w"> </span><span class="n">JSONB</span><span class="w">
</span></span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="w"></span><span class="p">);</span><span class="w">
</span></span></span><span class="line"><span class="ln">5</span><span class="cl"><span class="w"></span><span class="c1">-- 不調用 Citus function、預設留在 coordinator</span></span></span></code></pre></div><p>行為跟一般 PG table 一樣。用於 <em>不需 distribute</em> 的 table（如 admin metadata）。</p>
<h2 id="colocation跨-distributed-table-同-shard-對齊">Colocation：跨 distributed table 同 shard 對齊</h2>
<p>當兩個 distributed table 都用 <em>同 distribution column</em>（例如 <code>user_id</code>）+ 同 shard count、Citus 自動 colocate：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-sql" data-lang="sql"><span class="line"><span class="ln">1</span><span class="cl"><span class="k">SELECT</span><span class="w"> </span><span class="n">create_distributed_table</span><span class="p">(</span><span class="s1">&#39;orders&#39;</span><span class="p">,</span><span class="w"> </span><span class="s1">&#39;user_id&#39;</span><span class="p">);</span><span class="w">
</span></span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="w"></span><span class="k">SELECT</span><span class="w"> </span><span class="n">create_distributed_table</span><span class="p">(</span><span class="s1">&#39;user_addresses&#39;</span><span class="p">,</span><span class="w"> </span><span class="s1">&#39;user_id&#39;</span><span class="p">,</span><span class="w"> </span><span class="n">colocate_with</span><span class="w"> </span><span class="o">=&gt;</span><span class="w"> </span><span class="s1">&#39;orders&#39;</span><span class="p">);</span></span></span></code></pre></div><p>Colocate 後：</p>
<ul>
<li><code>user_id = 100</code> 的 orders 跟 user_addresses 在 <em>同一 worker shard</em></li>
<li>JOIN 不跨 worker、效率高</li>
<li>可用 PG 原生 FK constraint（cross-table 但同 shard）</li>
</ul>
<p>Colocate 是 Citus 設計的核心 <em>跨 table 一致性</em> 機制。沒 colocate 的 cross-table query 變 cross-worker、效率大降。</p>
<h2 id="配置-step-by-steplocal-cluster">配置 step-by-step（local cluster）</h2>
<p>Production 用 Citus Cloud（Microsoft 託管）或 Azure Cosmos DB for PostgreSQL（同 engine）。Self-hosted：</p>
<h3 id="step-1coordinator--worker-都裝-pg--citus">Step 1：Coordinator + worker 都裝 PG + Citus</h3>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl"><span class="c1"># 在每個 node（coordinator + 2 worker）</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl">apt install postgresql-14
</span></span><span class="line"><span class="ln">3</span><span class="cl">apt install postgresql-14-citus-12.0
</span></span><span class="line"><span class="ln">4</span><span class="cl">
</span></span><span class="line"><span class="ln">5</span><span class="cl"><span class="c1"># postgresql.conf</span>
</span></span><span class="line"><span class="ln">6</span><span class="cl"><span class="nv">shared_preload_libraries</span> <span class="o">=</span> <span class="s1">&#39;citus&#39;</span>
</span></span><span class="line"><span class="ln">7</span><span class="cl">
</span></span><span class="line"><span class="ln">8</span><span class="cl">systemctl restart postgresql</span></span></code></pre></div>




<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-sql" data-lang="sql"><span class="line"><span class="ln">1</span><span class="cl"><span class="c1">-- 在每個 node 跑
</span></span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="c1"></span><span class="k">CREATE</span><span class="w"> </span><span class="n">EXTENSION</span><span class="w"> </span><span class="n">citus</span><span class="p">;</span></span></span></code></pre></div><h3 id="step-2coordinator-註冊-worker">Step 2：Coordinator 註冊 worker</h3>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-sql" data-lang="sql"><span class="line"><span class="ln">1</span><span class="cl"><span class="c1">-- 在 coordinator 跑
</span></span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="c1"></span><span class="k">SELECT</span><span class="w"> </span><span class="n">citus_add_node</span><span class="p">(</span><span class="s1">&#39;worker1.example.com&#39;</span><span class="p">,</span><span class="w"> </span><span class="mi">5432</span><span class="p">);</span><span class="w">
</span></span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="w"></span><span class="k">SELECT</span><span class="w"> </span><span class="n">citus_add_node</span><span class="p">(</span><span class="s1">&#39;worker2.example.com&#39;</span><span class="p">,</span><span class="w"> </span><span class="mi">5432</span><span class="p">);</span><span class="w">
</span></span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="ln">5</span><span class="cl"><span class="w"></span><span class="c1">-- 確認
</span></span></span><span class="line"><span class="ln">6</span><span class="cl"><span class="c1"></span><span class="k">SELECT</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">citus_get_active_worker_nodes</span><span class="p">();</span></span></span></code></pre></div><h3 id="step-3建-distributed-table">Step 3：建 distributed table</h3>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-sql" data-lang="sql"><span class="line"><span class="ln">1</span><span class="cl"><span class="k">CREATE</span><span class="w"> </span><span class="k">TABLE</span><span class="w"> </span><span class="n">orders</span><span class="w"> </span><span class="p">(</span><span class="w">
</span></span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="w">    </span><span class="n">id</span><span class="w"> </span><span class="n">BIGSERIAL</span><span class="p">,</span><span class="w">
</span></span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="w">    </span><span class="n">user_id</span><span class="w"> </span><span class="nb">BIGINT</span><span class="w"> </span><span class="k">NOT</span><span class="w"> </span><span class="k">NULL</span><span class="p">,</span><span class="w">
</span></span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="w">    </span><span class="n">amount</span><span class="w"> </span><span class="nb">DECIMAL</span><span class="p">(</span><span class="mi">10</span><span class="p">,</span><span class="mi">2</span><span class="p">),</span><span class="w">
</span></span></span><span class="line"><span class="ln">5</span><span class="cl"><span class="w">    </span><span class="n">created_at</span><span class="w"> </span><span class="k">TIMESTAMP</span><span class="p">,</span><span class="w">
</span></span></span><span class="line"><span class="ln">6</span><span class="cl"><span class="w">    </span><span class="k">PRIMARY</span><span class="w"> </span><span class="k">KEY</span><span class="w"> </span><span class="p">(</span><span class="n">user_id</span><span class="p">,</span><span class="w"> </span><span class="n">id</span><span class="p">)</span><span class="w">
</span></span></span><span class="line"><span class="ln">7</span><span class="cl"><span class="w"></span><span class="p">);</span><span class="w">
</span></span></span><span class="line"><span class="ln">8</span><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="ln">9</span><span class="cl"><span class="w"></span><span class="k">SELECT</span><span class="w"> </span><span class="n">create_distributed_table</span><span class="p">(</span><span class="s1">&#39;orders&#39;</span><span class="p">,</span><span class="w"> </span><span class="s1">&#39;user_id&#39;</span><span class="p">);</span></span></span></code></pre></div><p>Citus 自動把 <code>orders</code> 拆成 32 個 shard（<code>orders_102008</code> 等）、分配到 worker。</p>
<h3 id="step-4application-連-coordinator">Step 4：Application 連 coordinator</h3>
<p>Application connection string 連 coordinator IP / port（不必知道 worker 存在）。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-sql" data-lang="sql"><span class="line"><span class="ln">1</span><span class="cl"><span class="c1">-- 從 application 跑 query、Citus 透明 route
</span></span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="c1"></span><span class="k">INSERT</span><span class="w"> </span><span class="k">INTO</span><span class="w"> </span><span class="n">orders</span><span class="w"> </span><span class="p">(</span><span class="n">user_id</span><span class="p">,</span><span class="w"> </span><span class="n">amount</span><span class="p">)</span><span class="w"> </span><span class="k">VALUES</span><span class="w"> </span><span class="p">(</span><span class="mi">12345</span><span class="p">,</span><span class="w"> </span><span class="mi">50</span><span class="p">);</span><span class="w">
</span></span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="w"></span><span class="c1">-- → Citus 看 user_id=12345 hash 屬 shard 17、route 給對應 worker
</span></span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="c1"></span><span class="w">
</span></span></span><span class="line"><span class="ln">5</span><span class="cl"><span class="w"></span><span class="k">SELECT</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">orders</span><span class="w"> </span><span class="k">WHERE</span><span class="w"> </span><span class="n">user_id</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">12345</span><span class="p">;</span><span class="w">
</span></span></span><span class="line"><span class="ln">6</span><span class="cl"><span class="w"></span><span class="c1">-- → Single-shard query、極快
</span></span></span><span class="line"><span class="ln">7</span><span class="cl"><span class="c1"></span><span class="w">
</span></span></span><span class="line"><span class="ln">8</span><span class="cl"><span class="w"></span><span class="k">SELECT</span><span class="w"> </span><span class="k">count</span><span class="p">(</span><span class="o">*</span><span class="p">)</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">orders</span><span class="p">;</span><span class="w">
</span></span></span><span class="line"><span class="ln">9</span><span class="cl"><span class="w"></span><span class="c1">-- → Cross-shard aggregation、Citus 並行跑、合併結果</span></span></span></code></pre></div><h2 id="5-個-production-踩雷">5 個 Production 踩雷</h2>
<h3 id="1-distribution-column-選錯--cross-shard-query-變主流">1. Distribution column 選錯 — Cross-shard query 變主流</h3>
<p>選 <code>created_at</code> 或 <code>id</code>（auto increment）作 distribution column、看起來均勻、實際 <em>application query 多以 user_id 為主</em>、變成 <em>每個 query 都 cross-shard</em>、performance 雪崩。</p>
<p>修法：</p>
<ul>
<li><em>Distribution column 選 application 最常 filter / join 的 column</em>（通常是 <code>tenant_id</code> / <code>user_id</code>）</li>
<li>Audit application top query、確認 distribution column 對齊 query pattern</li>
<li>改 distribution column 要 <em>rewrite 所有 shard</em>、像 resharding、大工程</li>
</ul>
<h3 id="2-cross-shard-transaction-限制">2. Cross-shard transaction 限制</h3>
<p>跨多 shard 的 transaction（如：UPDATE 兩個 user_id 不同的 row）Citus 用 <em>2PC</em>（two-phase commit）但有限制：</p>
<ul>
<li>Multi-statement transaction 跨 shard 需明確開 <code>SET citus.multi_shard_modify_mode = 'sequential'</code></li>
<li>部分 isolation level 不保證 serializable across shards</li>
<li>DDL 跨 shard 是 sequential</li>
</ul>
<p>修法：</p>
<ul>
<li>Schema design 避免 cross-shard transaction（同 colocation group 內 transaction 沒問題）</li>
<li>必要 cross-shard 場景明確設 multi-shard mode</li>
<li>對 <em>strict cross-shard consistency</em>、考慮 distributed SQL（CockroachDB / Aurora DSQL）</li>
</ul>
<h3 id="3-reference-table-過大--寫入廣播-cost-爆">3. Reference table 過大 — 寫入廣播 cost 爆</h3>
<p>Reference table 在每 worker 都有 copy、寫入 <em>廣播給所有 worker</em>。Reference table 100K row + 高頻寫入 → 寫一次寫 N worker、cost N x。</p>
<p>修法：</p>
<ul>
<li>Reference table 限 <em>小 + 寫入頻率低</em> 的 lookup data</li>
<li>超大表不該是 reference table、考慮 distributed</li>
<li>監控 reference table 寫入 rate、超 threshold 重新評估</li>
</ul>
<h3 id="4-colocate-沒對齊--隱性-cross-shard-join">4. Colocate 沒對齊 — 隱性 cross-shard JOIN</h3>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-sql" data-lang="sql"><span class="line"><span class="ln">1</span><span class="cl"><span class="c1">-- 看似可以、實際 cross-shard 慢
</span></span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="c1"></span><span class="k">SELECT</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">orders</span><span class="w"> </span><span class="n">o</span><span class="w"> </span><span class="k">JOIN</span><span class="w"> </span><span class="n">user_addresses</span><span class="w"> </span><span class="n">ua</span><span class="w"> </span><span class="k">ON</span><span class="w"> </span><span class="n">o</span><span class="p">.</span><span class="n">user_id</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">ua</span><span class="p">.</span><span class="n">user_id</span><span class="p">;</span></span></span></code></pre></div><p>若 <code>user_addresses</code> 沒 <code>colocate_with =&gt; 'orders'</code>、兩表 shard 分配獨立、JOIN 跨 worker。</p>
<p>修法：</p>
<ul>
<li>建相關 table 時 <code>colocate_with</code> 對齊</li>
<li>用 <code>SELECT * FROM citus_tables</code> 看 colocation_id、確認對齊</li>
<li>跨非 colocate table 的 JOIN 用 <em>materialized view</em> 或 application 層拆 query 避開</li>
</ul>
<h3 id="5-worker-failover--coordinator-必須知道">5. Worker failover — Coordinator 必須知道</h3>
<p>Worker 故障、Citus 預設 <em>coordinator 看到 query 失敗、不自動 failover</em>。</p>
<p>修法（Citus 11+）：</p>
<ul>
<li>用 <em>shard replication</em>（<code>citus.shard_replication_factor = 2</code>）— 每 shard 在 2 個 worker 有 copy</li>
<li>配 PG streaming replication 在 worker 層、外加 Patroni 管 failover</li>
<li>Coordinator 失敗 → 整個 cluster 失能、coordinator 也要 HA（Patroni）</li>
</ul>
<p>跟 Vitess 對比 Citus 的 HA story 較弱、production 必須認真規劃。</p>
<h2 id="何時用-citus">何時用 Citus</h2>
<table>
  <thead>
      <tr>
          <th>條件</th>
          <th>建議</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>Multi-tenant SaaS、tenant_id 為自然 distribution</td>
          <td>是</td>
      </tr>
      <tr>
          <td>寫吞吐 &gt; 50K WPS、單 PG 撐不住</td>
          <td>是</td>
      </tr>
      <tr>
          <td>需要保留 PG SQL + extension（pgvector / TimescaleDB）</td>
          <td>是</td>
      </tr>
      <tr>
          <td>應用 query pattern 80% 都用同一 distribution column</td>
          <td>是</td>
      </tr>
      <tr>
          <td>應用大量 ad-hoc cross-tenant aggregation</td>
          <td>否（cross-shard 慢）</td>
      </tr>
      <tr>
          <td>強 cross-shard consistency 需求</td>
          <td>否（用 CockroachDB）</td>
      </tr>
      <tr>
          <td>想 zero-ops managed</td>
          <td>Azure Cosmos DB for PostgreSQL（同 engine）</td>
      </tr>
  </tbody>
</table>
<h2 id="容量規劃">容量規劃</h2>
<ul>
<li>Coordinator: 中等 CPU + RAM、metadata 不大、不存 data</li>
<li>Worker: per-worker spec 同 single PG production</li>
<li>Shard count: 預設 32、實務常設 worker count × 4-8</li>
<li>Replication factor: production 至少 2</li>
</ul>
<h2 id="跟其他模組整合">跟其他模組整合</h2>
<h3 id="跟-replication-topology">跟 Replication topology</h3>
<p>Coordinator + worker 各跑 PG streaming replication、Citus 不取代 PG replication。Worker failover 用 Patroni / streaming replication。詳見 <a href="/blog/backend/01-database/vendors/postgresql/replication-topology/" data-link-title="PostgreSQL Replication Topology：async / sync / quorum 三模式跟 LSN &#43; replication slot 的三軸組合" data-link-desc="PostgreSQL streaming replication 不是「sync 或 async」、是 *durability / latency / consistency* 三軸組合 &#43; LSN-based 進度追蹤 &#43; replication slot 治理。本文走 3 軸取捨模型、async / sync / quorum-based sync 行為對比、LSN &#43; replication slot 機制、配置 step-by-step、5 production 踩雷（standby lag 暴衝 / sync standby 退回 async / orphan replication slot / cascading replication 雪崩 / failover 後 timeline 分歧）、跟 Patroni HA &#43; logical replication 整合">Replication Topology</a>。</p>
<h3 id="跟-pg-extensions">跟 PG Extensions</h3>
<p>Citus 跟其他 PG extension 多數兼容（pgvector / TimescaleDB / pg_stat_statements）— 它維持 <em>extension</em> 形態，保留 PostgreSQL 生態接點。詳見 <em>PG Extension Ecosystem</em> 篇（待寫）。</p>
<h3 id="跟-mysql-vitess">跟 MySQL Vitess</h3>
<table>
  <thead>
      <tr>
          <th>維度</th>
          <th>Citus</th>
          <th>Vitess</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>部署模型</td>
          <td>PG extension</td>
          <td>獨立 proxy + tablet</td>
      </tr>
      <tr>
          <td>主要場景</td>
          <td>Multi-tenant SaaS</td>
          <td>超大規模分片</td>
      </tr>
      <tr>
          <td>Cross-shard JOIN</td>
          <td>colocate 對齊 + reference table</td>
          <td>VTGate 自動 split + aggregate</td>
      </tr>
      <tr>
          <td>FK</td>
          <td>同 colocation 內可用</td>
          <td>Vitess 18+ 支援、cross-shard 限制</td>
      </tr>
      <tr>
          <td>HA</td>
          <td>依賴 Patroni + replication factor</td>
          <td>VTOrc + replication</td>
      </tr>
      <tr>
          <td>學習曲線</td>
          <td>中（PG ops 經驗夠）</td>
          <td>高（4 component）</td>
      </tr>
  </tbody>
</table>
<p>Citus 對 <em>PG-native</em> 場景更平順、Vitess 對 <em>MySQL-native</em> 場景更平順、不直接競爭。詳見 <a href="/blog/backend/01-database/vendors/mysql/vitess-sharding/" data-link-title="MySQL Vitess Sharding：VTGate / VTTablet / VReplication / VSchema 四件套協作" data-link-desc="Vitess 不只是 MySQL sharding proxy、是 4 個 component 協作的完整 sharding 系統 — VTGate（query routing layer）、VTTablet（per-MySQL agent）、VReplication（跨 shard 資料移動）、VSchema（sharding metadata）。本文走 4 件套各自責任、keyspace / shard / tablet 架構、shard key 設計（Vindex）、配置 step-by-step、5 production 踩雷（cross-shard transaction / VStream lag / Vindex 不均勻 / resharding 切流 / VReplication 卡住）、跟自管 sharding 跟 PlanetScale 的對比">MySQL Vitess Sharding</a>。</p>
<h2 id="相關連結">相關連結</h2>
<ul>
<li><a href="/blog/backend/01-database/vendors/postgresql/" data-link-title="PostgreSQL" data-link-desc="多用途 OLTP 主流關聯式資料庫、MVCC、豐富 SQL 特性、是 Aurora / Cosmos DB / Spanner / CockroachDB / Aurora DSQL 的相容目標">PostgreSQL vendor overview</a></li>
<li><a href="/blog/backend/01-database/vendors/postgresql/replication-topology/" data-link-title="PostgreSQL Replication Topology：async / sync / quorum 三模式跟 LSN &#43; replication slot 的三軸組合" data-link-desc="PostgreSQL streaming replication 不是「sync 或 async」、是 *durability / latency / consistency* 三軸組合 &#43; LSN-based 進度追蹤 &#43; replication slot 治理。本文走 3 軸取捨模型、async / sync / quorum-based sync 行為對比、LSN &#43; replication slot 機制、配置 step-by-step、5 production 踩雷（standby lag 暴衝 / sync standby 退回 async / orphan replication slot / cascading replication 雪崩 / failover 後 timeline 分歧）、跟 Patroni HA &#43; logical replication 整合">PG Replication Topology</a>（per-worker replication）</li>
<li><a href="/blog/backend/01-database/vendors/postgresql/mvcc-lock-model/" data-link-title="PostgreSQL MVCC &#43; Lock Model：為什麼 PG 比 MySQL 少 deadlock、但 vacuum 是別的代價" data-link-desc="PG 用 *MVCC-heavy &#43; 少 explicit lock* 的並行控制、跟 MySQL InnoDB 的 *lock-based*（record / gap / next-key）相反。本文走 MVCC 機制（tuple version &#43; xmin/xmax &#43; visibility）、PG 4 種 lock（row-level / table-level / advisory / predicate）、預測 SERIALIZABLE 行為、5 production 踩雷（idle transaction 卡 vacuum / SELECT FOR UPDATE 跨 transaction / advisory lock 沒釋放 / bloat 不是 vacuum 問題 / predicate lock 在 SSI 下 rollback）、跟 MySQL lock-contention sibling 對比">PG MVCC + Lock Model</a>（cross-shard transaction lock 行為）</li>
<li><a href="/blog/backend/01-database/global-distributed-oltp/" data-link-title="1.11 全球分散式 OLTP" data-link-desc="Spanner / Aurora DSQL / Cosmos DB multi-region write / CockroachDB / TiDB 的全球一致性取捨">1.11 全球分散式 OLTP</a>（Citus vs CockroachDB vs Spanner）</li>
<li><a href="/blog/backend/01-database/vendors/mysql/vitess-sharding/" data-link-title="MySQL Vitess Sharding：VTGate / VTTablet / VReplication / VSchema 四件套協作" data-link-desc="Vitess 不只是 MySQL sharding proxy、是 4 個 component 協作的完整 sharding 系統 — VTGate（query routing layer）、VTTablet（per-MySQL agent）、VReplication（跨 shard 資料移動）、VSchema（sharding metadata）。本文走 4 件套各自責任、keyspace / shard / tablet 架構、shard key 設計（Vindex）、配置 step-by-step、5 production 踩雷（cross-shard transaction / VStream lag / Vindex 不均勻 / resharding 切流 / VReplication 卡住）、跟自管 sharding 跟 PlanetScale 的對比">MySQL Vitess Sharding</a>（sibling、不同實作）</li>
<li><a href="/blog/backend/01-database/vendors/cosmosdb/" data-link-title="Azure Cosmos DB" data-link-desc="全球分散式 multi-model DB、5 個 consistency levels、Microsoft 自家 dogfood 證據">Cosmos DB vendor</a>（Azure Cosmos DB for PostgreSQL = managed Citus）</li>
<li>官方：<a href="https://docs.citusdata.com/">Citus Documentation</a> / <a href="https://github.com/citusdata/citus">Citus on GitHub</a></li>
</ul>
]]></content:encoded></item></channel></rss>