<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/"><channel><title>Multi-Axis on Tarragon</title><link>https://tarrragon.github.io/blog/tags/multi-axis/</link><description>Recent content in Multi-Axis on Tarragon</description><generator>Hugo -- gohugo.io</generator><language>zh-TW</language><copyright>Tarragon (CC BY 4.0)</copyright><lastBuildDate>Tue, 19 May 2026 00:00:00 +0000</lastBuildDate><atom:link href="https://tarrragon.github.io/blog/tags/multi-axis/index.xml" rel="self" type="application/rss+xml"/><item><title>PostgreSQL → CockroachDB：三維皆 High 的多重歸類 migration</title><link>https://tarrragon.github.io/blog/backend/01-database/vendors/postgresql/migrate-to-cockroachdb/</link><pubDate>Tue, 19 May 2026 00:00:00 +0000</pubDate><guid>https://tarrragon.github.io/blog/backend/01-database/vendors/postgresql/migrate-to-cockroachdb/</guid><description>&lt;blockquote>
&lt;p>本文是跨 vendor &lt;a href="https://tarrragon.github.io/blog/backend/knowledge-cards/migration/" data-link-title="Migration" data-link-desc="說明系統如何把資料、流量或結構從舊狀態移到新狀態">migration&lt;/a> playbook、cross-link 到 &lt;a href="https://tarrragon.github.io/blog/backend/01-database/vendors/postgresql/" data-link-title="PostgreSQL" data-link-desc="多用途 OLTP 主流關聯式資料庫、MVCC、豐富 SQL 特性、是 Aurora / Cosmos DB / Spanner / CockroachDB / Aurora DSQL 的相容目標">PostgreSQL&lt;/a> 跟 &lt;a href="https://tarrragon.github.io/blog/backend/01-database/vendors/cockroachdb/" data-link-title="CockroachDB" data-link-desc="分散式 SQL、PostgreSQL 相容、跨區強一致、Spanner 的開源 / 跨雲替代">CockroachDB&lt;/a>。本文是 &lt;a href="https://tarrragon.github.io/blog/report/content-structure-by-max-diff-dimension/" data-link-title="Process content 結構由最大差異維度決定、不是 universal phased" data-link-desc="跨 X process content（migration / upgrade / rollout / playbook）的結構由 source / target 之間 *差異維度組合* 決定、不存在 universal phased 模板；6 種 migration / process type 實證（schema 差 / drop-in / operational / multi-tool / paradigm / topology re-layout）跑出 6 種不同結構；寫作前必須做 *6 維 diff dimension audit* 才能決定結構、跳過會套錯模板">#127 多重歸類跟 tie-breaking&lt;/a> 規則的實證 — 三維皆 High 配對的處理方式不是「選 type A 或 type C 或 type E」、是 &lt;em>主導維度走 Type E、其他高維度獨立加段&lt;/em>。每階段切換用 &lt;a href="https://tarrragon.github.io/blog/backend/knowledge-cards/migration-gate/" data-link-title="Migration Gate" data-link-desc="說明遷移流程何時可以進入下一階段或正式切換">migration gate&lt;/a> 把關。&lt;/p>&lt;/blockquote>
&lt;h2 id="三維皆-high決策矩陣">三維皆 High：決策矩陣&lt;/h2>
&lt;p>跑 &lt;a href="https://tarrragon.github.io/blog/report/content-structure-by-max-diff-dimension/" data-link-title="Process content 結構由最大差異維度決定、不是 universal phased" data-link-desc="跨 X process content（migration / upgrade / rollout / playbook）的結構由 source / target 之間 *差異維度組合* 決定、不存在 universal phased 模板；6 種 migration / process type 實證（schema 差 / drop-in / operational / multi-tool / paradigm / topology re-layout）跑出 6 種不同結構；寫作前必須做 *6 維 diff dimension audit* 才能決定結構、跳過會套錯模板">diff dimension audit&lt;/a> 對 PostgreSQL → CockroachDB：&lt;/p>
&lt;table>
 &lt;thead>
 &lt;tr>
 &lt;th>維度&lt;/th>
 &lt;th>評估&lt;/th>
 &lt;th>等級&lt;/th>
 &lt;/tr>
 &lt;/thead>
 &lt;tbody>
 &lt;tr>
 &lt;td>Schema / API&lt;/td>
 &lt;td>PostgreSQL wire protocol 兼容、但 SQL feature set 部分缺（CTE recursive 部分 / window function 部分 / extension 完全缺）&lt;/td>
 &lt;td>&lt;strong>High&lt;/strong>&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>Operational model&lt;/td>
 &lt;td>Single-node + Patroni → distributed Raft + 自動 rebalance；HA / backup / topology 全換&lt;/td>
 &lt;td>&lt;strong>High&lt;/strong>&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>Abstraction / paradigm&lt;/td>
 &lt;td>Single-node MVCC + transaction → distributed Serializable Snapshot Isolation (SSI)&lt;/td>
 &lt;td>&lt;strong>High&lt;/strong>&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>Number of components&lt;/td>
 &lt;td>同 1 個 DB cluster&lt;/td>
 &lt;td>Low&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>Application change&lt;/td>
 &lt;td>Transaction retry pattern 必須改、ORM 可能需 patch&lt;/td>
 &lt;td>Medium&lt;/td>
 &lt;/tr>
 &lt;/tbody>
&lt;/table>
&lt;p>3 維 High + 1 維 Medium。按 &lt;a href="https://tarrragon.github.io/blog/posts/migration-playbook-%E6%96%B9%E6%B3%95%E8%AB%96%E7%9A%84%E6%BC%94%E5%8C%96%E7%B4%80%E9%8C%84stage-0-variant-%E8%A6%8F%E5%8A%83%E6%8A%8A-collapse-%E7%8E%87%E5%BE%9E-60-%E9%99%8D%E5%88%B0-0/" data-link-title="Migration Playbook 方法論的演化紀錄：Stage 0 variant 規劃把 collapse 率從 60% 降到 0%" data-link-desc="跨 vendor migration playbook 需要獨立寫作方法論的依據，以及這套方法論從三輪 batch dogfood 中演化出來的驗證證據。">methodology audit Step 5&lt;/a> 的多重歸類處理規則：&lt;/p></description><content:encoded><![CDATA[<blockquote>
<p>本文是跨 vendor <a href="/blog/backend/knowledge-cards/migration/" data-link-title="Migration" data-link-desc="說明系統如何把資料、流量或結構從舊狀態移到新狀態">migration</a> playbook、cross-link 到 <a href="/blog/backend/01-database/vendors/postgresql/" data-link-title="PostgreSQL" data-link-desc="多用途 OLTP 主流關聯式資料庫、MVCC、豐富 SQL 特性、是 Aurora / Cosmos DB / Spanner / CockroachDB / Aurora DSQL 的相容目標">PostgreSQL</a> 跟 <a href="/blog/backend/01-database/vendors/cockroachdb/" data-link-title="CockroachDB" data-link-desc="分散式 SQL、PostgreSQL 相容、跨區強一致、Spanner 的開源 / 跨雲替代">CockroachDB</a>。本文是 <a href="/blog/report/content-structure-by-max-diff-dimension/" data-link-title="Process content 結構由最大差異維度決定、不是 universal phased" data-link-desc="跨 X process content（migration / upgrade / rollout / playbook）的結構由 source / target 之間 *差異維度組合* 決定、不存在 universal phased 模板；6 種 migration / process type 實證（schema 差 / drop-in / operational / multi-tool / paradigm / topology re-layout）跑出 6 種不同結構；寫作前必須做 *6 維 diff dimension audit* 才能決定結構、跳過會套錯模板">#127 多重歸類跟 tie-breaking</a> 規則的實證 — 三維皆 High 配對的處理方式不是「選 type A 或 type C 或 type E」、是 <em>主導維度走 Type E、其他高維度獨立加段</em>。每階段切換用 <a href="/blog/backend/knowledge-cards/migration-gate/" data-link-title="Migration Gate" data-link-desc="說明遷移流程何時可以進入下一階段或正式切換">migration gate</a> 把關。</p></blockquote>
<h2 id="三維皆-high決策矩陣">三維皆 High：決策矩陣</h2>
<p>跑 <a href="/blog/report/content-structure-by-max-diff-dimension/" data-link-title="Process content 結構由最大差異維度決定、不是 universal phased" data-link-desc="跨 X process content（migration / upgrade / rollout / playbook）的結構由 source / target 之間 *差異維度組合* 決定、不存在 universal phased 模板；6 種 migration / process type 實證（schema 差 / drop-in / operational / multi-tool / paradigm / topology re-layout）跑出 6 種不同結構；寫作前必須做 *6 維 diff dimension audit* 才能決定結構、跳過會套錯模板">diff dimension audit</a> 對 PostgreSQL → CockroachDB：</p>
<table>
  <thead>
      <tr>
          <th>維度</th>
          <th>評估</th>
          <th>等級</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>Schema / API</td>
          <td>PostgreSQL wire protocol 兼容、但 SQL feature set 部分缺（CTE recursive 部分 / window function 部分 / extension 完全缺）</td>
          <td><strong>High</strong></td>
      </tr>
      <tr>
          <td>Operational model</td>
          <td>Single-node + Patroni → distributed Raft + 自動 rebalance；HA / backup / topology 全換</td>
          <td><strong>High</strong></td>
      </tr>
      <tr>
          <td>Abstraction / paradigm</td>
          <td>Single-node MVCC + transaction → distributed Serializable Snapshot Isolation (SSI)</td>
          <td><strong>High</strong></td>
      </tr>
      <tr>
          <td>Number of components</td>
          <td>同 1 個 DB cluster</td>
          <td>Low</td>
      </tr>
      <tr>
          <td>Application change</td>
          <td>Transaction retry pattern 必須改、ORM 可能需 patch</td>
          <td>Medium</td>
      </tr>
  </tbody>
</table>
<p>3 維 High + 1 維 Medium。按 <a href="/blog/posts/migration-playbook-%E6%96%B9%E6%B3%95%E8%AB%96%E7%9A%84%E6%BC%94%E5%8C%96%E7%B4%80%E9%8C%84stage-0-variant-%E8%A6%8F%E5%8A%83%E6%8A%8A-collapse-%E7%8E%87%E5%BE%9E-60-%E9%99%8D%E5%88%B0-0/" data-link-title="Migration Playbook 方法論的演化紀錄：Stage 0 variant 規劃把 collapse 率從 60% 降到 0%" data-link-desc="跨 vendor migration playbook 需要獨立寫作方法論的依據，以及這套方法論從三輪 batch dogfood 中演化出來的驗證證據。">methodology audit Step 5</a> 的多重歸類處理規則：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="ln">1</span><span class="cl">主導維度判讀 (優先序): Schema &gt; Paradigm &gt; Operational &gt; Components
</span></span><span class="line"><span class="ln">2</span><span class="cl">
</span></span><span class="line"><span class="ln">3</span><span class="cl">實際應用: Schema High + Paradigm High + Operational High
</span></span><span class="line"><span class="ln">4</span><span class="cl">- Schema 是 High、但 CRDB 提供 PostgreSQL wire protocol 兼容
</span></span><span class="line"><span class="ln">5</span><span class="cl">- Paradigm 是 High、是 *單機 → 分散式* 的根本轉變、讀者最關心
</span></span><span class="line"><span class="ln">6</span><span class="cl">- Operational 是 High、但很大程度是 Paradigm 的 downstream
</span></span><span class="line"><span class="ln">7</span><span class="cl">
</span></span><span class="line"><span class="ln">8</span><span class="cl">→ 主結構選 Paradigm（Type E）、Schema + Operational 抽獨立段補充</span></span></code></pre></div><p>不強迫單一 type 標籤 — 本文是 <em>Type E 為主 + Type A / C 高維度增補</em> 的 multi-axis 形態。</p>
<h2 id="結構-differentiatortype-e-主結構--多軸增補段">結構 differentiator：Type E 主結構 + 多軸增補段</h2>
<p>跟前批 5 個 migration playbook 對照：</p>
<table>
  <thead>
      <tr>
          <th>結構元素</th>
          <th>Type A Splunk → Elastic</th>
          <th>Type B Redis → DragonflyDB</th>
          <th>Type C PostgreSQL → Aurora</th>
          <th>Type D Datadog → Grafana</th>
          <th>Type E Kafka ↔ NATS</th>
          <th><strong>本文（三維 High）</strong></th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>Phased translation</td>
          <td>yes</td>
          <td>-</td>
          <td>-</td>
          <td>-</td>
          <td>-</td>
          <td>partial</td>
      </tr>
      <tr>
          <td>Compatibility audit</td>
          <td>-</td>
          <td>yes</td>
          <td>-</td>
          <td>-</td>
          <td>-</td>
          <td>yes</td>
      </tr>
      <tr>
          <td>Operational redesign 對位</td>
          <td>-</td>
          <td>-</td>
          <td>yes</td>
          <td>-</td>
          <td>-</td>
          <td><strong>yes（獨立段）</strong></td>
      </tr>
      <tr>
          <td>Schema gap 對位</td>
          <td>-</td>
          <td>-</td>
          <td>-</td>
          <td>-</td>
          <td>-</td>
          <td><strong>yes（獨立段）</strong></td>
      </tr>
      <tr>
          <td>Parallel streams</td>
          <td>-</td>
          <td>-</td>
          <td>-</td>
          <td>yes</td>
          <td>-</td>
          <td>-</td>
      </tr>
      <tr>
          <td>Paradigm contrast</td>
          <td>-</td>
          <td>-</td>
          <td>-</td>
          <td>-</td>
          <td>yes</td>
          <td>yes</td>
      </tr>
      <tr>
          <td>Application 重設計</td>
          <td>-</td>
          <td>-</td>
          <td>-</td>
          <td>-</td>
          <td>yes</td>
          <td>yes</td>
      </tr>
      <tr>
          <td>混合架構 long-term</td>
          <td>-</td>
          <td>-</td>
          <td>-</td>
          <td>-</td>
          <td>yes</td>
          <td>partial（部分 workload）</td>
      </tr>
  </tbody>
</table>
<p>本文是「Type E 為主 + Type A schema gap 段 + Type C operational redesign 段」混合形態、9-10 章節、260-300 行。</p>
<h2 id="維度-1paradigm-shift主導">維度 1：Paradigm shift（主導）</h2>
<p>CRDB 是 <em>distributed SQL DB</em>、不是「PostgreSQL 多節點版」。核心差異：</p>
<table>
  <thead>
      <tr>
          <th>概念</th>
          <th>PostgreSQL</th>
          <th>CockroachDB</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>Transaction isolation</td>
          <td>MVCC、Read Committed default</td>
          <td>Serializable Snapshot Isolation (SSI)、強一致</td>
      </tr>
      <tr>
          <td>Transaction conflict</td>
          <td>First writer wins</td>
          <td>Retry-on-conflict、application 必須處理 <code>40001</code> retry code</td>
      </tr>
      <tr>
          <td>Replication</td>
          <td>Streaming replication + standby</td>
          <td>Raft consensus、每筆寫 quorum + 自動 rebalance</td>
      </tr>
      <tr>
          <td>Partition</td>
          <td>Declarative partitioning（手動）</td>
          <td>Automatic range-based + locality-aware</td>
      </tr>
      <tr>
          <td>Latency p99</td>
          <td>1-10ms（單 region）</td>
          <td>5-50ms（cross-AZ Raft quorum）</td>
      </tr>
      <tr>
          <td>Throughput limit</td>
          <td>單 primary 上限 ~10-50K TPS</td>
          <td>Linear scale by adding node、~5K TPS / node</td>
      </tr>
  </tbody>
</table>
<p>關鍵 paradigm 改變：<em>transaction 是 retry-able 操作、不是 atomic guaranteed</em>。所有 transaction code 需要包 retry loop（CRDB 提供 <code>cockroach_restart</code> savepoint）。</p>
<h2 id="維度-2schema-gappostgresql-features-crdb-不支援">維度 2：Schema gap（PostgreSQL features CRDB 不支援）</h2>
<p>CRDB 號稱 PostgreSQL-compatible、但 <em>covergence rate 80-90%</em>；常見 gap：</p>
<table>
  <thead>
      <tr>
          <th>PostgreSQL feature</th>
          <th>CRDB 狀態</th>
          <th>影響</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>Stored procedure / function (PL/pgSQL)</td>
          <td>Limited（CRDB 22.2+ 部分支援）</td>
          <td>Migration scope 內必須 audit + 改寫</td>
      </tr>
      <tr>
          <td>Common Table Expression (CTE) recursive</td>
          <td>Limited (depth + structure)</td>
          <td>複雜 CTE 可能跑不通、必須 query refactor</td>
      </tr>
      <tr>
          <td>Window function 全集</td>
          <td>Partial</td>
          <td>報表 query 需逐 case 驗證</td>
      </tr>
      <tr>
          <td>Extensions (pg_repack / pgaudit / TimescaleDB)</td>
          <td><strong>不支援</strong></td>
          <td>用 CRDB 自家 alternative 或自管 application 層</td>
      </tr>
      <tr>
          <td>Triggers</td>
          <td>Limited</td>
          <td>Audit / data integrity 邏輯遷到 application 層</td>
      </tr>
      <tr>
          <td>Custom types / domain</td>
          <td>Partial</td>
          <td>用 CHECK constraint 替代</td>
      </tr>
      <tr>
          <td>Geographic types (PostGIS)</td>
          <td>CRDB native geo support（語法不同）</td>
          <td>Spatial query 改寫</td>
      </tr>
      <tr>
          <td><code>SELECT FOR UPDATE</code> semantics</td>
          <td>對等但底層機制不同（distributed lock）</td>
          <td>注意 deadlock pattern 差異</td>
      </tr>
      <tr>
          <td>Advisory locks</td>
          <td><strong>不支援</strong></td>
          <td>Application 端用其他 distributed lock（Redis / Consul）</td>
      </tr>
  </tbody>
</table>
<p>Migration 必須 <em>先 audit 完整 SQL feature 使用</em>、列出 gap、評估解法或退役。</p>
<h2 id="維度-3operational-redesign">維度 3：Operational redesign</h2>
<p>CRDB operational model 完全不同：</p>
<table>
  <thead>
      <tr>
          <th>Operational concept</th>
          <th>PostgreSQL self-managed</th>
          <th>CRDB</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>Cluster bootstrap</td>
          <td>Patroni / Stolon + manual</td>
          <td><code>cockroach init</code> + 自動 Raft formation</td>
      </tr>
      <tr>
          <td>HA</td>
          <td>Patroni + DCS + watchdog</td>
          <td>內建 Raft、無 single primary</td>
      </tr>
      <tr>
          <td>Failover</td>
          <td>Patroni-managed、15-60s</td>
          <td>透明 Raft re-election、&lt; 5s</td>
      </tr>
      <tr>
          <td>Backup</td>
          <td>pgBackRest + WAL archive</td>
          <td><code>BACKUP TO</code> (incremental + full)</td>
      </tr>
      <tr>
          <td>Restore</td>
          <td><code>pgBackRest restore</code> + PITR</td>
          <td><code>RESTORE FROM</code></td>
      </tr>
      <tr>
          <td>Replication</td>
          <td>Streaming + logical</td>
          <td>Built-in、無 logical replication 對等概念</td>
      </tr>
      <tr>
          <td>Schema migration</td>
          <td><code>pg_dump</code> / Flyway / Liquibase</td>
          <td><code>cockroach sql</code> + online schema change（無 lock）</td>
      </tr>
      <tr>
          <td>Monitoring</td>
          <td>pg_stat_* views + Prometheus exporter</td>
          <td>CRDB admin UI + Prometheus（schema 不同）</td>
      </tr>
      <tr>
          <td>Sizing</td>
          <td>Vertical scale（單 node big spec）</td>
          <td>Horizontal scale（多 node 小 spec）</td>
      </tr>
  </tbody>
</table>
<p>SRE 心智模型完全重訓：<em>無 primary 概念 / 無 streaming lag 概念 / 無 standby promote 概念</em>。</p>
<h2 id="migration-流程混合形態">Migration 流程（混合形態）</h2>
<p>不是線性 phased、是 <em>phased + parallel + partial</em> 混合：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="ln"> 1</span><span class="cl">Phase 0: scope 判讀
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">  - 列 application、區分「適合 CRDB」vs「保留 PostgreSQL」
</span></span><span class="line"><span class="ln"> 3</span><span class="cl">  - SQL feature audit
</span></span><span class="line"><span class="ln"> 4</span><span class="cl">  - Application transaction pattern audit
</span></span><span class="line"><span class="ln"> 5</span><span class="cl">
</span></span><span class="line"><span class="ln"> 6</span><span class="cl">Phase 1: schema port + application 改寫
</span></span><span class="line"><span class="ln"> 7</span><span class="cl">  - DDL 轉成 CRDB syntax
</span></span><span class="line"><span class="ln"> 8</span><span class="cl">  - 不支援 extension 找 alternative
</span></span><span class="line"><span class="ln"> 9</span><span class="cl">  - Application transaction code 加 retry loop
</span></span><span class="line"><span class="ln">10</span><span class="cl">
</span></span><span class="line"><span class="ln">11</span><span class="cl">Phase 2: 雙寫期（部分 application 開始走 CRDB）
</span></span><span class="line"><span class="ln">12</span><span class="cl">  - 新 application 走 CRDB
</span></span><span class="line"><span class="ln">13</span><span class="cl">  - 舊 application 持續 PostgreSQL
</span></span><span class="line"><span class="ln">14</span><span class="cl">  - CDC bridge（Debezium → Kafka → CRDB consumer）
</span></span><span class="line"><span class="ln">15</span><span class="cl">
</span></span><span class="line"><span class="ln">16</span><span class="cl">Phase 3: cutover 適合的 application
</span></span><span class="line"><span class="ln">17</span><span class="cl">  - 每個 application 獨立 cutover
</span></span><span class="line"><span class="ln">18</span><span class="cl">  - 不是「全 DB 一次切」
</span></span><span class="line"><span class="ln">19</span><span class="cl">
</span></span><span class="line"><span class="ln">20</span><span class="cl">Phase 4: 長期混合架構
</span></span><span class="line"><span class="ln">21</span><span class="cl">  - 某些 workload 永遠保留 PostgreSQL（不適合分散式）
</span></span><span class="line"><span class="ln">22</span><span class="cl">  - CRDB 跑 distributed 適配 workload</span></span></code></pre></div><p>整體 3-6 個月、不收斂到全 CRDB。</p>
<h2 id="production-故障演練">Production 故障演練</h2>
<h3 id="case-1transaction-retry-沒處理application-大量-40001-error">Case 1：Transaction retry 沒處理、application 大量 <code>40001</code> error</h3>
<p><strong>徵兆</strong>：cutover 後 application 5-10% transaction 報 <code>restart transaction: TransactionRetryWithProtoRefreshError</code>、業務 fail。</p>
<p><strong>根因</strong>：PostgreSQL Read Committed 不要求 application 處理 conflict、CRDB Serializable Isolation 必須 <em>retry-on-conflict</em>；application code 沒 retry loop。</p>
<p><strong>修法</strong>：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-go" data-lang="go"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="c1">// CRDB transaction with retry</span>
</span></span><span class="line"><span class="ln"> 2</span><span class="cl"><span class="k">for</span> <span class="nx">retries</span> <span class="o">:=</span> <span class="mi">0</span><span class="p">;</span> <span class="nx">retries</span> <span class="p">&lt;</span> <span class="mi">10</span><span class="p">;</span> <span class="nx">retries</span><span class="o">++</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln"> 3</span><span class="cl">    <span class="nx">tx</span><span class="p">,</span> <span class="nx">_</span> <span class="o">:=</span> <span class="nx">db</span><span class="p">.</span><span class="nf">Begin</span><span class="p">()</span>
</span></span><span class="line"><span class="ln"> 4</span><span class="cl">    <span class="c1">// ... transaction logic ...</span>
</span></span><span class="line"><span class="ln"> 5</span><span class="cl">    <span class="nx">err</span> <span class="o">:=</span> <span class="nx">tx</span><span class="p">.</span><span class="nf">Commit</span><span class="p">()</span>
</span></span><span class="line"><span class="ln"> 6</span><span class="cl">    <span class="k">if</span> <span class="nx">err</span> <span class="o">!=</span> <span class="kc">nil</span> <span class="o">&amp;&amp;</span> <span class="nx">strings</span><span class="p">.</span><span class="nf">Contains</span><span class="p">(</span><span class="nx">err</span><span class="p">.</span><span class="nf">Error</span><span class="p">(),</span> <span class="s">&#34;40001&#34;</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln"> 7</span><span class="cl">        <span class="nx">time</span><span class="p">.</span><span class="nf">Sleep</span><span class="p">(</span><span class="nf">backoff</span><span class="p">(</span><span class="nx">retries</span><span class="p">))</span>
</span></span><span class="line"><span class="ln"> 8</span><span class="cl">        <span class="k">continue</span>
</span></span><span class="line"><span class="ln"> 9</span><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="ln">10</span><span class="cl">    <span class="k">break</span>
</span></span><span class="line"><span class="ln">11</span><span class="cl"><span class="p">}</span></span></span></code></pre></div><p>framework-level：用 CRDB-provided client lib（go-cockroachdb / crdb-jdbc）有 retry helper。</p>
<h3 id="case-2extension-缺位application-feature-整段掉">Case 2：Extension 缺位、application feature 整段掉</h3>
<p><strong>徵兆</strong>：cutover 後 application 某個地理計算功能直接報錯、PostGIS 函數不存在；migrate 計畫漏看。</p>
<p><strong>根因</strong>：CRDB native geo 不同 syntax / API、PostGIS extension 不能直接搬。</p>
<p><strong>修法</strong>：</p>
<ol>
<li><strong>Pre-migration 必跑 extension audit</strong>：列所有 <code>pg_extension</code>、找對應 CRDB feature 或退役</li>
<li><strong>PostGIS 替代</strong>：CRDB native ST_* functions、部分 syntax 對齊但 spatial index 不同</li>
<li><strong>退役不能換的 feature</strong>：評估保留 PostgreSQL（混合架構）</li>
</ol>
<h3 id="case-3sequential-pk-撞-raft-quorum-瓶頸">Case 3：Sequential PK 撞 Raft quorum 瓶頸</h3>
<p><strong>徵兆</strong>：cutover 後寫入吞吐量 / latency 不如預期、CRDB cluster CPU &lt; 30% 但 write latency p99 high。</p>
<p><strong>根因</strong>：application 用 <code>AUTO_INCREMENT</code> / <code>SERIAL</code> 連續 PK；CRDB 把連續 key 放 <em>同一 range</em> / 同一 Raft group、寫入串行化、無法平行 scale。</p>
<p><strong>修法</strong>：</p>
<ol>
<li><strong>改 UUID v7 / <code>unique_rowid()</code></strong>：時序排序但散佈跨 range、自動 partition by hash</li>
<li><strong><code>PRIMARY KEY (region, id)</code></strong>：multi-region 場景 multi-tenancy 自然拆分</li>
<li><strong>不適合的 workload 留 PostgreSQL</strong>：不是所有 schema 都適合 distributed</li>
</ol>
<h3 id="case-4long-transaction-對-raft-衝擊">Case 4：Long transaction 對 Raft 衝擊</h3>
<p><strong>徵兆</strong>：跨 1 分鐘+ 的 transaction（batch processing / 大 ETL）大量 retry、最後失敗；同期間其他短 transaction 也 retry rate 上升。</p>
<p><strong>根因</strong>：CRDB long transaction holds intent on touched ranges、阻塞其他 transaction；SSI conflict 機率隨 transaction 時間平方增長。</p>
<p><strong>修法</strong>：</p>
<ol>
<li><strong>Long transaction 拆短</strong>：batch 用多個 short transaction、checkpoint 在 application 層</li>
<li><strong>Heavy ETL 不跑 CRDB</strong>：用 CRDB CDC export 到 OLAP（Snowflake / BigQuery）跑 batch</li>
<li><strong>Read-only long transaction 用 follower read</strong>：<code>AS OF SYSTEM TIME</code> 不 hold intent、適合 reporting</li>
</ol>
<h3 id="case-5backup--restore-行為跟-postgresql-不同sre-runbook-失效">Case 5：Backup / restore 行為跟 PostgreSQL 不同、SRE runbook 失效</h3>
<p><strong>徵兆</strong>：DBA 嘗試 <code>pg_restore</code> 失敗、CRDB 端 backup format 完全不同；incident response 卡關 1-2 小時。</p>
<p><strong>根因</strong>：CRDB backup 是 <em>cluster-internal format</em>、不能用 PostgreSQL tooling；SRE runbook 仍是 PostgreSQL world、應急時心智模型錯位。</p>
<p><strong>修法</strong>：</p>
<ol>
<li><strong>Runbook 重寫</strong>：CRDB-specific backup / restore 流程、SRE training</li>
<li><strong>DR drill</strong>：cutover 前跑完整 DR drill、用 CRDB tooling 完成、不依賴 PostgreSQL 經驗</li>
<li><strong>Multi-region backup</strong>：CRDB 跨 region backup 配置、避免單 region 故障</li>
</ol>
<h2 id="capacity-規劃">Capacity 規劃</h2>
<table>
  <thead>
      <tr>
          <th>維度</th>
          <th>PostgreSQL self-managed</th>
          <th>CockroachDB</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>Single-node 上限</td>
          <td>~10-50K TPS（vertical scale 到 32-128 vCPU）</td>
          <td>~5K TPS / node（horizontal scale by adding node）</td>
      </tr>
      <tr>
          <td>跨 region</td>
          <td>高 latency 跨區 streaming</td>
          <td>設計 native、Locality-aware queries</td>
      </tr>
      <tr>
          <td>Sharding</td>
          <td>手動 partition / pg_partman</td>
          <td>自動 range-based</td>
      </tr>
      <tr>
          <td>Storage / TPS ratio</td>
          <td>不變</td>
          <td>Storage 跨 node 3x（Raft quorum 3-replica default）</td>
      </tr>
      <tr>
          <td>Total cost (10TB)</td>
          <td>$2-4K USD / month（self-managed）</td>
          <td>$5-10K USD / month（CRDB Cloud + 3x storage）</td>
      </tr>
  </tbody>
</table>
<p><strong>判讀</strong>：CRDB cost 顯著高、選 CRDB 必須是 <em>paradigm 需求</em>（distributed transaction / multi-region / linear scale）；單純成本 / availability 改善走 <a href="/blog/backend/01-database/vendors/postgresql/migrate-to-aurora/" data-link-title="PostgreSQL → Aurora Migration：protocol 相容、operational 重設計" data-link-desc="Aurora 號稱 PostgreSQL-compatible 但 operational model 不同（storage decouple / cluster endpoint / instance class / 自家備份）；遷移流程是混合（protocol drop-in &#43; operational phased）、5 個 production 踩雷（extension 不支援 / replication slot 不直通 / autovacuum 行為差 / IAM 認證強制 / cost model 換算）、跟 Patroni / read replica / DR 對位">Aurora</a> 更划算。</p>
<h2 id="整合--下一步">整合 / 下一步</h2>
<h3 id="跟-postgresql--aurora-migration-對比">跟 <a href="/blog/backend/01-database/vendors/postgresql/migrate-to-aurora/" data-link-title="PostgreSQL → Aurora Migration：protocol 相容、operational 重設計" data-link-desc="Aurora 號稱 PostgreSQL-compatible 但 operational model 不同（storage decouple / cluster endpoint / instance class / 自家備份）；遷移流程是混合（protocol drop-in &#43; operational phased）、5 個 production 踩雷（extension 不支援 / replication slot 不直通 / autovacuum 行為差 / IAM 認證強制 / cost model 換算）、跟 Patroni / read replica / DR 對位">PostgreSQL → Aurora migration</a> 對比</h3>
<p>兩條 PostgreSQL 出路：</p>
<ul>
<li><strong>Aurora</strong>：operational simplification、protocol drop-in、cost 中等漲；適合 <em>不需 distributed transaction</em> 的 production</li>
<li><strong>CRDB</strong>：distributed paradigm shift、application 必須改、cost 顯著漲；適合 <em>真的需要 distributed</em> 的 workload</li>
</ul>
<p>多數 application 不需要 distributed transaction、Aurora 更合理；真正需要 cross-region 強一致 / linear scale by adding node 才走 CRDB。</p>
<h3 id="跟-application-transaction-pattern-重設計">跟 application transaction pattern 重設計</h3>
<p>CRDB 強制 application 改 transaction code、retry loop 必加。團隊心智模型轉換是 migration 主要 effort、技術部分相對少。</p>
<h3 id="下一步議題">下一步議題</h3>
<ul>
<li><strong>CRDB → PostgreSQL reverse migration</strong>：當業務 simplify 後 distributed 不必要、reverse migration cost 高、實務上 CRDB 是 <em>single-direction lock-in</em></li>
<li><strong>CRDB Serverless</strong>：cost 起點低、burst workload 適合；steady workload 仍是 dedicated cluster</li>
<li><strong>Multi-region active-active</strong>：CRDB 真正強項、但網路成本爆、僅金融 / 政府客戶 ROI 合理</li>
</ul>
<h2 id="相關連結">相關連結</h2>
<ul>
<li>Source / target vendor：<a href="/blog/backend/01-database/vendors/postgresql/" data-link-title="PostgreSQL" data-link-desc="多用途 OLTP 主流關聯式資料庫、MVCC、豐富 SQL 特性、是 Aurora / Cosmos DB / Spanner / CockroachDB / Aurora DSQL 的相容目標">PostgreSQL</a> / <a href="/blog/backend/01-database/vendors/cockroachdb/" data-link-title="CockroachDB" data-link-desc="分散式 SQL、PostgreSQL 相容、跨區強一致、Spanner 的開源 / 跨雲替代">CockroachDB</a></li>
<li>對位 migration：<a href="/blog/backend/01-database/vendors/postgresql/migrate-to-aurora/" data-link-title="PostgreSQL → Aurora Migration：protocol 相容、operational 重設計" data-link-desc="Aurora 號稱 PostgreSQL-compatible 但 operational model 不同（storage decouple / cluster endpoint / instance class / 自家備份）；遷移流程是混合（protocol drop-in &#43; operational phased）、5 個 production 踩雷（extension 不支援 / replication slot 不直通 / autovacuum 行為差 / IAM 認證強制 / cost model 換算）、跟 Patroni / read replica / DR 對位">PostgreSQL → Aurora</a>（另一條 PostgreSQL 出路）</li>
<li>平行 deep article：<a href="/blog/backend/01-database/vendors/postgresql/patroni-ha/" data-link-title="PostgreSQL Patroni HA：從 leader 失聯到 client 重連的 5 段 failover lifecycle" data-link-desc="Patroni 把 PostgreSQL HA 拆成 detection / election / promotion / reconfiguration / recovery 五段 lifecycle、每段都有獨立配置跟 failure mode；DCS quorum &#43; watchdog 防 split-brain、async/sync replication 取捨、5 個 production 踩雷、跟 PgBouncer / HAProxy / cert-manager 整合">Patroni HA</a> / <a href="/blog/backend/01-database/vendors/postgresql/logical-replication-debezium/" data-link-title="PostgreSQL Logical Replication &#43; Debezium CDC：replication slot × failure × recovery 對照" data-link-desc="PostgreSQL logical replication slot 跟 Debezium CDC 的失效模式對照表：slot lag 撐爆 primary disk / schema change 斷流 / 初始 COPY 鎖表 / zombie slot 不釋放 / replay storm 後 offset reset；publication / subscription / pgoutput 配置、跟 Kafka outbox pattern 整合">Logical Replication + Debezium</a></li>
<li>Methodology：<a href="/blog/posts/migration-playbook-%E6%96%B9%E6%B3%95%E8%AB%96%E7%9A%84%E6%BC%94%E5%8C%96%E7%B4%80%E9%8C%84stage-0-variant-%E8%A6%8F%E5%8A%83%E6%8A%8A-collapse-%E7%8E%87%E5%BE%9E-60-%E9%99%8D%E5%88%B0-0/" data-link-title="Migration Playbook 方法論的演化紀錄：Stage 0 variant 規劃把 collapse 率從 60% 降到 0%" data-link-desc="跨 vendor migration playbook 需要獨立寫作方法論的依據，以及這套方法論從三輪 batch dogfood 中演化出來的驗證證據。">Migration playbook methodology</a> / <a href="/blog/report/content-structure-by-max-diff-dimension/" data-link-title="Process content 結構由最大差異維度決定、不是 universal phased" data-link-desc="跨 X process content（migration / upgrade / rollout / playbook）的結構由 source / target 之間 *差異維度組合* 決定、不存在 universal phased 模板；6 種 migration / process type 實證（schema 差 / drop-in / operational / multi-tool / paradigm / topology re-layout）跑出 6 種不同結構；寫作前必須做 *6 維 diff dimension audit* 才能決定結構、跳過會套錯模板">#127 Process content 結構由最大差異維度決定</a>（本文驗證 <em>多重歸類 multi-axis 處理</em>）</li>
</ul>
]]></content:encoded></item></channel></rss>