<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/"><channel><title>Etcd on Tarragon</title><link>https://tarrragon.github.io/blog/tags/etcd/</link><description>Recent content in Etcd on Tarragon</description><generator>Hugo -- gohugo.io</generator><language>zh-TW</language><copyright>Tarragon (CC BY 4.0)</copyright><lastBuildDate>Tue, 19 May 2026 00:00:00 +0000</lastBuildDate><atom:link href="https://tarrragon.github.io/blog/tags/etcd/index.xml" rel="self" type="application/rss+xml"/><item><title>etcd → Consul：KV + N 個 extras feature matrix</title><link>https://tarrragon.github.io/blog/backend/05-deployment-platform/vendors/consul/migrate-from-etcd/</link><pubDate>Tue, 19 May 2026 00:00:00 +0000</pubDate><guid>https://tarrragon.github.io/blog/backend/05-deployment-platform/vendors/consul/migrate-from-etcd/</guid><description>&lt;blockquote>
&lt;p>本文是跨 vendor migration playbook、cross-link &lt;a href="https://etcd.io/">etcd&lt;/a> 跟 &lt;a href="https://tarrragon.github.io/blog/backend/05-deployment-platform/vendors/consul/" data-link-title="Consul" data-link-desc="Service registry / mesh / KV / DNS">Consul&lt;/a>。跑 &lt;a href="https://tarrragon.github.io/blog/posts/migration-playbook-%E6%96%B9%E6%B3%95%E8%AB%96%E7%9A%84%E6%BC%94%E5%8C%96%E7%B4%80%E9%8C%84stage-0-variant-%E8%A6%8F%E5%8A%83%E6%8A%8A-collapse-%E7%8E%87%E5%BE%9E-60-%E9%99%8D%E5%88%B0-0/" data-link-title="Migration Playbook 方法論的演化紀錄：Stage 0 variant 規劃把 collapse 率從 60% 降到 0%" data-link-desc="跨 vendor migration playbook 需要獨立寫作方法論的依據，以及這套方法論從三輪 batch dogfood 中演化出來的驗證證據。">migration-playbook-methodology 6 維 audit&lt;/a> 後對映 &lt;em>Paradigm = High（pure KV → service mesh paradigm）→ Type E paradigm shift&lt;/em>；跟 &lt;a href="https://tarrragon.github.io/blog/backend/02-cache-redis/vendors/redis/migrate-to-memcached/" data-link-title="Redis → Memcached：Memcached 不是 simpler Redis、是 cache paradigm" data-link-desc="Redis → Memcached 是 Type E paradigm reduction migration — 從 multi-paradigm（KV &amp;#43; 資料結構 &amp;#43; pub/sub &amp;#43; Lua &amp;#43; streams）退到 pure cache；不是「remove Redis features」、是「重新分配 Redis-specific feature 到對應 specialized 服務」；5 個 production 踩雷 &amp;#43; paradigm reduction 路線">Redis → Memcached&lt;/a>（paradigm reduction）對偶、本文是 &lt;em>paradigm expansion&lt;/em>（upgrade）方向。&lt;/p>&lt;/blockquote>
&lt;h2 id="kv--n-個-extrasfeature-matrix">KV + N 個 extras：feature matrix&lt;/h2>
&lt;table>
 &lt;thead>
 &lt;tr>
 &lt;th>概念&lt;/th>
 &lt;th>etcd&lt;/th>
 &lt;th>Consul&lt;/th>
 &lt;/tr>
 &lt;/thead>
 &lt;tbody>
 &lt;tr>
 &lt;td>核心 paradigm&lt;/td>
 &lt;td>Pure KV with Raft consensus&lt;/td>
 &lt;td>Service mesh（KV + 6 個其他）&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>Data store&lt;/td>
 &lt;td>KV with versioned values + watch&lt;/td>
 &lt;td>KV + service catalog + health checks + sessions&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>API style&lt;/td>
 &lt;td>gRPC + HTTP/REST&lt;/td>
 &lt;td>HTTP/REST + gRPC（Connect）+ DNS&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>Service discovery&lt;/td>
 &lt;td>無（application 自管）&lt;/td>
 &lt;td>Built-in（DNS / HTTP API）&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>Health check&lt;/td>
 &lt;td>無&lt;/td>
 &lt;td>Built-in（HTTP / TCP / script / TTL）&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>Service mesh&lt;/td>
 &lt;td>無&lt;/td>
 &lt;td>Connect（mTLS + intentions + service-to-service）&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>Multi-DC&lt;/td>
 &lt;td>不支援（per-cluster only）&lt;/td>
 &lt;td>Built-in WAN federation&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>ACL system&lt;/td>
 &lt;td>RBAC (etcd 3.5+)&lt;/td>
 &lt;td>Token-based ACL + namespaces (Enterprise)&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>Lock primitive&lt;/td>
 &lt;td>Lease + transaction&lt;/td>
 &lt;td>Session + KV check-and-set&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>Watch event model&lt;/td>
 &lt;td>Event stream（gRPC stream）&lt;/td>
 &lt;td>Long-polling blocking query (X-Consul-Index)&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>Distributed config&lt;/td>
 &lt;td>KV + watch&lt;/td>
 &lt;td>KV + watch + template rendering (consul-template)&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>Use case 對映&lt;/td>
 &lt;td>K8s control plane / 純 distributed KV&lt;/td>
 &lt;td>Service mesh + service discovery + config + KV&lt;/td>
 &lt;/tr>
 &lt;/tbody>
&lt;/table>
&lt;p>&lt;strong>核心差異不在「Consul 多功能」、在「Consul 是 service mesh paradigm」&lt;/strong>：service discovery / health check / Connect mTLS 是 first-class、KV 只是其中一個 sub-feature。&lt;/p></description><content:encoded><![CDATA[<blockquote>
<p>本文是跨 vendor migration playbook、cross-link <a href="https://etcd.io/">etcd</a> 跟 <a href="/blog/backend/05-deployment-platform/vendors/consul/" data-link-title="Consul" data-link-desc="Service registry / mesh / KV / DNS">Consul</a>。跑 <a href="/blog/posts/migration-playbook-%E6%96%B9%E6%B3%95%E8%AB%96%E7%9A%84%E6%BC%94%E5%8C%96%E7%B4%80%E9%8C%84stage-0-variant-%E8%A6%8F%E5%8A%83%E6%8A%8A-collapse-%E7%8E%87%E5%BE%9E-60-%E9%99%8D%E5%88%B0-0/" data-link-title="Migration Playbook 方法論的演化紀錄：Stage 0 variant 規劃把 collapse 率從 60% 降到 0%" data-link-desc="跨 vendor migration playbook 需要獨立寫作方法論的依據，以及這套方法論從三輪 batch dogfood 中演化出來的驗證證據。">migration-playbook-methodology 6 維 audit</a> 後對映 <em>Paradigm = High（pure KV → service mesh paradigm）→ Type E paradigm shift</em>；跟 <a href="/blog/backend/02-cache-redis/vendors/redis/migrate-to-memcached/" data-link-title="Redis → Memcached：Memcached 不是 simpler Redis、是 cache paradigm" data-link-desc="Redis → Memcached 是 Type E paradigm reduction migration — 從 multi-paradigm（KV &#43; 資料結構 &#43; pub/sub &#43; Lua &#43; streams）退到 pure cache；不是「remove Redis features」、是「重新分配 Redis-specific feature 到對應 specialized 服務」；5 個 production 踩雷 &#43; paradigm reduction 路線">Redis → Memcached</a>（paradigm reduction）對偶、本文是 <em>paradigm expansion</em>（upgrade）方向。</p></blockquote>
<h2 id="kv--n-個-extrasfeature-matrix">KV + N 個 extras：feature matrix</h2>
<table>
  <thead>
      <tr>
          <th>概念</th>
          <th>etcd</th>
          <th>Consul</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>核心 paradigm</td>
          <td>Pure KV with Raft consensus</td>
          <td>Service mesh（KV + 6 個其他）</td>
      </tr>
      <tr>
          <td>Data store</td>
          <td>KV with versioned values + watch</td>
          <td>KV + service catalog + health checks + sessions</td>
      </tr>
      <tr>
          <td>API style</td>
          <td>gRPC + HTTP/REST</td>
          <td>HTTP/REST + gRPC（Connect）+ DNS</td>
      </tr>
      <tr>
          <td>Service discovery</td>
          <td>無（application 自管）</td>
          <td>Built-in（DNS / HTTP API）</td>
      </tr>
      <tr>
          <td>Health check</td>
          <td>無</td>
          <td>Built-in（HTTP / TCP / script / TTL）</td>
      </tr>
      <tr>
          <td>Service mesh</td>
          <td>無</td>
          <td>Connect（mTLS + intentions + service-to-service）</td>
      </tr>
      <tr>
          <td>Multi-DC</td>
          <td>不支援（per-cluster only）</td>
          <td>Built-in WAN federation</td>
      </tr>
      <tr>
          <td>ACL system</td>
          <td>RBAC (etcd 3.5+)</td>
          <td>Token-based ACL + namespaces (Enterprise)</td>
      </tr>
      <tr>
          <td>Lock primitive</td>
          <td>Lease + transaction</td>
          <td>Session + KV check-and-set</td>
      </tr>
      <tr>
          <td>Watch event model</td>
          <td>Event stream（gRPC stream）</td>
          <td>Long-polling blocking query (X-Consul-Index)</td>
      </tr>
      <tr>
          <td>Distributed config</td>
          <td>KV + watch</td>
          <td>KV + watch + template rendering (consul-template)</td>
      </tr>
      <tr>
          <td>Use case 對映</td>
          <td>K8s control plane / 純 distributed KV</td>
          <td>Service mesh + service discovery + config + KV</td>
      </tr>
  </tbody>
</table>
<p><strong>核心差異不在「Consul 多功能」、在「Consul 是 service mesh paradigm」</strong>：service discovery / health check / Connect mTLS 是 first-class、KV 只是其中一個 sub-feature。</p>
<p>跑 <a href="/blog/posts/migration-playbook-%E6%96%B9%E6%B3%95%E8%AB%96%E7%9A%84%E6%BC%94%E5%8C%96%E7%B4%80%E9%8C%84stage-0-variant-%E8%A6%8F%E5%8A%83%E6%8A%8A-collapse-%E7%8E%87%E5%BE%9E-60-%E9%99%8D%E5%88%B0-0/" data-link-title="Migration Playbook 方法論的演化紀錄：Stage 0 variant 規劃把 collapse 率從 60% 降到 0%" data-link-desc="跨 vendor migration playbook 需要獨立寫作方法論的依據，以及這套方法論從三輪 batch dogfood 中演化出來的驗證證據。">6 維 diff dimension audit</a>：</p>
<table>
  <thead>
      <tr>
          <th>維度</th>
          <th>評估</th>
          <th>等級</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>Schema / API</td>
          <td>KV API 對位 + 多 N 個 extra API</td>
          <td>Medium</td>
      </tr>
      <tr>
          <td>Operational model</td>
          <td>兩者 Raft-based、ops similar</td>
          <td>Low</td>
      </tr>
      <tr>
          <td>Paradigm</td>
          <td>Pure KV → service mesh</td>
          <td><strong>High</strong></td>
      </tr>
      <tr>
          <td>Components</td>
          <td>同 1 cluster</td>
          <td>Low</td>
      </tr>
      <tr>
          <td>Application change</td>
          <td>KV API 改 + 新增 service registration / health</td>
          <td>Medium</td>
      </tr>
      <tr>
          <td>Data topology</td>
          <td>單 DC → multi-DC（如果用 federation）</td>
          <td>Low-Medium</td>
      </tr>
  </tbody>
</table>
<p>Paradigm = High（其他 Low-Medium）→ <strong>Type E paradigm shift</strong>；KV 是 sub-feature、不是 migration scope 全部。</p>
<h2 id="為什麼遷3-條-expansion-driver">為什麼遷：3 條 expansion driver</h2>
<ul>
<li><strong>Service mesh adoption</strong>：本來用 etcd 跑 K8s control plane、現在 application 端要 service mesh（mTLS / intentions / 流量切換）、Consul 一站式 cover</li>
<li><strong>Multi-DC strategy</strong>：etcd 不支援跨 DC、要 active-passive failover；Consul WAN federation 支援 active-active 多 DC</li>
<li><strong>Configuration management</strong>：consul-template + envconsul 比 etcd watch + 自寫 reloader 簡單</li>
</ul>
<p>反向 driver（Consul → etcd）：</p>
<ul>
<li>純 K8s control plane scenario、不需要 service discovery / health check / mesh、etcd 簡單足夠</li>
<li>Resource constraint：Consul agent 比 etcd 更吃資源、low-end VM 上不夠</li>
</ul>
<h2 id="paradigm-expansion-路線">Paradigm expansion 路線</h2>
<p>跟 <a href="/blog/backend/02-cache-redis/vendors/redis/migrate-to-memcached/" data-link-title="Redis → Memcached：Memcached 不是 simpler Redis、是 cache paradigm" data-link-desc="Redis → Memcached 是 Type E paradigm reduction migration — 從 multi-paradigm（KV &#43; 資料結構 &#43; pub/sub &#43; Lua &#43; streams）退到 pure cache；不是「remove Redis features」、是「重新分配 Redis-specific feature 到對應 specialized 服務」；5 個 production 踩雷 &#43; paradigm reduction 路線">Redis → Memcached paradigm reduction</a>（移除 features）對偶、Consul 是 <em>補進 features</em>：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="ln"> 1</span><span class="cl">etcd KV pattern         → Consul KV API (1:1 對位)
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">etcd watch              → Consul blocking query / consul-template
</span></span><span class="line"><span class="ln"> 3</span><span class="cl">etcd lease + lock       → Consul session + KV CAS
</span></span><span class="line"><span class="ln"> 4</span><span class="cl">
</span></span><span class="line"><span class="ln"> 5</span><span class="cl">(額外加進)
</span></span><span class="line"><span class="ln"> 6</span><span class="cl">無                      → Consul service registration (services.json / API)
</span></span><span class="line"><span class="ln"> 7</span><span class="cl">無                      → Consul health check (HTTP / TCP / TTL)
</span></span><span class="line"><span class="ln"> 8</span><span class="cl">無                      → Consul service discovery (DNS / HTTP)
</span></span><span class="line"><span class="ln"> 9</span><span class="cl">無                      → Consul Connect (mTLS + intentions)
</span></span><span class="line"><span class="ln">10</span><span class="cl">無                      → Consul WAN federation (multi-DC)
</span></span><span class="line"><span class="ln">11</span><span class="cl">無                      → Consul ACL token + policy</span></span></code></pre></div><p>Migration 不只是 KV API 對位、是 <em>application 增能</em>。</p>
<h2 id="api-對位">API 對位</h2>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl"><span class="c1"># etcd basic KV</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl">etcdctl put /myapp/config/db_url <span class="s1">&#39;postgres://...&#39;</span>
</span></span><span class="line"><span class="ln">3</span><span class="cl">etcdctl get /myapp/config/db_url
</span></span><span class="line"><span class="ln">4</span><span class="cl">
</span></span><span class="line"><span class="ln">5</span><span class="cl"><span class="c1"># Consul KV (對位)</span>
</span></span><span class="line"><span class="ln">6</span><span class="cl">consul kv put myapp/config/db_url <span class="s1">&#39;postgres://...&#39;</span>
</span></span><span class="line"><span class="ln">7</span><span class="cl">consul kv get myapp/config/db_url</span></span></code></pre></div>




<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl"><span class="c1"># etcd watch</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl">etcdctl watch --prefix /myapp/config/
</span></span><span class="line"><span class="ln">3</span><span class="cl">
</span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="c1"># Consul blocking query (long polling)</span>
</span></span><span class="line"><span class="ln">5</span><span class="cl">curl <span class="s1">&#39;http://consul:8500/v1/kv/myapp/config?recurse&amp;index=5&amp;wait=10s&#39;</span>
</span></span><span class="line"><span class="ln">6</span><span class="cl"><span class="c1"># X-Consul-Index header 為 watch cursor</span></span></span></code></pre></div>




<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="c1"># etcd transaction (multi-key atomic)</span>
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">etcdctl txn <span class="s">&lt;&lt;EOF
</span></span></span><span class="line"><span class="ln"> 3</span><span class="cl"><span class="s">compares:
</span></span></span><span class="line"><span class="ln"> 4</span><span class="cl"><span class="s">mod(&#34;/myapp/lock&#34;) = &#34;0&#34;
</span></span></span><span class="line"><span class="ln"> 5</span><span class="cl"><span class="s">success requests:
</span></span></span><span class="line"><span class="ln"> 6</span><span class="cl"><span class="s">put /myapp/lock &#34;owner1&#34;
</span></span></span><span class="line"><span class="ln"> 7</span><span class="cl"><span class="s">EOF</span>
</span></span><span class="line"><span class="ln"> 8</span><span class="cl">
</span></span><span class="line"><span class="ln"> 9</span><span class="cl"><span class="c1"># Consul session + KV CAS (對位)</span>
</span></span><span class="line"><span class="ln">10</span><span class="cl"><span class="nv">SESSION_ID</span><span class="o">=</span><span class="k">$(</span>curl -X PUT <span class="s1">&#39;http://consul:8500/v1/session/create&#39;</span> <span class="p">|</span> jq -r .ID<span class="k">)</span>
</span></span><span class="line"><span class="ln">11</span><span class="cl">curl -X PUT <span class="s1">&#39;http://consul:8500/v1/kv/myapp/lock?acquire=&#39;</span><span class="nv">$SESSION_ID</span> -d <span class="s1">&#39;owner1&#39;</span>
</span></span><span class="line"><span class="ln">12</span><span class="cl"><span class="c1"># 若失敗 lock 已被別人持有</span></span></span></code></pre></div><h2 id="application-重設計">Application 重設計</h2>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="c1"># Before: etcd</span>
</span></span><span class="line"><span class="ln"> 2</span><span class="cl"><span class="kn">import</span> <span class="nn">etcd3</span>
</span></span><span class="line"><span class="ln"> 3</span><span class="cl"><span class="n">etcd</span> <span class="o">=</span> <span class="n">etcd3</span><span class="o">.</span><span class="n">client</span><span class="p">(</span><span class="n">host</span><span class="o">=</span><span class="s1">&#39;etcd&#39;</span><span class="p">,</span> <span class="n">port</span><span class="o">=</span><span class="mi">2379</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 4</span><span class="cl"><span class="n">etcd</span><span class="o">.</span><span class="n">put</span><span class="p">(</span><span class="s1">&#39;/myapp/config/db_url&#39;</span><span class="p">,</span> <span class="s1">&#39;postgres://...&#39;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 5</span><span class="cl"><span class="n">db_url</span> <span class="o">=</span> <span class="n">etcd</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s1">&#39;/myapp/config/db_url&#39;</span><span class="p">)[</span><span class="mi">0</span><span class="p">]</span>
</span></span><span class="line"><span class="ln"> 6</span><span class="cl">
</span></span><span class="line"><span class="ln"> 7</span><span class="cl"><span class="c1"># After: Consul (KV-only)</span>
</span></span><span class="line"><span class="ln"> 8</span><span class="cl"><span class="kn">import</span> <span class="nn">consul</span>
</span></span><span class="line"><span class="ln"> 9</span><span class="cl"><span class="n">c</span> <span class="o">=</span> <span class="n">consul</span><span class="o">.</span><span class="n">Consul</span><span class="p">(</span><span class="n">host</span><span class="o">=</span><span class="s1">&#39;consul&#39;</span><span class="p">,</span> <span class="n">port</span><span class="o">=</span><span class="mi">8500</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">10</span><span class="cl"><span class="n">c</span><span class="o">.</span><span class="n">kv</span><span class="o">.</span><span class="n">put</span><span class="p">(</span><span class="s1">&#39;myapp/config/db_url&#39;</span><span class="p">,</span> <span class="s1">&#39;postgres://...&#39;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">11</span><span class="cl"><span class="n">_</span><span class="p">,</span> <span class="n">kv</span> <span class="o">=</span> <span class="n">c</span><span class="o">.</span><span class="n">kv</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s1">&#39;myapp/config/db_url&#39;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">12</span><span class="cl"><span class="n">db_url</span> <span class="o">=</span> <span class="n">kv</span><span class="p">[</span><span class="s1">&#39;Value&#39;</span><span class="p">]</span>
</span></span><span class="line"><span class="ln">13</span><span class="cl">
</span></span><span class="line"><span class="ln">14</span><span class="cl"><span class="c1"># (額外加進) After: Consul service discovery</span>
</span></span><span class="line"><span class="ln">15</span><span class="cl"><span class="n">c</span><span class="o">.</span><span class="n">agent</span><span class="o">.</span><span class="n">service</span><span class="o">.</span><span class="n">register</span><span class="p">(</span>
</span></span><span class="line"><span class="ln">16</span><span class="cl">    <span class="n">name</span><span class="o">=</span><span class="s1">&#39;myapp&#39;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">17</span><span class="cl">    <span class="n">service_id</span><span class="o">=</span><span class="s1">&#39;myapp-1&#39;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">18</span><span class="cl">    <span class="n">address</span><span class="o">=</span><span class="s1">&#39;10.0.0.10&#39;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">19</span><span class="cl">    <span class="n">port</span><span class="o">=</span><span class="mi">8080</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">20</span><span class="cl">    <span class="n">check</span><span class="o">=</span><span class="n">consul</span><span class="o">.</span><span class="n">Check</span><span class="o">.</span><span class="n">http</span><span class="p">(</span><span class="s1">&#39;http://10.0.0.10:8080/health&#39;</span><span class="p">,</span> <span class="s1">&#39;10s&#39;</span><span class="p">,</span> <span class="s1">&#39;5s&#39;</span><span class="p">,</span> <span class="s1">&#39;30s&#39;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">21</span><span class="cl"><span class="p">)</span>
</span></span><span class="line"><span class="ln">22</span><span class="cl">
</span></span><span class="line"><span class="ln">23</span><span class="cl"><span class="c1"># DNS-based discovery (其他 service 找 myapp)</span>
</span></span><span class="line"><span class="ln">24</span><span class="cl"><span class="c1"># dig +short myapp.service.consul SRV</span></span></span></code></pre></div><h2 id="migration-流程">Migration 流程</h2>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="ln"> 1</span><span class="cl">1. Pre-migration audit
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">   - 列 etcd 使用的所有 application
</span></span><span class="line"><span class="ln"> 3</span><span class="cl">   - 評估每個 application 是否 *需要* Consul extras（service discovery / health / mesh）
</span></span><span class="line"><span class="ln"> 4</span><span class="cl">   - 純 KV use case 標 *low-effort migration*、用得到 extras 標 *value-add migration*
</span></span><span class="line"><span class="ln"> 5</span><span class="cl">
</span></span><span class="line"><span class="ln"> 6</span><span class="cl">2. Consul cluster build
</span></span><span class="line"><span class="ln"> 7</span><span class="cl">   - 跨 DC 設計（WAN federation 規劃）
</span></span><span class="line"><span class="ln"> 8</span><span class="cl">   - ACL system 配置（不要 default open）
</span></span><span class="line"><span class="ln"> 9</span><span class="cl">   - 性能 sizing（Consul agent 比 etcd 重）
</span></span><span class="line"><span class="ln">10</span><span class="cl">
</span></span><span class="line"><span class="ln">11</span><span class="cl">3. Application migration（per-app）
</span></span><span class="line"><span class="ln">12</span><span class="cl">   - 純 KV: SDK 換、API 對位、cutover
</span></span><span class="line"><span class="ln">13</span><span class="cl">   - Service discovery: 加 registration + health check + DNS lookup
</span></span><span class="line"><span class="ln">14</span><span class="cl">   - Service mesh: 加 Connect proxy + intentions
</span></span><span class="line"><span class="ln">15</span><span class="cl">
</span></span><span class="line"><span class="ln">16</span><span class="cl">4. Dual-run period
</span></span><span class="line"><span class="ln">17</span><span class="cl">   - etcd 仍跑、application 漸進切到 Consul
</span></span><span class="line"><span class="ln">18</span><span class="cl">   - 每 application cutover 後驗證
</span></span><span class="line"><span class="ln">19</span><span class="cl">
</span></span><span class="line"><span class="ln">20</span><span class="cl">5. etcd decommission
</span></span><span class="line"><span class="ln">21</span><span class="cl">   - 確認所有 application 已切
</span></span><span class="line"><span class="ln">22</span><span class="cl">   - K8s control plane（如果是 etcd 唯一 user）保留不切</span></span></code></pre></div><p>整體 2-4 個月、依 application 數量跟 extras 採用程度。</p>
<h2 id="production-故障演練">Production 故障演練</h2>
<h3 id="case-1kv-api-對位看似-11watch-event-model-不同">Case 1：KV API 對位看似 1:1、watch event model 不同</h3>
<p><strong>徵兆</strong>：application 端從 etcd watch 切 Consul blocking query 後、event 處理 latency 從 50ms 漲到 1-5s；應用以為 event push 即時、實際變 polling。</p>
<p><strong>根因</strong>：etcd watch 是 gRPC stream、event 即時 push；Consul blocking query 是 long-polling、有 <code>wait</code> timeout、event 在 timeout 內到才即時收到。</p>
<p><strong>修法</strong>：</p>
<ol>
<li><strong>降 <code>wait</code> timeout</strong> 跟業務需求對齊（default 5min、可設 10s）</li>
<li><strong>多 instance 並發 polling</strong>：N 個 application instance 各自 polling、降單點 event 延遲</li>
<li><strong>架構</strong>：critical event 用 Consul event API（<code>PUT /v1/event/fire/&lt;name&gt;</code>）+ blocking query event endpoint、跟 KV change 分開</li>
<li><strong>保留 etcd for critical watch</strong>：mission-critical watch 用 etcd 不切</li>
</ol>
<h3 id="case-2session-based-lock-跟-etcd-lease-差">Case 2：Session-based lock 跟 etcd lease 差</h3>
<p><strong>徵兆</strong>：原本 etcd lease 5s TTL、lease holder application 失聯時 5s 內 lock 自動釋放；切 Consul session 後、session TTL 仍生效、但 health check 整合複雜、偶發 lock not released。</p>
<p><strong>根因</strong>：Consul session 有兩種模式 — <code>delete</code>（session expire 時 release lock）vs <code>release</code>（release lock 但 KV 保留）；TTL 配 health check 時行為複雜。</p>
<p><strong>修法</strong>：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="ln">1</span><span class="cl"><span class="c1"># 明示 session behavior</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="n">session_id</span> <span class="o">=</span> <span class="n">c</span><span class="o">.</span><span class="n">session</span><span class="o">.</span><span class="n">create</span><span class="p">(</span>
</span></span><span class="line"><span class="ln">3</span><span class="cl">    <span class="n">name</span><span class="o">=</span><span class="s1">&#39;myapp-lock&#39;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">4</span><span class="cl">    <span class="n">ttl</span><span class="o">=</span><span class="mi">15</span><span class="p">,</span>           <span class="c1"># 15s TTL</span>
</span></span><span class="line"><span class="ln">5</span><span class="cl">    <span class="n">behavior</span><span class="o">=</span><span class="s1">&#39;delete&#39;</span> <span class="c1"># session 過期時 lock 自動 release</span>
</span></span><span class="line"><span class="ln">6</span><span class="cl"><span class="p">)</span>
</span></span><span class="line"><span class="ln">7</span><span class="cl"><span class="n">c</span><span class="o">.</span><span class="n">kv</span><span class="o">.</span><span class="n">put</span><span class="p">(</span><span class="s1">&#39;myapp/lock&#39;</span><span class="p">,</span> <span class="s1">&#39;owner1&#39;</span><span class="p">,</span> <span class="n">acquire</span><span class="o">=</span><span class="n">session_id</span><span class="p">)</span></span></span></code></pre></div><p>session TTL 範圍 10s-86400s、不能 &lt; 10s（etcd 可以 1s）；critical low-latency lock 不適用 Consul。</p>
<h3 id="case-3multi-dc-failoverkv-寫到-wrong-dc">Case 3：Multi-DC failover、KV 寫到 wrong DC</h3>
<p><strong>徵兆</strong>：跨 DC 部署後、某 application 寫 KV、但 read 不到；發現 application 端 hardcode 一個 DC 端點、write 到 us-east 但 read 來自 us-west。</p>
<p><strong>根因</strong>：Consul WAN federation 跨 DC 不自動同步 KV；KV 是 <em>per-DC</em>、跨 DC sync 需要 <em>Consul Enterprise license</em> 或自管 <em>consul-replicate</em>。</p>
<p><strong>修法</strong>：</p>
<ol>
<li><strong>每 application instance 連 local DC Consul</strong>：write/read 同 DC</li>
<li><strong>KV replication 跨 DC</strong>：用 consul-replicate 自管、或升 Enterprise</li>
<li><strong>Architecture</strong>：跨 DC 共享 config 改用 <em>DB-backed config</em>（持久 + 跨 DC）+ Consul KV 只存 DC-local config</li>
</ol>
<h3 id="case-4acl-system-預設-opencutover-後曝險">Case 4：ACL system 預設 open、cutover 後曝險</h3>
<p><strong>徵兆</strong>：Consul cluster 上線 1 個月後 SOC 跑 audit、發現任何 application 都能 read 任何 KV；ACL 沒設、所有 token 都全權限。</p>
<p><strong>根因</strong>：Consul ACL 預設 disabled、需要 <em>bootstrap</em>；很多 setup tutorial 簡化跳過 ACL、cutover 後沒補。</p>
<p><strong>修法</strong>：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="c1"># Bootstrap ACL system</span>
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">consul acl bootstrap
</span></span><span class="line"><span class="ln"> 3</span><span class="cl"><span class="c1"># 生成 management token、保留為 root credential</span>
</span></span><span class="line"><span class="ln"> 4</span><span class="cl">
</span></span><span class="line"><span class="ln"> 5</span><span class="cl"><span class="c1"># 建 policy</span>
</span></span><span class="line"><span class="ln"> 6</span><span class="cl">consul acl policy create -name <span class="s1">&#39;myapp-readonly&#39;</span> <span class="se">\
</span></span></span><span class="line"><span class="ln"> 7</span><span class="cl"><span class="se"></span>  -rules <span class="s1">&#39;key_prefix &#34;myapp/&#34; { policy = &#34;read&#34; }&#39;</span>
</span></span><span class="line"><span class="ln"> 8</span><span class="cl">
</span></span><span class="line"><span class="ln"> 9</span><span class="cl"><span class="c1"># 建 token 給 application</span>
</span></span><span class="line"><span class="ln">10</span><span class="cl">consul acl token create -policy-name <span class="s1">&#39;myapp-readonly&#39;</span></span></span></code></pre></div><p>Production setup 第一步就 bootstrap ACL、不可以延後。</p>
<h3 id="case-5health-check-failure-連鎖service-discovery-失效">Case 5：Health check failure 連鎖、service discovery 失效</h3>
<p><strong>徵兆</strong>：某 application instance 因 GC pause 5 秒未 respond health check、被 Consul 標 failed；DNS query 不返回該 instance；流量切走；GC 結束後 instance 仍 healthy 但 Consul 端 still failed、需要 minutes recover。</p>
<p><strong>根因</strong>：Consul health check 失敗後進入 critical state、需要 <em>連續 N 次成功</em> 才回 passing；default 1-2 次成功即可、但實際時間視 check interval 而定。</p>
<p><strong>修法</strong>：</p>
<ol>
<li><strong><code>success_before_passing</code> 設低</strong>（1）讓快速恢復</li>
<li><strong><code>failures_before_critical</code> 設高</strong>（3-5）容忍 transient failure</li>
<li><strong>Multi-check strategy</strong>：HTTP + TCP + script check 三軸、不靠單 check</li>
<li><strong>Application-side hint</strong>：JVM application 配 <code>MaxGCPauseMillis</code> 限制 GC pause &lt; health check interval</li>
</ol>
<h2 id="capacity--cost">Capacity / cost</h2>
<table>
  <thead>
      <tr>
          <th>維度</th>
          <th>etcd</th>
          <th>Consul</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>Cluster baseline</td>
          <td>3-5 node Raft cluster</td>
          <td>3-5 server + N agent (per host)</td>
      </tr>
      <tr>
          <td>Memory per node</td>
          <td>2-8GB</td>
          <td>4-16GB（含 agent）</td>
      </tr>
      <tr>
          <td>Operational FTE</td>
          <td>0.2-0.5</td>
          <td>0.5-1.0（多 features 多運維）</td>
      </tr>
      <tr>
          <td>Feature surface</td>
          <td>Pure KV</td>
          <td>KV + service mesh + multi-DC + ACL</td>
      </tr>
      <tr>
          <td>Setup complexity</td>
          <td>Low</td>
          <td>Medium-High</td>
      </tr>
      <tr>
          <td>Multi-DC support</td>
          <td>不支援</td>
          <td>Built-in WAN federation</td>
      </tr>
      <tr>
          <td>License</td>
          <td>Apache 2.0 (open)</td>
          <td>MPL 2.0 (community) / commercial (enterprise)</td>
      </tr>
      <tr>
          <td>Migration cost</td>
          <td>-</td>
          <td>1-3 FTE × 2-4 個月</td>
      </tr>
  </tbody>
</table>
<p><strong>判讀</strong>：純 KV use case 走 etcd；service mesh / multi-DC / discovery 需求大走 Consul；混合 deployment 是 long-term default（K8s control plane 仍跑 etcd、service mesh 跑 Consul）。</p>
<h2 id="整合--下一步">整合 / 下一步</h2>
<h3 id="跟-kubernetes-對位">跟 Kubernetes 對位</h3>
<p>K8s control plane <em>永遠</em> 用 etcd、不切 Consul；Consul 是 K8s <em>外</em> 的 service mesh + 跨 cluster discovery。兩者並存、不互斥。</p>
<h3 id="跟-vault-整合">跟 <a href="/blog/backend/07-security-data-protection/vendors/hashicorp-vault/" data-link-title="HashiCorp Vault" data-link-desc="Self-hosted secret management 與 dynamic credential / encryption-as-a-service / PKI engine、跨雲跨環境的 secret 控制面">Vault</a> 整合</h3>
<p>Consul + Vault 是 HashiCorp 同生態、Consul 跑 service discovery / mesh、Vault 跑 secrets；Consul ACL token 可從 Vault dynamic engine 取得。</p>
<h3 id="跟-istio--linkerd-對位">跟 <a href="https://istio.io/">Istio / Linkerd</a> 對位</h3>
<p>Consul Connect 是 service mesh paradigm、跟 Istio / Linkerd 並列；多數 K8s-native organization 用 Istio / Linkerd、Consul 強項在 <em>跨 K8s + VM + multi-DC</em> mesh。</p>
<h3 id="反向-migrationconsul--etcd">反向 migration（Consul → etcd）</h3>
<p>少數 organization 簡化 stack 時做、流程鏡像對稱、但 <em>退掉 service mesh / multi-DC 是有意識降級</em>、不能假裝功能等價。</p>
<h3 id="下一步議題">下一步議題</h3>
<ul>
<li><strong>Consul Connect production rollout</strong>：mesh adoption 是 incremental、per-service intentions 漸進</li>
<li><strong>Multi-DC topology 設計</strong>：active-active vs active-passive、依 RPO/RTO 跟 cost trade-off</li>
<li><strong>跟 Kubernetes Gateway API 整合</strong>：service mesh paradigm 在 K8s 內 vs 外整合策略</li>
</ul>
<h2 id="相關連結">相關連結</h2>
<ul>
<li>Target vendor：<a href="/blog/backend/05-deployment-platform/vendors/consul/" data-link-title="Consul" data-link-desc="Service registry / mesh / KV / DNS">Consul</a></li>
<li>平行 migration playbook (Type E)：<a href="/blog/backend/02-cache-redis/vendors/redis/migrate-to-memcached/" data-link-title="Redis → Memcached：Memcached 不是 simpler Redis、是 cache paradigm" data-link-desc="Redis → Memcached 是 Type E paradigm reduction migration — 從 multi-paradigm（KV &#43; 資料結構 &#43; pub/sub &#43; Lua &#43; streams）退到 pure cache；不是「remove Redis features」、是「重新分配 Redis-specific feature 到對應 specialized 服務」；5 個 production 踩雷 &#43; paradigm reduction 路線">Redis → Memcached</a>（paradigm reduction 對偶）/ <a href="/blog/backend/03-message-queue/vendors/kafka/migrate-from-to-nats/" data-link-title="Kafka ↔ NATS：不是 migration、是 messaging paradigm 重設計" data-link-desc="Kafka 跟 NATS 不是同類產品（log-based event streaming vs subject-based messaging）、&#39;migration&#39; 字面上不成立；本文釐清兩家 paradigm 邊界、什麼情境真的能換、application 模式重設計的 5 個踩雷（consumer offset 觀念差 / retention model / exactly-once 假設 / schema registry 缺位 / fan-out 模式差）、跟 JetStream 對位 &#43; 混合架構">Kafka ↔ NATS</a></li>
<li>平行整合：<a href="/blog/backend/07-security-data-protection/vendors/hashicorp-vault/" data-link-title="HashiCorp Vault" data-link-desc="Self-hosted secret management 與 dynamic credential / encryption-as-a-service / PKI engine、跨雲跨環境的 secret 控制面">HashiCorp Vault</a></li>
<li>Methodology：<a href="/blog/posts/migration-playbook-%E6%96%B9%E6%B3%95%E8%AB%96%E7%9A%84%E6%BC%94%E5%8C%96%E7%B4%80%E9%8C%84stage-0-variant-%E8%A6%8F%E5%8A%83%E6%8A%8A-collapse-%E7%8E%87%E5%BE%9E-60-%E9%99%8D%E5%88%B0-0/" data-link-title="Migration Playbook 方法論的演化紀錄：Stage 0 variant 規劃把 collapse 率從 60% 降到 0%" data-link-desc="跨 vendor migration playbook 需要獨立寫作方法論的依據，以及這套方法論從三輪 batch dogfood 中演化出來的驗證證據。">Migration playbook methodology</a></li>
</ul>
]]></content:encoded></item></channel></rss>