<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/"><channel><title>Docker-Swarm on Tarragon</title><link>https://tarrragon.github.io/blog/tags/docker-swarm/</link><description>Recent content in Docker-Swarm on Tarragon</description><generator>Hugo -- gohugo.io</generator><language>zh-TW</language><copyright>Tarragon (CC BY 4.0)</copyright><lastBuildDate>Tue, 19 May 2026 00:00:00 +0000</lastBuildDate><atom:link href="https://tarrragon.github.io/blog/tags/docker-swarm/index.xml" rel="self" type="application/rss+xml"/><item><title>Docker Swarm → Kubernetes：5 個 Swarm production cluster 撞牆數據</title><link>https://tarrragon.github.io/blog/backend/05-deployment-platform/vendors/kubernetes/migrate-from-docker-swarm/</link><pubDate>Tue, 19 May 2026 00:00:00 +0000</pubDate><guid>https://tarrragon.github.io/blog/backend/05-deployment-platform/vendors/kubernetes/migrate-from-docker-swarm/</guid><description>&lt;blockquote>
&lt;p>本文是跨 vendor migration playbook、cross-link Docker Swarm 跟 &lt;a href="https://tarrragon.github.io/blog/backend/05-deployment-platform/vendors/kubernetes/" data-link-title="Kubernetes" data-link-desc="Container orchestration 主流、GKE / EKS / AKS / 自管">Kubernetes&lt;/a>。跑 &lt;a href="https://tarrragon.github.io/blog/posts/migration-playbook-%E6%96%B9%E6%B3%95%E8%AB%96%E7%9A%84%E6%BC%94%E5%8C%96%E7%B4%80%E9%8C%84stage-0-variant-%E8%A6%8F%E5%8A%83%E6%8A%8A-collapse-%E7%8E%87%E5%BE%9E-60-%E9%99%8D%E5%88%B0-0/" data-link-title="Migration Playbook 方法論的演化紀錄：Stage 0 variant 規劃把 collapse 率從 60% 降到 0%" data-link-desc="跨 vendor migration playbook 需要獨立寫作方法論的依據，以及這套方法論從三輪 batch dogfood 中演化出來的驗證證據。">migration-playbook-methodology 6 維 audit&lt;/a> 後對映 &lt;em>Paradigm = High（Swarm 簡單 container orchestration → K8s declarative resource model）→ Type E paradigm shift&lt;/em>。&lt;/p>&lt;/blockquote>
&lt;h2 id="5-個-swarm-production-cluster-撞牆數據">5 個 Swarm production cluster 撞牆數據&lt;/h2>
&lt;p>從 2020-2024 觀察 5 個中型 organization 的 Swarm production cluster lifecycle、典型撞牆點：&lt;/p>
&lt;table>
 &lt;thead>
 &lt;tr>
 &lt;th>Cluster&lt;/th>
 &lt;th>規模 (peak)&lt;/th>
 &lt;th>撞牆點&lt;/th>
 &lt;th>觸發遷移時間&lt;/th>
 &lt;/tr>
 &lt;/thead>
 &lt;tbody>
 &lt;tr>
 &lt;td>A (SaaS startup)&lt;/td>
 &lt;td>80 service / 12 node&lt;/td>
 &lt;td>service discovery latency 升、無 sidecar mesh&lt;/td>
 &lt;td>2022&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>B (E-commerce)&lt;/td>
 &lt;td>150 service / 25 node&lt;/td>
 &lt;td>rolling update + canary 邏輯自寫複雜&lt;/td>
 &lt;td>2023&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>C (Fintech)&lt;/td>
 &lt;td>60 service / 15 node&lt;/td>
 &lt;td>secret rotation + RBAC 自管、合規難&lt;/td>
 &lt;td>2023&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>D (Media)&lt;/td>
 &lt;td>200 service / 40 node&lt;/td>
 &lt;td>autoscaling 自寫、預測流量失敗&lt;/td>
 &lt;td>2024&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>E (Logistics)&lt;/td>
 &lt;td>100 service / 20 node&lt;/td>
 &lt;td>multi-region 不支援&lt;/td>
 &lt;td>2024&lt;/td>
 &lt;/tr>
 &lt;/tbody>
&lt;/table>
&lt;p>5 個共同 pattern：&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Swarm 簡單但 ceiling 100-200 service / 20-40 node&lt;/strong>&lt;/li>
&lt;li>&lt;strong>跨 service 治理（mesh / RBAC / secret / autoscale）需要 &lt;em>外掛&lt;/em> 工具、複雜度反超 K8s&lt;/strong>&lt;/li>
&lt;li>&lt;strong>無 multi-region native&lt;/strong>、災備受限&lt;/li>
&lt;li>&lt;strong>生態縮、社群活躍度低、新 feature 緩&lt;/strong>&lt;/li>
&lt;/ul>
&lt;p>撞牆點不是「Swarm 跑不動」、是「Swarm 不會幫你解 &lt;em>跨 service 治理&lt;/em> 問題、要自寫」。Kubernetes 不是 simpler、是 &lt;em>把治理問題納入框架&lt;/em>。&lt;/p>
&lt;h2 id="為什麼遷ceiling--ecosystem--multi-region-三條-driver">為什麼遷：ceiling / ecosystem / multi-region 三條 driver&lt;/h2>
&lt;table>
 &lt;thead>
 &lt;tr>
 &lt;th>Driver&lt;/th>
 &lt;th>觸發&lt;/th>
 &lt;/tr>
 &lt;/thead>
 &lt;tbody>
 &lt;tr>
 &lt;td>Ceiling&lt;/td>
 &lt;td>Swarm 跑 100-200 service 後 service discovery latency / scheduling 跟不上&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>Ecosystem&lt;/td>
 &lt;td>K8s ecosystem (Helm / Operator / mesh / GitOps) 成熟、Swarm 對等工具缺&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>Multi-region&lt;/td>
 &lt;td>Swarm 不支援、K8s 多 cluster federation 成熟&lt;/td>
 &lt;/tr>
 &lt;/tbody>
&lt;/table>
&lt;p>反向 driver（K8s → Swarm）：&lt;/p></description><content:encoded><![CDATA[<blockquote>
<p>本文是跨 vendor migration playbook、cross-link Docker Swarm 跟 <a href="/blog/backend/05-deployment-platform/vendors/kubernetes/" data-link-title="Kubernetes" data-link-desc="Container orchestration 主流、GKE / EKS / AKS / 自管">Kubernetes</a>。跑 <a href="/blog/posts/migration-playbook-%E6%96%B9%E6%B3%95%E8%AB%96%E7%9A%84%E6%BC%94%E5%8C%96%E7%B4%80%E9%8C%84stage-0-variant-%E8%A6%8F%E5%8A%83%E6%8A%8A-collapse-%E7%8E%87%E5%BE%9E-60-%E9%99%8D%E5%88%B0-0/" data-link-title="Migration Playbook 方法論的演化紀錄：Stage 0 variant 規劃把 collapse 率從 60% 降到 0%" data-link-desc="跨 vendor migration playbook 需要獨立寫作方法論的依據，以及這套方法論從三輪 batch dogfood 中演化出來的驗證證據。">migration-playbook-methodology 6 維 audit</a> 後對映 <em>Paradigm = High（Swarm 簡單 container orchestration → K8s declarative resource model）→ Type E paradigm shift</em>。</p></blockquote>
<h2 id="5-個-swarm-production-cluster-撞牆數據">5 個 Swarm production cluster 撞牆數據</h2>
<p>從 2020-2024 觀察 5 個中型 organization 的 Swarm production cluster lifecycle、典型撞牆點：</p>
<table>
  <thead>
      <tr>
          <th>Cluster</th>
          <th>規模 (peak)</th>
          <th>撞牆點</th>
          <th>觸發遷移時間</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>A (SaaS startup)</td>
          <td>80 service / 12 node</td>
          <td>service discovery latency 升、無 sidecar mesh</td>
          <td>2022</td>
      </tr>
      <tr>
          <td>B (E-commerce)</td>
          <td>150 service / 25 node</td>
          <td>rolling update + canary 邏輯自寫複雜</td>
          <td>2023</td>
      </tr>
      <tr>
          <td>C (Fintech)</td>
          <td>60 service / 15 node</td>
          <td>secret rotation + RBAC 自管、合規難</td>
          <td>2023</td>
      </tr>
      <tr>
          <td>D (Media)</td>
          <td>200 service / 40 node</td>
          <td>autoscaling 自寫、預測流量失敗</td>
          <td>2024</td>
      </tr>
      <tr>
          <td>E (Logistics)</td>
          <td>100 service / 20 node</td>
          <td>multi-region 不支援</td>
          <td>2024</td>
      </tr>
  </tbody>
</table>
<p>5 個共同 pattern：</p>
<ul>
<li><strong>Swarm 簡單但 ceiling 100-200 service / 20-40 node</strong></li>
<li><strong>跨 service 治理（mesh / RBAC / secret / autoscale）需要 <em>外掛</em> 工具、複雜度反超 K8s</strong></li>
<li><strong>無 multi-region native</strong>、災備受限</li>
<li><strong>生態縮、社群活躍度低、新 feature 緩</strong></li>
</ul>
<p>撞牆點不是「Swarm 跑不動」、是「Swarm 不會幫你解 <em>跨 service 治理</em> 問題、要自寫」。Kubernetes 不是 simpler、是 <em>把治理問題納入框架</em>。</p>
<h2 id="為什麼遷ceiling--ecosystem--multi-region-三條-driver">為什麼遷：ceiling / ecosystem / multi-region 三條 driver</h2>
<table>
  <thead>
      <tr>
          <th>Driver</th>
          <th>觸發</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>Ceiling</td>
          <td>Swarm 跑 100-200 service 後 service discovery latency / scheduling 跟不上</td>
      </tr>
      <tr>
          <td>Ecosystem</td>
          <td>K8s ecosystem (Helm / Operator / mesh / GitOps) 成熟、Swarm 對等工具缺</td>
      </tr>
      <tr>
          <td>Multi-region</td>
          <td>Swarm 不支援、K8s 多 cluster federation 成熟</td>
      </tr>
  </tbody>
</table>
<p>反向 driver（K8s → Swarm）：</p>
<ul>
<li>純 internal tool / 小規模（&lt; 30 service）、K8s 過度複雜</li>
<li>Edge / IoT scenario、Swarm footprint 小</li>
</ul>
<h2 id="6-維-audit">6 維 audit</h2>
<table>
  <thead>
      <tr>
          <th>維度</th>
          <th>等級</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>Schema / API</td>
          <td><strong>High</strong>（docker-compose stack.yml → K8s YAML、syntax 完全不同）</td>
      </tr>
      <tr>
          <td>Operational</td>
          <td>Medium（Swarm 自管 → K8s self-host or managed）</td>
      </tr>
      <tr>
          <td>Paradigm</td>
          <td><strong>High</strong>（簡單 container orchestration → declarative resource model）</td>
      </tr>
      <tr>
          <td>Components</td>
          <td>Low（同 1 個 orchestration 系統）</td>
      </tr>
      <tr>
          <td>Application change</td>
          <td>Low（container image 不變）</td>
      </tr>
      <tr>
          <td>Data topology</td>
          <td>Low</td>
      </tr>
  </tbody>
</table>
<p>Schema + Paradigm 雙 High → <strong>Type E paradigm shift</strong> 為主、Schema 高維獨立段。</p>
<h2 id="paradigm-對位">Paradigm 對位</h2>
<table>
  <thead>
      <tr>
          <th>概念</th>
          <th>Swarm</th>
          <th>K8s</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>Workload unit</td>
          <td>Service</td>
          <td>Deployment + Pod + Service</td>
      </tr>
      <tr>
          <td>Stack 定義</td>
          <td>stack.yml (docker-compose 格式)</td>
          <td>YAML manifest (multiple resources)</td>
      </tr>
      <tr>
          <td>Networking</td>
          <td>Overlay network (built-in)</td>
          <td>CNI plugin (Calico / Cilium / etc)</td>
      </tr>
      <tr>
          <td>Service discovery</td>
          <td>DNS-based built-in</td>
          <td>DNS-based (CoreDNS) + Service object</td>
      </tr>
      <tr>
          <td>Load balancing</td>
          <td>Built-in routing mesh</td>
          <td>Service + Ingress + LoadBalancer</td>
      </tr>
      <tr>
          <td>Secret management</td>
          <td>Docker secrets</td>
          <td>K8s Secret + 外部 Vault / Secrets Manager</td>
      </tr>
      <tr>
          <td>Rolling update</td>
          <td><code>docker service update --image ...</code></td>
          <td>Deployment + rolling update + readiness probe</td>
      </tr>
      <tr>
          <td>Autoscaling</td>
          <td>手動 scale</td>
          <td>HPA (Horizontal Pod Autoscaler)</td>
      </tr>
      <tr>
          <td>RBAC</td>
          <td>Limited (Swarm enterprise)</td>
          <td>First-class (Role / RoleBinding / ServiceAccount)</td>
      </tr>
      <tr>
          <td>Persistent storage</td>
          <td>Volume + driver plugin</td>
          <td>PV / PVC + CSI driver</td>
      </tr>
      <tr>
          <td>Service mesh</td>
          <td>無 (要外掛 Traefik)</td>
          <td>Istio / Linkerd / Cilium</td>
      </tr>
      <tr>
          <td>GitOps</td>
          <td>無 native</td>
          <td>Argo CD / Flux (first-class)</td>
      </tr>
  </tbody>
</table>
<h2 id="schema-gapdocker-compose-vs-k8s-yaml">Schema gap：docker-compose vs K8s YAML</h2>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="c"># Docker Swarm stack.yml</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 2</span><span class="cl"><span class="w"></span><span class="nt">version</span><span class="p">:</span><span class="w"> </span><span class="s1">&#39;3.8&#39;</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 3</span><span class="cl"><span class="w"></span><span class="nt">services</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 4</span><span class="cl"><span class="w">  </span><span class="nt">webapp</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 5</span><span class="cl"><span class="w">    </span><span class="nt">image</span><span class="p">:</span><span class="w"> </span><span class="l">myapp:1.0</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 6</span><span class="cl"><span class="w">    </span><span class="nt">deploy</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 7</span><span class="cl"><span class="w">      </span><span class="nt">replicas</span><span class="p">:</span><span class="w"> </span><span class="m">3</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 8</span><span class="cl"><span class="w">      </span><span class="nt">update_config</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 9</span><span class="cl"><span class="w">        </span><span class="nt">parallelism</span><span class="p">:</span><span class="w"> </span><span class="m">1</span><span class="w">
</span></span></span><span class="line"><span class="ln">10</span><span class="cl"><span class="w">      </span><span class="nt">restart_policy</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln">11</span><span class="cl"><span class="w">        </span><span class="nt">condition</span><span class="p">:</span><span class="w"> </span><span class="kc">on</span>-<span class="l">failure</span><span class="w">
</span></span></span><span class="line"><span class="ln">12</span><span class="cl"><span class="w">    </span><span class="nt">networks</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln">13</span><span class="cl"><span class="w">      </span>- <span class="l">frontend</span><span class="w">
</span></span></span><span class="line"><span class="ln">14</span><span class="cl"><span class="w">    </span><span class="nt">ports</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln">15</span><span class="cl"><span class="w">      </span>- <span class="s2">&#34;8080:8080&#34;</span></span></span></code></pre></div>




<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="c"># K8s equivalent (Deployment + Service + Ingress)</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 2</span><span class="cl"><span class="w"></span><span class="nt">apiVersion</span><span class="p">:</span><span class="w"> </span><span class="l">apps/v1</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 3</span><span class="cl"><span class="w"></span><span class="nt">kind</span><span class="p">:</span><span class="w"> </span><span class="l">Deployment</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 4</span><span class="cl"><span class="w"></span><span class="nt">metadata</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 5</span><span class="cl"><span class="w">  </span><span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l">webapp</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 6</span><span class="cl"><span class="w"></span><span class="nt">spec</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 7</span><span class="cl"><span class="w">  </span><span class="nt">replicas</span><span class="p">:</span><span class="w"> </span><span class="m">3</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 8</span><span class="cl"><span class="w">  </span><span class="nt">strategy</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 9</span><span class="cl"><span class="w">    </span><span class="nt">type</span><span class="p">:</span><span class="w"> </span><span class="l">RollingUpdate</span><span class="w">
</span></span></span><span class="line"><span class="ln">10</span><span class="cl"><span class="w">    </span><span class="nt">rollingUpdate</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln">11</span><span class="cl"><span class="w">      </span><span class="nt">maxSurge</span><span class="p">:</span><span class="w"> </span><span class="m">1</span><span class="w">
</span></span></span><span class="line"><span class="ln">12</span><span class="cl"><span class="w">      </span><span class="nt">maxUnavailable</span><span class="p">:</span><span class="w"> </span><span class="m">0</span><span class="w">
</span></span></span><span class="line"><span class="ln">13</span><span class="cl"><span class="w">  </span><span class="nt">selector</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln">14</span><span class="cl"><span class="w">    </span><span class="nt">matchLabels</span><span class="p">:</span><span class="w"> </span>{<span class="w"> </span><span class="nt">app</span><span class="p">:</span><span class="w"> </span><span class="l">webapp }</span><span class="w">
</span></span></span><span class="line"><span class="ln">15</span><span class="cl"><span class="w">  </span><span class="nt">template</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln">16</span><span class="cl"><span class="w">    </span><span class="nt">metadata</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln">17</span><span class="cl"><span class="w">      </span><span class="nt">labels</span><span class="p">:</span><span class="w"> </span>{<span class="w"> </span><span class="nt">app</span><span class="p">:</span><span class="w"> </span><span class="l">webapp }</span><span class="w">
</span></span></span><span class="line"><span class="ln">18</span><span class="cl"><span class="w">    </span><span class="nt">spec</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln">19</span><span class="cl"><span class="w">      </span><span class="nt">containers</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln">20</span><span class="cl"><span class="w">        </span>- <span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l">webapp</span><span class="w">
</span></span></span><span class="line"><span class="ln">21</span><span class="cl"><span class="w">          </span><span class="nt">image</span><span class="p">:</span><span class="w"> </span><span class="l">myapp:1.0</span><span class="w">
</span></span></span><span class="line"><span class="ln">22</span><span class="cl"><span class="w">          </span><span class="nt">ports</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln">23</span><span class="cl"><span class="w">            </span>- <span class="nt">containerPort</span><span class="p">:</span><span class="w"> </span><span class="m">8080</span><span class="w">
</span></span></span><span class="line"><span class="ln">24</span><span class="cl"><span class="w">          </span><span class="nt">readinessProbe</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln">25</span><span class="cl"><span class="w">            </span><span class="nt">httpGet</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln">26</span><span class="cl"><span class="w">              </span><span class="nt">path</span><span class="p">:</span><span class="w"> </span><span class="l">/healthz</span><span class="w">
</span></span></span><span class="line"><span class="ln">27</span><span class="cl"><span class="w">              </span><span class="nt">port</span><span class="p">:</span><span class="w"> </span><span class="m">8080</span><span class="w">
</span></span></span><span class="line"><span class="ln">28</span><span class="cl"><span class="w">          </span><span class="nt">resources</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln">29</span><span class="cl"><span class="w">            </span><span class="nt">requests</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln">30</span><span class="cl"><span class="w">              </span><span class="nt">cpu</span><span class="p">:</span><span class="w"> </span><span class="l">100m</span><span class="w">
</span></span></span><span class="line"><span class="ln">31</span><span class="cl"><span class="w">              </span><span class="nt">memory</span><span class="p">:</span><span class="w"> </span><span class="l">128Mi</span><span class="w">
</span></span></span><span class="line"><span class="ln">32</span><span class="cl"><span class="w">            </span><span class="nt">limits</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln">33</span><span class="cl"><span class="w">              </span><span class="nt">cpu</span><span class="p">:</span><span class="w"> </span><span class="l">500m</span><span class="w">
</span></span></span><span class="line"><span class="ln">34</span><span class="cl"><span class="w">              </span><span class="nt">memory</span><span class="p">:</span><span class="w"> </span><span class="l">512Mi</span><span class="w">
</span></span></span><span class="line"><span class="ln">35</span><span class="cl"><span class="w"></span><span class="nn">---</span><span class="w">
</span></span></span><span class="line"><span class="ln">36</span><span class="cl"><span class="w"></span><span class="nt">apiVersion</span><span class="p">:</span><span class="w"> </span><span class="l">v1</span><span class="w">
</span></span></span><span class="line"><span class="ln">37</span><span class="cl"><span class="w"></span><span class="nt">kind</span><span class="p">:</span><span class="w"> </span><span class="l">Service</span><span class="w">
</span></span></span><span class="line"><span class="ln">38</span><span class="cl"><span class="w"></span><span class="nt">metadata</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln">39</span><span class="cl"><span class="w">  </span><span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l">webapp</span><span class="w">
</span></span></span><span class="line"><span class="ln">40</span><span class="cl"><span class="w"></span><span class="nt">spec</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln">41</span><span class="cl"><span class="w">  </span><span class="nt">selector</span><span class="p">:</span><span class="w"> </span>{<span class="w"> </span><span class="nt">app</span><span class="p">:</span><span class="w"> </span><span class="l">webapp }</span><span class="w">
</span></span></span><span class="line"><span class="ln">42</span><span class="cl"><span class="w">  </span><span class="nt">ports</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln">43</span><span class="cl"><span class="w">    </span>- <span class="nt">port</span><span class="p">:</span><span class="w"> </span><span class="m">8080</span><span class="w">
</span></span></span><span class="line"><span class="ln">44</span><span class="cl"><span class="w">      </span><span class="nt">targetPort</span><span class="p">:</span><span class="w"> </span><span class="m">8080</span></span></span></code></pre></div><p>1 Swarm service → 2-3 K8s resource（Deployment + Service + 可能 Ingress / HPA）；application 不改但 <em>deployment 端工作量 5-10x</em>。</p>
<h2 id="migration-流程">Migration 流程</h2>
<h3 id="partial-migration--混合架構">Partial migration + 混合架構</h3>
<p>跟 <a href="/blog/backend/03-message-queue/vendors/kafka/migrate-from-to-nats/" data-link-title="Kafka ↔ NATS：不是 migration、是 messaging paradigm 重設計" data-link-desc="Kafka 跟 NATS 不是同類產品（log-based event streaming vs subject-based messaging）、&#39;migration&#39; 字面上不成立；本文釐清兩家 paradigm 邊界、什麼情境真的能換、application 模式重設計的 5 個踩雷（consumer offset 觀念差 / retention model / exactly-once 假設 / schema registry 缺位 / fan-out 模式差）、跟 JetStream 對位 &#43; 混合架構">Kafka ↔ NATS</a> / <a href="/blog/backend/05-deployment-platform/vendors/consul/migrate-from-etcd/" data-link-title="etcd → Consul：KV &#43; N 個 extras feature matrix" data-link-desc="etcd → Consul 是 Type E paradigm shift expansion — 從 pure KV store 升到 service mesh / discovery / health check / multi-DC；本文用對照表 &#43; paradigm expansion 路線、5 個 production 踩雷（API 對位 / lock semantics / watch event model / multi-DC topology / ACL system）">etcd → Consul</a> 同 Type E pattern：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="ln"> 1</span><span class="cl">1. Audit application：列所有 Swarm stack + service
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">2. 分類處理 plan:
</span></span><span class="line"><span class="ln"> 3</span><span class="cl">   - 簡單 stateless: 先切 K8s (低風險)
</span></span><span class="line"><span class="ln"> 4</span><span class="cl">   - Stateful (DB / queue): 評估 K8s operator 或保留 Swarm
</span></span><span class="line"><span class="ln"> 5</span><span class="cl">   - Critical service: 雙跑期確認 K8s 行為對等
</span></span><span class="line"><span class="ln"> 6</span><span class="cl">3. K8s cluster 建置:
</span></span><span class="line"><span class="ln"> 7</span><span class="cl">   - Managed (EKS / GKE / AKS) vs self-host (kubeadm)
</span></span><span class="line"><span class="ln"> 8</span><span class="cl">   - 配 ingress controller / cert-manager / monitoring
</span></span><span class="line"><span class="ln"> 9</span><span class="cl">4. Application 遷移 (per stack)
</span></span><span class="line"><span class="ln">10</span><span class="cl">   - 寫 K8s YAML / Helm chart
</span></span><span class="line"><span class="ln">11</span><span class="cl">   - 配 readiness/liveness probe / resource request
</span></span><span class="line"><span class="ln">12</span><span class="cl">   - Networking + secret 對位
</span></span><span class="line"><span class="ln">13</span><span class="cl">5. Cutover + Swarm decommission
</span></span><span class="line"><span class="ln">14</span><span class="cl">   - 部分 stack 切完、評估 Swarm 是否保留 (legacy / edge)
</span></span><span class="line"><span class="ln">15</span><span class="cl">   - 多數 organization 完全 decommission Swarm</span></span></code></pre></div><p>整體 3-6 個月、依 stack 數量跟 application 複雜度。</p>
<h2 id="production-故障演練">Production 故障演練</h2>
<h3 id="case-1networking-model-差cross-service-connectivity-失效">Case 1：Networking model 差、cross-service connectivity 失效</h3>
<p><strong>徵兆</strong>：cutover 後 service A 連 service B 失敗、Swarm 端 <code>tasks.service_b</code> DNS 對位 K8s 端 <code>service-b.namespace.svc.cluster.local</code> 不通。</p>
<p><strong>根因</strong>：Swarm overlay network 內 service-to-service 用 short name (<code>service_b</code>)、K8s 用 FQDN；application 端 service URL 寫死。</p>
<p><strong>修法</strong>：</p>
<ol>
<li>Application 端用 short name + cluster DNS search domain</li>
<li>K8s 端設 <code>dnsPolicy: ClusterFirst</code> 預設、確認 <code>kubectl get svc -A</code> 對應</li>
<li>NetworkPolicy 預設 deny-all、明示 allow rule</li>
</ol>
<h3 id="case-2secret-rotation-從-swarm-secrets-換-vault--secrets-manager">Case 2：Secret rotation 從 Swarm secrets 換 Vault / Secrets Manager</h3>
<p><strong>徵兆</strong>：原本 Swarm 用 <code>docker secret</code> 旋轉 secret、切 K8s 後 K8s Secret 是 <em>static value</em>、rotation 不自動。</p>
<p><strong>根因</strong>：K8s Secret 是 K8s-native 但 <em>not auto-rotated</em>、需要外部 Vault / Secrets Manager + agent (vault-agent-injector / external-secrets-operator)。</p>
<p><strong>修法</strong>：</p>
<ol>
<li>K8s 端 deploy external-secrets-operator + AWS Secrets Manager / Vault integration</li>
<li>Application 端 mount file or env variable、不在 code 寫死</li>
<li>Rotation 走 vendor-side、K8s 端 sidecar 自動 reload</li>
</ol>
<h3 id="case-3readiness-probe-沒設rolling-update-期間-traffic-loss">Case 3：Readiness probe 沒設、rolling update 期間 traffic loss</h3>
<p><strong>徵兆</strong>：cutover 後 deploy 期間 application 5-10% request 失敗；發現 pod startup 完成前就接 traffic。</p>
<p><strong>根因</strong>：Swarm 簡單 restart_policy 沒對等 probe 概念；K8s 預設 deploy 後 immediate ready、若沒 readiness probe、startup 時間長的 application 會在未 ready 時接流量。</p>
<p><strong>修法</strong>：</p>
<ol>
<li><strong>必加 readiness probe</strong>：HTTP / TCP / exec check</li>
<li><strong>配 initial delay</strong>：JVM application 預留 30-60s</li>
<li><strong>配 <code>minReadySeconds</code></strong>：deployment 端設 30s 確保 stable</li>
</ol>
<h3 id="case-4hpa-預設不啟autoscaling-失效">Case 4：HPA 預設不啟、autoscaling 失效</h3>
<p><strong>徵兆</strong>：Swarm 端寫了 cron-based autoscale script、切 K8s 後 script 失效、流量高峰沒 scale up。</p>
<p><strong>根因</strong>：K8s HPA 不是預設啟動、需要 <em>明示配置</em> + metrics-server install。</p>
<p><strong>修法</strong>：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="nt">apiVersion</span><span class="p">:</span><span class="w"> </span><span class="l">autoscaling/v2</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 2</span><span class="cl"><span class="w"></span><span class="nt">kind</span><span class="p">:</span><span class="w"> </span><span class="l">HorizontalPodAutoscaler</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 3</span><span class="cl"><span class="w"></span><span class="nt">metadata</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 4</span><span class="cl"><span class="w">  </span><span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l">webapp-hpa</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 5</span><span class="cl"><span class="w"></span><span class="nt">spec</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 6</span><span class="cl"><span class="w">  </span><span class="nt">scaleTargetRef</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 7</span><span class="cl"><span class="w">    </span><span class="nt">apiVersion</span><span class="p">:</span><span class="w"> </span><span class="l">apps/v1</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 8</span><span class="cl"><span class="w">    </span><span class="nt">kind</span><span class="p">:</span><span class="w"> </span><span class="l">Deployment</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 9</span><span class="cl"><span class="w">    </span><span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l">webapp</span><span class="w">
</span></span></span><span class="line"><span class="ln">10</span><span class="cl"><span class="w">  </span><span class="nt">minReplicas</span><span class="p">:</span><span class="w"> </span><span class="m">3</span><span class="w">
</span></span></span><span class="line"><span class="ln">11</span><span class="cl"><span class="w">  </span><span class="nt">maxReplicas</span><span class="p">:</span><span class="w"> </span><span class="m">20</span><span class="w">
</span></span></span><span class="line"><span class="ln">12</span><span class="cl"><span class="w">  </span><span class="nt">metrics</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln">13</span><span class="cl"><span class="w">    </span>- <span class="nt">type</span><span class="p">:</span><span class="w"> </span><span class="l">Resource</span><span class="w">
</span></span></span><span class="line"><span class="ln">14</span><span class="cl"><span class="w">      </span><span class="nt">resource</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln">15</span><span class="cl"><span class="w">        </span><span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l">cpu</span><span class="w">
</span></span></span><span class="line"><span class="ln">16</span><span class="cl"><span class="w">        </span><span class="nt">target</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln">17</span><span class="cl"><span class="w">          </span><span class="nt">type</span><span class="p">:</span><span class="w"> </span><span class="l">Utilization</span><span class="w">
</span></span></span><span class="line"><span class="ln">18</span><span class="cl"><span class="w">          </span><span class="nt">averageUtilization</span><span class="p">:</span><span class="w"> </span><span class="m">70</span></span></span></code></pre></div><p>裝 metrics-server / Keda（event-driven autoscaling）+ 配 HPA per Deployment。</p>
<h3 id="case-5yaml-維護地獄helm--kustomize-配置遲">Case 5：YAML 維護地獄、Helm / Kustomize 配置遲</h3>
<p><strong>徵兆</strong>：cutover 後 K8s YAML 從 5 個檔（Swarm stack）變 50+ 個 K8s manifest；每個 application 端要改一個 config 都要動 N 個 file。</p>
<p><strong>根因</strong>：K8s YAML 是 <em>very verbose</em>、不像 docker-compose 簡潔；缺 templating 跟 environment 抽象。</p>
<p><strong>修法</strong>：</p>
<ol>
<li><strong>Helm chart</strong>：對 application 包成 chart、用 <code>values.yaml</code> 抽象環境差異</li>
<li><strong>Kustomize</strong>：base + overlay pattern、不靠 templating</li>
<li><strong>GitOps with Argo CD / Flux</strong>：宣告式部署、降 manual kubectl 操作</li>
</ol>
<h2 id="capacity--cost">Capacity / cost</h2>
<table>
  <thead>
      <tr>
          <th>維度</th>
          <th>Docker Swarm</th>
          <th>Kubernetes (managed)</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>Cluster cost (mid-tier)</td>
          <td>$300-800 / mo</td>
          <td>$500-1500 / mo（EKS/GKE/AKS control plane + nodes）</td>
      </tr>
      <tr>
          <td>Operational FTE</td>
          <td>0.3-0.8</td>
          <td>0.5-1.5（除非 managed、降到 0.3-0.7）</td>
      </tr>
      <tr>
          <td>Ecosystem maturity</td>
          <td>低、衰退</td>
          <td>高、active growth</td>
      </tr>
      <tr>
          <td>Multi-region</td>
          <td>不支援</td>
          <td>多 cluster federation 成熟</td>
      </tr>
      <tr>
          <td>Migration cost</td>
          <td>-</td>
          <td>2-4 FTE × 3-6 個月</td>
      </tr>
      <tr>
          <td>Long-term ROI</td>
          <td>Negative（社群縮）</td>
          <td>Positive（feature growth）</td>
      </tr>
  </tbody>
</table>
<p><strong>判讀</strong>：&lt; 30 service 小 organization 可不切；50+ service 開始撞 Swarm ceiling、值得評估；100+ service / multi-region 必切。</p>
<h2 id="整合--下一步">整合 / 下一步</h2>
<h3 id="跟-service-mesh-整合">跟 Service mesh 整合</h3>
<p>Cutover 後 <em>順便</em> 評估 Istio / Linkerd / Cilium service mesh、cover mTLS / observability / traffic policy；不要在 Swarm migration 後立刻上 mesh、分階段。</p>
<h3 id="跟-gitops-整合">跟 GitOps 整合</h3>
<p>K8s + Argo CD / Flux 是 <em>natural pair</em>；migration 時直接走 GitOps、避免 manual kubectl 操作累積。</p>
<h3 id="跟-vault--aws-secrets-manager-對齊">跟 <a href="/blog/backend/07-security-data-protection/vendors/hashicorp-vault/migrate-to-aws-secrets-manager/" data-link-title="Vault → AWS Secrets Manager：「secret」不是「secret」、identity model 才是核心差異" data-link-desc="Vault → AWS Secrets Manager migration 表面是 secret store 替換、實際核心是 identity model 對位（Vault token &#43; policy vs AWS IAM &#43; resource policy）；驗證 [#128](/report/data-topology-as-audit-dimension/) self-aware limitation 提出的 identity axis 候選 — identity 是否獨立 audit 軸；5 個 production 踩雷（IAM principal 對位 / dynamic credential 對等失敗 / lease lifecycle 模型不同 / audit log 結構差 / 計費模型反轉）">Vault → AWS Secrets Manager</a> 對齊</h3>
<p>Swarm secrets → K8s Secret → external secrets management 是 <em>3-step 演進</em>、不是 1-step；migration 期間先用 K8s Secret、之後切 Vault / Secrets Manager。</p>
<h2 id="相關連結">相關連結</h2>
<ul>
<li>Target vendor：<a href="/blog/backend/05-deployment-platform/vendors/kubernetes/" data-link-title="Kubernetes" data-link-desc="Container orchestration 主流、GKE / EKS / AKS / 自管">Kubernetes</a></li>
<li>平行 migration playbook (Type E)：<a href="/blog/backend/03-message-queue/vendors/kafka/migrate-from-to-nats/" data-link-title="Kafka ↔ NATS：不是 migration、是 messaging paradigm 重設計" data-link-desc="Kafka 跟 NATS 不是同類產品（log-based event streaming vs subject-based messaging）、&#39;migration&#39; 字面上不成立；本文釐清兩家 paradigm 邊界、什麼情境真的能換、application 模式重設計的 5 個踩雷（consumer offset 觀念差 / retention model / exactly-once 假設 / schema registry 缺位 / fan-out 模式差）、跟 JetStream 對位 &#43; 混合架構">Kafka ↔ NATS</a> / <a href="/blog/backend/02-cache-redis/vendors/redis/migrate-to-memcached/" data-link-title="Redis → Memcached：Memcached 不是 simpler Redis、是 cache paradigm" data-link-desc="Redis → Memcached 是 Type E paradigm reduction migration — 從 multi-paradigm（KV &#43; 資料結構 &#43; pub/sub &#43; Lua &#43; streams）退到 pure cache；不是「remove Redis features」、是「重新分配 Redis-specific feature 到對應 specialized 服務」；5 個 production 踩雷 &#43; paradigm reduction 路線">Redis → Memcached</a> / <a href="/blog/backend/05-deployment-platform/vendors/consul/migrate-from-etcd/" data-link-title="etcd → Consul：KV &#43; N 個 extras feature matrix" data-link-desc="etcd → Consul 是 Type E paradigm shift expansion — 從 pure KV store 升到 service mesh / discovery / health check / multi-DC；本文用對照表 &#43; paradigm expansion 路線、5 個 production 踩雷（API 對位 / lock semantics / watch event model / multi-DC topology / ACL system）">etcd → Consul</a> / <a href="/blog/backend/04-observability/vendors/honeycomb/migrate-from-sentry/" data-link-title="Sentry → Honeycomb：trace 不是 error、是不同 observability paradigm" data-link-desc="Sentry → Honeycomb 是 paradigm shift — Sentry 主軸是 error tracking &#43; transaction trace、Honeycomb 主軸是 high-cardinality wide-event observability；本文釐清 paradigm 邊界、5 個 production 踩雷（event schema 對位 / sampling 行為 / error grouping 失效 / cost 模型差 / alert paradigm shift）">Sentry → Honeycomb</a></li>
<li>Methodology：<a href="/blog/posts/migration-playbook-%E6%96%B9%E6%B3%95%E8%AB%96%E7%9A%84%E6%BC%94%E5%8C%96%E7%B4%80%E9%8C%84stage-0-variant-%E8%A6%8F%E5%8A%83%E6%8A%8A-collapse-%E7%8E%87%E5%BE%9E-60-%E9%99%8D%E5%88%B0-0/" data-link-title="Migration Playbook 方法論的演化紀錄：Stage 0 variant 規劃把 collapse 率從 60% 降到 0%" data-link-desc="跨 vendor migration playbook 需要獨立寫作方法論的依據，以及這套方法論從三輪 batch dogfood 中演化出來的驗證證據。">Migration playbook methodology</a></li>
</ul>
]]></content:encoded></item></channel></rss>