<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/"><channel><title>Service-Mesh on Tarragon</title><link>https://tarrragon.github.io/blog/tags/service-mesh/</link><description>Recent content in Service-Mesh on Tarragon</description><generator>Hugo -- gohugo.io</generator><language>zh-TW</language><copyright>Tarragon (CC BY 4.0)</copyright><lastBuildDate>Mon, 18 May 2026 00:00:00 +0000</lastBuildDate><atom:link href="https://tarrragon.github.io/blog/tags/service-mesh/index.xml" rel="self" type="application/rss+xml"/><item><title>Service Mesh Mirroring</title><link>https://tarrragon.github.io/blog/backend/09-performance-capacity/vendors/service-mesh-mirroring/</link><pubDate>Fri, 15 May 2026 00:00:00 +0000</pubDate><guid>https://tarrragon.github.io/blog/backend/09-performance-capacity/vendors/service-mesh-mirroring/</guid><description>&lt;p>Service mesh mirroring 的核心責任是在 proxy 層複製 production traffic 到 shadow service，讓新版本接受真實請求形狀，同時把使用者回應留在原本路徑。它適合已經落地 Istio、Linkerd 或類似 mesh 的平台，重點在用 routing policy 控制 mirror ratio、target、隔離與觀測。&lt;/p>
&lt;p>跟 &lt;a href="https://tarrragon.github.io/blog/backend/09-performance-capacity/vendors/goreplay/" data-link-title="GoReplay" data-link-desc="用 production HTTP traffic capture 與 replay 驗證真實請求形狀的效能工程工具">GoReplay&lt;/a> 比、Service Mesh Mirroring 在 &lt;em>proxy / sidecar&lt;/em> 層、是 K8s mesh-native 的 L7 HTTP request mirror、不需要 application 或 host 端 capture binary；GoReplay 在 &lt;em>application host&lt;/em> 層、適合無 mesh 的環境或要 capture artifact 離線 replay。跟 &lt;a href="https://tarrragon.github.io/blog/backend/09-performance-capacity/vendors/aws-vpc-traffic-mirroring/" data-link-title="AWS VPC Traffic Mirroring" data-link-desc="用 VPC 網路層封包鏡像觀察 production traffic 的低侵入 production validation 方式">AWS VPC Traffic Mirroring&lt;/a> 比、Service Mesh Mirroring 在 L7（HTTP route / header / subset 可控）、VPC Traffic Mirroring 在 L3-L4 packet 層、見度更底層但缺 application 語意。三者組合常見於 K8s + 多 cloud 混合環境。&lt;/p>
&lt;h2 id="最短判讀路徑">最短判讀路徑&lt;/h2>
&lt;p>判斷 Service Mesh Mirroring 部署是否健康、最少看四件事：&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Mesh implementation 對齊&lt;/strong>：用哪套 mesh（Istio / Linkerd / Envoy gateway / Consul Connect）、control plane 版本、sidecar injection coverage、跨 namespace policy 邊界是否清楚&lt;/li>
&lt;li>&lt;strong>VirtualService mirror config&lt;/strong>：mirror destination 是否限制在同 namespace / 同 cluster、mirror_percent 是否從 1% 漸進、route / header filter 是否排除 write-heavy 或 PII path&lt;/li>
&lt;li>&lt;strong>Target service capacity&lt;/strong>：shadow target deployment 是否有獨立 HPA、跟 primary 同 node pool 還是隔離、DB / cache / external API 是否導 mock 或 sandbox、不會 share connection pool 造成 primary 飽和&lt;/li>
&lt;li>&lt;strong>Response handling&lt;/strong>：mirrored response 是 fire-and-forget（Istio 預設）還是有 logging、shadow 端是否能辨識 mirrored request（&lt;code>X-Envoy-Internal&lt;/code> / custom header）、side effect（payment / notification / webhook）是否走 dry-run&lt;/li>
&lt;/ul>
&lt;p>四件事任一缺失、就是 &lt;a href="https://tarrragon.github.io/blog/backend/09-performance-capacity/production-validation/" data-link-title="9.10 Production-Side 驗證" data-link-desc="shadow traffic、dark launch、canary、production-like load test">9.10 Production-Side 驗證&lt;/a> shadow traffic 治理的待補項目。&lt;/p></description><content:encoded><![CDATA[<p>Service mesh mirroring 的核心責任是在 proxy 層複製 production traffic 到 shadow service，讓新版本接受真實請求形狀，同時把使用者回應留在原本路徑。它適合已經落地 Istio、Linkerd 或類似 mesh 的平台，重點在用 routing policy 控制 mirror ratio、target、隔離與觀測。</p>
<p>跟 <a href="/blog/backend/09-performance-capacity/vendors/goreplay/" data-link-title="GoReplay" data-link-desc="用 production HTTP traffic capture 與 replay 驗證真實請求形狀的效能工程工具">GoReplay</a> 比、Service Mesh Mirroring 在 <em>proxy / sidecar</em> 層、是 K8s mesh-native 的 L7 HTTP request mirror、不需要 application 或 host 端 capture binary；GoReplay 在 <em>application host</em> 層、適合無 mesh 的環境或要 capture artifact 離線 replay。跟 <a href="/blog/backend/09-performance-capacity/vendors/aws-vpc-traffic-mirroring/" data-link-title="AWS VPC Traffic Mirroring" data-link-desc="用 VPC 網路層封包鏡像觀察 production traffic 的低侵入 production validation 方式">AWS VPC Traffic Mirroring</a> 比、Service Mesh Mirroring 在 L7（HTTP route / header / subset 可控）、VPC Traffic Mirroring 在 L3-L4 packet 層、見度更底層但缺 application 語意。三者組合常見於 K8s + 多 cloud 混合環境。</p>
<h2 id="最短判讀路徑">最短判讀路徑</h2>
<p>判斷 Service Mesh Mirroring 部署是否健康、最少看四件事：</p>
<ul>
<li><strong>Mesh implementation 對齊</strong>：用哪套 mesh（Istio / Linkerd / Envoy gateway / Consul Connect）、control plane 版本、sidecar injection coverage、跨 namespace policy 邊界是否清楚</li>
<li><strong>VirtualService mirror config</strong>：mirror destination 是否限制在同 namespace / 同 cluster、mirror_percent 是否從 1% 漸進、route / header filter 是否排除 write-heavy 或 PII path</li>
<li><strong>Target service capacity</strong>：shadow target deployment 是否有獨立 HPA、跟 primary 同 node pool 還是隔離、DB / cache / external API 是否導 mock 或 sandbox、不會 share connection pool 造成 primary 飽和</li>
<li><strong>Response handling</strong>：mirrored response 是 fire-and-forget（Istio 預設）還是有 logging、shadow 端是否能辨識 mirrored request（<code>X-Envoy-Internal</code> / custom header）、side effect（payment / notification / webhook）是否走 dry-run</li>
</ul>
<p>四件事任一缺失、就是 <a href="/blog/backend/09-performance-capacity/production-validation/" data-link-title="9.10 Production-Side 驗證" data-link-desc="shadow traffic、dark launch、canary、production-like load test">9.10 Production-Side 驗證</a> shadow traffic 治理的待補項目。</p>
<h2 id="定位">定位</h2>
<p>Service mesh mirroring 適合平台已經有 proxy control plane 的團隊。當 service-to-service traffic 都經過 sidecar 或 gateway，mirror policy 可以把部分 production request 複製到新版本，不需要在 application code 中加 capture / replay 邏輯。</p>
<p>這個定位讓 service mesh mirroring 接到 <a href="/blog/backend/09-performance-capacity/production-validation/" data-link-title="9.10 Production-Side 驗證" data-link-desc="shadow traffic、dark launch、canary、production-like load test">9.10 Production-Side 驗證</a> 的 shadow traffic 與 canary perf check。它比 host capture 更貼近 service routing，但也依賴 mesh 的觀測、policy、資源隔離與治理能力。</p>
<h2 id="適用場景">適用場景</h2>
<p>新版本 shadow validation 適合 service mesh mirroring。平台可以把 1%、5% 或特定 route 的流量 mirror 到 shadow deployment，觀察新版本 CPU、memory、latency、DB read 與 error。</p>
<p>Service-to-service migration 適合 service mesh mirroring。當下游服務準備換 runtime、framework、DB client 或 cache client，mirror 可以讓新路徑吃到 production upstream pattern。</p>
<p>多 region / 多 version 對照適合 service mesh mirroring。Mesh policy 能按 namespace、host、route、header 或 subset 控制 mirror target，讓平台在小 blast radius 下收集 production-shaped evidence。</p>
<h2 id="選型判準">選型判準</h2>
<table>
  <thead>
      <tr>
          <th>判準</th>
          <th>Service mesh mirroring 的價值</th>
          <th>需要補的能力</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>Proxy 層控制</td>
          <td>mirror policy 不侵入 application code</td>
          <td>mesh control plane 治理與變更審核</td>
      </tr>
      <tr>
          <td>Service routing</td>
          <td>可按 host、route、subset 控制 target</td>
          <td>route 命名、ownership、policy drift</td>
      </tr>
      <tr>
          <td>Mesh observability</td>
          <td>request metric、trace、service graph 可對照</td>
          <td>shadow target 的獨立 dashboard</td>
      </tr>
      <tr>
          <td>漸進比例</td>
          <td>mirror ratio 可逐步放大</td>
          <td>下游容量與 stop condition</td>
      </tr>
  </tbody>
</table>
<p>Proxy 層控制價值來自一致性。當所有 service 都走 mesh，mirror policy 可以用同一套控制面管理，避免每個 application 自行實作 replay。</p>
<p>Mesh observability 價值來自對照能力。Shadow service 的 latency、error、resource saturation 與 dependency call 可以直接跟 primary path 對比，但 dashboard 要清楚標記 mirrored traffic，避免混入正式 SLO。</p>
<h2 id="跟其他方式的取捨">跟其他方式的取捨</h2>
<p>Service mesh mirroring 和 GoReplay 的主要差異是控制面。Service mesh mirroring 依賴既有 proxy / mesh，適合服務間流量；GoReplay 適合 HTTP capture artifact、離線 replay 與沒有 mesh 的環境。</p>
<p>Service mesh mirroring 和 AWS VPC Traffic Mirroring 的主要差異是語意層級。Mesh 在 L7 routing 層，能按 route、host、header 與 subset 控制；VPC mirroring 在網路層，能見度更底層但應用語意控制較少。</p>
<p>Service mesh mirroring 和 canary 的主要差異是使用者影響。Mirrored request 的回應不回給使用者，適合 capacity / correctness observation；canary 會讓真實使用者走新版本，適合最終放量。</p>
<h2 id="操作成本">操作成本</h2>
<p>Service mesh mirroring 的主要成本是下游容量。Shadow traffic 雖然不回應使用者，但仍會消耗 shadow service、DB、cache、third-party mock、queue 與 observability pipeline 的資源。</p>
<p>Policy 成本來自控制面治理。Mirror rule、route、subset、namespace、owner 與 rollout window 都要可審查；錯誤的 mirror policy 可能把過大比例流量導到未準備好的 target。</p>
<p>Side effect 成本來自 application 行為。Shadow service 要能辨識 mirrored request，並把 write、external API call、notification、payment 與 queue publish 導到 sandbox、mock 或 dry-run。</p>
<h2 id="evidence-package">Evidence Package</h2>
<p>Service mesh mirroring 結果應回寫到 evidence package。最小欄位包括 mesh policy version、source service、route、mirror ratio、target subset、time range、shadow target resource、data / side effect isolation、p95 / p99、error rate、dependency saturation、known gap 與 owner。</p>
<table>
  <thead>
      <tr>
          <th>欄位</th>
          <th>Service mesh mirroring 證據來源</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>Source</td>
          <td>mesh policy、route config、deployment version</td>
      </tr>
      <tr>
          <td>Time range</td>
          <td>mirror start / end</td>
      </tr>
      <tr>
          <td>Query link</td>
          <td>service graph、metrics、trace、logs</td>
      </tr>
      <tr>
          <td>Data quality</td>
          <td>mirror ratio、route coverage、header filter</td>
      </tr>
      <tr>
          <td>Confidence</td>
          <td>target parity、dependency isolation</td>
      </tr>
      <tr>
          <td>Known gap</td>
          <td>未 mirror route、side effect mock、mesh overhead</td>
      </tr>
  </tbody>
</table>
<p>Evidence package 的核心用途是讓 mirror 實驗可關閉。Reviewer 要能看到 mirror policy 何時啟動、何時停止、覆蓋哪些 route、消耗哪些下游資源，以及 shadow target 是否接近 production。</p>
<h2 id="進階主題">進階主題</h2>
<p><strong>Istio VirtualService mirror / mirror_percent</strong>：Istio 用 <code>VirtualService</code> 的 <code>mirror</code> 欄位指定 shadow destination、<code>mirrorPercentage</code>（v1.7+；舊版 <code>mirror_percent</code>）控制比例。production 操作慣例是從 1% 起步、每 30-60min 觀察 shadow target latency / error / saturation 再放大、達到 100% 後維持一週收 evidence 才 promote。route-level config 比 mesh-wide policy 安全、blast radius 限定在指定 host / path。</p>
<p><strong>Linkerd traffic split</strong>：Linkerd 用 SMI <code>TrafficSplit</code> CRD 或 native <code>HTTPRoute</code> 分流、走 <em>active-active</em> shadow 模式而非 fire-and-forget。Linkerd mirror 預設較輕量、proxy overhead 比 Istio 低、適合資源敏感的 K8s cluster；但 L7 policy 表達力不如 Istio EnvoyFilter。</p>
<p><strong>Envoy MirrorPolicy</strong>：直接寫 Envoy config（不透過 Istio control plane）時、<code>route.RouteAction.request_mirror_policies</code> 是底層 primitive。多 cluster 邊緣 gateway（Contour / Emissary-Ingress / Gloo）都是這層的 abstraction、適合不想引入 full Istio 但要 mirror 能力的場景。</p>
<p><strong>跟 Argo Rollouts canary 整合 — shadow deployment</strong>：Argo Rollouts 的 <code>analysis</code> step 可以接 mesh mirror — <em>shadow stage</em> 先用 mirror 收 evidence、<em>canary stage</em> 才放真實流量。對應 <a href="/blog/backend/09-performance-capacity/production-validation/" data-link-title="9.10 Production-Side 驗證" data-link-desc="shadow traffic、dark launch、canary、production-like load test">9.10 Production-Side 驗證</a> 的「shadow 先於 canary」原則、避免把使用者當小白鼠。</p>
<p><strong>跟 <a href="/blog/backend/04-observability/vendors/datadog/" data-link-title="Datadog" data-link-desc="All-in-one SaaS 觀測平台、APM / Logs / Metrics / RUM / Security">Datadog</a> APM trace correlation</strong>：mirrored request 應該有獨立的 trace tag（<code>env:shadow</code> 或 <code>traffic.mirror:true</code>）、讓 Datadog APM / <a href="/blog/backend/04-observability/" data-link-title="模組四：可觀測性平台" data-link-desc="整理 log、metric、trace、dashboard 與 alert 的後端操作實務">observability stack</a> 能 filter 出 shadow path 的 p95 / error rate、不混入 primary SLO dashboard。trace propagation header 要保留、否則 distributed trace 斷在 mesh 邊界。</p>
<h2 id="排錯與失敗快速判讀">排錯與失敗快速判讀</h2>
<ul>
<li><strong>Mirror target capacity 不足 / shadow service OOM</strong>：shadow deployment 沒獨立 HPA、跟 primary 共用 node pool — 拆 node pool、shadow 設獨立 resource request、mirror_percent 從 1% 起步</li>
<li><strong>Mirrored response 漏處理（fire-and-forget 副作用）</strong>：Istio 預設丟棄 mirrored response、shadow 端的 error 沒被 collect — shadow service 自己 emit metric / log、不依賴 mirror response、加 <code>X-Shadow-Request</code> header 讓 shadow 端可辨識並走 dry-run 路徑</li>
<li><strong>PII / sensitive data 進 staging</strong>：mirrored request 帶真實 user token / payment info 打到 staging — header / body filter 走 EnvoyFilter 做 PII redaction、或在 mesh 邊界跑 <a href="/blog/backend/07-security-data-protection/" data-link-title="模組七：資安與資料保護" data-link-desc="以問題驅動方式擴充資安知識網：先定義服務環節問題，再以案例作為觸發式參考">data masking proxy</a> 再 mirror</li>
<li><strong>Side effect 真的發生（payment double charge / notification 真寄）</strong>：shadow service 沒辨識 mirrored request 就走正式邏輯 — 強制 shadow 端用 sandbox credential、external API client 走 mock / dry-run mode、write 改 read-only replica</li>
<li><strong>Mesh control plane 飽和 / mirror policy drift</strong>：mirror rule 散落各 namespace 沒 owner、policy version 不一致 — 走 GitOps（Argo CD / Flux）+ policy as code、定期 audit <code>kubectl get virtualservice -A</code></li>
<li><strong>Cross-cluster mirror blast radius 失控</strong>：mirror destination 指向其他 cluster 導致跨 cluster 流量爆增 — mirror destination 限 same-cluster、跨 cluster 要走獨立的 gateway 並設 quota</li>
<li><strong>Shadow trace 混進 SLO dashboard</strong>：APM 沒分 primary / shadow tag、p95 看起來變差但其實是 shadow 拖累 — trace tag <code>env:shadow</code> 強制、observability dashboard filter</li>
</ul>
<h2 id="何時改走其他服務">何時改走其他服務</h2>
<table>
  <thead>
      <tr>
          <th>需求形狀</th>
          <th>改走</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>無 mesh 環境 / 要 capture artifact 離線重播</td>
          <td><a href="/blog/backend/09-performance-capacity/vendors/goreplay/" data-link-title="GoReplay" data-link-desc="用 production HTTP traffic capture 與 replay 驗證真實請求形狀的效能工程工具">GoReplay</a></td>
      </tr>
      <tr>
          <td>L3-L4 packet 層分析（IDS / network forensic）</td>
          <td><a href="/blog/backend/09-performance-capacity/vendors/aws-vpc-traffic-mirroring/" data-link-title="AWS VPC Traffic Mirroring" data-link-desc="用 VPC 網路層封包鏡像觀察 production traffic 的低侵入 production validation 方式">AWS VPC Traffic Mirroring</a></td>
      </tr>
      <tr>
          <td>合成負載 / load test 而非 production mirror</td>
          <td><a href="/blog/backend/09-performance-capacity/vendors/k6/" data-link-title="k6" data-link-desc="用 scriptable scenario 建立 API、protocol 與 CI 友善壓測的效能工程工具">k6</a> / <a href="/blog/backend/09-performance-capacity/vendors/gatling/" data-link-title="Gatling" data-link-desc="用 JVM DSL、simulation 與 injection profile 表達複雜 scenario 的效能工程工具">Gatling</a></td>
      </tr>
      <tr>
          <td>Production-side 整體治理</td>
          <td><a href="/blog/backend/09-performance-capacity/production-validation/" data-link-title="9.10 Production-Side 驗證" data-link-desc="shadow traffic、dark launch、canary、production-like load test">9.10 Production-Side 驗證</a></td>
      </tr>
  </tbody>
</table>
<h2 id="不在本頁內的主題">不在本頁內的主題</h2>
<ul>
<li>Istio / Linkerd / Envoy 完整 install / 升級 / control plane HA 細節</li>
<li>Service mesh 安全模型（mTLS / SPIFFE / authorization policy）— 屬 <a href="/blog/backend/07-security-data-protection/" data-link-title="模組七：資安與資料保護" data-link-desc="以問題驅動方式擴充資安知識網：先定義服務環節問題，再以案例作為觸發式參考">7 security</a> 邊界</li>
<li>Mesh-level retry / timeout / circuit breaker 等 resilience pattern</li>
<li>Multi-cluster mesh federation（Istio multi-primary、Linkerd multicluster）</li>
</ul>
<h2 id="案例回寫">案例回寫</h2>
<p>Service mesh mirroring 適合回寫平台遷移與新版本 shadow validation 案例。它可接 <a href="/blog/backend/05-deployment-platform/cases/miro-managed-eks-migration/" data-link-title="5.C5 Miro：Managed EKS 遷移" data-link-desc="從自維運平台轉向 managed EKS 的組織與技術協同案例。">Miro managed EKS migration</a>、<a href="/blog/backend/05-deployment-platform/cases/tradeshift-self-managed-k8s-to-eks/" data-link-title="5.C1 Tradeshift：self-managed Kubernetes 遷移到 EKS" data-link-desc="零停機平台遷移的分段策略案例。">Tradeshift self-managed K8s to EKS</a>、<a href="/blog/backend/09-performance-capacity/cases/fanduel-dual-peak-betting-streaming/" data-link-title="9.C28 FanDuel：體育直播 &#43; 投注的雙重峰值" data-link-desc="FanDuel 3.5M MAU、Super Bowl 期間擴容 5-10 倍、用 AWS Local Zones &#43; Wavelength &#43; Outposts 處理 20&#43; 州的雙重峰值">9.C28 FanDuel 雙峰 workload</a> 的逐步驗證需求、<a href="/blog/backend/09-performance-capacity/cases/riot-games-eks-multi-cluster/" data-link-title="9.C12 Riot Games：246 個 EKS cluster 的多遊戲多地區治理" data-link-desc="Riot Games 從 Mesos 遷移到 EKS、用 246 個 cluster 跨遊戲跨地區治理、年省 1000 萬美金">9.C12 Riot Games 246 EKS cluster</a> 的 single-tenant per game 跨 cluster 流量 shadow，以及 <a href="/blog/backend/09-performance-capacity/cases/lyft-microservice-eight-x-peak/" data-link-title="9.C7 Lyft：100&#43; 微服務在 8 倍峰值下的 Auto Scaling" data-link-desc="Lyft 用 AWS Auto Scaling 跨 100&#43; 個微服務承載 8 倍峰值流量、跨 200&#43; 城市">9.C7 Lyft 100+ 微服務</a> 跨服務的 mirror 範圍治理。</p>
<p>這些案例的重點是 routing policy 與 blast radius。Service mesh mirroring 頁引用案例時，要把 case 轉成 route、mirror ratio、target subset、dependency isolation 與 abort condition — 例如 Riot Games 的 single-tenant 模式下、mirror policy 必須限制在 <em>同遊戲</em> cluster 內、不能跨 game 否則 blast radius 失控。</p>
<h2 id="下一步路由">下一步路由</h2>
<ul>
<li>上游：<a href="/blog/backend/09-performance-capacity/production-validation/" data-link-title="9.10 Production-Side 驗證" data-link-desc="shadow traffic、dark launch、canary、production-like load test">9.10 Production-Side 驗證</a></li>
<li>上游：<a href="/blog/backend/05-deployment-platform/traffic-config-control-plane-boundary/" data-link-title="5.7 Traffic、Config 與 Control Plane Boundary" data-link-desc="說明流量、設定、secret、service discovery 與管理面如何分責任與回退。">5.6 Traffic, Config and Control Plane Boundary</a></li>
<li>平行：<a href="/blog/backend/09-performance-capacity/vendors/goreplay/" data-link-title="GoReplay" data-link-desc="用 production HTTP traffic capture 與 replay 驗證真實請求形狀的效能工程工具">GoReplay</a></li>
<li>平行：<a href="/blog/backend/09-performance-capacity/vendors/aws-vpc-traffic-mirroring/" data-link-title="AWS VPC Traffic Mirroring" data-link-desc="用 VPC 網路層封包鏡像觀察 production traffic 的低侵入 production validation 方式">AWS VPC Traffic Mirroring</a></li>
<li>知識卡：<a href="/blog/backend/knowledge-cards/shadow-traffic/" data-link-title="Shadow Traffic" data-link-desc="把 production traffic 複製到新版本驗證、但不返回結果給用戶的測試模式">Shadow Traffic</a></li>
</ul>
]]></content:encoded></item><item><title>mTLS 實際怎麼設定與運維：CA 階層、憑證生命週期、撤銷機制</title><link>https://tarrragon.github.io/blog/work-log/mtls-%E5%AF%A6%E9%9A%9B%E6%80%8E%E9%BA%BC%E8%A8%AD%E5%AE%9A%E8%88%87%E9%81%8B%E7%B6%ADca-%E9%9A%8E%E5%B1%A4%E6%86%91%E8%AD%89%E7%94%9F%E5%91%BD%E9%80%B1%E6%9C%9F%E6%92%A4%E9%8A%B7%E6%A9%9F%E5%88%B6/</link><pubDate>Mon, 18 May 2026 00:00:00 +0000</pubDate><guid>https://tarrragon.github.io/blog/work-log/mtls-%E5%AF%A6%E9%9A%9B%E6%80%8E%E9%BA%BC%E8%A8%AD%E5%AE%9A%E8%88%87%E9%81%8B%E7%B6%ADca-%E9%9A%8E%E5%B1%A4%E6%86%91%E8%AD%89%E7%94%9F%E5%91%BD%E9%80%B1%E6%9C%9F%E6%92%A4%E9%8A%B7%E6%A9%9F%E5%88%B6/</guid><description>&lt;h2 id="mtls-這篇要解決什麼">mTLS 這篇要解決什麼&lt;/h2>
&lt;p>mTLS 的核心是把系統身分綁到 X.509 憑證與私鑰，而不是可重用的 shared secret。介紹文章常把它簡化成「雙向 TLS 憑證、適合金融醫療」，但實際落地時，設計責任會立刻延伸到 CA 階層、憑證生命週期、撤銷與基礎設施整合：&lt;/p>
&lt;ul>
&lt;li>自簽 CA 還是商業 CA？&lt;/li>
&lt;li>憑證放哪、怎麼 rotate？&lt;/li>
&lt;li>怎麼撤銷？CRL 還是 OCSP 還是 short-lived cert？&lt;/li>
&lt;li>nginx 設定怎麼寫、service mesh 怎麼整合？&lt;/li>
&lt;li>跟 API Key、OAuth 比，什麼情境適合承擔 mTLS 的運維成本？&lt;/li>
&lt;/ul>
&lt;p>這些是 mTLS 第一次部署就要處理的基本問題。若只知道「雙向憑證」而沒有 lifecycle 設計，系統會在過期、撤銷或 mesh 升級時失去可預測性。&lt;/p>
&lt;p>本文拆解 mTLS 的工程實務：&lt;/p>
&lt;ol>
&lt;li>&lt;strong>CA 階層&lt;/strong>：為什麼要分層、Root CA / Intermediate CA / Leaf cert&lt;/li>
&lt;li>&lt;strong>憑證生命週期&lt;/strong>：簽發、儲存、rotation、撤銷&lt;/li>
&lt;li>&lt;strong>基礎設施整合&lt;/strong>：nginx / envoy / service mesh 設定模式&lt;/li>
&lt;li>&lt;strong>跟其他 Layer 2 方案的取捨&lt;/strong>：何時 mTLS 才是對的選擇&lt;/li>
&lt;/ol>
&lt;blockquote>
&lt;p>&lt;strong>本文位置&lt;/strong>：本文是 &lt;a href="https://tarrragon.github.io/blog/work-log/api-%E8%AA%8D%E8%AD%89%E7%9A%84%E4%B8%89%E5%B1%A4%E4%BF%A1%E4%BB%BB%E9%82%8A%E7%95%8C%E4%BD%BF%E7%94%A8%E8%80%85%E7%B3%BB%E7%B5%B1%E8%B7%A8%E7%B3%BB%E7%B5%B1-provisioning/" data-link-title="API 認證的三層信任邊界：使用者、系統、跨系統 Provisioning" data-link-desc="API 認證的信任邊界分層（Bearer Token / Shared Secret / Provisioning）：各層的洩漏後果與撤銷方式，以及混用造成的設計失效模式。">API 認證的三層信任邊界&lt;/a> Layer 2 的深入篇之一。主文聚焦「為什麼系統間要獨立 credential」、本文聚焦「用 mTLS 實作這層的具體工程細節」。&lt;/p>&lt;/blockquote>
&lt;hr>
&lt;h2 id="mtls-解什麼問題">mTLS 解什麼問題&lt;/h2>
&lt;h3 id="跟一般-tls-的差異">跟一般 TLS 的差異&lt;/h3>
&lt;p>一般 TLS（HTTPS）是&lt;strong>單向認證&lt;/strong>：client 驗證 server 身分，server 再透過 API Key、token 或 session 辨識 client。&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-text" data-lang="text">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">client ────&amp;#34;我要連 example.com&amp;#34;────▶ server
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">2&lt;/span>&lt;span class="cl"> ◀───server 出示憑證───────── server
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">3&lt;/span>&lt;span class="cl"> 驗證:&amp;#34;這是真的 example.com 嗎&amp;#34;
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">4&lt;/span>&lt;span class="cl"> ↓
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">5&lt;/span>&lt;span class="cl"> 建立加密通道&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>client 驗證 server、但 server 不驗證 client。Client 是匿名的、靠後續 API Key / token 認證。&lt;/p>
&lt;p>mTLS 加上&lt;strong>反向驗證&lt;/strong>：server 也在 TLS handshake 階段驗證 client 憑證，把系統身分提前到連線層建立。&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-text" data-lang="text">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">client ──&amp;#34;我要連 example.com、這是我的憑證&amp;#34;──▶ server
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">2&lt;/span>&lt;span class="cl"> ◀──server 出示憑證───────────────────── server
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">3&lt;/span>&lt;span class="cl"> 
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">4&lt;/span>&lt;span class="cl"> 雙方驗證對方憑證：
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">5&lt;/span>&lt;span class="cl"> client: &amp;#34;這是真的 example.com 嗎&amp;#34;
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">6&lt;/span>&lt;span class="cl"> server: &amp;#34;這個 client 是被授權的嗎&amp;#34;
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">7&lt;/span>&lt;span class="cl"> ↓
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">8&lt;/span>&lt;span class="cl"> 建立加密通道、且雙方都已認證&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>每個 client 有自己的憑證、server 用 CA 信任鏈驗證 client 憑證是否合法。&lt;strong>Client 的身分綁定在 X.509 憑證上、不需要額外的 API Key&lt;/strong>。&lt;/p>
&lt;h3 id="mtls-解的具體威脅">mTLS 解的具體威脅&lt;/h3>
&lt;table>
 &lt;thead>
 &lt;tr>
 &lt;th>威脅&lt;/th>
 &lt;th>一般 TLS + API Key&lt;/th>
 &lt;th>mTLS&lt;/th>
 &lt;/tr>
 &lt;/thead>
 &lt;tbody>
 &lt;tr>
 &lt;td>中間人攔截&lt;/td>
 &lt;td>TLS 已解&lt;/td>
 &lt;td>TLS 已解&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>攻擊者用洩漏的 API Key 假冒 client&lt;/td>
 &lt;td>漏&lt;/td>
 &lt;td>需 client 私鑰、無法只憑網路觀察取得&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>API Key 寫在 client code、被反編譯&lt;/td>
 &lt;td>漏&lt;/td>
 &lt;td>私鑰可放硬體（HSM / TPM / Secure Enclave）&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>Server 端 per-client credential 被攻陷&lt;/td>
 &lt;td>漏（API Key DB 外流）&lt;/td>
 &lt;td>server 無 per-client secret、僅 CA trust chain 暴露&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>Client 端被植入、用合法身分滲透&lt;/td>
 &lt;td>部分（rate limit）&lt;/td>
 &lt;td>同樣（需依靠撤銷機制）&lt;/td>
 &lt;/tr>
 &lt;/tbody>
&lt;/table>
&lt;p>mTLS 的核心優勢是：&lt;strong>client 端的 private key 是 scope-bound、不跨系統共用&lt;/strong>。私鑰理論上不離開 client，且驗證憑藉的是 CA 簽章而非可重用字串；相較 shared API Key，一個 client 的私鑰外流通常可被限制在該 client 的憑證與授權範圍內。&lt;/p></description><content:encoded><![CDATA[<h2 id="mtls-這篇要解決什麼">mTLS 這篇要解決什麼</h2>
<p>mTLS 的核心是把系統身分綁到 X.509 憑證與私鑰，而不是可重用的 shared secret。介紹文章常把它簡化成「雙向 TLS 憑證、適合金融醫療」，但實際落地時，設計責任會立刻延伸到 CA 階層、憑證生命週期、撤銷與基礎設施整合：</p>
<ul>
<li>自簽 CA 還是商業 CA？</li>
<li>憑證放哪、怎麼 rotate？</li>
<li>怎麼撤銷？CRL 還是 OCSP 還是 short-lived cert？</li>
<li>nginx 設定怎麼寫、service mesh 怎麼整合？</li>
<li>跟 API Key、OAuth 比，什麼情境適合承擔 mTLS 的運維成本？</li>
</ul>
<p>這些是 mTLS 第一次部署就要處理的基本問題。若只知道「雙向憑證」而沒有 lifecycle 設計，系統會在過期、撤銷或 mesh 升級時失去可預測性。</p>
<p>本文拆解 mTLS 的工程實務：</p>
<ol>
<li><strong>CA 階層</strong>：為什麼要分層、Root CA / Intermediate CA / Leaf cert</li>
<li><strong>憑證生命週期</strong>：簽發、儲存、rotation、撤銷</li>
<li><strong>基礎設施整合</strong>：nginx / envoy / service mesh 設定模式</li>
<li><strong>跟其他 Layer 2 方案的取捨</strong>：何時 mTLS 才是對的選擇</li>
</ol>
<blockquote>
<p><strong>本文位置</strong>：本文是 <a href="/blog/work-log/api-%E8%AA%8D%E8%AD%89%E7%9A%84%E4%B8%89%E5%B1%A4%E4%BF%A1%E4%BB%BB%E9%82%8A%E7%95%8C%E4%BD%BF%E7%94%A8%E8%80%85%E7%B3%BB%E7%B5%B1%E8%B7%A8%E7%B3%BB%E7%B5%B1-provisioning/" data-link-title="API 認證的三層信任邊界：使用者、系統、跨系統 Provisioning" data-link-desc="API 認證的信任邊界分層（Bearer Token / Shared Secret / Provisioning）：各層的洩漏後果與撤銷方式，以及混用造成的設計失效模式。">API 認證的三層信任邊界</a> Layer 2 的深入篇之一。主文聚焦「為什麼系統間要獨立 credential」、本文聚焦「用 mTLS 實作這層的具體工程細節」。</p></blockquote>
<hr>
<h2 id="mtls-解什麼問題">mTLS 解什麼問題</h2>
<h3 id="跟一般-tls-的差異">跟一般 TLS 的差異</h3>
<p>一般 TLS（HTTPS）是<strong>單向認證</strong>：client 驗證 server 身分，server 再透過 API Key、token 或 session 辨識 client。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="ln">1</span><span class="cl">client ────&#34;我要連 example.com&#34;────▶ server
</span></span><span class="line"><span class="ln">2</span><span class="cl">       ◀───server 出示憑證───────── server
</span></span><span class="line"><span class="ln">3</span><span class="cl">       驗證:&#34;這是真的 example.com 嗎&#34;
</span></span><span class="line"><span class="ln">4</span><span class="cl">       ↓
</span></span><span class="line"><span class="ln">5</span><span class="cl">       建立加密通道</span></span></code></pre></div><p>client 驗證 server、但 server 不驗證 client。Client 是匿名的、靠後續 API Key / token 認證。</p>
<p>mTLS 加上<strong>反向驗證</strong>：server 也在 TLS handshake 階段驗證 client 憑證，把系統身分提前到連線層建立。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="ln">1</span><span class="cl">client ──&#34;我要連 example.com、這是我的憑證&#34;──▶ server
</span></span><span class="line"><span class="ln">2</span><span class="cl">       ◀──server 出示憑證───────────────────── server
</span></span><span class="line"><span class="ln">3</span><span class="cl">       
</span></span><span class="line"><span class="ln">4</span><span class="cl">       雙方驗證對方憑證：
</span></span><span class="line"><span class="ln">5</span><span class="cl">       client: &#34;這是真的 example.com 嗎&#34;
</span></span><span class="line"><span class="ln">6</span><span class="cl">       server: &#34;這個 client 是被授權的嗎&#34;
</span></span><span class="line"><span class="ln">7</span><span class="cl">       ↓
</span></span><span class="line"><span class="ln">8</span><span class="cl">       建立加密通道、且雙方都已認證</span></span></code></pre></div><p>每個 client 有自己的憑證、server 用 CA 信任鏈驗證 client 憑證是否合法。<strong>Client 的身分綁定在 X.509 憑證上、不需要額外的 API Key</strong>。</p>
<h3 id="mtls-解的具體威脅">mTLS 解的具體威脅</h3>
<table>
  <thead>
      <tr>
          <th>威脅</th>
          <th>一般 TLS + API Key</th>
          <th>mTLS</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>中間人攔截</td>
          <td>TLS 已解</td>
          <td>TLS 已解</td>
      </tr>
      <tr>
          <td>攻擊者用洩漏的 API Key 假冒 client</td>
          <td>漏</td>
          <td>需 client 私鑰、無法只憑網路觀察取得</td>
      </tr>
      <tr>
          <td>API Key 寫在 client code、被反編譯</td>
          <td>漏</td>
          <td>私鑰可放硬體（HSM / TPM / Secure Enclave）</td>
      </tr>
      <tr>
          <td>Server 端 per-client credential 被攻陷</td>
          <td>漏（API Key DB 外流）</td>
          <td>server 無 per-client secret、僅 CA trust chain 暴露</td>
      </tr>
      <tr>
          <td>Client 端被植入、用合法身分滲透</td>
          <td>部分（rate limit）</td>
          <td>同樣（需依靠撤銷機制）</td>
      </tr>
  </tbody>
</table>
<p>mTLS 的核心優勢是：<strong>client 端的 private key 是 scope-bound、不跨系統共用</strong>。私鑰理論上不離開 client，且驗證憑藉的是 CA 簽章而非可重用字串；相較 shared API Key，一個 client 的私鑰外流通常可被限制在該 client 的憑證與授權範圍內。</p>
<p>代價是：<strong>PKI 基礎建設複雜</strong>、憑證生命週期管理重、運維成本高。</p>
<hr>
<h2 id="ca-階層設計">CA 階層設計</h2>
<h3 id="為什麼要分層">為什麼要分層</h3>
<p>CA 分層的核心責任是降低最高信任根的暴露頻率。直覺做法是「用一張 Root CA 直接簽 client 憑證」：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="ln">1</span><span class="cl">Root CA ──signs──▶ client-A.crt
</span></span><span class="line"><span class="ln">2</span><span class="cl">        ──signs──▶ client-B.crt
</span></span><span class="line"><span class="ln">3</span><span class="cl">        ──signs──▶ client-C.crt
</span></span><span class="line"><span class="ln">4</span><span class="cl">        ...</span></span></code></pre></div><p>Root CA 私鑰是整個 PKI 的最高信任根，通常需要離線、HSM 與多人簽核。它一旦洩漏，所有信任這個 Root 的系統都要重新建立信任；Root CA 又通常活 10-20 年，撤換成本極高。</p>
<p>如果 Root CA 私鑰要常常拿出來簽 client cert、暴露風險就大幅提高。</p>
<p>解法：<strong>分層</strong>。Root CA 只簽 Intermediate CA、Intermediate CA 負責日常簽發 client cert：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="ln">1</span><span class="cl">Root CA (offline, 20 年)
</span></span><span class="line"><span class="ln">2</span><span class="cl">    ↓ signs (一次性 / 5-10 年)
</span></span><span class="line"><span class="ln">3</span><span class="cl">Intermediate CA (online, 1-5 年)
</span></span><span class="line"><span class="ln">4</span><span class="cl">    ↓ signs (日常、每張 90 天-1 年)
</span></span><span class="line"><span class="ln">5</span><span class="cl">Leaf certificates (client / server)</span></span></code></pre></div><p>Root CA 通常<strong>完全離線</strong>（air-gapped 機器、硬體 HSM）、私鑰一年只拿出來簽幾次（簽 Intermediate）。Intermediate CA 才是 online、處理日常簽發。</p>
<h3 id="階層帶來的好處">階層帶來的好處</h3>
<table>
  <thead>
      <tr>
          <th>好處</th>
          <th>機制</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>Root CA 私鑰暴露次數降到最低</td>
          <td>只在簽 Intermediate 時用、其他時間離線</td>
      </tr>
      <tr>
          <td>Intermediate 被攻陷可撤換</td>
          <td>Root CA 撤掉該 Intermediate、用新 Intermediate 簽</td>
      </tr>
      <tr>
          <td>可按用途分 Intermediate</td>
          <td>一個給 server cert、一個給 client cert、一個給 internal services</td>
      </tr>
      <tr>
          <td>短 chain 仍可驗證</td>
          <td>client 只信任 Root CA、Intermediate 在 chain 中傳遞</td>
      </tr>
  </tbody>
</table>
<h3 id="三種典型部署模式">三種典型部署模式</h3>
<h4 id="模式-a自管-ca">模式 A：自管 CA</h4>
<p>完全自己跑 CA infra：</p>
<ul>
<li>Root CA：離線 HSM、年度作業簽 Intermediate</li>
<li>Intermediate CA：online、用工具如 <code>step-ca</code>、<code>cfssl</code>、<code>Vault PKI</code>、<code>Smallstep</code></li>
<li>Leaf cert：自動化簽發、短 TTL</li>
</ul>
<p>適合：純內部系統、不需 public trust、要完全控制 CA infrastructure。</p>
<h4 id="模式-b商業-cadigicert--sectigo--entrust">模式 B：商業 CA（DigiCert / Sectigo / Entrust）</h4>
<p>買商業 CA 服務、商業 CA 已預埋進所有 OS / browser trust store：</p>
<ul>
<li>適合：需要 public trust（HTTPS server cert、SSL/TLS for end users）</li>
<li>mTLS client cert 通常在自己的封閉系統內驗證，public trust 的價值較低，因此較少使用商業 CA</li>
</ul>
<h4 id="模式-ccloud-managed-pki">模式 C：Cloud-managed PKI</h4>
<p>雲廠商提供 managed PKI：</p>
<ul>
<li>AWS Private CA（ACM PCA）— managed Root + Intermediate</li>
<li>GCP Certificate Authority Service</li>
<li>Azure Key Vault Certificates</li>
</ul>
<p>適合：已在某朵雲、不想自管 CA infra、可接受 vendor lock。</p>
<h3 id="自管-ca-的最小工具鏈">自管 CA 的最小工具鏈</h3>
<p>如果走模式 A、推薦工具：</p>
<table>
  <thead>
      <tr>
          <th>工具</th>
          <th>用途</th>
          <th>特性</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td><strong>step-ca</strong></td>
          <td>Lightweight CA server、支援 ACME</td>
          <td>Smallstep 開源、設定簡單</td>
      </tr>
      <tr>
          <td><strong>HashiCorp Vault PKI</strong></td>
          <td>Vault 內建 PKI engine</td>
          <td>整合 Vault 既有 secret 管理</td>
      </tr>
      <tr>
          <td><strong>cfssl</strong></td>
          <td>Cloudflare 的 CA toolkit</td>
          <td>CLI-based、適合 build pipeline</td>
      </tr>
      <tr>
          <td><strong>OpenSSL</strong></td>
          <td>純手工建 CA</td>
          <td>維運成本高、適合學習與小規模</td>
      </tr>
  </tbody>
</table>
<p><code>step-ca</code> 是最低門檻的起手選擇 — 一行 <code>step ca init</code> 建好整套 CA、自動發 ACME 給 client。</p>
<hr>
<h2 id="憑證生命週期">憑證生命週期</h2>
<h3 id="簽發">簽發</h3>
<p><strong>Server cert 簽發流程</strong>：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="ln">1</span><span class="cl">1. Server 產生 private key (RSA 2048+ 或 ECDSA P-256)
</span></span><span class="line"><span class="ln">2</span><span class="cl">2. Server 用 private key 產生 CSR (Certificate Signing Request)
</span></span><span class="line"><span class="ln">3</span><span class="cl">3. CSR 送給 CA
</span></span><span class="line"><span class="ln">4</span><span class="cl">4. CA 驗證 CSR 內容（DN、SAN、用途）
</span></span><span class="line"><span class="ln">5</span><span class="cl">5. CA 用 Intermediate CA 私鑰簽 cert
</span></span><span class="line"><span class="ln">6</span><span class="cl">6. 把簽好的 cert 回給 server
</span></span><span class="line"><span class="ln">7</span><span class="cl">7. Server 部署 cert + 自己的 private key</span></span></code></pre></div><p><strong>Client cert 簽發流程</strong>：跟 server 一樣，但 SAN 通常是 client identifier（service name、device ID），而非 hostname。</p>
<h3 id="私鑰留在產生端">私鑰留在產生端</h3>
<p>關鍵安全原則是：<strong>private key 在哪產生、就只在那裡存活</strong>。CA 只收 CSR（裡面只有 public key），簽完 cert 回去；client private key 全程留在 client 的受控環境。</p>
<p><strong>失效模式</strong>：</p>
<ul>
<li>CA 幫 client 產生 keypair、把 private key 跟 cert 一起寄給 client（密鑰在 CA 經手了）</li>
<li>把 private key 跟 cert 打包成 PKCS12 用 email 寄</li>
<li>把 keypair 放進公共 git repo</li>
</ul>
<p><strong>操作路由</strong>：</p>
<ul>
<li>Client 端產生 keypair、只送 CSR 給 CA（CSR 只含 public key）、簽完 cert 回來、private key 全程不離開 client</li>
</ul>
<h3 id="儲存">儲存</h3>
<p>Private key 的儲存等級：</p>
<table>
  <thead>
      <tr>
          <th>方式</th>
          <th>安全等級</th>
          <th>適合</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>Plain file（chmod 600）</td>
          <td>低</td>
          <td>dev / staging、無 HSM 的低風險環境</td>
      </tr>
      <tr>
          <td>OS keystore（Keychain / Windows Cert Store）</td>
          <td>中</td>
          <td>desktop client、laptop</td>
      </tr>
      <tr>
          <td>HSM（hardware security module）</td>
          <td>高</td>
          <td>金融、政府、私鑰永不離開硬體</td>
      </tr>
      <tr>
          <td>Cloud KMS（AWS KMS / GCP KMS）</td>
          <td>中-高</td>
          <td>cloud-native、private key 進 KMS、簽章用 API</td>
      </tr>
      <tr>
          <td>TPM / Secure Enclave</td>
          <td>高</td>
          <td>mobile / IoT、跟硬體綁定</td>
      </tr>
  </tbody>
</table>
<p>Production server cert 私鑰至少應該 OS 層保護（檔案權限 + 加密磁碟）、高敏感場景上 HSM。</p>
<h3 id="rotation">Rotation</h3>
<p>mTLS 憑證的 rotation 跟 <a href="/blog/work-log/shared-secret-%E5%AE%89%E5%85%A8%E8%BC%AA%E6%9B%BF%E8%A8%AD%E8%A8%88%E9%9B%99%E5%AF%86%E9%81%8E%E6%B8%A1%E6%9C%9F%E8%87%AA%E5%8B%95%E5%8C%96%E8%88%87%E7%B7%8A%E6%80%A5%E6%B5%81%E7%A8%8B/" data-link-title="Shared Secret 安全輪替設計：雙密過渡期、自動化與緊急流程" data-link-desc="系統間 Shared Secret 輪替的核心機制：dual-secret rollover、自動化工具比較（AWS Secrets Manager / Vault / GCP）、緊急 rotation 流程與多 client 環境的失敗模式。">shared secret rotation</a> 概念類似、但有具體差異：</p>
<table>
  <thead>
      <tr>
          <th>維度</th>
          <th>Shared Secret</th>
          <th>mTLS Cert</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>過期機制</td>
          <td>沒有、要手動 rotate</td>
          <td>內建 <code>notBefore</code> / <code>notAfter</code>、自動過期</td>
      </tr>
      <tr>
          <td>雙密期</td>
          <td>兩把同時 valid</td>
          <td>過渡期 server 同時持有舊 cert（未過期）+ 新 cert（已簽發）、自動有效</td>
      </tr>
      <tr>
          <td>Rotation 觸發</td>
          <td>排程</td>
          <td>排程 + 過期前自動</td>
      </tr>
  </tbody>
</table>
<p>實務上的 rotation 模式：</p>
<p><strong>短 TTL + 自動續發（推薦）</strong>：</p>
<ul>
<li>Leaf cert TTL 設短（24 小時 ~ 7 天）</li>
<li>用 ACME protocol（如 Let&rsquo;s Encrypt 的協定）讓 client 自動續發</li>
<li>rotation 由續發流程承擔，過期前自動換新</li>
</ul>
<p>工具：<code>cert-manager</code>（K8s）、<code>step-ca</code> + <code>step</code>、<code>certbot</code>。</p>
<p><strong>中 TTL + 半自動（傳統）</strong>：</p>
<ul>
<li>TTL 1 年、年度手動 rotation</li>
<li>用工具列管所有 cert 的 <code>notAfter</code>、過期前 30 天自動告警</li>
<li>適合舊架構、無法跑短 TTL 的場景</li>
</ul>
<p><strong>長 TTL（不建議）</strong>：</p>
<ul>
<li>TTL 多年、近乎不 rotate</li>
<li>私鑰暴露窗極長、被洩漏到察覺的時間差大</li>
<li>唯一情境：IoT 設備、無法 OTA 更新</li>
</ul>
<h3 id="撤銷">撤銷</h3>
<p>當 cert 在 <code>notAfter</code> 前需要失效（私鑰洩漏、員工離職、合約終止）、需要撤銷機制。有三種主流方案：</p>
<h4 id="crlcertificate-revocation-list">CRL（Certificate Revocation List）</h4>
<p>CA 維護一份「<strong>已撤銷憑證 list</strong>」、定期發佈（小時級到天級）。Client 端要：</p>
<ol>
<li>下載最新 CRL</li>
<li>連線時檢查對方 cert 是否在 CRL 內</li>
</ol>
<p><strong>優點</strong>：簡單、infrastructure 輕。</p>
<p><strong>缺點</strong>：</p>
<ul>
<li>CRL 大、下載成本高</li>
<li>Cache 期內撤銷不生效（最差幾小時）</li>
<li>Client 沒下載 CRL、撤銷完全沒效</li>
</ul>
<h4 id="ocsponline-certificate-status-protocol">OCSP（Online Certificate Status Protocol）</h4>
<p>Real-time 查詢、client 每次連線時即時 query OCSP responder：「<strong>這張 cert 還有效嗎？</strong>」</p>
<p><strong>優點</strong>：Real-time、撤銷即時生效。</p>
<p><strong>缺點</strong>：</p>
<ul>
<li>每次連線增加一次 OCSP query、延遲</li>
<li>OCSP responder 是 single point of failure</li>
<li>Privacy 顧慮（每次連線都告訴 CA 你在連誰）</li>
</ul>
<p>進階：<strong>OCSP Stapling</strong> — server 預先 query OCSP、把結果 staple 在自己的 cert chain 裡、client 不需自己 query。解決延遲跟 privacy、但 server 端要實作。</p>
<h4 id="short-lived-cert不撤銷讓它過期">Short-lived cert（不撤銷、讓它過期）</h4>
<p>最現代的做法：<strong>cert TTL 極短（小時、甚至分鐘）、不實作撤銷機制、靠過期自然失效</strong>。</p>
<p><strong>優點</strong>：</p>
<ul>
<li>可省略 CRL / OCSP infrastructure</li>
<li>撤銷窗 = TTL（小時級）、可預期</li>
<li>Privacy 友善</li>
</ul>
<p><strong>缺點</strong>：</p>
<ul>
<li>需要可靠的自動續發機制</li>
<li>Client 無法續發時直接斷線</li>
</ul>
<p>工具：<code>SPIFFE</code>/<code>SPIRE</code> 主推這個模式、cert TTL 設小時級。</p>
<h3 id="三種撤銷方案的選擇">三種撤銷方案的選擇</h3>
<table>
  <thead>
      <tr>
          <th>場景</th>
          <th>推薦撤銷方案</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>傳統 enterprise、架構變動成本高</td>
          <td>CRL（最低門檻）</td>
      </tr>
      <tr>
          <td>公開 HTTPS、需要 real-time 撤銷</td>
          <td>OCSP Stapling</td>
      </tr>
      <tr>
          <td>Cloud-native、有自動續發 infra</td>
          <td>Short-lived cert</td>
      </tr>
      <tr>
          <td>內部 service mesh</td>
          <td>Short-lived cert（mesh 自動）</td>
      </tr>
  </tbody>
</table>
<hr>
<h2 id="基礎設施整合">基礎設施整合</h2>
<h3 id="nginx-設定-mtls-server">nginx 設定 mTLS server</h3>
<p>最常見的場景：nginx 當 reverse proxy、要求 client 出示憑證。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-nginx" data-lang="nginx"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="k">server</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">    <span class="kn">listen</span> <span class="mi">443</span> <span class="s">ssl</span><span class="p">;</span>
</span></span><span class="line"><span class="ln"> 3</span><span class="cl">    <span class="kn">server_name</span> <span class="s">api.example.com</span><span class="p">;</span>
</span></span><span class="line"><span class="ln"> 4</span><span class="cl">
</span></span><span class="line"><span class="ln"> 5</span><span class="cl">    <span class="c1"># Server cert (出示給 client)
</span></span></span><span class="line"><span class="ln"> 6</span><span class="cl"><span class="c1"></span>    <span class="kn">ssl_certificate</span>     <span class="s">/etc/ssl/certs/api.crt</span><span class="p">;</span>
</span></span><span class="line"><span class="ln"> 7</span><span class="cl">    <span class="kn">ssl_certificate_key</span> <span class="s">/etc/ssl/private/api.key</span><span class="p">;</span>
</span></span><span class="line"><span class="ln"> 8</span><span class="cl">
</span></span><span class="line"><span class="ln"> 9</span><span class="cl">    <span class="c1"># 要求 client 出示憑證、用這個 CA 驗證
</span></span></span><span class="line"><span class="ln">10</span><span class="cl"><span class="c1"></span>    <span class="kn">ssl_client_certificate</span> <span class="s">/etc/ssl/ca/client-ca-chain.pem</span><span class="p">;</span>
</span></span><span class="line"><span class="ln">11</span><span class="cl">    <span class="kn">ssl_verify_client</span> <span class="no">on</span><span class="p">;</span>            <span class="c1"># 強制 client 出示憑證、否則拒絕
</span></span></span><span class="line"><span class="ln">12</span><span class="cl"><span class="c1"></span>    <span class="kn">ssl_verify_depth</span> <span class="mi">2</span><span class="p">;</span>              <span class="c1"># 驗證 chain 深度、視 PKI 階層調 (Root → Intermediate → Leaf)
</span></span></span><span class="line"><span class="ln">13</span><span class="cl"><span class="c1"></span>
</span></span><span class="line"><span class="ln">14</span><span class="cl">    <span class="kn">location</span> <span class="s">/</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">15</span><span class="cl">        <span class="c1"># 把 client cert 資訊傳給後端 application
</span></span></span><span class="line"><span class="ln">16</span><span class="cl"><span class="c1"></span>        <span class="kn">proxy_set_header</span> <span class="s">X-Client-DN</span>  <span class="nv">$ssl_client_s_dn</span><span class="p">;</span>
</span></span><span class="line"><span class="ln">17</span><span class="cl">        <span class="kn">proxy_set_header</span> <span class="s">X-Client-Verify</span> <span class="nv">$ssl_client_verify</span><span class="p">;</span>
</span></span><span class="line"><span class="ln">18</span><span class="cl">        <span class="kn">proxy_pass</span> <span class="s">http://backend</span><span class="p">;</span>
</span></span><span class="line"><span class="ln">19</span><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="ln">20</span><span class="cl"><span class="p">}</span></span></span></code></pre></div><p>關鍵 directive：</p>
<table>
  <thead>
      <tr>
          <th>Directive</th>
          <th>作用</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td><code>ssl_client_certificate</code></td>
          <td>信任的 CA chain</td>
      </tr>
      <tr>
          <td><code>ssl_verify_client on</code></td>
          <td>強制 client 出示憑證、<code>optional</code> 則彈性接受</td>
      </tr>
      <tr>
          <td><code>ssl_verify_depth</code></td>
          <td>chain 驗證深度、根據 PKI 階層調</td>
      </tr>
      <tr>
          <td><code>$ssl_client_s_dn</code></td>
          <td>傳 client cert 的 subject DN 給 backend</td>
      </tr>
  </tbody>
</table>
<h3 id="nginx-設定-mtls-client呼叫上游">nginx 設定 mTLS client（呼叫上游）</h3>
<p>當 nginx 是 client、要呼叫上游 mTLS server：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-nginx" data-lang="nginx"><span class="line"><span class="ln">1</span><span class="cl"><span class="k">location</span> <span class="s">/upstream</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl">    <span class="kn">proxy_pass</span> <span class="s">https://upstream.example.com</span><span class="p">;</span>
</span></span><span class="line"><span class="ln">3</span><span class="cl">    <span class="kn">proxy_ssl_certificate</span>     <span class="s">/etc/ssl/certs/client.crt</span><span class="p">;</span>
</span></span><span class="line"><span class="ln">4</span><span class="cl">    <span class="kn">proxy_ssl_certificate_key</span> <span class="s">/etc/ssl/private/client.key</span><span class="p">;</span>
</span></span><span class="line"><span class="ln">5</span><span class="cl">    <span class="kn">proxy_ssl_trusted_certificate</span> <span class="s">/etc/ssl/ca/upstream-ca.pem</span><span class="p">;</span>
</span></span><span class="line"><span class="ln">6</span><span class="cl">    <span class="kn">proxy_ssl_verify</span> <span class="no">on</span><span class="p">;</span>
</span></span><span class="line"><span class="ln">7</span><span class="cl"><span class="p">}</span></span></span></code></pre></div><h3 id="envoy--api-gateway-整合">Envoy / API Gateway 整合</h3>
<p>Envoy 是 service mesh 的常見 data plane、mTLS 設定模式：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="nt">listeners</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 2</span><span class="cl"><span class="w"></span>- <span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l">api_listener</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 3</span><span class="cl"><span class="w">  </span><span class="nt">address</span><span class="p">:</span><span class="w"> </span>{<span class="w"> </span><span class="nt">socket_address</span><span class="p">:</span><span class="w"> </span>{<span class="w"> </span><span class="nt">port_value</span><span class="p">:</span><span class="w"> </span><span class="m">443</span><span class="w"> </span>}<span class="w"> </span>}<span class="w">
</span></span></span><span class="line"><span class="ln"> 4</span><span class="cl"><span class="w">  </span><span class="nt">filter_chains</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 5</span><span class="cl"><span class="w">  </span>- <span class="nt">transport_socket</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 6</span><span class="cl"><span class="w">      </span><span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l">envoy.transport_sockets.tls</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 7</span><span class="cl"><span class="w">      </span><span class="nt">typed_config</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 8</span><span class="cl"><span class="w">        </span><span class="nt">&#34;@type&#34;: </span><span class="l">type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.DownstreamTlsContext</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 9</span><span class="cl"><span class="w">        </span><span class="nt">common_tls_context</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln">10</span><span class="cl"><span class="w">          </span><span class="nt">tls_certificates</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln">11</span><span class="cl"><span class="w">          </span>- <span class="nt">certificate_chain</span><span class="p">:</span><span class="w"> </span>{<span class="w"> </span><span class="nt">filename</span><span class="p">:</span><span class="w"> </span><span class="l">/etc/ssl/api.crt }</span><span class="w">
</span></span></span><span class="line"><span class="ln">12</span><span class="cl"><span class="w">            </span><span class="nt">private_key</span><span class="p">:</span><span class="w">      </span>{<span class="w"> </span><span class="nt">filename</span><span class="p">:</span><span class="w"> </span><span class="l">/etc/ssl/api.key }</span><span class="w">
</span></span></span><span class="line"><span class="ln">13</span><span class="cl"><span class="w">          </span><span class="nt">validation_context</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln">14</span><span class="cl"><span class="w">            </span><span class="nt">trusted_ca</span><span class="p">:</span><span class="w"> </span>{<span class="w"> </span><span class="nt">filename</span><span class="p">:</span><span class="w"> </span><span class="l">/etc/ssl/client-ca.pem }</span><span class="w">
</span></span></span><span class="line"><span class="ln">15</span><span class="cl"><span class="w">        </span><span class="nt">require_client_certificate</span><span class="p">:</span><span class="w"> </span><span class="kc">true</span></span></span></code></pre></div><blockquote>
<p>上方只展 inbound listener 的 <code>DownstreamTlsContext</code>。Envoy 作為 client 呼叫上游 mTLS server 時、要在對應的 cluster 配 <code>transport_socket</code> + <code>UpstreamTlsContext</code>（含 client cert + private key + trusted CA）、不在這份 listener 設定裡。</p></blockquote>
<p>跟 nginx 比、Envoy 的優勢：</p>
<ul>
<li>動態設定（xDS API、不需 reload）</li>
<li>支援 SDS（Secret Discovery Service）動態取憑證</li>
<li>跟 Istio / Linkerd 等 mesh 整合</li>
</ul>
<h3 id="service-meshistio--linkerd">Service Mesh（Istio / Linkerd）</h3>
<p>Service mesh 內建 mTLS：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"><span class="line"><span class="ln">1</span><span class="cl"><span class="c"># Istio: 強制 mesh 內所有 service 走 mTLS</span><span class="w">
</span></span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="w"></span><span class="nt">apiVersion</span><span class="p">:</span><span class="w"> </span><span class="l">security.istio.io/v1beta1</span><span class="w">
</span></span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="w"></span><span class="nt">kind</span><span class="p">:</span><span class="w"> </span><span class="l">PeerAuthentication</span><span class="w">
</span></span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="w"></span><span class="nt">metadata</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln">5</span><span class="cl"><span class="w">  </span><span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l">default</span><span class="w">
</span></span></span><span class="line"><span class="ln">6</span><span class="cl"><span class="w">  </span><span class="nt">namespace</span><span class="p">:</span><span class="w"> </span><span class="l">production</span><span class="w">
</span></span></span><span class="line"><span class="ln">7</span><span class="cl"><span class="w"></span><span class="nt">spec</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln">8</span><span class="cl"><span class="w">  </span><span class="nt">mtls</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln">9</span><span class="cl"><span class="w">    </span><span class="nt">mode</span><span class="p">:</span><span class="w"> </span><span class="l">STRICT</span></span></span></code></pre></div><p>機制：</p>
<ul>
<li>Mesh control plane（Istio: Istiod / Linkerd: identity）內建 CA、自動發每個 pod 一張 cert</li>
<li>Sidecar proxy（Envoy / Linkerd proxy）handle TLS termination、application code 完全不感</li>
<li>Cert TTL 短（Istio 預設 24 小時、視版本而定）、自動續發</li>
<li>mTLS identity 綁定 K8s ServiceAccount</li>
</ul>
<p>優點：<strong>application 完全不用改 code、不用管 cert、不用管 rotation</strong> — mesh 全包。</p>
<p>缺點：<strong>綁定整套 mesh 架構</strong>、運維 mesh 本身是大事、學習曲線陡。</p>
<h3 id="為-application-直接做-mtls">為 application 直接做 mTLS</h3>
<p>某些場景（沒 mesh、需要 application 級控制）需要 application 直接做 mTLS：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="ln">1</span><span class="cl"><span class="c1"># Python requests 範例 - mTLS client</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="kn">import</span> <span class="nn">requests</span>
</span></span><span class="line"><span class="ln">3</span><span class="cl">
</span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="n">response</span> <span class="o">=</span> <span class="n">requests</span><span class="o">.</span><span class="n">get</span><span class="p">(</span>
</span></span><span class="line"><span class="ln">5</span><span class="cl">    <span class="s1">&#39;https://api.example.com/data&#39;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">6</span><span class="cl">    <span class="n">cert</span><span class="o">=</span><span class="p">(</span><span class="s1">&#39;/path/to/client.crt&#39;</span><span class="p">,</span> <span class="s1">&#39;/path/to/client.key&#39;</span><span class="p">),</span>
</span></span><span class="line"><span class="ln">7</span><span class="cl">    <span class="n">verify</span><span class="o">=</span><span class="s1">&#39;/path/to/server-ca.pem&#39;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">8</span><span class="cl"><span class="p">)</span></span></span></code></pre></div>




<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-go" data-lang="go"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="c1">// Go net/http 範例 - mTLS client</span>
</span></span><span class="line"><span class="ln"> 2</span><span class="cl"><span class="nx">cert</span><span class="p">,</span> <span class="nx">err</span> <span class="o">:=</span> <span class="nx">tls</span><span class="p">.</span><span class="nf">LoadX509KeyPair</span><span class="p">(</span><span class="s">&#34;client.crt&#34;</span><span class="p">,</span> <span class="s">&#34;client.key&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 3</span><span class="cl"><span class="k">if</span> <span class="nx">err</span> <span class="o">!=</span> <span class="kc">nil</span> <span class="p">{</span> <span class="k">return</span> <span class="nx">err</span> <span class="p">}</span>
</span></span><span class="line"><span class="ln"> 4</span><span class="cl">
</span></span><span class="line"><span class="ln"> 5</span><span class="cl"><span class="nx">caCert</span><span class="p">,</span> <span class="nx">err</span> <span class="o">:=</span> <span class="nx">os</span><span class="p">.</span><span class="nf">ReadFile</span><span class="p">(</span><span class="s">&#34;server-ca.pem&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 6</span><span class="cl"><span class="k">if</span> <span class="nx">err</span> <span class="o">!=</span> <span class="kc">nil</span> <span class="p">{</span> <span class="k">return</span> <span class="nx">err</span> <span class="p">}</span>
</span></span><span class="line"><span class="ln"> 7</span><span class="cl"><span class="nx">caCertPool</span> <span class="o">:=</span> <span class="nx">x509</span><span class="p">.</span><span class="nf">NewCertPool</span><span class="p">()</span>
</span></span><span class="line"><span class="ln"> 8</span><span class="cl"><span class="nx">caCertPool</span><span class="p">.</span><span class="nf">AppendCertsFromPEM</span><span class="p">(</span><span class="nx">caCert</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 9</span><span class="cl">
</span></span><span class="line"><span class="ln">10</span><span class="cl"><span class="nx">client</span> <span class="o">:=</span> <span class="o">&amp;</span><span class="nx">http</span><span class="p">.</span><span class="nx">Client</span><span class="p">{</span>
</span></span><span class="line"><span class="ln">11</span><span class="cl">    <span class="nx">Transport</span><span class="p">:</span> <span class="o">&amp;</span><span class="nx">http</span><span class="p">.</span><span class="nx">Transport</span><span class="p">{</span>
</span></span><span class="line"><span class="ln">12</span><span class="cl">        <span class="nx">TLSClientConfig</span><span class="p">:</span> <span class="o">&amp;</span><span class="nx">tls</span><span class="p">.</span><span class="nx">Config</span><span class="p">{</span>
</span></span><span class="line"><span class="ln">13</span><span class="cl">            <span class="nx">Certificates</span><span class="p">:</span> <span class="p">[]</span><span class="nx">tls</span><span class="p">.</span><span class="nx">Certificate</span><span class="p">{</span><span class="nx">cert</span><span class="p">},</span>
</span></span><span class="line"><span class="ln">14</span><span class="cl">            <span class="nx">RootCAs</span><span class="p">:</span>      <span class="nx">caCertPool</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">15</span><span class="cl">        <span class="p">},</span>
</span></span><span class="line"><span class="ln">16</span><span class="cl">    <span class="p">},</span>
</span></span><span class="line"><span class="ln">17</span><span class="cl"><span class="p">}</span>
</span></span><span class="line"><span class="ln">18</span><span class="cl"><span class="nx">resp</span><span class="p">,</span> <span class="nx">err</span> <span class="o">:=</span> <span class="nx">client</span><span class="p">.</span><span class="nf">Get</span><span class="p">(</span><span class="s">&#34;https://api.example.com/data&#34;</span><span class="p">)</span></span></span></code></pre></div><p>每個語言的 stdlib 都有對應 API、寫法大同小異。但 application 要自己處理 cert reload、過期、rotation — 比 service mesh 麻煩很多。</p>
<hr>
<h2 id="跟其他-layer-2-方案的成本比較">跟其他 Layer 2 方案的成本比較</h2>
<p>mTLS 在三層信任邊界的 Layer 2 是安全強度高、運維責任也重的選項。是否採用，要看威脅模型、合規要求、私鑰保護能力與自動化成熟度。</p>
<table>
  <thead>
      <tr>
          <th>方案</th>
          <th>安全等級</th>
          <th>運維成本</th>
          <th>適合</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td><strong>Shared Secret</strong></td>
          <td>低-中</td>
          <td>低</td>
          <td>純內部、低風險</td>
      </tr>
      <tr>
          <td><strong>API Key + HTTPS</strong></td>
          <td>中</td>
          <td>低</td>
          <td>一般 SaaS、對外 API</td>
      </tr>
      <tr>
          <td><strong>HMAC 簽章</strong></td>
          <td>中-高</td>
          <td>中</td>
          <td>需防 replay / tampering</td>
      </tr>
      <tr>
          <td><strong>OAuth Client Credentials</strong></td>
          <td>中-高</td>
          <td>中</td>
          <td>跨組織、需 short-lived token</td>
      </tr>
      <tr>
          <td><strong>mTLS</strong></td>
          <td>高</td>
          <td>高</td>
          <td>合規、零信任、私鑰可硬體保護</td>
      </tr>
  </tbody>
</table>
<h3 id="mtls-適合的場景">mTLS 適合的場景</h3>
<table>
  <thead>
      <tr>
          <th>場景</th>
          <th>為什麼 mTLS 適合</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>金融、醫療、政府合規要求</td>
          <td>合規條款直接要求 mTLS</td>
      </tr>
      <tr>
          <td>零信任網路（zero-trust）</td>
          <td>網路不可信、每個 hop 都要驗身分</td>
      </tr>
      <tr>
          <td>內部 service mesh（K8s + Istio）</td>
          <td>Mesh 自動處理、邊際成本低</td>
      </tr>
      <tr>
          <td>私鑰能放硬體（HSM / TPM / Secure Enclave）</td>
          <td>比 API Key 強得多</td>
      </tr>
      <tr>
          <td>高頻 service-to-service、API Key rotation 痛苦</td>
          <td>短 TTL cert 自動續發、不用人介入</td>
      </tr>
  </tbody>
</table>
<h3 id="mtls-成本偏高的場景">mTLS 成本偏高的場景</h3>
<table>
  <thead>
      <tr>
          <th>場景</th>
          <th>成本偏高的原因</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>對外開放給第三方 SDK</td>
          <td>第三方管理 cert 的門檻高、API Key + HTTPS 較易落地</td>
      </tr>
      <tr>
          <td>小規模、運維資源少</td>
          <td>PKI infra 維護成本超過安全增益</td>
      </tr>
      <tr>
          <td>純內部、不需強身分隔離</td>
          <td>Shared secret 已經夠用</td>
      </tr>
      <tr>
          <td>大量短連線 client（mobile app）</td>
          <td>Cert 散佈跟 rotation 複雜度高</td>
      </tr>
  </tbody>
</table>
<hr>
<h2 id="常見失敗模式">常見失敗模式</h2>
<h3 id="失敗-1忘記-intermediate-cachain-不完整">失敗 1：忘記 Intermediate CA、chain 不完整</h3>
<p><strong>症狀</strong>：server 設定看似正確、但 client 連線時報 <code>certificate verify failed</code>。</p>
<p><strong>根因</strong>：server 端只放了 leaf cert、沒附 Intermediate CA。Client 端只信任 Root、無法 chain 到 Root。</p>
<p><strong>緩解</strong>：server 端 <code>ssl_certificate</code> 要放<strong>完整 chain</strong>（leaf + intermediate、不含 root）：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">cat leaf.crt intermediate.crt &gt; chain.crt
</span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="c1"># nginx 用 chain.crt 而非單獨 leaf.crt</span></span></span></code></pre></div><h3 id="失敗-2cert-過期造成連線中斷">失敗 2：Cert 過期造成連線中斷</h3>
<p><strong>症狀</strong>：cert <code>notAfter</code> 過了、所有 client 突然連不上。</p>
<p><strong>緩解</strong>：</p>
<ul>
<li>監控 cert 過期時間、提前 30 天告警、提前 7 天緊急告警</li>
<li>用自動續發機制（cert-manager / step-ca / ACME）</li>
<li>過期防護應由系統監控與自動續發承擔，而不是依賴人工記憶</li>
</ul>
<h3 id="失敗-3私鑰權限過寬被同機其他-user-讀走">失敗 3：私鑰權限過寬、被同機其他 user 讀走</h3>
<p><strong>症狀</strong>：security audit 發現 <code>/etc/ssl/private/server.key</code> 是 644、所有 user 可讀。</p>
<p><strong>緩解</strong>：</p>
<ul>
<li>Private key 一律 <code>chmod 600</code>、owner <code>root</code> 或 application user</li>
<li>用 systemd 跑的 service、private key 放 <code>LoadCredential=</code> 而非 file path</li>
<li>定期 audit <code>/etc/ssl/</code> 權限</li>
</ul>
<h3 id="失敗-4撤銷後-cert-仍能用">失敗 4：撤銷後 cert 仍能用</h3>
<p><strong>症狀</strong>：cert 撤銷了、但 client 還能連上。</p>
<p><strong>根因</strong>：</p>
<ul>
<li>CRL 設定但 server 沒 enable CRL check</li>
<li>OCSP 設定但 client 沒 query</li>
<li>用 short-lived cert 但 TTL 太長、撤銷窗不可接受</li>
</ul>
<p><strong>緩解</strong>：撤銷機制要<strong>端到端測試</strong>、不只「設定上有」、要驗證「實際生效」。</p>
<h3 id="失敗-5service-mesh-upgrade-後-mtls-中斷">失敗 5：Service mesh upgrade 後 mTLS 中斷</h3>
<p><strong>症狀</strong>：Istio 升級後、cluster 內部分 service 互相連不上。</p>
<p><strong>根因</strong>：mesh control plane 的 CA 換了、舊 cert chain 不通。</p>
<p><strong>緩解</strong>：</p>
<ul>
<li>Mesh upgrade 走 staged rollout，分批驗證 cert chain</li>
<li>Mesh 提供的 CA migration 流程要完整執行</li>
<li>Staging 環境先跑升級流程</li>
</ul>
<hr>
<h2 id="收尾">收尾</h2>
<p>mTLS 是「<strong>用 PKI 換掉 secret 管理</strong>」的設計 — 私鑰不離 client、身分綁在 X.509 cert 上、不依賴可重用的字串。安全等級高、但代價是要建立 CA infrastructure、處理 cert 生命週期、整合到各種基礎設施。</p>
<p>幾個核心判斷：</p>
<ol>
<li><strong>CA 分層是基本盤</strong> — Root + Intermediate + Leaf，讓最高信任根維持低暴露</li>
<li><strong>私鑰留在產生端</strong> — CA 只簽 CSR、不碰 private key</li>
<li><strong>撤銷方案要實證可用</strong> — CRL / OCSP / Short-lived 三選一，並驗證實際生效</li>
<li><strong>Service mesh 是 cloud-native 的低成本入口</strong> — Istio / Linkerd 把 mTLS 變成基礎設施，application 改動較小</li>
<li><strong>mTLS 是高責任方案</strong> — 對外開放 API、小規模、無 mesh 場景，OAuth / API Key 往往更容易維運</li>
</ol>
<p>延伸閱讀：</p>
<ul>
<li><a href="/blog/work-log/api-%E8%AA%8D%E8%AD%89%E7%9A%84%E4%B8%89%E5%B1%A4%E4%BF%A1%E4%BB%BB%E9%82%8A%E7%95%8C%E4%BD%BF%E7%94%A8%E8%80%85%E7%B3%BB%E7%B5%B1%E8%B7%A8%E7%B3%BB%E7%B5%B1-provisioning/" data-link-title="API 認證的三層信任邊界：使用者、系統、跨系統 Provisioning" data-link-desc="API 認證的信任邊界分層（Bearer Token / Shared Secret / Provisioning）：各層的洩漏後果與撤銷方式，以及混用造成的設計失效模式。">API 認證的三層信任邊界</a> — 本文的主篇、mTLS 在「Layer 2 系統層」的位置</li>
<li><a href="/blog/work-log/shared-secret-%E5%AE%89%E5%85%A8%E8%BC%AA%E6%9B%BF%E8%A8%AD%E8%A8%88%E9%9B%99%E5%AF%86%E9%81%8E%E6%B8%A1%E6%9C%9F%E8%87%AA%E5%8B%95%E5%8C%96%E8%88%87%E7%B7%8A%E6%80%A5%E6%B5%81%E7%A8%8B/" data-link-title="Shared Secret 安全輪替設計：雙密過渡期、自動化與緊急流程" data-link-desc="系統間 Shared Secret 輪替的核心機制：dual-secret rollover、自動化工具比較（AWS Secrets Manager / Vault / GCP）、緊急 rotation 流程與多 client 環境的失敗模式。">Shared Secret 安全輪替設計</a> — 不用 mTLS 走 secret-based 認證的對應 lifecycle 問題</li>
<li><a href="/blog/work-log/laravel-sanctum-%E7%9A%84-bearer-token-%E8%A8%AD%E8%A8%88%E5%89%96%E6%9E%90pksecret-%E7%82%BA%E4%BB%80%E9%BA%BC%E9%80%99%E6%A8%A3%E8%A8%AD%E8%A8%88/" data-link-title="Laravel Sanctum 的 Bearer Token 設計剖析：{PK}|{secret} 為什麼這樣設計" data-link-desc="Laravel Sanctum `{PK}|{secret}` 格式的設計理由、hash 儲存取捨、constant-time 比對位置，以及跟 GitHub PAT、Stripe API Key 的差異。">Laravel Sanctum 的 Bearer Token 設計剖析</a> — Layer 1 使用者層的 token 機制、跟 mTLS 解的問題不同</li>
</ul>
]]></content:encoded></item></channel></rss>