<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/"><channel><title>Tls on Tarragon</title><link>https://tarrragon.github.io/blog/tags/tls/</link><description>Recent content in Tls on Tarragon</description><generator>Hugo -- gohugo.io</generator><language>zh-TW</language><copyright>Tarragon (CC BY 4.0)</copyright><lastBuildDate>Fri, 26 Jun 2026 00:00:00 +0000</lastBuildDate><atom:link href="https://tarrragon.github.io/blog/tags/tls/index.xml" rel="self" type="application/rss+xml"/><item><title>入口上 IaC — ALB、TLS 與健康檢查</title><link>https://tarrragon.github.io/blog/infra/05-core-services/loadbalancer-alb/</link><pubDate>Fri, 26 Jun 2026 00:00:00 +0000</pubDate><guid>https://tarrragon.github.io/blog/infra/05-core-services/loadbalancer-alb/</guid><description>&lt;p>ALB（Application Load Balancer）描述流量進入系統的第一站。它在 IaC 裡的接線責任是把三個層次釘清楚：listener 決定監聽哪些 port 與協定、target group 決定流量導向哪些運算後端、health check 決定後端是否健康到可以接流量。ALB 本身是 stateless 的 — 重建不會遺失資料，但會換掉它的 DNS 名稱，所以對外服務通常在它前面再掛一層穩定的 DNS 記錄（Route 53 alias 或 CNAME），讓使用者看到的網域不隨 ALB 重建而改變。&lt;/p>
&lt;p>ALB 掛在 public subnet、引用專屬的 security group，security group 的入站通常只開 80 和 443 對 &lt;code>0.0.0.0/0&lt;/code>（這是少數合理出現全開的位置，因為 ALB 的工作本來就是接收公開流量）。後端運算節點住在 private subnet，它們的 security group 入站只允許來自 ALB security group 的流量 — 這個 group-to-group 引用讓規則跟著成員身分走，不跟著 IP 走（見&lt;a href="https://tarrragon.github.io/blog/infra/03-network-foundation/" data-link-title="模組三：網路地基 — VPC 與分層" data-link-desc="VPC、public / private subnet 切分、route table、NAT、security group 設計">模組三：網路地基&lt;/a>）。&lt;/p>
&lt;h2 id="alb-與-listener-設定">ALB 與 listener 設定&lt;/h2>
&lt;p>ALB 資源本身描述的是它掛在哪些 subnet、用哪個 security group、是對外（&lt;code>internal = false&lt;/code>）還是內部。Listener 則是掛在 ALB 上的監聽端點，每個 listener 綁定一個 port + protocol 的組合。&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-hcl" data-lang="hcl">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">&lt;span class="k">resource&lt;/span> &lt;span class="s2">&amp;#34;aws_lb&amp;#34; &amp;#34;api&amp;#34;&lt;/span> {
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">2&lt;/span>&lt;span class="cl">&lt;span class="n"> name&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="s2">&amp;#34;api-${var.env}&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">3&lt;/span>&lt;span class="cl">&lt;span class="n"> internal&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="kt">false&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">4&lt;/span>&lt;span class="cl">&lt;span class="n"> load_balancer_type&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="s2">&amp;#34;application&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">5&lt;/span>&lt;span class="cl">&lt;span class="n"> security_groups&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="p">[&lt;/span>&lt;span class="k">aws_security_group&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="k">alb&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="k">id&lt;/span>&lt;span class="p">]&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">6&lt;/span>&lt;span class="cl">&lt;span class="n"> subnets&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="p">[&lt;/span>&lt;span class="k">for&lt;/span> &lt;span class="k">s&lt;/span> &lt;span class="k">in&lt;/span> &lt;span class="k">aws_subnet&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="k">public&lt;/span> &lt;span class="err">:&lt;/span> &lt;span class="k">s&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="k">id&lt;/span>&lt;span class="p">]&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">7&lt;/span>&lt;span class="cl">}&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h3 id="http-到-https-的強制跳轉">HTTP 到 HTTPS 的強制跳轉&lt;/h3>
&lt;p>正式服務通常同時建兩個 listener：port 443 接受 HTTPS 流量並轉發到後端，port 80 接收 HTTP 流量後直接回一個 301 redirect 到 HTTPS — 確保使用者即使用 &lt;code>http://&lt;/code> 開頭訪問也會被導到加密連線。&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-hcl" data-lang="hcl">&lt;span class="line">&lt;span class="ln"> 1&lt;/span>&lt;span class="cl">&lt;span class="k">resource&lt;/span> &lt;span class="s2">&amp;#34;aws_lb_listener&amp;#34; &amp;#34;https&amp;#34;&lt;/span> {
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 2&lt;/span>&lt;span class="cl">&lt;span class="n"> load_balancer_arn&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="k">aws_lb&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="k">api&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="k">arn&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 3&lt;/span>&lt;span class="cl">&lt;span class="n"> port&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="m">443&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 4&lt;/span>&lt;span class="cl">&lt;span class="n"> protocol&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="s2">&amp;#34;HTTPS&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 5&lt;/span>&lt;span class="cl">&lt;span class="n"> ssl_policy&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="s2">&amp;#34;ELBSecurityPolicy-TLS13-1-2-2021-06&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 6&lt;/span>&lt;span class="cl">&lt;span class="n"> certificate_arn&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="k">aws_acm_certificate&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="k">api&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="k">arn&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 7&lt;/span>&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 8&lt;/span>&lt;span class="cl"> &lt;span class="k">default_action&lt;/span> {
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 9&lt;/span>&lt;span class="cl">&lt;span class="n"> type&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="s2">&amp;#34;forward&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">10&lt;/span>&lt;span class="cl">&lt;span class="n"> target_group_arn&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="k">aws_lb_target_group&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="k">api&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="k">arn&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">11&lt;/span>&lt;span class="cl"> }
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">12&lt;/span>&lt;span class="cl">}
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">13&lt;/span>&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">14&lt;/span>&lt;span class="cl">&lt;span class="k">resource&lt;/span> &lt;span class="s2">&amp;#34;aws_lb_listener&amp;#34; &amp;#34;http_redirect&amp;#34;&lt;/span> {
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">15&lt;/span>&lt;span class="cl">&lt;span class="n"> load_balancer_arn&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="k">aws_lb&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="k">api&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="k">arn&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">16&lt;/span>&lt;span class="cl">&lt;span class="n"> port&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="m">80&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">17&lt;/span>&lt;span class="cl">&lt;span class="n"> protocol&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="s2">&amp;#34;HTTP&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">18&lt;/span>&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">19&lt;/span>&lt;span class="cl"> &lt;span class="k">default_action&lt;/span> {
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">20&lt;/span>&lt;span class="cl">&lt;span class="n"> type&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="s2">&amp;#34;redirect&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">21&lt;/span>&lt;span class="cl"> &lt;span class="k">redirect&lt;/span> {
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">22&lt;/span>&lt;span class="cl">&lt;span class="n"> port&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="s2">&amp;#34;443&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">23&lt;/span>&lt;span class="cl">&lt;span class="n"> protocol&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="s2">&amp;#34;HTTPS&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">24&lt;/span>&lt;span class="cl">&lt;span class="n"> status_code&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="s2">&amp;#34;HTTP_301&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">25&lt;/span>&lt;span class="cl"> }
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">26&lt;/span>&lt;span class="cl"> }
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">27&lt;/span>&lt;span class="cl">}&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>&lt;code>ssl_policy&lt;/code> 決定 ALB 接受哪些 TLS 版本與密碼套件。選擇以安全與相容性為取捨 — &lt;code>ELBSecurityPolicy-TLS13-1-2-2021-06&lt;/code> 只接受 TLS 1.2 和 1.3，能阻擋過時協定的降級攻擊，但會拒絕仍在使用 TLS 1.0/1.1 的極舊用戶端。對面向公眾的 API 或網站，TLS 1.2 以上是合理的底線；如果有明確的舊用戶端需求（例如嵌入式設備），再往下調但要知道代價。&lt;/p></description><content:encoded><![CDATA[<p>ALB（Application Load Balancer）描述流量進入系統的第一站。它在 IaC 裡的接線責任是把三個層次釘清楚：listener 決定監聽哪些 port 與協定、target group 決定流量導向哪些運算後端、health check 決定後端是否健康到可以接流量。ALB 本身是 stateless 的 — 重建不會遺失資料，但會換掉它的 DNS 名稱，所以對外服務通常在它前面再掛一層穩定的 DNS 記錄（Route 53 alias 或 CNAME），讓使用者看到的網域不隨 ALB 重建而改變。</p>
<p>ALB 掛在 public subnet、引用專屬的 security group，security group 的入站通常只開 80 和 443 對 <code>0.0.0.0/0</code>（這是少數合理出現全開的位置，因為 ALB 的工作本來就是接收公開流量）。後端運算節點住在 private subnet，它們的 security group 入站只允許來自 ALB security group 的流量 — 這個 group-to-group 引用讓規則跟著成員身分走，不跟著 IP 走（見<a href="/blog/infra/03-network-foundation/" data-link-title="模組三：網路地基 — VPC 與分層" data-link-desc="VPC、public / private subnet 切分、route table、NAT、security group 設計">模組三：網路地基</a>）。</p>
<h2 id="alb-與-listener-設定">ALB 與 listener 設定</h2>
<p>ALB 資源本身描述的是它掛在哪些 subnet、用哪個 security group、是對外（<code>internal = false</code>）還是內部。Listener 則是掛在 ALB 上的監聽端點，每個 listener 綁定一個 port + protocol 的組合。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-hcl" data-lang="hcl"><span class="line"><span class="ln">1</span><span class="cl"><span class="k">resource</span> <span class="s2">&#34;aws_lb&#34; &#34;api&#34;</span> {
</span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="n">  name</span>               <span class="o">=</span> <span class="s2">&#34;api-${var.env}&#34;</span>
</span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="n">  internal</span>           <span class="o">=</span> <span class="kt">false</span>
</span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="n">  load_balancer_type</span> <span class="o">=</span> <span class="s2">&#34;application&#34;</span>
</span></span><span class="line"><span class="ln">5</span><span class="cl"><span class="n">  security_groups</span>    <span class="o">=</span> <span class="p">[</span><span class="k">aws_security_group</span><span class="p">.</span><span class="k">alb</span><span class="p">.</span><span class="k">id</span><span class="p">]</span>
</span></span><span class="line"><span class="ln">6</span><span class="cl"><span class="n">  subnets</span>            <span class="o">=</span> <span class="p">[</span><span class="k">for</span> <span class="k">s</span> <span class="k">in</span> <span class="k">aws_subnet</span><span class="p">.</span><span class="k">public</span> <span class="err">:</span> <span class="k">s</span><span class="p">.</span><span class="k">id</span><span class="p">]</span>
</span></span><span class="line"><span class="ln">7</span><span class="cl">}</span></span></code></pre></div><h3 id="http-到-https-的強制跳轉">HTTP 到 HTTPS 的強制跳轉</h3>
<p>正式服務通常同時建兩個 listener：port 443 接受 HTTPS 流量並轉發到後端，port 80 接收 HTTP 流量後直接回一個 301 redirect 到 HTTPS — 確保使用者即使用 <code>http://</code> 開頭訪問也會被導到加密連線。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-hcl" data-lang="hcl"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="k">resource</span> <span class="s2">&#34;aws_lb_listener&#34; &#34;https&#34;</span> {
</span></span><span class="line"><span class="ln"> 2</span><span class="cl"><span class="n">  load_balancer_arn</span> <span class="o">=</span> <span class="k">aws_lb</span><span class="p">.</span><span class="k">api</span><span class="p">.</span><span class="k">arn</span>
</span></span><span class="line"><span class="ln"> 3</span><span class="cl"><span class="n">  port</span>              <span class="o">=</span> <span class="m">443</span>
</span></span><span class="line"><span class="ln"> 4</span><span class="cl"><span class="n">  protocol</span>          <span class="o">=</span> <span class="s2">&#34;HTTPS&#34;</span>
</span></span><span class="line"><span class="ln"> 5</span><span class="cl"><span class="n">  ssl_policy</span>        <span class="o">=</span> <span class="s2">&#34;ELBSecurityPolicy-TLS13-1-2-2021-06&#34;</span>
</span></span><span class="line"><span class="ln"> 6</span><span class="cl"><span class="n">  certificate_arn</span>   <span class="o">=</span> <span class="k">aws_acm_certificate</span><span class="p">.</span><span class="k">api</span><span class="p">.</span><span class="k">arn</span>
</span></span><span class="line"><span class="ln"> 7</span><span class="cl">
</span></span><span class="line"><span class="ln"> 8</span><span class="cl">  <span class="k">default_action</span> {
</span></span><span class="line"><span class="ln"> 9</span><span class="cl"><span class="n">    type</span>             <span class="o">=</span> <span class="s2">&#34;forward&#34;</span>
</span></span><span class="line"><span class="ln">10</span><span class="cl"><span class="n">    target_group_arn</span> <span class="o">=</span> <span class="k">aws_lb_target_group</span><span class="p">.</span><span class="k">api</span><span class="p">.</span><span class="k">arn</span>
</span></span><span class="line"><span class="ln">11</span><span class="cl">  }
</span></span><span class="line"><span class="ln">12</span><span class="cl">}
</span></span><span class="line"><span class="ln">13</span><span class="cl">
</span></span><span class="line"><span class="ln">14</span><span class="cl"><span class="k">resource</span> <span class="s2">&#34;aws_lb_listener&#34; &#34;http_redirect&#34;</span> {
</span></span><span class="line"><span class="ln">15</span><span class="cl"><span class="n">  load_balancer_arn</span> <span class="o">=</span> <span class="k">aws_lb</span><span class="p">.</span><span class="k">api</span><span class="p">.</span><span class="k">arn</span>
</span></span><span class="line"><span class="ln">16</span><span class="cl"><span class="n">  port</span>              <span class="o">=</span> <span class="m">80</span>
</span></span><span class="line"><span class="ln">17</span><span class="cl"><span class="n">  protocol</span>          <span class="o">=</span> <span class="s2">&#34;HTTP&#34;</span>
</span></span><span class="line"><span class="ln">18</span><span class="cl">
</span></span><span class="line"><span class="ln">19</span><span class="cl">  <span class="k">default_action</span> {
</span></span><span class="line"><span class="ln">20</span><span class="cl"><span class="n">    type</span> <span class="o">=</span> <span class="s2">&#34;redirect&#34;</span>
</span></span><span class="line"><span class="ln">21</span><span class="cl">    <span class="k">redirect</span> {
</span></span><span class="line"><span class="ln">22</span><span class="cl"><span class="n">      port</span>        <span class="o">=</span> <span class="s2">&#34;443&#34;</span>
</span></span><span class="line"><span class="ln">23</span><span class="cl"><span class="n">      protocol</span>    <span class="o">=</span> <span class="s2">&#34;HTTPS&#34;</span>
</span></span><span class="line"><span class="ln">24</span><span class="cl"><span class="n">      status_code</span> <span class="o">=</span> <span class="s2">&#34;HTTP_301&#34;</span>
</span></span><span class="line"><span class="ln">25</span><span class="cl">    }
</span></span><span class="line"><span class="ln">26</span><span class="cl">  }
</span></span><span class="line"><span class="ln">27</span><span class="cl">}</span></span></code></pre></div><p><code>ssl_policy</code> 決定 ALB 接受哪些 TLS 版本與密碼套件。選擇以安全與相容性為取捨 — <code>ELBSecurityPolicy-TLS13-1-2-2021-06</code> 只接受 TLS 1.2 和 1.3，能阻擋過時協定的降級攻擊，但會拒絕仍在使用 TLS 1.0/1.1 的極舊用戶端。對面向公眾的 API 或網站，TLS 1.2 以上是合理的底線；如果有明確的舊用戶端需求（例如嵌入式設備），再往下調但要知道代價。</p>
<h3 id="多服務共用-alb">多服務共用 ALB</h3>
<p>一個 ALB 可以掛多個 listener rule，用 host header 或 path 把流量分到不同的 target group。這讓多個微服務共用一個 ALB（省成本），而不需要每個服務各開一個：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-hcl" data-lang="hcl"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="k">resource</span> <span class="s2">&#34;aws_lb_listener_rule&#34; &#34;auth&#34;</span> {
</span></span><span class="line"><span class="ln"> 2</span><span class="cl"><span class="n">  listener_arn</span> <span class="o">=</span> <span class="k">aws_lb_listener</span><span class="p">.</span><span class="k">https</span><span class="p">.</span><span class="k">arn</span>
</span></span><span class="line"><span class="ln"> 3</span><span class="cl"><span class="n">  priority</span>     <span class="o">=</span> <span class="m">10</span>
</span></span><span class="line"><span class="ln"> 4</span><span class="cl">
</span></span><span class="line"><span class="ln"> 5</span><span class="cl">  <span class="k">condition</span> {
</span></span><span class="line"><span class="ln"> 6</span><span class="cl"><span class="n">    path_pattern { values</span> <span class="o">=</span> <span class="p">[</span><span class="s2">&#34;/auth/*&#34;</span><span class="p">]</span> }
</span></span><span class="line"><span class="ln"> 7</span><span class="cl">  }
</span></span><span class="line"><span class="ln"> 8</span><span class="cl">
</span></span><span class="line"><span class="ln"> 9</span><span class="cl">  <span class="k">action</span> {
</span></span><span class="line"><span class="ln">10</span><span class="cl"><span class="n">    type</span>             <span class="o">=</span> <span class="s2">&#34;forward&#34;</span>
</span></span><span class="line"><span class="ln">11</span><span class="cl"><span class="n">    target_group_arn</span> <span class="o">=</span> <span class="k">aws_lb_target_group</span><span class="p">.</span><span class="k">auth</span><span class="p">.</span><span class="k">arn</span>
</span></span><span class="line"><span class="ln">12</span><span class="cl">  }
</span></span><span class="line"><span class="ln">13</span><span class="cl">}</span></span></code></pre></div><p>一個常見的收斂機會：如果每個服務都各自開了一個 ALB，但流量都從同一個入口進來、只是路徑不同，可以收斂成一個 ALB 加 listener rule。每個 ALB 有固定的小時費，少開幾個月費就少幾筆。反過來，當不同服務的安全等級或流量特性差異大到需要獨立的 security group 和 WAF 規則時，分開 ALB 才合理。</p>
<h2 id="target-group-與健康檢查">target group 與健康檢查</h2>
<p>Target group 定義一組接收流量的後端（ECS task、EC2 instance 或 IP），以及判斷這些後端是否健康的檢查邏輯。它是 ALB 和實際運算之間的橋樑。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-hcl" data-lang="hcl"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="k">resource</span> <span class="s2">&#34;aws_lb_target_group&#34; &#34;api&#34;</span> {
</span></span><span class="line"><span class="ln"> 2</span><span class="cl"><span class="n">  name</span>        <span class="o">=</span> <span class="s2">&#34;api-${var.env}-tg&#34;</span>
</span></span><span class="line"><span class="ln"> 3</span><span class="cl"><span class="n">  port</span>        <span class="o">=</span> <span class="m">8080</span>
</span></span><span class="line"><span class="ln"> 4</span><span class="cl"><span class="n">  protocol</span>    <span class="o">=</span> <span class="s2">&#34;HTTP&#34;</span>
</span></span><span class="line"><span class="ln"> 5</span><span class="cl"><span class="n">  vpc_id</span>      <span class="o">=</span> <span class="k">aws_vpc</span><span class="p">.</span><span class="k">main</span><span class="p">.</span><span class="k">id</span>
</span></span><span class="line"><span class="ln"> 6</span><span class="cl"><span class="n">  target_type</span> <span class="o">=</span> <span class="s2">&#34;ip&#34;</span>
</span></span><span class="line"><span class="ln"> 7</span><span class="cl">
</span></span><span class="line"><span class="ln"> 8</span><span class="cl">  <span class="k">health_check</span> {
</span></span><span class="line"><span class="ln"> 9</span><span class="cl"><span class="n">    path</span>                <span class="o">=</span> <span class="s2">&#34;/healthz&#34;</span>
</span></span><span class="line"><span class="ln">10</span><span class="cl"><span class="n">    interval</span>            <span class="o">=</span> <span class="m">15</span>
</span></span><span class="line"><span class="ln">11</span><span class="cl"><span class="n">    healthy_threshold</span>   <span class="o">=</span> <span class="m">2</span>
</span></span><span class="line"><span class="ln">12</span><span class="cl"><span class="n">    unhealthy_threshold</span> <span class="o">=</span> <span class="m">3</span>
</span></span><span class="line"><span class="ln">13</span><span class="cl"><span class="n">    timeout</span>             <span class="o">=</span> <span class="m">5</span>
</span></span><span class="line"><span class="ln">14</span><span class="cl"><span class="n">    matcher</span>             <span class="o">=</span> <span class="s2">&#34;200&#34;</span>
</span></span><span class="line"><span class="ln">15</span><span class="cl">  }
</span></span><span class="line"><span class="ln">16</span><span class="cl">}</span></span></code></pre></div><h3 id="健康檢查的閾值設計">健康檢查的閾值設計</h3>
<p>健康檢查的路徑與閾值是最常被忽略的判讀點。各參數之間的交互作用決定了兩個時間窗口：新後端多久後開始接流量、壞後端多久後被移出。</p>
<p><code>healthy_threshold = 2</code> 配 <code>interval = 15</code> 代表一個新啟動的後端要等 30 秒（兩次通過）才開始接流量。<code>unhealthy_threshold = 3</code> 代表連續三次失敗（45 秒）才被移出。閾值太寬鬆會把壞掉的後端留在輪替裡，讓部分使用者持續收到錯誤；太嚴格會在部署瞬間 — 新容器啟動、應用還在初始化 — 就判定不健康，反覆移出移入，使用者看到間歇性失敗。</p>
<table>
  <thead>
      <tr>
          <th>參數</th>
          <th>過小的風險</th>
          <th>過大的風險</th>
          <th>起點建議</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td><code>interval</code></td>
          <td>ALB 對後端造成額外負擔</td>
          <td>壞後端被偵測到的延遲增加</td>
          <td>15-30 秒</td>
      </tr>
      <tr>
          <td><code>healthy_threshold</code></td>
          <td>還沒完全就緒就接流量</td>
          <td>部署後等太久才開始分流</td>
          <td>2-3 次</td>
      </tr>
      <tr>
          <td><code>unhealthy_threshold</code></td>
          <td>暫時性波動導致健康的後端被移出</td>
          <td>壞後端繼續收流量太久</td>
          <td>2-3 次</td>
      </tr>
      <tr>
          <td><code>timeout</code></td>
          <td>正常但偏慢的回應被誤判為失敗</td>
          <td>確實掛了卻要等很久才確認</td>
          <td>5 秒</td>
      </tr>
  </tbody>
</table>
<h3 id="健康檢查路徑的選擇">健康檢查路徑的選擇</h3>
<p><code>path</code> 指向的端點應該能反映應用是否確實能服務請求，而不只是 process 還活著。一個只回 200 的空端點（所謂 liveness check）證明 HTTP server 在跑，但不代表它能連到資料庫、能讀到必要的 config。較合理的做法是讓 <code>/healthz</code> 至少檢查核心依賴的連線（例如 ping 一下 DB），失敗時回 503。代價是健康檢查會跟著核心依賴一起報不健康 — 如果 DB 暫時斷了，所有後端都會被判定不健康，ALB 會回 503 給使用者。這是正確的行為：如果應用確實無法服務請求，把它標成不健康比假裝健康好。</p>
<p>判讀方式：部署後觀察 target group 裡的 healthy / unhealthy 轉換次數。如果每次部署都看到新 target 在 healthy 與 unhealthy 之間跳動，代表初始等待不夠 — 應用的啟動時間超出 <code>healthy_threshold * interval</code>，考慮加大 <code>healthy_threshold</code> 或設定 ECS 的 <code>startPeriod</code>（啟動寬限期）讓健康檢查在應用初始化期間暫停。</p>
<h2 id="tls-憑證acm-簽發dns-驗證與自動續期">TLS 憑證：ACM 簽發、DNS 驗證與自動續期</h2>
<p>HTTPS listener 引用的 TLS 憑證也屬於 ALB 的接線。用 ACM（AWS Certificate Manager）簽發的憑證在 IaC 裡完整描述 — 涵蓋網域與 DNS 驗證方式 — 讓「憑證存在、驗證、掛載」整條鏈都進版本控制，而非在 Console 手動上傳一份會過期沒人盯的憑證。</p>
<p>ACM 簽發的憑證使用 DNS 驗證時，ACM 要求在指定的 DNS 記錄上放一段驗證值。Terraform 可以自動建立這段記錄並等待驗證通過：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-hcl" data-lang="hcl"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="k">resource</span> <span class="s2">&#34;aws_acm_certificate&#34; &#34;api&#34;</span> {
</span></span><span class="line"><span class="ln"> 2</span><span class="cl"><span class="n">  domain_name</span>       <span class="o">=</span> <span class="s2">&#34;api.${var.domain}&#34;</span>
</span></span><span class="line"><span class="ln"> 3</span><span class="cl"><span class="n">  validation_method</span> <span class="o">=</span> <span class="s2">&#34;DNS&#34;</span>
</span></span><span class="line"><span class="ln"> 4</span><span class="cl">
</span></span><span class="line"><span class="ln"> 5</span><span class="cl"><span class="n">  lifecycle { create_before_destroy</span> <span class="o">=</span> <span class="kt">true</span> }
</span></span><span class="line"><span class="ln"> 6</span><span class="cl">}
</span></span><span class="line"><span class="ln"> 7</span><span class="cl">
</span></span><span class="line"><span class="ln"> 8</span><span class="cl"><span class="k">resource</span> <span class="s2">&#34;aws_route53_record&#34; &#34;cert_validation&#34;</span> {
</span></span><span class="line"><span class="ln"> 9</span><span class="cl"><span class="n">  for_each</span> <span class="o">=</span> {
</span></span><span class="line"><span class="ln">10</span><span class="cl"><span class="n">    for dvo in aws_acm_certificate.api.domain_validation_options : dvo.domain_name</span> <span class="o">=</span><span class="err">&gt;</span> <span class="k">dvo</span>
</span></span><span class="line"><span class="ln">11</span><span class="cl">  }
</span></span><span class="line"><span class="ln">12</span><span class="cl"><span class="n">  zone_id</span> <span class="o">=</span> <span class="k">data</span><span class="p">.</span><span class="k">aws_route53_zone</span><span class="p">.</span><span class="k">main</span><span class="p">.</span><span class="k">zone_id</span>
</span></span><span class="line"><span class="ln">13</span><span class="cl"><span class="n">  name</span>    <span class="o">=</span> <span class="k">each</span><span class="p">.</span><span class="k">value</span><span class="p">.</span><span class="k">resource_record_name</span>
</span></span><span class="line"><span class="ln">14</span><span class="cl"><span class="n">  type</span>    <span class="o">=</span> <span class="k">each</span><span class="p">.</span><span class="k">value</span><span class="p">.</span><span class="k">resource_record_type</span>
</span></span><span class="line"><span class="ln">15</span><span class="cl"><span class="n">  records</span> <span class="o">=</span> <span class="p">[</span><span class="k">each</span><span class="p">.</span><span class="k">value</span><span class="p">.</span><span class="k">resource_record_value</span><span class="p">]</span>
</span></span><span class="line"><span class="ln">16</span><span class="cl"><span class="n">  ttl</span>     <span class="o">=</span> <span class="m">60</span>
</span></span><span class="line"><span class="ln">17</span><span class="cl">}
</span></span><span class="line"><span class="ln">18</span><span class="cl">
</span></span><span class="line"><span class="ln">19</span><span class="cl"><span class="k">resource</span> <span class="s2">&#34;aws_acm_certificate_validation&#34; &#34;api&#34;</span> {
</span></span><span class="line"><span class="ln">20</span><span class="cl"><span class="n">  certificate_arn</span>         <span class="o">=</span> <span class="k">aws_acm_certificate</span><span class="p">.</span><span class="k">api</span><span class="p">.</span><span class="k">arn</span>
</span></span><span class="line"><span class="ln">21</span><span class="cl"><span class="n">  validation_record_fqdns</span> <span class="o">=</span> <span class="p">[</span><span class="k">for</span> <span class="k">r</span> <span class="k">in</span> <span class="k">aws_route53_record</span><span class="p">.</span><span class="k">cert_validation</span> <span class="err">:</span> <span class="k">r</span><span class="p">.</span><span class="k">fqdn</span><span class="p">]</span>
</span></span><span class="line"><span class="ln">22</span><span class="cl">}</span></span></code></pre></div><h3 id="create_before_destroy-的必要性">create_before_destroy 的必要性</h3>
<p><code>create_before_destroy = true</code> 確保憑證更新（例如加 SAN 或續期觸發重建）時先建新的再刪舊的，避免 listener 在交接期間沒有可用憑證。Terraform 預設行為是先刪後建，會造成一個短暫的 HTTPS 中斷窗口 — listener 找不到憑證、所有 HTTPS 連線失敗直到新憑證簽發並驗證完畢。</p>
<p>ACM 簽發的憑證自動續期：只要 DNS 驗證記錄還在（由 Terraform 管理，所以會一直在），ACM 在到期前 60 天自動續期。這是把憑證管理成本降到接近零的做法 — 不需要排程提醒、不需要手動下載上傳。判讀訊號：如果 CloudWatch 出現 <code>DaysToExpiry</code> 降到 30 以下的 alarm，代表自動續期失敗，通常是 DNS 驗證記錄被手動刪了或 Route 53 zone 變了。</p>
<h3 id="多網域憑證san">多網域憑證（SAN）</h3>
<p>一張 ACM 憑證可以涵蓋多個網域（Subject Alternative Names），例如 <code>api.example.com</code> 和 <code>admin.example.com</code> 共用一張。在 IaC 裡用 <code>subject_alternative_names</code> 列舉：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-hcl" data-lang="hcl"><span class="line"><span class="ln">1</span><span class="cl"><span class="k">resource</span> <span class="s2">&#34;aws_acm_certificate&#34; &#34;multi&#34;</span> {
</span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="n">  domain_name</span>               <span class="o">=</span> <span class="s2">&#34;api.${var.domain}&#34;</span>
</span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="n">  subject_alternative_names</span> <span class="o">=</span> <span class="p">[</span><span class="s2">&#34;admin.${var.domain}&#34;, &#34;*.internal.${var.domain}&#34;</span><span class="p">]</span>
</span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="n">  validation_method</span>         <span class="o">=</span> <span class="s2">&#34;DNS&#34;</span>
</span></span><span class="line"><span class="ln">5</span><span class="cl">
</span></span><span class="line"><span class="ln">6</span><span class="cl"><span class="n">  lifecycle { create_before_destroy</span> <span class="o">=</span> <span class="kt">true</span> }
</span></span><span class="line"><span class="ln">7</span><span class="cl">}</span></span></code></pre></div><p>共用一張還是分開簽取決於生命週期：如果這幾個網域總是一起上下線、一起變更，共用一張省維護；如果各自獨立演進，分開簽讓變更範圍更小。</p>
<h2 id="dns-zone-管理與-alb-的銜接">DNS zone 管理與 ALB 的銜接</h2>
<h3 id="hosted-zonedns-記錄的容器">Hosted zone：DNS 記錄的容器</h3>
<p>Route 53 的 hosted zone 是一個網域下所有 DNS 記錄的容器。public hosted zone 管理對外可見的網域（如 <code>example.com</code>），private hosted zone 管理只在 VPC 內可解析的內部網域（如 <code>internal.example.com</code>），讓服務之間用 DNS 名稱互連而不靠 IP。</p>
<p>多環境的 DNS 管理常用子網域 delegation：production 用 <code>example.com</code>（主 zone），dev 和 staging 各用 <code>dev.example.com</code> 和 <code>staging.example.com</code>（子 zone）。子 zone 可以放在不同帳號、由不同團隊管理，主 zone 只需要一組 NS 記錄指向子 zone。這讓環境之間的 DNS 邊界跟帳號邊界對齊。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-hcl" data-lang="hcl"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="k">resource</span> <span class="s2">&#34;aws_route53_zone&#34; &#34;main&#34;</span> {
</span></span><span class="line"><span class="ln"> 2</span><span class="cl"><span class="n">  name</span> <span class="o">=</span> <span class="k">var</span><span class="p">.</span><span class="k">domain</span>
</span></span><span class="line"><span class="ln"> 3</span><span class="cl">}
</span></span><span class="line"><span class="ln"> 4</span><span class="cl">
</span></span><span class="line"><span class="ln"> 5</span><span class="cl"><span class="k">resource</span> <span class="s2">&#34;aws_route53_zone&#34; &#34;staging&#34;</span> {
</span></span><span class="line"><span class="ln"> 6</span><span class="cl"><span class="n">  name</span> <span class="o">=</span> <span class="s2">&#34;staging.${var.domain}&#34;</span>
</span></span><span class="line"><span class="ln"> 7</span><span class="cl">}
</span></span><span class="line"><span class="ln"> 8</span><span class="cl">
</span></span><span class="line"><span class="ln"> 9</span><span class="cl"><span class="k">resource</span> <span class="s2">&#34;aws_route53_record&#34; &#34;staging_ns&#34;</span> {
</span></span><span class="line"><span class="ln">10</span><span class="cl"><span class="n">  zone_id</span> <span class="o">=</span> <span class="k">aws_route53_zone</span><span class="p">.</span><span class="k">main</span><span class="p">.</span><span class="k">zone_id</span>
</span></span><span class="line"><span class="ln">11</span><span class="cl"><span class="n">  name</span>    <span class="o">=</span> <span class="s2">&#34;staging.${var.domain}&#34;</span>
</span></span><span class="line"><span class="ln">12</span><span class="cl"><span class="n">  type</span>    <span class="o">=</span> <span class="s2">&#34;NS&#34;</span>
</span></span><span class="line"><span class="ln">13</span><span class="cl"><span class="n">  ttl</span>     <span class="o">=</span> <span class="m">300</span>
</span></span><span class="line"><span class="ln">14</span><span class="cl"><span class="n">  records</span> <span class="o">=</span> <span class="k">aws_route53_zone</span><span class="p">.</span><span class="k">staging</span><span class="p">.</span><span class="k">name_servers</span>
</span></span><span class="line"><span class="ln">15</span><span class="cl">}</span></span></code></pre></div><p>hosted zone 也是 ACM 憑證 DNS 驗證的依賴 — ACM 簽發憑證時需要在對應的 zone 寫入一條驗證記錄，zone 不存在或不在同帳號就接不上。把 zone 的建立排在 ACM 之前，讓依賴圖自然正確。</p>
<h3 id="alb-的穩定-dns-記錄">ALB 的穩定 DNS 記錄</h3>
<p>ALB 重建後 DNS 名稱會改變。穩定對外的方式是在 Route 53 建一條 alias 記錄指向 ALB，使用者連的是 <code>api.example.com</code>，DNS 自動解析到 ALB 目前的位址：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-hcl" data-lang="hcl"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="k">resource</span> <span class="s2">&#34;aws_route53_record&#34; &#34;api&#34;</span> {
</span></span><span class="line"><span class="ln"> 2</span><span class="cl"><span class="n">  zone_id</span> <span class="o">=</span> <span class="k">data</span><span class="p">.</span><span class="k">aws_route53_zone</span><span class="p">.</span><span class="k">main</span><span class="p">.</span><span class="k">zone_id</span>
</span></span><span class="line"><span class="ln"> 3</span><span class="cl"><span class="n">  name</span>    <span class="o">=</span> <span class="s2">&#34;api.${var.domain}&#34;</span>
</span></span><span class="line"><span class="ln"> 4</span><span class="cl"><span class="n">  type</span>    <span class="o">=</span> <span class="s2">&#34;A&#34;</span>
</span></span><span class="line"><span class="ln"> 5</span><span class="cl">
</span></span><span class="line"><span class="ln"> 6</span><span class="cl">  <span class="k">alias</span> {
</span></span><span class="line"><span class="ln"> 7</span><span class="cl"><span class="n">    name</span>                   <span class="o">=</span> <span class="k">aws_lb</span><span class="p">.</span><span class="k">api</span><span class="p">.</span><span class="k">dns_name</span>
</span></span><span class="line"><span class="ln"> 8</span><span class="cl"><span class="n">    zone_id</span>                <span class="o">=</span> <span class="k">aws_lb</span><span class="p">.</span><span class="k">api</span><span class="p">.</span><span class="k">zone_id</span>
</span></span><span class="line"><span class="ln"> 9</span><span class="cl"><span class="n">    evaluate_target_health</span> <span class="o">=</span> <span class="kt">true</span>
</span></span><span class="line"><span class="ln">10</span><span class="cl">  }
</span></span><span class="line"><span class="ln">11</span><span class="cl">}</span></span></code></pre></div><p><code>evaluate_target_health = true</code> 讓 Route 53 在 ALB 所有 target 都不健康時把這條記錄標為不健康。如果有多個 region 的 ALB 做了 failover routing，這個設定能讓 DNS 層自動切換到健康的 region — 屬於跨區域容災的地基，在 devops 模組展開。</p>
<h2 id="waf-與下一步">WAF 與下一步</h2>
<p>ALB 支援掛載 AWS WAF（Web Application Firewall），在流量進到應用之前先過一層規則 — 擋已知惡意 IP、防 SQL injection / XSS 的常見模式、限制單一 IP 的請求速率。WAF 的規則也可以寫進 IaC，讓「哪些流量被擋」成為可審查的程式碼而非 Console 上的設定。WAF 的詳細設計屬於安全層的範圍（見 <a href="/blog/backend/07-security-data-protection/" data-link-title="模組七：資安與資料保護" data-link-desc="以問題驅動方式擴充資安知識網：先定義服務環節問題，再以案例作為觸發式參考">backend 模組七：資安與資料保護</a>），這裡只確認它的掛載點是 ALB。</p>
<p>四類核心服務的 IaC 描述到此完成。下一步是讓這些服務可被觀測——log、metric、alarm 跟資源同生命週期建立，見<a href="/blog/infra/06-observability-logging/" data-link-title="模組六：可觀測性與 log 一併寫進 code" data-link-desc="log group、metric、alarm 跟基礎設施同生命週期管理，出事時追得到查得到">模組六：可觀測性與 log</a>。</p>
<h2 id="跨分類引用">跨分類引用</h2>
<ul>
<li>→ <a href="/blog/infra/03-network-foundation/" data-link-title="模組三：網路地基 — VPC 與分層" data-link-desc="VPC、public / private subnet 切分、route table、NAT、security group 設計">模組三：網路地基</a>：ALB 的 security group 設計，group-to-group 引用</li>
<li>→ <a href="/blog/infra/05-core-services/stateful-protection-dependency/" data-link-title="Stateful 資源保護與跨服務依賴表達" data-link-desc="stateful 資源的保護策略（multi-AZ、備份、刪除保護）、stateful 與 stateless 的操作差異，以及用 output 與 data source 表達服務間依賴">模組五：stateful 資源的保護策略</a>：ALB 是 stateless，但它引用的 ACM 憑證和 DNS 記錄有自己的生命週期考量</li>
<li>→ <a href="/blog/devops/01-load-balancing/" data-link-title="模組一：負載平衡與反向代理" data-link-desc="流量進來怎麼分給多個服務實例 — nginx / HAProxy / DNS round-robin 的選型和健康檢查路由設計">devops 模組一：負載平衡</a>：ALB 的運行期調校 — 跨 AZ 流量分配、connection draining、sticky session</li>
<li>→ <a href="/blog/backend/07-security-data-protection/" data-link-title="模組七：資安與資料保護" data-link-desc="以問題驅動方式擴充資安知識網：先定義服務環節問題，再以案例作為觸發式參考">backend 模組七：資安與資料保護</a>：WAF 規則設計</li>
</ul>
]]></content:encoded></item><item><title>ACM 憑證、DNS 與 HTTPS 設定</title><link>https://tarrragon.github.io/blog/infra/05-core-services/acm-tls-dns-setup/</link><pubDate>Fri, 26 Jun 2026 00:00:00 +0000</pubDate><guid>https://tarrragon.github.io/blog/infra/05-core-services/acm-tls-dns-setup/</guid><description>&lt;p>HTTPS 的運作需要三個元件配合：一個管理網域記錄的 DNS zone、一張證明網域所有權的 TLS 憑證、以及一個用這張憑證終結 TLS 連線的入口（ALB listener）。這三者在 IaC 裡各自是獨立資源，但建立順序有依賴——zone 先存在、憑證才能用 DNS 驗證、驗證通過才能掛到 listener。把這條鏈路寫進 Terraform，讓憑證的申請、驗證與續期都在版本控制裡，是避免「憑證過期才發現沒人盯」的結構性做法。&lt;/p>
&lt;h2 id="route-53-hosted-zone">Route 53 Hosted Zone&lt;/h2>
&lt;p>Hosted zone 是 Route 53 用來管理某個網域的 DNS 記錄集合。建立 zone 後，Route 53 會分配一組 NS（Name Server）記錄，網域的 DNS 解析就由這組 NS 負責。&lt;/p>
&lt;h3 id="public-vs-private-zone">Public vs Private Zone&lt;/h3>
&lt;p>Public hosted zone 對應的是可從網際網路解析的網域（如 &lt;code>example.com&lt;/code>），用於對外服務的 A / CNAME / MX 記錄。Private hosted zone 只在指定的 VPC 內可解析，用於內部服務發現（如 &lt;code>db.internal.example.com&lt;/code> 解析到 RDS 的 private IP）。多數專案兩者都需要：public zone 給對外流量、private zone 給內部服務互連。&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-hcl" data-lang="hcl">&lt;span class="line">&lt;span class="ln"> 1&lt;/span>&lt;span class="cl">&lt;span class="k">resource&lt;/span> &lt;span class="s2">&amp;#34;aws_route53_zone&amp;#34; &amp;#34;public&amp;#34;&lt;/span> {
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 2&lt;/span>&lt;span class="cl">&lt;span class="n"> name&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="s2">&amp;#34;example.com&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 3&lt;/span>&lt;span class="cl">&lt;span class="n"> tags&lt;/span> &lt;span class="o">=&lt;/span>&lt;span class="n"> { Environment&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="s2">&amp;#34;production&amp;#34;&lt;/span> }
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 4&lt;/span>&lt;span class="cl">}
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 5&lt;/span>&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 6&lt;/span>&lt;span class="cl">&lt;span class="k">resource&lt;/span> &lt;span class="s2">&amp;#34;aws_route53_zone&amp;#34; &amp;#34;private&amp;#34;&lt;/span> {
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 7&lt;/span>&lt;span class="cl">&lt;span class="n"> name&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="s2">&amp;#34;internal.example.com&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 8&lt;/span>&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 9&lt;/span>&lt;span class="cl"> &lt;span class="k">vpc&lt;/span> {
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">10&lt;/span>&lt;span class="cl">&lt;span class="n"> vpc_id&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="k">aws_vpc&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="k">main&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="k">id&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">11&lt;/span>&lt;span class="cl"> }
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">12&lt;/span>&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">13&lt;/span>&lt;span class="cl">&lt;span class="n"> tags&lt;/span> &lt;span class="o">=&lt;/span>&lt;span class="n"> { Environment&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="s2">&amp;#34;production&amp;#34;&lt;/span> }
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">14&lt;/span>&lt;span class="cl">}&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h3 id="子網域-delegation">子網域 delegation&lt;/h3>
&lt;p>當 dev / staging / prod 各用獨立帳號時，每個帳號建自己的 hosted zone 管理子網域（如 &lt;code>dev.example.com&lt;/code>）。父網域的 zone 需要加一組 NS 記錄指向子網域的 zone，這個動作叫 delegation。&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-hcl" data-lang="hcl">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">&lt;span class="k">resource&lt;/span> &lt;span class="s2">&amp;#34;aws_route53_record&amp;#34; &amp;#34;dev_ns&amp;#34;&lt;/span> {
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">2&lt;/span>&lt;span class="cl">&lt;span class="n"> zone_id&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="k">aws_route53_zone&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="k">public&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="k">zone_id&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">3&lt;/span>&lt;span class="cl">&lt;span class="n"> name&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="s2">&amp;#34;dev.example.com&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">4&lt;/span>&lt;span class="cl">&lt;span class="n"> type&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="s2">&amp;#34;NS&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">5&lt;/span>&lt;span class="cl">&lt;span class="n"> ttl&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="m">300&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">6&lt;/span>&lt;span class="cl">&lt;span class="n"> records&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="k">aws_route53_zone&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="k">dev&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="k">name_servers&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">7&lt;/span>&lt;span class="cl">}&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>delegation 的 NS 記錄指向子帳號 zone 的 name server。子帳號內的所有 DNS 記錄（如 &lt;code>api.dev.example.com&lt;/code>）由子帳號的 zone 管理，父帳號不需要逐條設定。跨帳號 delegation 需要兩邊的 Terraform 各自管理自己的 zone，NS 記錄在父帳號的 state 裡。&lt;/p>
&lt;p>判讀設定是否正確：用 &lt;code>dig dev.example.com NS&lt;/code> 查回的 name server 應該是子帳號 zone 的 NS，不是父帳號的。如果查回父帳號的 NS，代表 delegation 沒生效，子網域的 DNS 記錄不會被解析。&lt;/p>
&lt;h2 id="acm-憑證申請與-dns-驗證">ACM 憑證申請與 DNS 驗證&lt;/h2>
&lt;p>AWS Certificate Manager（ACM）提供免費的 TLS 憑證，條件是透過 DNS 或 email 驗證網域所有權。DNS 驗證是 IaC 友善的方式——ACM 要求在指定網域下建一條 CNAME 記錄，記錄值由 ACM 提供，驗證通過後憑證自動簽發。&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-hcl" data-lang="hcl">&lt;span class="line">&lt;span class="ln"> 1&lt;/span>&lt;span class="cl">&lt;span class="k">resource&lt;/span> &lt;span class="s2">&amp;#34;aws_acm_certificate&amp;#34; &amp;#34;main&amp;#34;&lt;/span> {
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 2&lt;/span>&lt;span class="cl">&lt;span class="n"> domain_name&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="s2">&amp;#34;example.com&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 3&lt;/span>&lt;span class="cl">&lt;span class="n"> subject_alternative_names&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="p">[&lt;/span>&lt;span class="s2">&amp;#34;*.example.com&amp;#34;&lt;/span>&lt;span class="p">]&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 4&lt;/span>&lt;span class="cl">&lt;span class="n"> validation_method&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="s2">&amp;#34;DNS&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 5&lt;/span>&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 6&lt;/span>&lt;span class="cl"> &lt;span class="k">lifecycle&lt;/span> {
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 7&lt;/span>&lt;span class="cl">&lt;span class="n"> create_before_destroy&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="kt">true&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 8&lt;/span>&lt;span class="cl"> }
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 9&lt;/span>&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">10&lt;/span>&lt;span class="cl">&lt;span class="n"> tags&lt;/span> &lt;span class="o">=&lt;/span>&lt;span class="n"> { Environment&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="s2">&amp;#34;production&amp;#34;&lt;/span> }
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">11&lt;/span>&lt;span class="cl">}&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>&lt;code>subject_alternative_names&lt;/code> 加 &lt;code>*.example.com&lt;/code> 讓同一張憑證涵蓋所有子網域（如 &lt;code>api.example.com&lt;/code>、&lt;code>admin.example.com&lt;/code>），省去為每個子網域各申請一張。&lt;/p></description><content:encoded><![CDATA[<p>HTTPS 的運作需要三個元件配合：一個管理網域記錄的 DNS zone、一張證明網域所有權的 TLS 憑證、以及一個用這張憑證終結 TLS 連線的入口（ALB listener）。這三者在 IaC 裡各自是獨立資源，但建立順序有依賴——zone 先存在、憑證才能用 DNS 驗證、驗證通過才能掛到 listener。把這條鏈路寫進 Terraform，讓憑證的申請、驗證與續期都在版本控制裡，是避免「憑證過期才發現沒人盯」的結構性做法。</p>
<h2 id="route-53-hosted-zone">Route 53 Hosted Zone</h2>
<p>Hosted zone 是 Route 53 用來管理某個網域的 DNS 記錄集合。建立 zone 後，Route 53 會分配一組 NS（Name Server）記錄，網域的 DNS 解析就由這組 NS 負責。</p>
<h3 id="public-vs-private-zone">Public vs Private Zone</h3>
<p>Public hosted zone 對應的是可從網際網路解析的網域（如 <code>example.com</code>），用於對外服務的 A / CNAME / MX 記錄。Private hosted zone 只在指定的 VPC 內可解析，用於內部服務發現（如 <code>db.internal.example.com</code> 解析到 RDS 的 private IP）。多數專案兩者都需要：public zone 給對外流量、private zone 給內部服務互連。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-hcl" data-lang="hcl"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="k">resource</span> <span class="s2">&#34;aws_route53_zone&#34; &#34;public&#34;</span> {
</span></span><span class="line"><span class="ln"> 2</span><span class="cl"><span class="n">  name</span> <span class="o">=</span> <span class="s2">&#34;example.com&#34;</span>
</span></span><span class="line"><span class="ln"> 3</span><span class="cl"><span class="n">  tags</span> <span class="o">=</span><span class="n"> { Environment</span> <span class="o">=</span> <span class="s2">&#34;production&#34;</span> }
</span></span><span class="line"><span class="ln"> 4</span><span class="cl">}
</span></span><span class="line"><span class="ln"> 5</span><span class="cl">
</span></span><span class="line"><span class="ln"> 6</span><span class="cl"><span class="k">resource</span> <span class="s2">&#34;aws_route53_zone&#34; &#34;private&#34;</span> {
</span></span><span class="line"><span class="ln"> 7</span><span class="cl"><span class="n">  name</span> <span class="o">=</span> <span class="s2">&#34;internal.example.com&#34;</span>
</span></span><span class="line"><span class="ln"> 8</span><span class="cl">
</span></span><span class="line"><span class="ln"> 9</span><span class="cl">  <span class="k">vpc</span> {
</span></span><span class="line"><span class="ln">10</span><span class="cl"><span class="n">    vpc_id</span> <span class="o">=</span> <span class="k">aws_vpc</span><span class="p">.</span><span class="k">main</span><span class="p">.</span><span class="k">id</span>
</span></span><span class="line"><span class="ln">11</span><span class="cl">  }
</span></span><span class="line"><span class="ln">12</span><span class="cl">
</span></span><span class="line"><span class="ln">13</span><span class="cl"><span class="n">  tags</span> <span class="o">=</span><span class="n"> { Environment</span> <span class="o">=</span> <span class="s2">&#34;production&#34;</span> }
</span></span><span class="line"><span class="ln">14</span><span class="cl">}</span></span></code></pre></div><h3 id="子網域-delegation">子網域 delegation</h3>
<p>當 dev / staging / prod 各用獨立帳號時，每個帳號建自己的 hosted zone 管理子網域（如 <code>dev.example.com</code>）。父網域的 zone 需要加一組 NS 記錄指向子網域的 zone，這個動作叫 delegation。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-hcl" data-lang="hcl"><span class="line"><span class="ln">1</span><span class="cl"><span class="k">resource</span> <span class="s2">&#34;aws_route53_record&#34; &#34;dev_ns&#34;</span> {
</span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="n">  zone_id</span> <span class="o">=</span> <span class="k">aws_route53_zone</span><span class="p">.</span><span class="k">public</span><span class="p">.</span><span class="k">zone_id</span>
</span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="n">  name</span>    <span class="o">=</span> <span class="s2">&#34;dev.example.com&#34;</span>
</span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="n">  type</span>    <span class="o">=</span> <span class="s2">&#34;NS&#34;</span>
</span></span><span class="line"><span class="ln">5</span><span class="cl"><span class="n">  ttl</span>     <span class="o">=</span> <span class="m">300</span>
</span></span><span class="line"><span class="ln">6</span><span class="cl"><span class="n">  records</span> <span class="o">=</span> <span class="k">aws_route53_zone</span><span class="p">.</span><span class="k">dev</span><span class="p">.</span><span class="k">name_servers</span>
</span></span><span class="line"><span class="ln">7</span><span class="cl">}</span></span></code></pre></div><p>delegation 的 NS 記錄指向子帳號 zone 的 name server。子帳號內的所有 DNS 記錄（如 <code>api.dev.example.com</code>）由子帳號的 zone 管理，父帳號不需要逐條設定。跨帳號 delegation 需要兩邊的 Terraform 各自管理自己的 zone，NS 記錄在父帳號的 state 裡。</p>
<p>判讀設定是否正確：用 <code>dig dev.example.com NS</code> 查回的 name server 應該是子帳號 zone 的 NS，不是父帳號的。如果查回父帳號的 NS，代表 delegation 沒生效，子網域的 DNS 記錄不會被解析。</p>
<h2 id="acm-憑證申請與-dns-驗證">ACM 憑證申請與 DNS 驗證</h2>
<p>AWS Certificate Manager（ACM）提供免費的 TLS 憑證，條件是透過 DNS 或 email 驗證網域所有權。DNS 驗證是 IaC 友善的方式——ACM 要求在指定網域下建一條 CNAME 記錄，記錄值由 ACM 提供，驗證通過後憑證自動簽發。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-hcl" data-lang="hcl"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="k">resource</span> <span class="s2">&#34;aws_acm_certificate&#34; &#34;main&#34;</span> {
</span></span><span class="line"><span class="ln"> 2</span><span class="cl"><span class="n">  domain_name</span>               <span class="o">=</span> <span class="s2">&#34;example.com&#34;</span>
</span></span><span class="line"><span class="ln"> 3</span><span class="cl"><span class="n">  subject_alternative_names</span> <span class="o">=</span> <span class="p">[</span><span class="s2">&#34;*.example.com&#34;</span><span class="p">]</span>
</span></span><span class="line"><span class="ln"> 4</span><span class="cl"><span class="n">  validation_method</span>         <span class="o">=</span> <span class="s2">&#34;DNS&#34;</span>
</span></span><span class="line"><span class="ln"> 5</span><span class="cl">
</span></span><span class="line"><span class="ln"> 6</span><span class="cl">  <span class="k">lifecycle</span> {
</span></span><span class="line"><span class="ln"> 7</span><span class="cl"><span class="n">    create_before_destroy</span> <span class="o">=</span> <span class="kt">true</span>
</span></span><span class="line"><span class="ln"> 8</span><span class="cl">  }
</span></span><span class="line"><span class="ln"> 9</span><span class="cl">
</span></span><span class="line"><span class="ln">10</span><span class="cl"><span class="n">  tags</span> <span class="o">=</span><span class="n"> { Environment</span> <span class="o">=</span> <span class="s2">&#34;production&#34;</span> }
</span></span><span class="line"><span class="ln">11</span><span class="cl">}</span></span></code></pre></div><p><code>subject_alternative_names</code> 加 <code>*.example.com</code> 讓同一張憑證涵蓋所有子網域（如 <code>api.example.com</code>、<code>admin.example.com</code>），省去為每個子網域各申請一張。</p>
<h3 id="dns-驗證記錄">DNS 驗證記錄</h3>
<p>ACM 簽發後會產出一組驗證用的 CNAME 記錄。用 Terraform 自動在 Route 53 建立這些記錄，讓驗證流程不需要手動操作：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-hcl" data-lang="hcl"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="k">resource</span> <span class="s2">&#34;aws_route53_record&#34; &#34;cert_validation&#34;</span> {
</span></span><span class="line"><span class="ln"> 2</span><span class="cl"><span class="n">  for_each</span> <span class="o">=</span> {
</span></span><span class="line"><span class="ln"> 3</span><span class="cl"><span class="n">    for dvo in aws_acm_certificate.main.domain_validation_options : dvo.domain_name</span> <span class="o">=</span><span class="err">&gt;</span> {
</span></span><span class="line"><span class="ln"> 4</span><span class="cl"><span class="n">      name</span>   <span class="o">=</span> <span class="k">dvo</span><span class="p">.</span><span class="k">resource_record_name</span>
</span></span><span class="line"><span class="ln"> 5</span><span class="cl"><span class="n">      record</span> <span class="o">=</span> <span class="k">dvo</span><span class="p">.</span><span class="k">resource_record_value</span>
</span></span><span class="line"><span class="ln"> 6</span><span class="cl"><span class="n">      type</span>   <span class="o">=</span> <span class="k">dvo</span><span class="p">.</span><span class="k">resource_record_type</span>
</span></span><span class="line"><span class="ln"> 7</span><span class="cl">    }
</span></span><span class="line"><span class="ln"> 8</span><span class="cl">  }
</span></span><span class="line"><span class="ln"> 9</span><span class="cl">
</span></span><span class="line"><span class="ln">10</span><span class="cl"><span class="n">  zone_id</span> <span class="o">=</span> <span class="k">aws_route53_zone</span><span class="p">.</span><span class="k">public</span><span class="p">.</span><span class="k">zone_id</span>
</span></span><span class="line"><span class="ln">11</span><span class="cl"><span class="n">  name</span>    <span class="o">=</span> <span class="k">each</span><span class="p">.</span><span class="k">value</span><span class="p">.</span><span class="k">name</span>
</span></span><span class="line"><span class="ln">12</span><span class="cl"><span class="n">  type</span>    <span class="o">=</span> <span class="k">each</span><span class="p">.</span><span class="k">value</span><span class="p">.</span><span class="k">type</span>
</span></span><span class="line"><span class="ln">13</span><span class="cl"><span class="n">  ttl</span>     <span class="o">=</span> <span class="m">300</span>
</span></span><span class="line"><span class="ln">14</span><span class="cl"><span class="n">  records</span> <span class="o">=</span> <span class="p">[</span><span class="k">each</span><span class="p">.</span><span class="k">value</span><span class="p">.</span><span class="k">record</span><span class="p">]</span>
</span></span><span class="line"><span class="ln">15</span><span class="cl">
</span></span><span class="line"><span class="ln">16</span><span class="cl"><span class="n">  allow_overwrite</span> <span class="o">=</span> <span class="kt">true</span>
</span></span><span class="line"><span class="ln">17</span><span class="cl">}
</span></span><span class="line"><span class="ln">18</span><span class="cl">
</span></span><span class="line"><span class="ln">19</span><span class="cl"><span class="k">resource</span> <span class="s2">&#34;aws_acm_certificate_validation&#34; &#34;main&#34;</span> {
</span></span><span class="line"><span class="ln">20</span><span class="cl"><span class="n">  certificate_arn</span>         <span class="o">=</span> <span class="k">aws_acm_certificate</span><span class="p">.</span><span class="k">main</span><span class="p">.</span><span class="k">arn</span>
</span></span><span class="line"><span class="ln">21</span><span class="cl"><span class="n">  validation_record_fqdns</span> <span class="o">=</span> <span class="p">[</span><span class="k">for</span> <span class="k">record</span> <span class="k">in</span> <span class="k">aws_route53_record</span><span class="p">.</span><span class="k">cert_validation</span> <span class="err">:</span> <span class="k">record</span><span class="p">.</span><span class="k">fqdn</span><span class="p">]</span>
</span></span><span class="line"><span class="ln">22</span><span class="cl">}</span></span></code></pre></div><p><code>aws_acm_certificate_validation</code> 資源會等到 ACM 確認驗證通過才算 apply 成功。如果 DNS 記錄設錯或 zone 的 NS delegation 有問題，這個資源會卡住直到 timeout——排查方向是先確認驗證 CNAME 記錄能被公網 DNS 解析。</p>
<h3 id="create_before_destroy">create_before_destroy</h3>
<p><code>lifecycle { create_before_destroy = true }</code> 在憑證需要替換時（如增加 SAN、更換網域），讓 Terraform 先建新憑證、再刪舊憑證。沒有這個設定，預設行為是先刪後建——刪除的瞬間 ALB listener 失去憑證，HTTPS 連線全部中斷直到新憑證驗證通過（可能要幾分鐘到幾十分鐘）。</p>
<h2 id="alb-https-listener">ALB HTTPS Listener</h2>
<p>憑證驗證通過後，把它掛到 ALB 的 HTTPS listener：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-hcl" data-lang="hcl"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="k">resource</span> <span class="s2">&#34;aws_lb_listener&#34; &#34;https&#34;</span> {
</span></span><span class="line"><span class="ln"> 2</span><span class="cl"><span class="n">  load_balancer_arn</span> <span class="o">=</span> <span class="k">aws_lb</span><span class="p">.</span><span class="k">main</span><span class="p">.</span><span class="k">arn</span>
</span></span><span class="line"><span class="ln"> 3</span><span class="cl"><span class="n">  port</span>              <span class="o">=</span> <span class="m">443</span>
</span></span><span class="line"><span class="ln"> 4</span><span class="cl"><span class="n">  protocol</span>          <span class="o">=</span> <span class="s2">&#34;HTTPS&#34;</span>
</span></span><span class="line"><span class="ln"> 5</span><span class="cl"><span class="n">  ssl_policy</span>        <span class="o">=</span> <span class="s2">&#34;ELBSecurityPolicy-TLS13-1-2-2021-06&#34;</span>
</span></span><span class="line"><span class="ln"> 6</span><span class="cl"><span class="n">  certificate_arn</span>   <span class="o">=</span> <span class="k">aws_acm_certificate_validation</span><span class="p">.</span><span class="k">main</span><span class="p">.</span><span class="k">certificate_arn</span>
</span></span><span class="line"><span class="ln"> 7</span><span class="cl">
</span></span><span class="line"><span class="ln"> 8</span><span class="cl">  <span class="k">default_action</span> {
</span></span><span class="line"><span class="ln"> 9</span><span class="cl"><span class="n">    type</span>             <span class="o">=</span> <span class="s2">&#34;forward&#34;</span>
</span></span><span class="line"><span class="ln">10</span><span class="cl"><span class="n">    target_group_arn</span> <span class="o">=</span> <span class="k">aws_lb_target_group</span><span class="p">.</span><span class="k">app</span><span class="p">.</span><span class="k">arn</span>
</span></span><span class="line"><span class="ln">11</span><span class="cl">  }
</span></span><span class="line"><span class="ln">12</span><span class="cl">}</span></span></code></pre></div><p><code>ssl_policy</code> 決定 TLS 版本與加密套件。<code>ELBSecurityPolicy-TLS13-1-2-2021-06</code> 支援 TLS 1.2 和 1.3、停用已知不安全的舊版協定。選型判準是相容性與安全性的平衡——TLS 1.3-only policy 最安全但可能排除舊版客戶端，多數場景用 1.2+1.3 的組合。</p>
<p><code>certificate_arn</code> 引用的是 <code>aws_acm_certificate_validation</code> 而非直接引用 <code>aws_acm_certificate</code>，確保 listener 只在憑證驗證通過後才建立。</p>
<h3 id="http--https-重導">HTTP → HTTPS 重導</h3>
<p>同時建立一個 HTTP listener，把所有 80 埠流量重導到 443：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-hcl" data-lang="hcl"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="k">resource</span> <span class="s2">&#34;aws_lb_listener&#34; &#34;http_redirect&#34;</span> {
</span></span><span class="line"><span class="ln"> 2</span><span class="cl"><span class="n">  load_balancer_arn</span> <span class="o">=</span> <span class="k">aws_lb</span><span class="p">.</span><span class="k">main</span><span class="p">.</span><span class="k">arn</span>
</span></span><span class="line"><span class="ln"> 3</span><span class="cl"><span class="n">  port</span>              <span class="o">=</span> <span class="m">80</span>
</span></span><span class="line"><span class="ln"> 4</span><span class="cl"><span class="n">  protocol</span>          <span class="o">=</span> <span class="s2">&#34;HTTP&#34;</span>
</span></span><span class="line"><span class="ln"> 5</span><span class="cl">
</span></span><span class="line"><span class="ln"> 6</span><span class="cl">  <span class="k">default_action</span> {
</span></span><span class="line"><span class="ln"> 7</span><span class="cl"><span class="n">    type</span> <span class="o">=</span> <span class="s2">&#34;redirect&#34;</span>
</span></span><span class="line"><span class="ln"> 8</span><span class="cl">    <span class="k">redirect</span> {
</span></span><span class="line"><span class="ln"> 9</span><span class="cl"><span class="n">      port</span>        <span class="o">=</span> <span class="s2">&#34;443&#34;</span>
</span></span><span class="line"><span class="ln">10</span><span class="cl"><span class="n">      protocol</span>    <span class="o">=</span> <span class="s2">&#34;HTTPS&#34;</span>
</span></span><span class="line"><span class="ln">11</span><span class="cl"><span class="n">      status_code</span> <span class="o">=</span> <span class="s2">&#34;HTTP_301&#34;</span>
</span></span><span class="line"><span class="ln">12</span><span class="cl">    }
</span></span><span class="line"><span class="ln">13</span><span class="cl">  }
</span></span><span class="line"><span class="ln">14</span><span class="cl">}</span></span></code></pre></div><p>301 永久重導讓瀏覽器記住後續直接走 HTTPS。security group 仍然需要開放 80 埠入站，否則重導不會發生——client 連 80 埠被擋、收到的是連線失敗而非重導回應。</p>
<h2 id="多網域與-san-憑證">多網域與 SAN 憑證</h2>
<p>一張 ACM 憑證最多支援 10 個 SAN（Subject Alternative Name）。多數場景用主網域 + wildcard（<code>example.com</code> + <code>*.example.com</code>）就夠用。如果有多個不同根網域（如 <code>example.com</code> 和 <code>example-app.com</code>），可以加進同一張憑證：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-hcl" data-lang="hcl"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="k">resource</span> <span class="s2">&#34;aws_acm_certificate&#34; &#34;multi_domain&#34;</span> {
</span></span><span class="line"><span class="ln"> 2</span><span class="cl"><span class="n">  domain_name</span>               <span class="o">=</span> <span class="s2">&#34;example.com&#34;</span>
</span></span><span class="line"><span class="ln"> 3</span><span class="cl"><span class="n">  subject_alternative_names</span> <span class="o">=</span> <span class="p">[</span>
</span></span><span class="line"><span class="ln"> 4</span><span class="cl">    <span class="s2">&#34;*.example.com&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 5</span><span class="cl">    <span class="s2">&#34;example-app.com&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 6</span><span class="cl">    <span class="s2">&#34;*.example-app.com&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 7</span><span class="cl">  <span class="p">]</span>
</span></span><span class="line"><span class="ln"> 8</span><span class="cl"><span class="n">  validation_method</span> <span class="o">=</span> <span class="s2">&#34;DNS&#34;</span>
</span></span><span class="line"><span class="ln"> 9</span><span class="cl">
</span></span><span class="line"><span class="ln">10</span><span class="cl">  <span class="k">lifecycle</span> {
</span></span><span class="line"><span class="ln">11</span><span class="cl"><span class="n">    create_before_destroy</span> <span class="o">=</span> <span class="kt">true</span>
</span></span><span class="line"><span class="ln">12</span><span class="cl">  }
</span></span><span class="line"><span class="ln">13</span><span class="cl">}</span></span></code></pre></div><p>每個 SAN 網域都需要獨立的 DNS 驗證記錄。如果不同網域在不同的 hosted zone 裡，驗證記錄的建立要分別指向各自的 zone。</p>
<p>當 SAN 數量超過 10、或不同網域的憑證需要獨立管理（不同 team 負責不同網域），改用 <code>aws_lb_listener_certificate</code> 額外掛載：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-hcl" data-lang="hcl"><span class="line"><span class="ln">1</span><span class="cl"><span class="k">resource</span> <span class="s2">&#34;aws_lb_listener_certificate&#34; &#34;additional&#34;</span> {
</span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="n">  listener_arn</span>    <span class="o">=</span> <span class="k">aws_lb_listener</span><span class="p">.</span><span class="k">https</span><span class="p">.</span><span class="k">arn</span>
</span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="n">  certificate_arn</span> <span class="o">=</span> <span class="k">aws_acm_certificate</span><span class="p">.</span><span class="k">other_domain</span><span class="p">.</span><span class="k">arn</span>
</span></span><span class="line"><span class="ln">4</span><span class="cl">}</span></span></code></pre></div><p>ALB 會根據 SNI（Server Name Indication）自動選擇匹配的憑證。</p>
<h2 id="穩定的-dns-別名記錄">穩定的 DNS 別名記錄</h2>
<p>ALB 重建後 DNS 名稱會改變，對外服務不應該直接用 ALB 的 DNS 名稱。用 Route 53 的 alias record 把穩定的網域名指向 ALB：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-hcl" data-lang="hcl"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="k">resource</span> <span class="s2">&#34;aws_route53_record&#34; &#34;app&#34;</span> {
</span></span><span class="line"><span class="ln"> 2</span><span class="cl"><span class="n">  zone_id</span> <span class="o">=</span> <span class="k">aws_route53_zone</span><span class="p">.</span><span class="k">public</span><span class="p">.</span><span class="k">zone_id</span>
</span></span><span class="line"><span class="ln"> 3</span><span class="cl"><span class="n">  name</span>    <span class="o">=</span> <span class="s2">&#34;api.example.com&#34;</span>
</span></span><span class="line"><span class="ln"> 4</span><span class="cl"><span class="n">  type</span>    <span class="o">=</span> <span class="s2">&#34;A&#34;</span>
</span></span><span class="line"><span class="ln"> 5</span><span class="cl">
</span></span><span class="line"><span class="ln"> 6</span><span class="cl">  <span class="k">alias</span> {
</span></span><span class="line"><span class="ln"> 7</span><span class="cl"><span class="n">    name</span>                   <span class="o">=</span> <span class="k">aws_lb</span><span class="p">.</span><span class="k">main</span><span class="p">.</span><span class="k">dns_name</span>
</span></span><span class="line"><span class="ln"> 8</span><span class="cl"><span class="n">    zone_id</span>                <span class="o">=</span> <span class="k">aws_lb</span><span class="p">.</span><span class="k">main</span><span class="p">.</span><span class="k">zone_id</span>
</span></span><span class="line"><span class="ln"> 9</span><span class="cl"><span class="n">    evaluate_target_health</span> <span class="o">=</span> <span class="kt">true</span>
</span></span><span class="line"><span class="ln">10</span><span class="cl">  }
</span></span><span class="line"><span class="ln">11</span><span class="cl">}</span></span></code></pre></div><p>alias record 不收費（一般的 A/CNAME 記錄每百萬次查詢 $0.40，alias 到 AWS 資源免費），且支援 zone apex（如 <code>example.com</code>，一般 CNAME 不支援 zone apex）。<code>evaluate_target_health = true</code> 讓 Route 53 在 ALB 不健康時停止回應該記錄，配合 failover routing 使用。</p>
<h2 id="憑證續期監控">憑證續期監控</h2>
<p>ACM 的 DNS 驗證憑證會自動續期——條件是驗證用的 CNAME 記錄仍然存在且可解析。只要那條記錄沒被刪掉，憑證到期前 60 天 ACM 會自動續期。</p>
<p>自動續期失敗的常見原因：驗證 CNAME 記錄被手動刪除、hosted zone 的 NS delegation 失效、或 zone 本身被刪除重建導致 NS 改變。用 CloudWatch alarm 監控憑證到期日，在自動續期失敗時提前收到通知：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-hcl" data-lang="hcl"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="k">resource</span> <span class="s2">&#34;aws_cloudwatch_metric_alarm&#34; &#34;cert_expiry&#34;</span> {
</span></span><span class="line"><span class="ln"> 2</span><span class="cl"><span class="n">  alarm_name</span>          <span class="o">=</span> <span class="s2">&#34;acm-cert-expiry-${aws_acm_certificate.main.domain_name}&#34;</span>
</span></span><span class="line"><span class="ln"> 3</span><span class="cl"><span class="n">  comparison_operator</span> <span class="o">=</span> <span class="s2">&#34;LessThanThreshold&#34;</span>
</span></span><span class="line"><span class="ln"> 4</span><span class="cl"><span class="n">  evaluation_periods</span>  <span class="o">=</span> <span class="m">1</span>
</span></span><span class="line"><span class="ln"> 5</span><span class="cl"><span class="n">  metric_name</span>         <span class="o">=</span> <span class="s2">&#34;DaysToExpiry&#34;</span>
</span></span><span class="line"><span class="ln"> 6</span><span class="cl"><span class="n">  namespace</span>           <span class="o">=</span> <span class="s2">&#34;AWS/CertificateManager&#34;</span>
</span></span><span class="line"><span class="ln"> 7</span><span class="cl"><span class="n">  period</span>              <span class="o">=</span> <span class="m">86400</span>
</span></span><span class="line"><span class="ln"> 8</span><span class="cl"><span class="n">  statistic</span>           <span class="o">=</span> <span class="s2">&#34;Minimum&#34;</span>
</span></span><span class="line"><span class="ln"> 9</span><span class="cl"><span class="n">  threshold</span>           <span class="o">=</span> <span class="m">30</span>
</span></span><span class="line"><span class="ln">10</span><span class="cl"><span class="n">  alarm_actions</span>       <span class="o">=</span> <span class="p">[</span><span class="k">aws_sns_topic</span><span class="p">.</span><span class="k">oncall</span><span class="p">.</span><span class="k">arn</span><span class="p">]</span>
</span></span><span class="line"><span class="ln">11</span><span class="cl">
</span></span><span class="line"><span class="ln">12</span><span class="cl"><span class="n">  dimensions</span> <span class="o">=</span> {
</span></span><span class="line"><span class="ln">13</span><span class="cl"><span class="n">    CertificateArn</span> <span class="o">=</span> <span class="k">aws_acm_certificate</span><span class="p">.</span><span class="k">main</span><span class="p">.</span><span class="k">arn</span>
</span></span><span class="line"><span class="ln">14</span><span class="cl">  }
</span></span><span class="line"><span class="ln">15</span><span class="cl">}</span></span></code></pre></div><p>這個 alarm 在憑證距離到期不足 30 天時觸發。正常情況下 ACM 在到期前 60 天就會完成續期，收到 30 天警報代表自動續期失敗了、需要人工介入確認驗證記錄。</p>
<h2 id="跨分類引用">跨分類引用</h2>
<ul>
<li>→ <a href="/blog/infra/05-core-services/loadbalancer-alb/" data-link-title="入口上 IaC — ALB、TLS 與健康檢查" data-link-desc="Application Load Balancer 的 listener、target group、健康檢查閾值設計，以及用 ACM 把 TLS 憑證的簽發、驗證與掛載整條鏈寫進版本控制">入口上 IaC — ALB</a>：ALB listener、target group、健康檢查的完整設定</li>
<li>→ <a href="/blog/infra/03-network-foundation/" data-link-title="模組三：網路地基 — VPC 與分層" data-link-desc="VPC、public / private subnet 切分、route table、NAT、security group 設計">模組三：網路地基</a>：ALB 所在的 public subnet 與 security group 設計</li>
<li>→ <a href="/blog/infra/07-infra-as-pr/" data-link-title="模組七：infra 走 PR 流程與自動化護欄" data-link-desc="infra 變更走 PR → plan → review diff → 合併 → apply，配 fmt / validate / tflint / checkov / tfsec 與 Atlantis 自動化，讓基礎設施可審查、可回溯、可交接">模組七：infra 走 PR 流程</a>：憑證與 DNS 變更走 PR review</li>
</ul>
]]></content:encoded></item><item><title>cert-manager</title><link>https://tarrragon.github.io/blog/backend/07-security-data-protection/vendors/cert-manager/</link><pubDate>Mon, 18 May 2026 00:00:00 +0000</pubDate><guid>https://tarrragon.github.io/blog/backend/07-security-data-protection/vendors/cert-manager/</guid><description>&lt;p>cert-manager 是 K8s 原生的 &lt;em>certificate lifecycle automation&lt;/em> — 把「拿 cert、放 cert、定期 renew」這條從以前需要 cron + certbot + 手動 reload 的鏈、轉成 &lt;em>declarative + controller pattern&lt;/em>。使用者在 cluster 內 apply 一個 &lt;code>Certificate&lt;/code> resource、cert-manager controller 自動跟 issuer 對話、把 cert 存進 Secret、在 lifetime 2/3 點觸發 renew。它把 cert 這件事接進 K8s 控制循環、跟 Pod / Service / Ingress 同等地位的 first-class resource、層級高於 certbot 的 K8s 移植。&lt;/p>
&lt;h2 id="服務定位">服務定位&lt;/h2>
&lt;p>cert-manager 的核心責任是 &lt;em>K8s cluster 內所有 cert 的生命週期治理&lt;/em>。從 Ingress / Gateway 對外 TLS、internal service mTLS、到 workload-level 短期 cert、都用同一套 declarative model 表達。Issuer 抽象讓底層 cert 來源可換 — 公開 cert 走 &lt;a href="https://tarrragon.github.io/blog/backend/07-security-data-protection/vendors/letsencrypt/" data-link-title="Let&amp;#39;s Encrypt" data-link-desc="免費 &amp;#43; 自動化的公共 ACME CA、90 天 TTL 強制自動化、跨雲跨平台 public TLS cert 的事實基礎">Let&amp;rsquo;s Encrypt&lt;/a> ACME、內部 cert 走 &lt;a href="https://tarrragon.github.io/blog/backend/07-security-data-protection/vendors/hashicorp-vault/" data-link-title="HashiCorp Vault" data-link-desc="Self-hosted secret management 與 dynamic credential / encryption-as-a-service / PKI engine、跨雲跨環境的 secret 控制面">Vault PKI engine&lt;/a> 或 self-signed CA、企業環境走 Venafi 或 AWS PCA — 上層 &lt;code>Certificate&lt;/code> spec 不變。&lt;/p>
&lt;p>跟 &lt;a href="https://tarrragon.github.io/blog/backend/07-security-data-protection/vendors/aws-acm/" data-link-title="AWS ACM" data-link-desc="AWS-managed certificate provisioning、DNS validation &amp;#43; auto-renewal、整合 ELB / CloudFront / API Gateway、Private CA 後端">AWS ACM&lt;/a> 的差異是 &lt;em>cert 的部署面&lt;/em>：ACM 是 AWS-managed cert、只能掛在 AWS service（ELB / CloudFront / API Gateway）、私鑰永不離 AWS；cert-manager 是 K8s-native client、cert 放在 cluster 內的 Secret、可以掛任何 ingress controller 或 workload mTLS。跟 Let&amp;rsquo;s Encrypt 的關係是 &lt;em>client vs issuer&lt;/em> — cert-manager 是 ACME client、Let&amp;rsquo;s Encrypt 是 ACME server、不是替代關係。跟 &lt;a href="https://tarrragon.github.io/blog/backend/07-security-data-protection/vendors/spire/" data-link-title="SPIRE" data-link-desc="SPIFFE Runtime Environment、attested workload identity、short-lived SVID &amp;#43; Trust Bundle、跨組織 federation">SPIRE&lt;/a> 的差異是 &lt;em>身份模型&lt;/em> — cert-manager 給 &lt;em>DNS-named cert&lt;/em>（CN / SAN 是 hostname）、SPIRE 給 &lt;em>SPIFFE ID-based workload identity&lt;/em>（&lt;code>spiffe://trust-domain/workload&lt;/code>）、兩者互補不衝突。&lt;/p></description><content:encoded><![CDATA[<p>cert-manager 是 K8s 原生的 <em>certificate lifecycle automation</em> — 把「拿 cert、放 cert、定期 renew」這條從以前需要 cron + certbot + 手動 reload 的鏈、轉成 <em>declarative + controller pattern</em>。使用者在 cluster 內 apply 一個 <code>Certificate</code> resource、cert-manager controller 自動跟 issuer 對話、把 cert 存進 Secret、在 lifetime 2/3 點觸發 renew。它把 cert 這件事接進 K8s 控制循環、跟 Pod / Service / Ingress 同等地位的 first-class resource、層級高於 certbot 的 K8s 移植。</p>
<h2 id="服務定位">服務定位</h2>
<p>cert-manager 的核心責任是 <em>K8s cluster 內所有 cert 的生命週期治理</em>。從 Ingress / Gateway 對外 TLS、internal service mTLS、到 workload-level 短期 cert、都用同一套 declarative model 表達。Issuer 抽象讓底層 cert 來源可換 — 公開 cert 走 <a href="/blog/backend/07-security-data-protection/vendors/letsencrypt/" data-link-title="Let&#39;s Encrypt" data-link-desc="免費 &#43; 自動化的公共 ACME CA、90 天 TTL 強制自動化、跨雲跨平台 public TLS cert 的事實基礎">Let&rsquo;s Encrypt</a> ACME、內部 cert 走 <a href="/blog/backend/07-security-data-protection/vendors/hashicorp-vault/" data-link-title="HashiCorp Vault" data-link-desc="Self-hosted secret management 與 dynamic credential / encryption-as-a-service / PKI engine、跨雲跨環境的 secret 控制面">Vault PKI engine</a> 或 self-signed CA、企業環境走 Venafi 或 AWS PCA — 上層 <code>Certificate</code> spec 不變。</p>
<p>跟 <a href="/blog/backend/07-security-data-protection/vendors/aws-acm/" data-link-title="AWS ACM" data-link-desc="AWS-managed certificate provisioning、DNS validation &#43; auto-renewal、整合 ELB / CloudFront / API Gateway、Private CA 後端">AWS ACM</a> 的差異是 <em>cert 的部署面</em>：ACM 是 AWS-managed cert、只能掛在 AWS service（ELB / CloudFront / API Gateway）、私鑰永不離 AWS；cert-manager 是 K8s-native client、cert 放在 cluster 內的 Secret、可以掛任何 ingress controller 或 workload mTLS。跟 Let&rsquo;s Encrypt 的關係是 <em>client vs issuer</em> — cert-manager 是 ACME client、Let&rsquo;s Encrypt 是 ACME server、不是替代關係。跟 <a href="/blog/backend/07-security-data-protection/vendors/spire/" data-link-title="SPIRE" data-link-desc="SPIFFE Runtime Environment、attested workload identity、short-lived SVID &#43; Trust Bundle、跨組織 federation">SPIRE</a> 的差異是 <em>身份模型</em> — cert-manager 給 <em>DNS-named cert</em>（CN / SAN 是 hostname）、SPIRE 給 <em>SPIFFE ID-based workload identity</em>（<code>spiffe://trust-domain/workload</code>）、兩者互補不衝突。</p>
<h2 id="本章目標">本章目標</h2>
<p>讀完本頁、讀者能判斷：</p>
<ol>
<li>cert-manager 用 Issuer / ClusterIssuer 哪個、配什麼 issuer backend（Let&rsquo;s Encrypt / Vault PKI / self-signed / 公司 CA）</li>
<li>Challenge solver 選 HTTP01 還是 DNS01、為什麼 wildcard cert 必須用 DNF01</li>
<li>Auto-renewal 觸發點、renew 失敗的 alert 時機、跟 Ingress / Gateway API 整合的 annotation</li>
<li>何時用 cert-manager、何時改走 <a href="/blog/backend/07-security-data-protection/vendors/aws-acm/" data-link-title="AWS ACM" data-link-desc="AWS-managed certificate provisioning、DNS validation &#43; auto-renewal、整合 ELB / CloudFront / API Gateway、Private CA 後端">ACM</a>（雲端原生 service）或 <a href="/blog/backend/07-security-data-protection/vendors/spire/" data-link-title="SPIRE" data-link-desc="SPIFFE Runtime Environment、attested workload identity、short-lived SVID &#43; Trust Bundle、跨組織 federation">SPIRE</a>（workload identity）</li>
</ol>
<h2 id="最短判讀路徑">最短判讀路徑</h2>
<p>判斷 cert-manager 部署是否健康、最少看四件事：</p>
<ul>
<li><strong>Issuer 配置</strong>：是 <code>ClusterIssuer</code>（cluster-wide）還是 <code>Issuer</code>（namespace-scoped）、backend 是哪一種（acme / vault / ca / venafi）、credential（ACME private key、Vault token、CA cert）放哪、RBAC 限制誰能參考這個 issuer</li>
<li><strong>Certificate spec</strong>：<code>dnsNames</code> / <code>ipAddresses</code> 跟實際 service 一致、<code>duration</code> 跟 <code>renewBefore</code> 比例合理（renewBefore &gt;= duration / 3）、<code>secretName</code> 指向的 Secret 是不是 ingress 真的會讀的那個</li>
<li><strong>Renewal 觸發</strong>：controller log 有沒有按時觸發 renew、<code>kubectl describe certificate</code> 的 <code>Renewal Time</code> 接近沒、Challenge resource 沒有卡在 pending</li>
<li><strong>Challenge solver</strong>：HTTP01 的 ingress / Gateway 80 port 真的能被 Let&rsquo;s Encrypt 從 Internet 打到、DNS01 用的 cloud provider credential 還有效、wildcard cert 沒誤用 HTTP01</li>
</ul>
<p>四件事任一缺失、cert 就會在不知不覺中過期、production 看到 <code>x509: certificate has expired</code> 才驚覺、是 <a href="/blog/backend/07-security-data-protection/transport-trust-and-certificate-lifecycle/" data-link-title="7.5 傳輸信任與憑證生命週期" data-link-desc="以問題驅動方式整理傳輸信任鏈、會話完整性與憑證節奏">Transport Trust and Certificate Lifecycle</a> 的典型缺口。</p>
<h2 id="日常操作與決策形狀">日常操作與決策形狀</h2>
<p><strong>Issuer vs ClusterIssuer 的選擇</strong>：<code>Issuer</code> 是 namespace-scoped、只能 issue 該 namespace 的 cert、適合 <em>單 team 自管 issuer credential</em> 的場景；<code>ClusterIssuer</code> 是 cluster-wide、所有 namespace 都可以參考、適合 <em>平台 team 統一管理 issuer</em>。production 通常用 ClusterIssuer 配特定 issuer backend + RBAC 收 <code>Certificate</code> 建立權（讓 application team 只能在自己 namespace 建 Certificate、不能改 ClusterIssuer）。</p>
<p><strong>Certificate spec 設計</strong>：<code>dnsNames</code> 列出該 cert 涵蓋的 hostname（支援 wildcard <code>*.example.com</code>）、<code>ipAddresses</code> 加 IP SAN（mTLS 跨 service 常用）、<code>duration</code> 是 cert 有效期、<code>renewBefore</code> 是提前多久 renew（預設 duration 的 1/3）。短期 cert（hours-level、Vault PKI 常用）配 <code>renewBefore</code> 短、長期 cert（90 天、Let&rsquo;s Encrypt）配 <code>renewBefore</code> 30 天。<code>secretName</code> 指向 cert-manager 會寫入的 Secret、Ingress 跟 workload 從這個 Secret 讀。</p>
<p><strong>Challenge solver 的選擇</strong>：ACME issuer（Let&rsquo;s Encrypt）需要證明 <em>你控制這個 domain</em>、有兩個方法：HTTP01（在 <code>http://yourdomain/.well-known/acme-challenge/&lt;token&gt;</code> 放檔案、Let&rsquo;s Encrypt 從 Internet 來抓）跟 DNS01（在 DNS zone 加 <code>_acme-challenge.yourdomain TXT &lt;token&gt;</code> record、Let&rsquo;s Encrypt 查 DNS）。<strong>wildcard cert（<code>*.example.com</code>）必須用 DNS01</strong>、HTTP01 不支援 wildcard 因為 Let&rsquo;s Encrypt 不知道要打哪個 subdomain。HTTP01 要求 ingress controller 80 port 對 Internet 開放、DNS01 要求 cluster 有 cloud DNS API credential。</p>
<p><strong>Auto-renewal 機制</strong>：cert-manager 在 cert lifetime 達到 <code>(duration - renewBefore)</code> 時間時觸發 renew、預設約 lifetime 2/3 點。Let&rsquo;s Encrypt cert 90 天 = 60 天時開始嘗試 renew、留 30 天緩衝給 renew 失敗的重試。renew 失敗會持續重試（exponential backoff、最長 8 小時間隔）、剩下 ~7 天時 controller log 開始 ERROR 級別 alert — 監控要 hook 進這個 log 訊號、否則 cert 真的過期才知道就太晚。</p>
<p><strong>跟 Ingress 整合</strong>：Ingress resource 加 annotation <code>cert-manager.io/cluster-issuer: letsencrypt-prod</code>（或 <code>cert-manager.io/issuer:</code>）、cert-manager 看到 Ingress 的 <code>tls.hosts</code> 自動建立對應 Certificate、issue 完寫進 <code>tls.secretName</code> 指定的 Secret、ingress controller 自動 reload 用新 cert。Gateway API 的整合機制類似、用 <code>cert-manager.io/issuer</code> annotation 在 <code>Gateway</code> resource。</p>
<p><strong>CertificateRequest Approval Policy（v1.4+）</strong>：每個 Certificate 建立會產生 CertificateRequest、由 Approver 決定要不要送給 issuer。預設 cert-manager 內建 approver 自動 approve、但可以加 admission policy（Kyverno / OPA / 自寫 webhook）限制「誰能在哪個 namespace 建什麼 SAN 的 cert」— 防 internal compromise 任意 issue cert 對外冒名。production 環境通常會在 platform-level 鎖 wildcard cert、防 application team 誤建涵蓋整個 zone 的 cert。</p>
<h2 id="核心取捨表">核心取捨表</h2>
<table>
  <thead>
      <tr>
          <th>取捨維度</th>
          <th>cert-manager</th>
          <th>AWS ACM</th>
          <th>手動 certbot / OpenSSL</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>部署模型</td>
          <td>K8s controller、declarative <code>Certificate</code> resource</td>
          <td>AWS managed、Console / API request</td>
          <td>手動跑 CLI、cron 跑 renew</td>
      </tr>
      <tr>
          <td>Cert 部署面</td>
          <td>K8s Secret、任何 ingress controller / workload</td>
          <td>只能掛 ELB / CloudFront / API Gateway</td>
          <td>任何地方、但 deploy 要自己做</td>
      </tr>
      <tr>
          <td>Issuer 彈性</td>
          <td>多 issuer（ACME / Vault / Venafi / CA / AWS PCA）</td>
          <td>只能 Amazon CA</td>
          <td>任何 ACME provider、但要手寫 hook</td>
      </tr>
      <tr>
          <td>Auto-renewal</td>
          <td>內建 controller、預設 2/3 lifetime 點 renew</td>
          <td>AWS 自動 renew（DNS-validated only）</td>
          <td>自己寫 cron + reload script</td>
      </tr>
      <tr>
          <td>Wildcard 支援</td>
          <td>走 DNS01 challenge</td>
          <td>支援、需 DNS 驗證</td>
          <td>走 DNS01 hook</td>
      </tr>
      <tr>
          <td>私鑰位置</td>
          <td>K8s Secret（cluster 內、需 RBAC + etcd encryption）</td>
          <td>AWS 內、不可 export</td>
          <td>Local filesystem、要自己管</td>
      </tr>
      <tr>
          <td>適合場景</td>
          <td>K8s cluster 內所有 cert、跨 issuer、internal mTLS</td>
          <td>AWS-only serving cert（ELB / CDN）</td>
          <td>非 K8s 的 server、舊系統</td>
      </tr>
      <tr>
          <td>退場成本</td>
          <td>中 — 改其他 ACME client 或回手動</td>
          <td>高 — 私鑰拿不出來、要重新 issue</td>
          <td>低 — 完全自管</td>
      </tr>
  </tbody>
</table>
<p>選 cert-manager 的核心訴求：<em>cluster 內 cert 跨 issuer 統一管理 + 自動 renew + 跟 Ingress / Gateway declarative 整合</em>。如果 cert 完全給 AWS service 用、不進 K8s workload、ACM 更簡單（不用裝 controller、AWS 自動處理）。如果是非 K8s 環境（VM、bare-metal Nginx）、certbot + cron 仍是合理選擇、不需要為了 cert 跑 K8s controller。</p>
<h2 id="進階主題">進階主題</h2>
<p><strong>DNS01 challenge 跟 cloud DNS 整合</strong>：cert-manager 支援多家 cloud DNS provider 作為 DNS01 solver — Route53、Cloud DNS（GCP）、Azure DNS、Cloudflare、ACMEDNS（自管 DNS proxy）。每個 provider 需要 <em>DNS zone 寫入 credential</em>（IAM role、service account key、API token）— 這份 credential 等於 <em>任意改該 zone DNS record 的權力</em>、blast radius 大、要走 <a href="/blog/backend/07-security-data-protection/identity-access-boundary/" data-link-title="7.2 身分與授權邊界" data-link-desc="以問題驅動方式整理身分、授權、會話與供應商身分鏈">least privilege</a> 限定到 specific zone + 只給 TXT record write、不要全 zone 全 record type。</p>
<p><strong>跟 Vault PKI engine 整合</strong>：cert-manager 可用 <a href="/blog/backend/07-security-data-protection/vendors/hashicorp-vault/" data-link-title="HashiCorp Vault" data-link-desc="Self-hosted secret management 與 dynamic credential / encryption-as-a-service / PKI engine、跨雲跨環境的 secret 控制面">Vault PKI engine</a> 作為 issuer backend — 在 cluster 內建 <code>Issuer</code> / <code>ClusterIssuer</code> type 為 <code>vault</code>、指向 Vault address + PKI mount path + auth method（Kubernetes auth / AppRole）。每張 cert 的 issue / revoke 都進 Vault audit log、跟 secret rotation 用同一套 evidence chain（呼應 <a href="/blog/backend/07-security-data-protection/credential-rotation-scoped-evidence/" data-link-title="7.27 Credential Rotation with Scoped Evidence 實作示範" data-link-desc="以 webhook/API credential 輪替示範 scope map、證據欄位與回退窗口如何一起設計。">Credential Rotation Scoped Evidence</a>）。typical 用法：short-lived workload mTLS cert（hours-level duration、minutes-level renewBefore）、靠 Vault PKI 短期 cert + cert-manager 自動換。</p>
<p><strong>跟 SPIRE 的互補</strong>：cert-manager 自動更新 cert、但 <em>cert 是給人讀的 DNS name</em>；SPIRE 自動建立 workload identity、<em>identity 是 SPIFFE ID</em>。兩者解不同問題 — cert-manager 解「Ingress / external API 的 TLS」、SPIRE 解「service A 要怎麼證明自己是 A 給 service B 看」。production 環境常 <em>並存</em>：edge cert 跟 user-facing TLS 用 cert-manager + Let&rsquo;s Encrypt、internal service mesh 用 SPIRE + SPIFFE。</p>
<p><strong>Trust bundle 管理（trust-manager）</strong>：trust-manager 是 cert-manager 姐妹專案、解決 <em>trust anchor（root CA bundle）跨 namespace 同步</em> 問題。傳統做法是每個 pod ConfigMap 各自塞 CA bundle、更新時要逐個改；trust-manager 提供 <code>Bundle</code> resource 一處定義、自動 distribute 到指定 namespace 的 ConfigMap。對應 <em>cert rotation</em> 跟 <em>CA rotation</em> 是兩條獨立 chain、後者是 trust-manager 的領域。</p>
<h2 id="排錯與失敗快速判讀">排錯與失敗快速判讀</h2>
<ul>
<li><strong>Challenge 卡在 pending</strong>：HTTP01 卡 = ingress 80 port 沒對 Internet、firewall / NLB 沒開、redirect 80→443 把 challenge 也轉了；DNS01 卡 = DNS provider credential 過期、IAM 沒 zone write 權、<code>_acme-challenge</code> record 沒寫進去 — <code>kubectl describe challenge</code> 看 reason</li>
<li><strong>Wildcard cert 用 HTTP01</strong>：申請失敗 + log 寫 &ldquo;wildcard not supported with HTTP-01&rdquo; — 改 DNS01 solver</li>
<li><strong>renewBefore 太短</strong>：renew 失敗只剩幾天才 alert、實際過期前來不及處理 — <code>renewBefore</code> 至少 duration / 3、production cert 給 30 天</li>
<li><strong>Secret 沒被 ingress 讀到</strong>：Certificate 已 Ready 但 ingress 還用舊 cert — ingress <code>tls.secretName</code> 拼錯、ingress controller 沒 reload、TLS handshake 用的 SNI 沒匹配</li>
<li><strong>ACME rate limit 撞牆</strong>：<a href="/blog/backend/07-security-data-protection/vendors/letsencrypt/" data-link-title="Let&#39;s Encrypt" data-link-desc="免費 &#43; 自動化的公共 ACME CA、90 天 TTL 強制自動化、跨雲跨平台 public TLS cert 的事實基礎">Let&rsquo;s Encrypt rate limit</a> 每週同 domain 50 cert / 同 account 300 pending — 反覆建錯 Certificate 重 issue 會撞、staging environment 用 <code>letsencrypt-staging</code> issuer 測過再上 prod</li>
<li><strong>ClusterIssuer 被 application team 誤改</strong>：沒設 RBAC、任何 namespace 都能 patch ClusterIssuer — 用 admission policy 鎖 ClusterIssuer 變更權給 platform team</li>
<li><strong>Approval Policy 缺失</strong>：任何 namespace 能建 wildcard cert、internal compromise 拿到 K8s API token 就能 issue 假冒 cert — 上 CertificateRequest Approval Policy + Kyverno / OPA rule</li>
</ul>
<h2 id="何時改走其他服務">何時改走其他服務</h2>
<table>
  <thead>
      <tr>
          <th>需求形狀</th>
          <th>改走</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>AWS-only serving cert（ELB / CloudFront）</td>
          <td><a href="/blog/backend/07-security-data-protection/vendors/aws-acm/" data-link-title="AWS ACM" data-link-desc="AWS-managed certificate provisioning、DNS validation &#43; auto-renewal、整合 ELB / CloudFront / API Gateway、Private CA 後端">AWS ACM</a></td>
      </tr>
      <tr>
          <td>非 K8s 環境（VM、bare-metal）的 ACME cert</td>
          <td>certbot / acme.sh / <a href="/blog/backend/07-security-data-protection/vendors/letsencrypt/" data-link-title="Let&#39;s Encrypt" data-link-desc="免費 &#43; 自動化的公共 ACME CA、90 天 TTL 強制自動化、跨雲跨平台 public TLS cert 的事實基礎">Let&rsquo;s Encrypt</a> 直接用</td>
      </tr>
      <tr>
          <td>Workload identity（不是 DNS-named cert）</td>
          <td><a href="/blog/backend/07-security-data-protection/vendors/spire/" data-link-title="SPIRE" data-link-desc="SPIFFE Runtime Environment、attested workload identity、short-lived SVID &#43; Trust Bundle、跨組織 federation">SPIRE</a>（SPIFFE-based）</td>
      </tr>
      <tr>
          <td>大量短期 internal cert + 完整 PKI 治理</td>
          <td><a href="/blog/backend/07-security-data-protection/vendors/hashicorp-vault/" data-link-title="HashiCorp Vault" data-link-desc="Self-hosted secret management 與 dynamic credential / encryption-as-a-service / PKI engine、跨雲跨環境的 secret 控制面">Vault PKI engine</a>（可配 cert-manager 為 client）</td>
      </tr>
      <tr>
          <td>公司既有 enterprise CA（Venafi / DigiCert）</td>
          <td>cert-manager + Venafi issuer / 商用 issuer plugin</td>
      </tr>
      <tr>
          <td>全公司 cert rotation 證據鏈</td>
          <td><a href="/blog/backend/07-security-data-protection/credential-rotation-scoped-evidence/" data-link-title="7.27 Credential Rotation with Scoped Evidence 實作示範" data-link-desc="以 webhook/API credential 輪替示範 scope map、證據欄位與回退窗口如何一起設計。">7.5 Credential Rotation Scoped Evidence</a></td>
      </tr>
  </tbody>
</table>
<h2 id="不在本頁內的主題">不在本頁內的主題</h2>
<ul>
<li>cert-manager Helm chart 的所有 value 細節跟版本相容性矩陣</li>
<li>每個 issuer backend 的完整 schema（acme / vault / venafi / ca / selfSigned）</li>
<li>Gateway API 跟 Ingress API 的 cert-manager annotation 完整對照</li>
<li>ACME RFC 8555 protocol 細節（HTTP01 / DNS01 / TLS-ALPN-01 challenge mechanism）</li>
<li>trust-manager 的 Bundle source 種類（inMemory / secret / configMap / defaultPackage）</li>
</ul>
<h2 id="案例回寫">案例回寫</h2>
<p>cert-manager 在 07 案例庫沒有直接 vendor-level 事件、以下案例採對照引用：</p>
<table>
  <thead>
      <tr>
          <th>案例</th>
          <th>跟 cert-manager 的關係（對照）</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td><a href="/blog/backend/07-security-data-protection/transport-trust-and-certificate-lifecycle/" data-link-title="7.5 傳輸信任與憑證生命週期" data-link-desc="以問題驅動方式整理傳輸信任鏈、會話完整性與憑證節奏">Transport Trust and Certificate Lifecycle (section)</a></td>
          <td>cert-manager 是 cert lifecycle automation 的具體實作 — auto-renewal + Challenge solver + Approval Policy 是 lifecycle 治理三層機制</td>
      </tr>
      <tr>
          <td><a href="/blog/backend/07-security-data-protection/credential-rotation-scoped-evidence/" data-link-title="7.27 Credential Rotation with Scoped Evidence 實作示範" data-link-desc="以 webhook/API credential 輪替示範 scope map、證據欄位與回退窗口如何一起設計。">Credential Rotation Scoped Evidence (section)</a></td>
          <td>cert-manager 的 renewal 自動但 <em>revocation 流程不自動</em> — 舊 cert 失效後 fleet 層級 trust bundle update 是另一條 chain、走 trust-manager</td>
      </tr>
      <tr>
          <td><a href="/blog/backend/07-security-data-protection/red-team/cases/edge-exposure/citrix-bleed-2023-session-hijack/" data-link-title="7.R7.3.3 Citrix Bleed 2023：會話被劫持與重放風險" data-link-desc="邊界設備會話資料外洩後，如何演變成帳號與服務風險">Citrix Bleed 2023 Session Hijack</a></td>
          <td>對照啟示 — cert 更新後 session 仍可能延續、cert-manager 只管 cert lifecycle、session invalidation 是另一層責任、不要把 cert rotation 當 session 失效手段</td>
      </tr>
  </tbody>
</table>
<h2 id="下一步路由">下一步路由</h2>
<ul>
<li>上游：<a href="/blog/backend/07-security-data-protection/secrets-and-machine-credential-governance/" data-link-title="7.6 秘密管理與機器憑證治理" data-link-desc="以問題驅動方式整理 secret、token、key 與機器身份治理">7.6 秘密管理與機器憑證治理</a>、<a href="/blog/backend/07-security-data-protection/transport-trust-and-certificate-lifecycle/" data-link-title="7.5 傳輸信任與憑證生命週期" data-link-desc="以問題驅動方式整理傳輸信任鏈、會話完整性與憑證節奏">Transport Trust and Certificate Lifecycle</a></li>
<li>平行：<a href="/blog/backend/07-security-data-protection/vendors/letsencrypt/" data-link-title="Let&#39;s Encrypt" data-link-desc="免費 &#43; 自動化的公共 ACME CA、90 天 TTL 強制自動化、跨雲跨平台 public TLS cert 的事實基礎">Let&rsquo;s Encrypt</a>（ACME issuer）、<a href="/blog/backend/07-security-data-protection/vendors/aws-acm/" data-link-title="AWS ACM" data-link-desc="AWS-managed certificate provisioning、DNS validation &#43; auto-renewal、整合 ELB / CloudFront / API Gateway、Private CA 後端">AWS ACM</a>（AWS-managed cert）、<a href="/blog/backend/07-security-data-protection/vendors/spire/" data-link-title="SPIRE" data-link-desc="SPIFFE Runtime Environment、attested workload identity、short-lived SVID &#43; Trust Bundle、跨組織 federation">SPIRE</a>（workload identity）</li>
<li>下游：<a href="/blog/backend/07-security-data-protection/vendors/hashicorp-vault/" data-link-title="HashiCorp Vault" data-link-desc="Self-hosted secret management 與 dynamic credential / encryption-as-a-service / PKI engine、跨雲跨環境的 secret 控制面">HashiCorp Vault</a>（PKI engine 作為 issuer backend）</li>
<li>跨模組：<a href="/blog/backend/08-incident-response/vendors/" data-link-title="事故處理 Vendor 清單" data-link-desc="規劃 on-call、incident response、status page 與 postmortem 工具的服務頁撰寫順序與判準">8 事故處理 vendor 清單</a>（cert 過期 / mis-issue 事件如何 routing）</li>
<li>官方：<a href="https://cert-manager.io/docs/">cert-manager Documentation</a></li>
</ul>
]]></content:encoded></item><item><title>AWS ACM</title><link>https://tarrragon.github.io/blog/backend/07-security-data-protection/vendors/aws-acm/</link><pubDate>Mon, 18 May 2026 00:00:00 +0000</pubDate><guid>https://tarrragon.github.io/blog/backend/07-security-data-protection/vendors/aws-acm/</guid><description>&lt;p>AWS Certificate Manager (ACM) 是 AWS-managed 的 &lt;em>certificate provisioning 服務&lt;/em>、解決兩件事：&lt;em>public TLS cert 全自動化&lt;/em>（Amazon Trust Services 簽發、DNS validation 通過後 60 天前自動 renew）跟 &lt;em>AWS-managed service 的 cert 整合&lt;/em>（&lt;a href="https://docs.aws.amazon.com/acm/latest/userguide/acm-services.html">ELB / CloudFront / API Gateway / App Runner&lt;/a> 直接 attach、不需要客戶持有私鑰）。內部 mTLS / 自管 endpoint 的 private cert 走另一個產品 ACM Private CA（PCA）— ACM 是 &lt;em>frontend&lt;/em>、PCA 是 &lt;em>自管 CA hierarchy backend&lt;/em>。&lt;/p>
&lt;h2 id="服務定位">服務定位&lt;/h2>
&lt;p>ACM 的核心定位是 &lt;em>AWS 平台內 cert 的全託管 lifecycle&lt;/em>。客戶不持私鑰、不跑 ACME client、不手動 renew — 但代價是 ACM public cert &lt;em>只能 attach 到 AWS-managed service&lt;/em>（ELB / CloudFront / API Gateway / App Runner / Nitro Enclaves）、不能 export 給自管 Nginx / EC2 應用。Private cert 必須有 ACM Private CA (PCA) 後端、ACM 自己不是 CA。&lt;/p>
&lt;p>跟其他 cert 工具的場景重疊度低、定位是分工互補：&lt;a href="https://tarrragon.github.io/blog/backend/07-security-data-protection/vendors/cert-manager/" data-link-title="cert-manager" data-link-desc="K8s 原生 certificate lifecycle automation、支援 Let&amp;#39;s Encrypt / Vault PKI / Venafi 等多 issuer、auto-renewal &amp;#43; Challenge solver">cert-manager&lt;/a> 走 cluster 內 K8s workload cert（Ingress / service mesh）、&lt;a href="https://tarrragon.github.io/blog/backend/07-security-data-protection/vendors/letsencrypt/" data-link-title="Let&amp;#39;s Encrypt" data-link-desc="免費 &amp;#43; 自動化的公共 ACME CA、90 天 TTL 強制自動化、跨雲跨平台 public TLS cert 的事實基礎">Let&amp;rsquo;s Encrypt&lt;/a> 走跨平台公共 ACME cert（可 export 任何地方使用）、ACM Private CA 走自管 CA hierarchy（root + intermediate、客戶控制 policy）。常見組合：AWS-native endpoint 用 ACM、K8s workload + 自管伺服器走 cert-manager + Let&amp;rsquo;s Encrypt、內部 mTLS root 走 PCA。詳細差異見「核心取捨表」。&lt;/p>
&lt;h2 id="本章目標">本章目標&lt;/h2>
&lt;p>讀完本頁、讀者能判斷：&lt;/p>
&lt;ol>
&lt;li>ACM public cert vs private cert vs imported cert 各自的使用邊界（能 attach 哪些 service、能不能 export）&lt;/li>
&lt;li>DNS validation vs Email validation 的差異、跟 auto-renewal 條件的關聯&lt;/li>
&lt;li>跨 region 跟 CloudFront 的 us-east-1 限制如何處理&lt;/li>
&lt;li>何時 ACM 不夠用、要改走 cert-manager / Let&amp;rsquo;s Encrypt / ACM Private CA&lt;/li>
&lt;/ol>
&lt;h2 id="最短判讀路徑">最短判讀路徑&lt;/h2>
&lt;p>判斷 ACM cert 部署是否健康、最少看四件事：&lt;/p></description><content:encoded><![CDATA[<p>AWS Certificate Manager (ACM) 是 AWS-managed 的 <em>certificate provisioning 服務</em>、解決兩件事：<em>public TLS cert 全自動化</em>（Amazon Trust Services 簽發、DNS validation 通過後 60 天前自動 renew）跟 <em>AWS-managed service 的 cert 整合</em>（<a href="https://docs.aws.amazon.com/acm/latest/userguide/acm-services.html">ELB / CloudFront / API Gateway / App Runner</a> 直接 attach、不需要客戶持有私鑰）。內部 mTLS / 自管 endpoint 的 private cert 走另一個產品 ACM Private CA（PCA）— ACM 是 <em>frontend</em>、PCA 是 <em>自管 CA hierarchy backend</em>。</p>
<h2 id="服務定位">服務定位</h2>
<p>ACM 的核心定位是 <em>AWS 平台內 cert 的全託管 lifecycle</em>。客戶不持私鑰、不跑 ACME client、不手動 renew — 但代價是 ACM public cert <em>只能 attach 到 AWS-managed service</em>（ELB / CloudFront / API Gateway / App Runner / Nitro Enclaves）、不能 export 給自管 Nginx / EC2 應用。Private cert 必須有 ACM Private CA (PCA) 後端、ACM 自己不是 CA。</p>
<p>跟其他 cert 工具的場景重疊度低、定位是分工互補：<a href="/blog/backend/07-security-data-protection/vendors/cert-manager/" data-link-title="cert-manager" data-link-desc="K8s 原生 certificate lifecycle automation、支援 Let&#39;s Encrypt / Vault PKI / Venafi 等多 issuer、auto-renewal &#43; Challenge solver">cert-manager</a> 走 cluster 內 K8s workload cert（Ingress / service mesh）、<a href="/blog/backend/07-security-data-protection/vendors/letsencrypt/" data-link-title="Let&#39;s Encrypt" data-link-desc="免費 &#43; 自動化的公共 ACME CA、90 天 TTL 強制自動化、跨雲跨平台 public TLS cert 的事實基礎">Let&rsquo;s Encrypt</a> 走跨平台公共 ACME cert（可 export 任何地方使用）、ACM Private CA 走自管 CA hierarchy（root + intermediate、客戶控制 policy）。常見組合：AWS-native endpoint 用 ACM、K8s workload + 自管伺服器走 cert-manager + Let&rsquo;s Encrypt、內部 mTLS root 走 PCA。詳細差異見「核心取捨表」。</p>
<h2 id="本章目標">本章目標</h2>
<p>讀完本頁、讀者能判斷：</p>
<ol>
<li>ACM public cert vs private cert vs imported cert 各自的使用邊界（能 attach 哪些 service、能不能 export）</li>
<li>DNS validation vs Email validation 的差異、跟 auto-renewal 條件的關聯</li>
<li>跨 region 跟 CloudFront 的 us-east-1 限制如何處理</li>
<li>何時 ACM 不夠用、要改走 cert-manager / Let&rsquo;s Encrypt / ACM Private CA</li>
</ol>
<h2 id="最短判讀路徑">最短判讀路徑</h2>
<p>判斷 ACM cert 部署是否健康、最少看四件事：</p>
<ul>
<li><strong>Cert 跟 service 整合</strong>：cert ARN 是否真的 attach 到 ELB / CloudFront / API Gateway listener、<code>DescribeCertificate</code> 的 <code>InUseBy</code> 有沒有資源、有 cert 但沒 attach 等於 issue 失敗</li>
<li><strong>DNS validation 設定</strong>：cert 是 DNS 還是 Email validation、DNS 的 CNAME record 是否還留在 DNS（auto-renewal 需要這條 record 持續存在）、Route53 vs 外部 DNS 的責任分界</li>
<li><strong>Renewal status</strong>：<code>DescribeCertificate</code> 的 <code>RenewalSummary.RenewalStatus</code> 是 <code>SUCCESS</code> / <code>PENDING_AUTO_RENEWAL</code> / <code>FAILED</code>、失敗時 <code>RenewalStatusReason</code> 是什麼（多半是 DNS record 被刪、CNAME 不再回應）</li>
<li><strong>CloudTrail 證據</strong>：<code>RequestCertificate</code> / <code>ImportCertificate</code> / <code>DeleteCertificate</code> 的 caller identity、是否有非預期的 cert 建立或刪除（防誤刪 / 惡意刪）</li>
</ul>
<p>四件事任一缺失、就是 <a href="/blog/backend/07-security-data-protection/transport-trust-and-certificate-lifecycle/" data-link-title="7.5 傳輸信任與憑證生命週期" data-link-desc="以問題驅動方式整理傳輸信任鏈、會話完整性與憑證節奏">Transport Trust and Certificate Lifecycle</a> 的覆蓋缺口。</p>
<h2 id="日常操作與決策形狀">日常操作與決策形狀</h2>
<p><strong>Request public cert</strong>：對 internet-facing endpoint（網站、API）issue public cert、走 <code>RequestCertificate</code> API、選 DNS validation。ACM 給一組 CNAME record、放進 DNS（Route53 可一鍵 create）、ACM 自動驗證 + issue。Cert 生效後 attach 到 ELB / CloudFront / API Gateway listener。Issuer 是 Amazon Trust Services、所有主流瀏覽器 / OS trust store 都認。</p>
<p><strong>Request private cert（需 PCA 後端）</strong>：內部 service mTLS root、走 <code>RequestCertificate</code> 但指定 PCA ARN。ACM 透過 PCA 簽 cert、cert chain 是組織內部 CA hierarchy。Trust store 必須在各 workload 手動建立（不像 public cert 自動 trust）。</p>
<p><strong>DNS validation vs Email validation</strong>：DNS validation 是預設 + 推薦 — CNAME record 放進 DNS 後、ACM 持續驗證 domain ownership、auto-renewal 全自動。Email validation 是 legacy、ACM 寄信到 domain 的 WHOIS / 預設 admin email、人工點連結驗證；auto-renewal 不會自動完成、cert 到期前必須手動 re-validate。Production 一律用 DNS validation。</p>
<p><strong>Auto-renewal 條件</strong>：ACM 在 cert lifetime 60 天前嘗試 renew、條件嚴格：(1) cert 是 ACM-issued（不是 imported）(2) DNS validation 走 CNAME record 仍存在且可回應 (3) cert 至少 attach 到一個 AWS service。三個條件任一不滿足、renewal 不自動觸發、cert 會 expire。Imported cert <em>完全不自動 renew</em>、必須在 expiry 前手動 re-import。</p>
<p><strong>跟 ELB / CloudFront / API Gateway 整合</strong>：ELB / API Gateway 用所在 region 的 ACM cert、CloudFront 例外 — <em>只認 us-east-1 region 的 ACM cert</em>（CloudFront edge 是 global、cert metadata 統一從 us-east-1 拉）。Multi-region app 要在每個 region 各 request 一份 cert、CloudFront 那份固定放 us-east-1。</p>
<p><strong>Imported certificate</strong>：自管 cert（外部 CA 簽的、舊系統遷移過來的）可以 import 進 ACM、拿到 ARN 後一樣 attach 到 AWS service。代價是 <em>ACM 不會 renew</em>、expiry 前必須手動 re-import 新版。常見事故源：imported cert 過期、AWS service 突然 serve expired cert、Browser 顯示警告。建議 imported cert 都設 CloudWatch alarm 監 <code>DaysToExpiry</code>。</p>
<p><strong>跟 <a href="/blog/backend/07-security-data-protection/vendors/aws-iam/" data-link-title="AWS IAM" data-link-desc="AWS cloud resource permission engine、Role / Policy / STS、跨帳號信任邊界與 OIDC federation 的核心">AWS IAM</a> 整合</strong>：誰能 issue / delete cert 走 IAM policy 控制 — <code>acm:RequestCertificate</code> / <code>acm:DeleteCertificate</code> / <code>acm:ImportCertificate</code>。Tag-based access control 可以限定「只有帶 <code>team=platform</code> tag 的 cert 才能被 platform team IAM role 改」、防誤刪 production cert。Cert 是 region-scoped resource、IAM policy 可指定 <code>Resource</code> ARN 限定 region / cert ID。</p>
<h2 id="核心取捨表">核心取捨表</h2>
<table>
  <thead>
      <tr>
          <th>取捨維度</th>
          <th>ACM (public)</th>
          <th>ACM Private CA (PCA)</th>
          <th>cert-manager + Let&rsquo;s Encrypt</th>
          <th>手動 OpenSSL CA</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>部署模型</td>
          <td>AWS managed</td>
          <td>AWS managed CA hierarchy</td>
          <td>K8s cluster 內 self-hosted controller</td>
          <td>手動腳本</td>
      </tr>
      <tr>
          <td>私鑰持有</td>
          <td>AWS 持有、客戶不能 export</td>
          <td>AWS 持有 CA key、subordinate 可 export</td>
          <td>cluster 內 Secret、可 export</td>
          <td>自己持有</td>
      </tr>
      <tr>
          <td>Issuer</td>
          <td>Amazon Trust Services（public trust store）</td>
          <td>客戶自管 CA（內部 trust）</td>
          <td>Let&rsquo;s Encrypt / 任何 ACME CA</td>
          <td>自簽</td>
      </tr>
      <tr>
          <td>適用 endpoint</td>
          <td>AWS-managed service（ELB / CloudFront / API GW）</td>
          <td>內部 mTLS、AWS service 也可用</td>
          <td>K8s workload、Ingress、任何持有 PEM 的服務</td>
          <td>實驗 / 內部小規模</td>
      </tr>
      <tr>
          <td>Auto-renewal</td>
          <td>DNS validation 全自動</td>
          <td>透過 ACM 自動</td>
          <td>cert-manager 自動</td>
          <td>自己寫 cron</td>
      </tr>
      <tr>
          <td>跨雲 / 跨平台</td>
          <td>弱 — AWS 內</td>
          <td>弱 — AWS 內</td>
          <td>強 — K8s 在哪都可</td>
          <td>強</td>
      </tr>
      <tr>
          <td>計費</td>
          <td>public cert 免費</td>
          <td>per CA + per cert（PCA 較貴）</td>
          <td>免費（Let&rsquo;s Encrypt）</td>
          <td>免費</td>
      </tr>
      <tr>
          <td>適合場景</td>
          <td>AWS-heavy + edge endpoint</td>
          <td>內部 mTLS root + AWS 整合</td>
          <td>K8s workload + 跨雲</td>
          <td>實驗、極小規模</td>
      </tr>
      <tr>
          <td>退場成本</td>
          <td>中 — cert 重 issue 但 service 配置要改</td>
          <td>高 — CA hierarchy 遷移痛苦</td>
          <td>低 — PEM 在手、換 issuer 容易</td>
          <td>低</td>
      </tr>
  </tbody>
</table>
<p>選 ACM 的核心訴求：cert 主要 attach 到 AWS-managed service、希望 cert 完全 hands-off、不需要 export 私鑰、能接受 AWS lock-in。需要 export PEM 或跨雲 / 自管 endpoint、改走 cert-manager + Let&rsquo;s Encrypt。需要內部 mTLS root + CA hierarchy 控制、走 ACM Private CA。</p>
<h2 id="進階主題">進階主題</h2>
<p><strong>ACM Private CA hierarchy</strong>：PCA 支援 root CA + 多層 intermediate CA、生產建議 root CA 離線（CA 簽完 intermediate 後 disable）、日常簽發走 subordinate CA。Subordinate CA compromise 時 revoke 該層、root 不受影響。Cert policy（path length、key usage、name constraint）在 CA 建立時設定、之後無法改、設計時要算對。</p>
<p><strong>Cross-region cert（CloudFront 的 us-east-1 限制）</strong>：CloudFront 是 global service、但 attach 的 ACM cert <em>必須在 us-east-1</em>。Multi-region 部署：每個 region 各 issue 一份 cert 給該 region 的 ELB / API Gateway、CloudFront 的那份單獨在 us-east-1 issue。Terraform / CloudFormation 要顯式宣告 provider region。</p>
<p><strong>Imported cert 跟 auto-renewal 邊界</strong>：imported cert（外部 CA 簽的）ACM 知道存在、可以 attach、但 <em>不 renew</em>。常見事故：團隊 import cert 後忘了；幾個月後 cert 到期；CloudFront / ELB serve expired cert；客戶看到 browser 警告。對策：所有 imported cert 設 CloudWatch alarm <code>DaysToExpiry &lt; 30</code>、<code>AlmostExpired</code> event 推 EventBridge → PagerDuty。長期策略是把 imported cert 都遷移成 ACM-issued cert（如果 domain ownership 可驗證）。</p>
<p><strong>Tag-based access control</strong>：cert 加 tag（<code>team=platform</code>、<code>env=prod</code>）後、IAM policy 用 <code>Condition</code> 限定：只有同 tag 的 role 才能 update / delete。防誤刪 production cert（dev IAM role 跑 cleanup script 不會誤刪 prod）。配合 <a href="/blog/backend/07-security-data-protection/vendors/aws-iam/" data-link-title="AWS IAM" data-link-desc="AWS cloud resource permission engine、Role / Policy / STS、跨帳號信任邊界與 OIDC federation 的核心">AWS IAM</a> 的 ABAC 模型運作。</p>
<p><strong>Wildcard cert 跟 SAN cert</strong>：ACM 支援 wildcard（<code>*.example.com</code> 涵蓋一層 subdomain）跟 SAN（一張 cert 多個 domain，最多 100 個）。Wildcard 簡化部署但 blast radius 大 — 一張 cert compromise 等於整個 subdomain tree 出事；SAN cert 細粒度但管理成本高。Production 建議按服務邊界拆 — 每個 service 一張 cert、不共用 wildcard，除非確實有大量短 lifecycle subdomain。</p>
<h2 id="排錯與失敗快速判讀">排錯與失敗快速判讀</h2>
<ul>
<li><strong>Cert PENDING_VALIDATION 一直卡住</strong>：DNS validation CNAME record 沒放對、或 DNS provider 緩存太久 — 用 <code>dig</code> 直接查 CNAME 是否生效、Route53 + ACM 整合通常幾分鐘、外部 DNS 可能 30 分鐘以上</li>
<li><strong>Cert renewal FAILED</strong>：<code>RenewalStatusReason</code> 多半是 <code>DOMAIN_VALIDATION_DENIED</code>（CNAME record 被刪了）或 cert 沒 attach 到任何 service — 補回 CNAME record、或把 cert attach 到至少一個 resource</li>
<li><strong>CloudFront 找不到 cert</strong>：cert 在 us-east-1 以外的 region issue — 在 us-east-1 重 issue、或用 Terraform 顯式跨 provider 設定</li>
<li><strong>Imported cert expired</strong>：忘了 manual renewal、AWS service serve expired cert — CloudWatch alarm + EventBridge 推 alert、長期遷成 ACM-issued</li>
<li><strong>ACM cert 無法用在 EC2 自管 Nginx</strong>：public cert 私鑰不能 export 是設計限制 — 改用 ACM Private CA 或 Let&rsquo;s Encrypt + cert-manager</li>
<li><strong>誤刪 production cert</strong>：沒設 tag-based protection、admin script bug — 開 deletion protection（暫時無內建、用 IAM Condition 限定 delete operation + 24h cooldown via Lambda）+ CloudTrail alert 上 <code>acm:DeleteCertificate</code></li>
<li><strong>Cross-account cert 共用</strong>：ACM cert 不支援 RAM 共用 — 跨 account 要在每個 account 各 issue（或用 PCA + RAM 共用 PCA、各 account 從 PCA issue）</li>
</ul>
<h2 id="何時改走其他服務">何時改走其他服務</h2>
<table>
  <thead>
      <tr>
          <th>需求形狀</th>
          <th>改走</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>K8s workload mTLS / Ingress TLS</td>
          <td><a href="/blog/backend/07-security-data-protection/vendors/cert-manager/" data-link-title="cert-manager" data-link-desc="K8s 原生 certificate lifecycle automation、支援 Let&#39;s Encrypt / Vault PKI / Venafi 等多 issuer、auto-renewal &#43; Challenge solver">cert-manager</a> + Let&rsquo;s Encrypt / 內部 issuer</td>
      </tr>
      <tr>
          <td>自管 Nginx / EC2 / 跨雲 endpoint</td>
          <td><a href="/blog/backend/07-security-data-protection/vendors/letsencrypt/" data-link-title="Let&#39;s Encrypt" data-link-desc="免費 &#43; 自動化的公共 ACME CA、90 天 TTL 強制自動化、跨雲跨平台 public TLS cert 的事實基礎">Let&rsquo;s Encrypt</a> + 自管 ACME client</td>
      </tr>
      <tr>
          <td>內部 mTLS root + CA hierarchy 控制</td>
          <td>ACM Private CA（PCA）或 <a href="/blog/backend/07-security-data-protection/vendors/hashicorp-vault/" data-link-title="HashiCorp Vault" data-link-desc="Self-hosted secret management 與 dynamic credential / encryption-as-a-service / PKI engine、跨雲跨環境的 secret 控制面">HashiCorp Vault</a> PKI engine</td>
      </tr>
      <tr>
          <td>Workload identity（SPIFFE）跨平台</td>
          <td><a href="/blog/backend/07-security-data-protection/vendors/spire/" data-link-title="SPIRE" data-link-desc="SPIFFE Runtime Environment、attested workload identity、short-lived SVID &#43; Trust Bundle、跨組織 federation">SPIRE</a></td>
      </tr>
      <tr>
          <td>Cert renewal 證據鏈（rotation evidence）</td>
          <td><a href="/blog/backend/07-security-data-protection/credential-rotation-scoped-evidence/" data-link-title="7.27 Credential Rotation with Scoped Evidence 實作示範" data-link-desc="以 webhook/API credential 輪替示範 scope map、證據欄位與回退窗口如何一起設計。">7.5 Credential Rotation Scoped Evidence</a></td>
      </tr>
      <tr>
          <td>Cert + session invalidation 邊界</td>
          <td><a href="/blog/backend/07-security-data-protection/entrypoint-and-server-protection/" data-link-title="7.3 入口治理與伺服器防護" data-link-desc="以問題驅動方式整理對外入口、管理平面與伺服器邊界">7.3 入口治理</a>、cert renew 跟 session token 是兩條獨立 lifecycle</td>
      </tr>
  </tbody>
</table>
<h2 id="不在本頁內的主題">不在本頁內的主題</h2>
<ul>
<li>ACM Private CA 完整 hierarchy 設計（root CA 離線儲存、HSM-backed CA key、CRL / OCSP responder 部署）</li>
<li>ACM API 完整 CLI reference 跟 Terraform resource 詳盡欄位</li>
<li>TLS protocol 本身（TLS 1.2 vs 1.3、cipher suite、handshake 流程）</li>
<li>Certificate Transparency log 跟 SCT embedding 內部機制</li>
<li>各 browser / OS trust store 的更新週期</li>
</ul>
<h2 id="案例回寫">案例回寫</h2>
<p>ACM 在 07 案例庫沒有直接 vendor-level 事件、以下採對照引用：</p>
<table>
  <thead>
      <tr>
          <th>案例</th>
          <th>跟 ACM 的關係（對照）</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td><a href="/blog/backend/07-security-data-protection/transport-trust-and-certificate-lifecycle/" data-link-title="7.5 傳輸信任與憑證生命週期" data-link-desc="以問題驅動方式整理傳輸信任鏈、會話完整性與憑證節奏">Transport Trust and Certificate Lifecycle (section)</a></td>
          <td>ACM 是 AWS 平台 cert lifecycle 自動化的具體落地 — DNS validation + auto-renewal 是 <em>自動化覆蓋率</em> 的指標、imported cert 是覆蓋缺口、要單獨設 alarm 兜底</td>
      </tr>
      <tr>
          <td><a href="/blog/backend/07-security-data-protection/red-team/cases/edge-exposure/citrix-bleed-2023-session-hijack/" data-link-title="7.R7.3.3 Citrix Bleed 2023：會話被劫持與重放風險" data-link-desc="邊界設備會話資料外洩後，如何演變成帳號與服務風險">Citrix Bleed 2023 Session Hijack</a></td>
          <td>對照啟示 — cert 自動 renew 不等於 session 自動 invalidate、舊 session token 在新 cert 下仍可重放、session lifecycle 是另一層責任、不在 ACM 範圍</td>
      </tr>
      <tr>
          <td><a href="/blog/backend/07-security-data-protection/credential-rotation-scoped-evidence/" data-link-title="7.27 Credential Rotation with Scoped Evidence 實作示範" data-link-desc="以 webhook/API credential 輪替示範 scope map、證據欄位與回退窗口如何一起設計。">Credential Rotation Scoped Evidence (section)</a></td>
          <td>ACM renewal 自動、但 <em>Certificate Transparency log 比對</em> + <em>fleet-wide trust bundle update</em> 是另一條 evidence chain、要跟 SBOM / CMDB 對齊</td>
      </tr>
  </tbody>
</table>
<h2 id="下一步路由">下一步路由</h2>
<ul>
<li>上游：<a href="/blog/backend/07-security-data-protection/transport-trust-and-certificate-lifecycle/" data-link-title="7.5 傳輸信任與憑證生命週期" data-link-desc="以問題驅動方式整理傳輸信任鏈、會話完整性與憑證節奏">7.4 傳輸信任與憑證生命週期</a>、<a href="/blog/backend/07-security-data-protection/entrypoint-and-server-protection/" data-link-title="7.3 入口治理與伺服器防護" data-link-desc="以問題驅動方式整理對外入口、管理平面與伺服器邊界">7.3 入口治理與伺服器防護</a></li>
<li>平行：<a href="/blog/backend/07-security-data-protection/vendors/cert-manager/" data-link-title="cert-manager" data-link-desc="K8s 原生 certificate lifecycle automation、支援 Let&#39;s Encrypt / Vault PKI / Venafi 等多 issuer、auto-renewal &#43; Challenge solver">cert-manager</a>、<a href="/blog/backend/07-security-data-protection/vendors/letsencrypt/" data-link-title="Let&#39;s Encrypt" data-link-desc="免費 &#43; 自動化的公共 ACME CA、90 天 TTL 強制自動化、跨雲跨平台 public TLS cert 的事實基礎">Let&rsquo;s Encrypt</a>、<a href="/blog/backend/07-security-data-protection/vendors/spire/" data-link-title="SPIRE" data-link-desc="SPIFFE Runtime Environment、attested workload identity、short-lived SVID &#43; Trust Bundle、跨組織 federation">SPIRE</a></li>
<li>下游：<a href="/blog/backend/07-security-data-protection/vendors/aws-iam/" data-link-title="AWS IAM" data-link-desc="AWS cloud resource permission engine、Role / Policy / STS、跨帳號信任邊界與 OIDC federation 的核心">AWS IAM</a>（誰能 issue / delete cert）、<a href="/blog/backend/07-security-data-protection/vendors/aws-kms/" data-link-title="AWS KMS" data-link-desc="AWS 原生 key management service、envelope encryption / digital signing / Multi-Region Key、Key Policy &#43; Grant 雙軌授權">AWS KMS</a>（PCA CA key 後端）</li>
<li>跨模組：<a href="/blog/backend/08-incident-response/vendors/" data-link-title="事故處理 Vendor 清單" data-link-desc="規劃 on-call、incident response、status page 與 postmortem 工具的服務頁撰寫順序與判準">8 事故處理 vendor 清單</a>（cert expiry / mis-issuance 進 IR 流程）</li>
<li>官方：<a href="https://docs.aws.amazon.com/acm/">AWS Certificate Manager Documentation</a></li>
</ul>
]]></content:encoded></item><item><title>Let's Encrypt</title><link>https://tarrragon.github.io/blog/backend/07-security-data-protection/vendors/letsencrypt/</link><pubDate>Mon, 18 May 2026 00:00:00 +0000</pubDate><guid>https://tarrragon.github.io/blog/backend/07-security-data-protection/vendors/letsencrypt/</guid><description>&lt;p>Let&amp;rsquo;s Encrypt 是免費 + 自動化的公共 ACME CA（Certificate Authority）、由 Internet Security Research Group (ISRG) 營運、簽發 DV（Domain Validation）等級的 public TLS cert。它的核心設計選擇是 &lt;em>只發 90 天 TTL 的 cert + 完全自動化的 ACME protocol&lt;/em>、把人工管理選項從工程實務中拿掉、強迫 cert lifecycle 走機器化路線。今天大多數 public-facing web service 的 TLS cert 都直接或間接從 Let&amp;rsquo;s Encrypt 來、是現代 Web 的事實基礎設施之一。&lt;/p>
&lt;h2 id="服務定位">服務定位&lt;/h2>
&lt;p>Let&amp;rsquo;s Encrypt 的角色是 &lt;em>跨雲、跨平台、跨組織規模&lt;/em> 的公共 DV cert 來源。對於需要 public TLS cert 又不被特定雲廠綁定的場景（on-prem、edge node、跨雲 service、自架 CDN origin、開源專案）、Let&amp;rsquo;s Encrypt 是預設選項。它解決的問題不是「能不能拿到 cert」、而是「能不能 &lt;em>無人值守&lt;/em> 持續拿到 cert」— ACME protocol 把申請、驗證、issue、renew、revoke 全部標準化、ACME client（&lt;a href="https://tarrragon.github.io/blog/backend/07-security-data-protection/vendors/cert-manager/" data-link-title="cert-manager" data-link-desc="K8s 原生 certificate lifecycle automation、支援 Let&amp;#39;s Encrypt / Vault PKI / Venafi 等多 issuer、auto-renewal &amp;#43; Challenge solver">cert-manager&lt;/a> / certbot / acme.sh / Caddy / Traefik）負責 client 端執行。&lt;/p>
&lt;p>跟 &lt;a href="https://tarrragon.github.io/blog/backend/07-security-data-protection/vendors/aws-acm/" data-link-title="AWS ACM" data-link-desc="AWS-managed certificate provisioning、DNS validation &amp;#43; auto-renewal、整合 ELB / CloudFront / API Gateway、Private CA 後端">AWS ACM&lt;/a> 比、Let&amp;rsquo;s Encrypt 跨雲跨平台、ACM 限 AWS-managed service（ALB / CloudFront / API Gateway）內使用、export 出去要另談；ACM Private CA 又是另一個產品。跟商業 CA（DigiCert / Sectigo / Entrust）比、商業 CA 提供 OV（Organization Validation）/ EV（Extended Validation）cert、cert 內含經過驗證的組織資訊、金融網站或法遵需求會用；Let&amp;rsquo;s Encrypt 只發 DV cert、不驗證組織身份。跟 &lt;a href="https://tarrragon.github.io/blog/backend/07-security-data-protection/vendors/hashicorp-vault/" data-link-title="HashiCorp Vault" data-link-desc="Self-hosted secret management 與 dynamic credential / encryption-as-a-service / PKI engine、跨雲跨環境的 secret 控制面">HashiCorp Vault PKI&lt;/a> 比、Vault PKI 是 &lt;em>internal CA&lt;/em>（不被公共瀏覽器信任、適合 internal mTLS / workload identity）、Let&amp;rsquo;s Encrypt 是 &lt;em>public CA&lt;/em>（瀏覽器信任、適合 public-facing service）— 兩個是互補關係、不是替代。&lt;/p>
&lt;h2 id="本章目標">本章目標&lt;/h2>
&lt;p>讀完本頁、讀者能判斷：&lt;/p>
&lt;ol>
&lt;li>哪些 cert 需求適合 Let&amp;rsquo;s Encrypt（public-facing、DV、跨平台）、哪些該走 ACM / 商業 CA / Vault PKI&lt;/li>
&lt;li>ACME protocol 的四個 first-class concept（Account / Order / Authorization / Challenge）跟自己選的 ACME client 怎麼對應&lt;/li>
&lt;li>Rate limit 是 &lt;em>硬限制&lt;/em>、SaaS 多 tenant 場景如何規劃（wildcard / SAN / rate limit exemption）&lt;/li>
&lt;li>90 天 TTL + CT log 公開 + revocation 弱化 在 production 設計上的影響&lt;/li>
&lt;/ol>
&lt;h2 id="最短判讀路徑">最短判讀路徑&lt;/h2>
&lt;p>判斷 Let&amp;rsquo;s Encrypt 使用是否健康、最少看四件事：&lt;/p></description><content:encoded><![CDATA[<p>Let&rsquo;s Encrypt 是免費 + 自動化的公共 ACME CA（Certificate Authority）、由 Internet Security Research Group (ISRG) 營運、簽發 DV（Domain Validation）等級的 public TLS cert。它的核心設計選擇是 <em>只發 90 天 TTL 的 cert + 完全自動化的 ACME protocol</em>、把人工管理選項從工程實務中拿掉、強迫 cert lifecycle 走機器化路線。今天大多數 public-facing web service 的 TLS cert 都直接或間接從 Let&rsquo;s Encrypt 來、是現代 Web 的事實基礎設施之一。</p>
<h2 id="服務定位">服務定位</h2>
<p>Let&rsquo;s Encrypt 的角色是 <em>跨雲、跨平台、跨組織規模</em> 的公共 DV cert 來源。對於需要 public TLS cert 又不被特定雲廠綁定的場景（on-prem、edge node、跨雲 service、自架 CDN origin、開源專案）、Let&rsquo;s Encrypt 是預設選項。它解決的問題不是「能不能拿到 cert」、而是「能不能 <em>無人值守</em> 持續拿到 cert」— ACME protocol 把申請、驗證、issue、renew、revoke 全部標準化、ACME client（<a href="/blog/backend/07-security-data-protection/vendors/cert-manager/" data-link-title="cert-manager" data-link-desc="K8s 原生 certificate lifecycle automation、支援 Let&#39;s Encrypt / Vault PKI / Venafi 等多 issuer、auto-renewal &#43; Challenge solver">cert-manager</a> / certbot / acme.sh / Caddy / Traefik）負責 client 端執行。</p>
<p>跟 <a href="/blog/backend/07-security-data-protection/vendors/aws-acm/" data-link-title="AWS ACM" data-link-desc="AWS-managed certificate provisioning、DNS validation &#43; auto-renewal、整合 ELB / CloudFront / API Gateway、Private CA 後端">AWS ACM</a> 比、Let&rsquo;s Encrypt 跨雲跨平台、ACM 限 AWS-managed service（ALB / CloudFront / API Gateway）內使用、export 出去要另談；ACM Private CA 又是另一個產品。跟商業 CA（DigiCert / Sectigo / Entrust）比、商業 CA 提供 OV（Organization Validation）/ EV（Extended Validation）cert、cert 內含經過驗證的組織資訊、金融網站或法遵需求會用；Let&rsquo;s Encrypt 只發 DV cert、不驗證組織身份。跟 <a href="/blog/backend/07-security-data-protection/vendors/hashicorp-vault/" data-link-title="HashiCorp Vault" data-link-desc="Self-hosted secret management 與 dynamic credential / encryption-as-a-service / PKI engine、跨雲跨環境的 secret 控制面">HashiCorp Vault PKI</a> 比、Vault PKI 是 <em>internal CA</em>（不被公共瀏覽器信任、適合 internal mTLS / workload identity）、Let&rsquo;s Encrypt 是 <em>public CA</em>（瀏覽器信任、適合 public-facing service）— 兩個是互補關係、不是替代。</p>
<h2 id="本章目標">本章目標</h2>
<p>讀完本頁、讀者能判斷：</p>
<ol>
<li>哪些 cert 需求適合 Let&rsquo;s Encrypt（public-facing、DV、跨平台）、哪些該走 ACM / 商業 CA / Vault PKI</li>
<li>ACME protocol 的四個 first-class concept（Account / Order / Authorization / Challenge）跟自己選的 ACME client 怎麼對應</li>
<li>Rate limit 是 <em>硬限制</em>、SaaS 多 tenant 場景如何規劃（wildcard / SAN / rate limit exemption）</li>
<li>90 天 TTL + CT log 公開 + revocation 弱化 在 production 設計上的影響</li>
</ol>
<h2 id="最短判讀路徑">最短判讀路徑</h2>
<p>判斷 Let&rsquo;s Encrypt 使用是否健康、最少看四件事：</p>
<ul>
<li><strong>Account 管理</strong>：ACME account 是 <em>cross-domain</em> 的身份、同一個 account 可以申請組織所有 domain 的 cert — account key 外洩等於 attacker 可以對所有 domain 發 cert；account key 是否離線備份、是否跟 ACME client 用獨立 key（不重用 server key）</li>
<li><strong>Challenge 選擇</strong>：HTTP-01 需要 port 80 reachable、適合單機 + 直接 internet 暴露；DNS-01 需要 DNS API access、適合 wildcard + 私有環境；TLS-ALPN-01 走 443、適合 port 80 不可用的場景 — Challenge 選錯會卡在 validation 階段</li>
<li><strong>Rate limit 規劃</strong>：50 cert/week per registered domain、5 duplicate cert/week — 大型 SaaS 服務多 customer subdomain 容易撞牆、要先估 cert 量、再決定 wildcard / SAN / rate limit 申請</li>
<li><strong>Revocation 流程</strong>：cert 被洩漏怎麼辦 — revoke 不是 fleet-wide invalidation、real-world 失效靠 <em>rotate + 短 TTL</em>；revocation 程序是否寫入 runbook、舊 cert 是否在所有 endpoint 確實 retire</li>
</ul>
<p>四件事任一缺失、就是 <a href="/blog/backend/07-security-data-protection/transport-trust-and-certificate-lifecycle/" data-link-title="7.5 傳輸信任與憑證生命週期" data-link-desc="以問題驅動方式整理傳輸信任鏈、會話完整性與憑證節奏">Transport Trust and Certificate Lifecycle</a> 跟 <a href="/blog/backend/07-security-data-protection/credential-rotation-scoped-evidence/" data-link-title="7.27 Credential Rotation with Scoped Evidence 實作示範" data-link-desc="以 webhook/API credential 輪替示範 scope map、證據欄位與回退窗口如何一起設計。">Credential Rotation Scoped Evidence</a> 邊界的待補項目。</p>
<h2 id="日常操作與決策形狀">日常操作與決策形狀</h2>
<p><strong>ACME client 選擇</strong>：<a href="/blog/backend/07-security-data-protection/vendors/cert-manager/" data-link-title="cert-manager" data-link-desc="K8s 原生 certificate lifecycle automation、支援 Let&#39;s Encrypt / Vault PKI / Venafi 等多 issuer、auto-renewal &#43; Challenge solver">cert-manager</a> 適合 K8s 環境、Ingress / Gateway / Certificate CRD 自動化；certbot 適合單機 / VM、官方參考實作；acme.sh 是 pure shell、嵌入既有 deployment script 容易；Caddy / Traefik 把 ACME 內建進 reverse proxy、零設定拿 cert。client 端的選擇決定 <em>cert 怎麼存、怎麼 deploy 到 termination point</em>、Let&rsquo;s Encrypt 自己不管這層。</p>
<p><strong>ACME Account（cross-domain identity）</strong>：Account 是 ACME server 認可的身份、用一把 account key（不同於 cert private key）簽 ACME request。同一個 account 可以申請 <em>組織所有 domain</em> 的 cert — 安全意義是 account key 外洩 = attacker 對所有 domain 都能 issue cert。Production 場景把 account key 視為跟 root signing key 同等級的 secret、離線備份、跟日常 ACME client 用獨立 key。</p>
<p><strong>Challenge 選擇 — HTTP-01 / DNS-01 / TLS-ALPN-01</strong>：HTTP-01 在 <code>/.well-known/acme-challenge/&lt;token&gt;</code> 放 response、Let&rsquo;s Encrypt 從 port 80 拉、適合單機 + 直接 internet 暴露；DNS-01 在 <code>_acme-challenge.&lt;domain&gt;</code> 放 TXT record、適合 wildcard cert（<code>*.example.com</code> 必須 DNS-01、HTTP-01 不行）跟私有環境（不需要 port 80 開放）；TLS-ALPN-01 走 port 443、用 special ALPN extension 回 challenge、適合 port 80 被擋的場景。Wildcard cert 強制 DNS-01 是 Let&rsquo;s Encrypt 政策、不能用 HTTP-01 繞過。</p>
<p><strong>Rate limit 是硬限制</strong>：50 cert/week per registered domain（包含 SAN 在內）、5 duplicate cert/week（同樣 SAN 組合）、300 new orders/3 hours per account、5 failed validation/hour。大型 SaaS 對 N 個 customer subdomain 發 cert 容易撞牆 — 解法有三：用 wildcard cert 把多 subdomain 合一張（單張 cert 服務無限 subdomain）、用 SAN cert 把多個 subdomain 寫進同一張 cert、申請 rate limit 上限提高（<a href="https://isrg.formstack.com/forms/rate_limit_adjustment_request">官方表單</a>）。撞 rate limit 後該 domain 整個 week 不能發新 cert、是 production outage 等級。</p>
<p><strong>Staging environment 必用於測試</strong>：<code>acme-staging-v02.api.letsencrypt.org</code> 是 Let&rsquo;s Encrypt 的測試 endpoint、cert 不被瀏覽器信任、但 <em>rate limit 寬鬆很多</em>（30000 cert/week / 60 duplicate cert/week）。debug ACME client 設定、新 deploy pipeline、CI 跑 cert renewal test 都應該先指 staging、確認 OK 再切 production endpoint。直接在 production 試錯撞 rate limit 是常見事故。</p>
<p><strong>90 天 TTL + 60 天 renew cadence</strong>：Let&rsquo;s Encrypt cert 固定 90 天 TTL、ACME client convention 是 <em>過 60 天就開始 renew</em>、留 30 天 buffer 給 retry。90 天是 <em>設計選擇</em>、不是技術限制 — 短 TTL 強迫自動化、把「過期前手動處理」這個失敗模式從設計中拿掉。如果你的 cert renewal 還需要人介入、表示 ACME client / deployment pipeline / monitoring 哪邊沒做好、要在 60 天 buffer 內修。</p>
<p><strong>CT log 公開可查</strong>：Let&rsquo;s Encrypt cert 都會進 Certificate Transparency log（CT log）、可以用 <a href="https://crt.sh">crt.sh</a> 查任何 domain 的歷史 cert。對 production 意義有兩面：blue team 可以監控自家 domain 的 unexpected cert（attacker 用相似 domain 釣魚會留痕）；red team 可以查 target 公司新出現的 internal hostname（cert 上的 SAN 等於公開的 service inventory）。對 <em>internal-only</em> hostname、不要用 Let&rsquo;s Encrypt cert、否則 SAN 變成 recon 資料源 — 內部服務走 Vault PKI / 私有 CA。</p>
<h2 id="核心取捨表">核心取捨表</h2>
<table>
  <thead>
      <tr>
          <th>取捨維度</th>
          <th>Let&rsquo;s Encrypt</th>
          <th>AWS ACM</th>
          <th>商業 CA（DigiCert / Sectigo）</th>
          <th>Vault PKI（internal CA）</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>信任範圍</td>
          <td>Public（公共瀏覽器信任）</td>
          <td>Public（公共瀏覽器信任）</td>
          <td>Public（公共瀏覽器信任）</td>
          <td>Internal（需要客戶端裝 CA cert）</td>
      </tr>
      <tr>
          <td>部署範圍</td>
          <td>跨雲、跨平台、on-prem</td>
          <td>限 AWS-managed service（ALB / CF / APIGW）</td>
          <td>跨雲、跨平台</td>
          <td>自管、跨雲皆可</td>
      </tr>
      <tr>
          <td>Cert 等級</td>
          <td>DV（Domain Validation）</td>
          <td>DV（ACM）/ Private CA 任意</td>
          <td>DV / OV / EV</td>
          <td>自定義（內部信任）</td>
      </tr>
      <tr>
          <td>費用</td>
          <td>免費</td>
          <td>免費（ACM public）/ Private CA 收費</td>
          <td>收費（DV / OV / EV 各價位）</td>
          <td>自管成本</td>
      </tr>
      <tr>
          <td>自動化</td>
          <td>ACME protocol 標準化</td>
          <td>ACM 自動 renew（限 AWS-managed service）</td>
          <td>多數需手動 / API 申請、自動化弱</td>
          <td>自管 + ACME server 可選</td>
      </tr>
      <tr>
          <td>TTL</td>
          <td>90 天（硬性）</td>
          <td>13 個月（AWS rotate）</td>
          <td>1-2 年</td>
          <td>自訂</td>
      </tr>
      <tr>
          <td>適合場景</td>
          <td>public-facing、跨雲、open source、SaaS</td>
          <td>AWS-only + ALB/CloudFront 內</td>
          <td>金融、政府、需要 EV 顯示組織</td>
          <td>internal mTLS、workload identity、企業內部 service</td>
      </tr>
      <tr>
          <td>不適合場景</td>
          <td>internal mTLS、EV cert、cert 內需含組織</td>
          <td>跨雲、export 出 AWS</td>
          <td>需要快速自動化、預算敏感</td>
          <td>public-facing、不能要求客戶端裝 CA</td>
      </tr>
  </tbody>
</table>
<p>選 Let&rsquo;s Encrypt 的核心訴求：<em>public-facing + DV 等級夠用 + 跨平台 + 需要自動化</em>。需要 EV cert 走商業 CA、需要 internal mTLS 走 Vault PKI、AWS-only + 留在 ALB / CloudFront 內走 ACM 更省事。</p>
<h2 id="進階主題">進階主題</h2>
<p><strong>Rate limit 規劃跟 SaaS 多 tenant</strong>：N 個 customer subdomain 場景下、單 domain 50 cert/week 很容易撞牆。設計選項：(1) wildcard cert（<code>*.app.example.com</code>）一張覆蓋無限 subdomain、但 wildcard cert 不能保護 nested subdomain（<code>*.app.example.com</code> 不蓋 <code>foo.bar.app.example.com</code>）；(2) SAN cert 把多個 subdomain 寫進同一張 cert（單張最多 100 個 SAN）、適合 customer 數固定、新增不頻繁的場景；(3) 申請 rate limit 上限提高、production scale SaaS 走這條；(4) cert reuse — 同樣 SAN 組合在 5 duplicate cert/week 內可 reuse、不重發。</p>
<p><strong>跟 cert-manager + DNS-01 整合</strong>：production K8s 環境最常見組合是 cert-manager + Let&rsquo;s Encrypt + DNS-01、DNS provider 走 Route53 / Cloud DNS / Cloudflare。cert-manager 用 ClusterIssuer 設定 Let&rsquo;s Encrypt account + DNS solver、Certificate CRD 宣告需要的 cert、cert-manager 自動完成 ACME flow。優勢是 <em>wildcard cert 可用</em>（DNS-01 不受 HTTP-01 的 port 80 限制）、跨 cluster 可標準化、cert renewal 進 K8s event stream 容易監控。</p>
<p><strong>ACME profiles（client-specific behavior）</strong>：Let&rsquo;s Encrypt 2024 開始提供 ACME profile 機制、允許 client 選擇 cert 屬性（如 short-lived 6 天 cert vs standard 90 天）。short-lived cert 適合機器 workload、進一步壓縮 revocation 缺陷的影響窗口；普通 web service 用 standard profile 即可。Profile 是 opt-in、ACME client 要支援。</p>
<p><strong>跨 ACME CA fallback</strong>：Let&rsquo;s Encrypt 不是唯一 ACME CA — ZeroSSL、Buypass、Google Trust Services 都提供 ACME endpoint。production 建議 ACME client 設兩個 issuer（Let&rsquo;s Encrypt primary + ZeroSSL / Buypass secondary）、Let&rsquo;s Encrypt 出事（rate limit 撞牆、AWS outage 影響 challenge 驗證、ISRG 服務中斷）時可以 fallback、不會 cert 全停。cert-manager 用兩個 ClusterIssuer 即可、application 端零感知。</p>
<p><strong>Revocation 的弱化現實</strong>：cert 可以 revoke、但實際失效路徑薄弱 — CRL（Certificate Revocation List）跟 OCSP（Online Certificate Status Protocol）更新有延遲、且大多數 client（瀏覽器、API client）不會主動檢查 revocation 狀態（soft-fail：查不到就放行）。real-world 的 cert 失效機制其實是 <em>短 TTL + rotate</em>、不是 revocation API。設計時不要寄望 revoke 後 attacker 拿到的 cert 就無效 — rotate 出新 cert + 在所有 endpoint deploy 新 cert + 觀察舊 cert traffic 歸零、才算真正失效。</p>
<h2 id="排錯與失敗快速判讀">排錯與失敗快速判讀</h2>
<ul>
<li><strong>ACME challenge 失敗</strong>：HTTP-01 拉不到 <code>/.well-known/acme-challenge/&lt;token&gt;</code>、檢查 port 80 reachability、firewall、CDN 是否擋；DNS-01 TXT record 沒生效、檢查 DNS provider API permission、TXT TTL 是否設太長</li>
<li><strong>撞 rate limit</strong>：50 cert/week per registered domain 撞牆、整個 week 不能發新 cert — production 必須先 <em>staging 測完</em> 再切 production、cert reuse 機制要開（同 SAN 組合不重發）、長期解走 wildcard / SAN consolidation / rate limit exemption</li>
<li><strong>Renewal 沒在 60 天前開始</strong>：cert 過期前才 renew、撞到 ACME server 暫時不可用會直接過期 — ACME client 設 60 天 renew threshold、cert expiry 30 天前 alert 給 oncall</li>
<li><strong>Account key 沒備份</strong>：account key 弄丟、可以重新註冊但 <em>舊 cert 的 revocation 權限沒了</em>（除非用 cert 私鑰 revoke）— account key 跟 root signing key 同等級保護、離線備份</li>
<li><strong>CT log 暴露 internal hostname</strong>：Let&rsquo;s Encrypt cert 進 CT log、internal-only hostname 的 SAN 變 recon 資料源 — internal service 不用 Let&rsquo;s Encrypt、改 Vault PKI / 私有 CA</li>
<li><strong>Wildcard cert 用 HTTP-01</strong>：<code>*.example.com</code> 申請失敗、Let&rsquo;s Encrypt 政策強制 wildcard 走 DNS-01 — 切到 DNS-01 solver、設定 DNS provider API access</li>
<li><strong>Cert 出事 revoke 後 attacker 還能用</strong>：revocation 不是 fleet-wide invalidation、CRL/OCSP 多數 client 不檢查 — 真正失效靠 rotate + 觀察舊 cert traffic 歸零、不是 revoke API</li>
</ul>
<h2 id="何時改走其他服務">何時改走其他服務</h2>
<table>
  <thead>
      <tr>
          <th>需求形狀</th>
          <th>改走</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>AWS-only + 留在 ALB / CloudFront 內</td>
          <td><a href="/blog/backend/07-security-data-protection/vendors/aws-acm/" data-link-title="AWS ACM" data-link-desc="AWS-managed certificate provisioning、DNS validation &#43; auto-renewal、整合 ELB / CloudFront / API Gateway、Private CA 後端">AWS ACM</a></td>
      </tr>
      <tr>
          <td>需要 OV / EV cert（cert 含組織資訊）</td>
          <td>商業 CA（DigiCert / Sectigo / Entrust）</td>
      </tr>
      <tr>
          <td>Internal mTLS / workload identity</td>
          <td><a href="/blog/backend/07-security-data-protection/vendors/hashicorp-vault/" data-link-title="HashiCorp Vault" data-link-desc="Self-hosted secret management 與 dynamic credential / encryption-as-a-service / PKI engine、跨雲跨環境的 secret 控制面">HashiCorp Vault PKI</a> / <a href="/blog/backend/07-security-data-protection/vendors/spire/" data-link-title="SPIRE" data-link-desc="SPIFFE Runtime Environment、attested workload identity、short-lived SVID &#43; Trust Bundle、跨組織 federation">SPIRE</a></td>
      </tr>
      <tr>
          <td>K8s workload cert 自動化（用 LE 當源）</td>
          <td><a href="/blog/backend/07-security-data-protection/vendors/cert-manager/" data-link-title="cert-manager" data-link-desc="K8s 原生 certificate lifecycle automation、支援 Let&#39;s Encrypt / Vault PKI / Venafi 等多 issuer、auto-renewal &#43; Challenge solver">cert-manager</a></td>
      </tr>
      <tr>
          <td>Cert lifecycle 治理（跨 vendor 通則）</td>
          <td><a href="/blog/backend/07-security-data-protection/transport-trust-and-certificate-lifecycle/" data-link-title="7.5 傳輸信任與憑證生命週期" data-link-desc="以問題驅動方式整理傳輸信任鏈、會話完整性與憑證節奏">7.4 Transport Trust and Certificate Lifecycle</a></td>
      </tr>
      <tr>
          <td>Cert rotation 證據鏈</td>
          <td><a href="/blog/backend/07-security-data-protection/credential-rotation-scoped-evidence/" data-link-title="7.27 Credential Rotation with Scoped Evidence 實作示範" data-link-desc="以 webhook/API credential 輪替示範 scope map、證據欄位與回退窗口如何一起設計。">7.5 Credential Rotation Scoped Evidence</a></td>
      </tr>
  </tbody>
</table>
<h2 id="不在本頁內的主題">不在本頁內的主題</h2>
<ul>
<li>ACME protocol RFC 8555 完整規格逐條解讀</li>
<li>每個 ACME client（certbot / cert-manager / acme.sh / Caddy / Traefik）的完整設定教學</li>
<li>Let&rsquo;s Encrypt 內部 CA infrastructure 跟 ISRG governance 細節</li>
<li>CT log 內部結構跟 SCT（Signed Certificate Timestamp）驗證流程</li>
<li>DNS provider 的 API 認證設定（Route53 IAM / Cloud DNS service account / Cloudflare API token）</li>
</ul>
<h2 id="案例回寫">案例回寫</h2>
<p>Let&rsquo;s Encrypt 在 07 案例庫沒有直接 vendor-level 事件、以下案例採對照引用：</p>
<table>
  <thead>
      <tr>
          <th>案例</th>
          <th>跟 Let&rsquo;s Encrypt 的關係（對照）</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td><a href="/blog/backend/07-security-data-protection/transport-trust-and-certificate-lifecycle/" data-link-title="7.5 傳輸信任與憑證生命週期" data-link-desc="以問題驅動方式整理傳輸信任鏈、會話完整性與憑證節奏">Transport Trust and Certificate Lifecycle (section)</a></td>
          <td>Let&rsquo;s Encrypt 90 天 TTL + 強制 ACME 自動化、把人工依賴從 cert lifecycle 設計中拿掉、是 <em>forcing function 級別</em> 的治理選擇</td>
      </tr>
      <tr>
          <td><a href="/blog/backend/07-security-data-protection/credential-rotation-scoped-evidence/" data-link-title="7.27 Credential Rotation with Scoped Evidence 實作示範" data-link-desc="以 webhook/API credential 輪替示範 scope map、證據欄位與回退窗口如何一起設計。">Credential Rotation Scoped Evidence (section)</a></td>
          <td>Let&rsquo;s Encrypt 沒提供 fleet-wide revocation API、cert 出事後客戶側自己負責 fleet update + session invalidation、是 scope map 必要的典型情境</td>
      </tr>
      <tr>
          <td><a href="/blog/backend/07-security-data-protection/red-team/cases/edge-exposure/citrix-bleed-2023-session-hijack/" data-link-title="7.R7.3.3 Citrix Bleed 2023：會話被劫持與重放風險" data-link-desc="邊界設備會話資料外洩後，如何演變成帳號與服務風險">Citrix Bleed 2023 Session Hijack</a></td>
          <td>對照啟示 — cert rotation 跟 session invalidation 是兩件事、Let&rsquo;s Encrypt cert renew 不會 invalidate 既有 TLS session 跟 application-layer session、要分別處理</td>
      </tr>
      <tr>
          <td><a href="/blog/backend/07-security-data-protection/cases/failure-credential-rotation-without-scope/" data-link-title="7.C9 反例：憑證輪替未分 Scope" data-link-desc="憑證輪替若未分域分批，容易造成跨系統連鎖中斷。">Failure: Credential Rotation Without Scope</a></td>
          <td>Let&rsquo;s Encrypt rate limit（50 cert/week per domain）是 scope-driven 設計的硬約束、單一 domain 不能無限 rotation、wildcard / SAN consolidation 必須納入 rotation 策略</td>
      </tr>
  </tbody>
</table>
<h2 id="下一步路由">下一步路由</h2>
<ul>
<li>上游：<a href="/blog/backend/07-security-data-protection/transport-trust-and-certificate-lifecycle/" data-link-title="7.5 傳輸信任與憑證生命週期" data-link-desc="以問題驅動方式整理傳輸信任鏈、會話完整性與憑證節奏">7.4 Transport Trust and Certificate Lifecycle</a>、<a href="/blog/backend/07-security-data-protection/credential-rotation-scoped-evidence/" data-link-title="7.27 Credential Rotation with Scoped Evidence 實作示範" data-link-desc="以 webhook/API credential 輪替示範 scope map、證據欄位與回退窗口如何一起設計。">7.5 Credential Rotation Scoped Evidence</a></li>
<li>平行：<a href="/blog/backend/07-security-data-protection/vendors/aws-acm/" data-link-title="AWS ACM" data-link-desc="AWS-managed certificate provisioning、DNS validation &#43; auto-renewal、整合 ELB / CloudFront / API Gateway、Private CA 後端">AWS ACM</a>、<a href="/blog/backend/07-security-data-protection/vendors/cert-manager/" data-link-title="cert-manager" data-link-desc="K8s 原生 certificate lifecycle automation、支援 Let&#39;s Encrypt / Vault PKI / Venafi 等多 issuer、auto-renewal &#43; Challenge solver">cert-manager</a>、<a href="/blog/backend/07-security-data-protection/vendors/spire/" data-link-title="SPIRE" data-link-desc="SPIFFE Runtime Environment、attested workload identity、short-lived SVID &#43; Trust Bundle、跨組織 federation">SPIRE</a></li>
<li>下游：<a href="/blog/backend/07-security-data-protection/vendors/hashicorp-vault/" data-link-title="HashiCorp Vault" data-link-desc="Self-hosted secret management 與 dynamic credential / encryption-as-a-service / PKI engine、跨雲跨環境的 secret 控制面">HashiCorp Vault</a>（Vault PKI 處理 internal CA、跟 Let&rsquo;s Encrypt public CA 互補）</li>
<li>跨模組：<a href="/blog/backend/08-incident-response/vendors/" data-link-title="事故處理 Vendor 清單" data-link-desc="規劃 on-call、incident response、status page 與 postmortem 工具的服務頁撰寫順序與判準">8 事故處理 vendor 清單</a>（cert 出事 / private key 外洩如何 routing 進 IR 流程）</li>
<li>官方：<a href="https://letsencrypt.org/docs/">Let&rsquo;s Encrypt Documentation</a>、<a href="https://datatracker.ietf.org/doc/html/rfc8555">ACME RFC 8555</a>、<a href="https://crt.sh">crt.sh CT log search</a></li>
</ul>
]]></content:encoded></item><item><title>MySQL Encryption / TLS / Key Management</title><link>https://tarrragon.github.io/blog/backend/01-database/vendors/mysql/encryption-tls-key-management/</link><pubDate>Fri, 22 May 2026 00:00:00 +0000</pubDate><guid>https://tarrragon.github.io/blog/backend/01-database/vendors/mysql/encryption-tls-key-management/</guid><description>&lt;p>MySQL encryption / TLS / key management 的核心責任是把資料庫保護拆成儲存加密、傳輸加密、&lt;a href="https://tarrragon.github.io/blog/backend/knowledge-cards/key-management/" data-link-title="Key Management" data-link-desc="說明加密金鑰如何產生、保存、輪替，以及還原時如何依賴金鑰">金鑰生命週期&lt;/a>與連線憑證治理。Encryption 是多層保護設計；它涵蓋 InnoDB tablespace、redo / undo、binary log、backup artifact、client connection 與 keyring。&lt;/p>
&lt;p>本文的判讀錨點是：加密要服務於 threat model。若風險是磁碟遺失，&lt;a href="https://tarrragon.github.io/blog/backend/knowledge-cards/at-rest-encryption/" data-link-title="At-Rest Encryption" data-link-desc="說明資料落到儲存媒介前的加密層，以及它對應的威脅模型">at-rest encryption&lt;/a> 是重點；若風險是網路攔截，TLS 是重點；若風險是內部濫用，還需要 role、audit、masking 與 SIEM。&lt;/p>
&lt;p>官方文件路由的核心責任是固定 MySQL 8.4 security claim。實作前先查 &lt;a href="https://dev.mysql.com/doc/refman/8.2/en/innodb-data-encryption.html">InnoDB data-at-rest encryption&lt;/a>、&lt;a href="https://dev.mysql.com/doc/refman/8.0/en/keyring.html">MySQL keyring&lt;/a> 與 &lt;a href="https://dev.mysql.com/doc/refman/8.4/en/show-binary-log-status.html">SHOW BINARY LOG STATUS&lt;/a>；本文最後檢查日是 2026-05-22。&lt;/p>
&lt;h2 id="protection-layers">Protection Layers&lt;/h2>
&lt;p>Protection layers 的核心責任是把保護面分層。&lt;/p>
&lt;table>
 &lt;thead>
 &lt;tr>
 &lt;th>層級&lt;/th>
 &lt;th>主要責任&lt;/th>
 &lt;th>Evidence&lt;/th>
 &lt;/tr>
 &lt;/thead>
 &lt;tbody>
 &lt;tr>
 &lt;td>At-rest encryption&lt;/td>
 &lt;td>data file、redo、undo、temp&lt;/td>
 &lt;td>encryption setting、keyring status&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>In-transit TLS&lt;/td>
 &lt;td>client / replica / admin connection&lt;/td>
 &lt;td>TLS mode、certificate、cipher&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>Backup encryption&lt;/td>
 &lt;td>dump、snapshot、physical backup&lt;/td>
 &lt;td>encrypted artifact、restore drill&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>Key management&lt;/td>
 &lt;td>key generation、rotation、access&lt;/td>
 &lt;td>KMS / keyring log、rotation record&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>Credential governance&lt;/td>
 &lt;td>user password、secret、rotation&lt;/td>
 &lt;td>grant review、secret age&lt;/td>
 &lt;/tr>
 &lt;/tbody>
&lt;/table>
&lt;p>這些層級要一起設計。資料檔加密後，backup 若以明文落到 object storage，保護鏈仍然破洞；TLS 開啟後，client 若允許 insecure fallback，也會失去網路保護。&lt;/p>
&lt;h2 id="keyring-boundary">Keyring Boundary&lt;/h2>
&lt;p>Keyring boundary 的核心責任是定義 MySQL 如何取得與保護 encryption key。MySQL 支援 keyring component / plugin 與外部 KMS 整合；managed MySQL 可能由 provider 接管 key storage。&lt;/p>
&lt;table>
 &lt;thead>
 &lt;tr>
 &lt;th>部署型態&lt;/th>
 &lt;th>key 責任&lt;/th>
 &lt;th>審查問題&lt;/th>
 &lt;/tr>
 &lt;/thead>
 &lt;tbody>
 &lt;tr>
 &lt;td>Self-managed&lt;/td>
 &lt;td>自行部署 keyring / KMS&lt;/td>
 &lt;td>key file permission、backup、rotation&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>Managed MySQL&lt;/td>
 &lt;td>provider KMS / customer-managed key&lt;/td>
 &lt;td>region、rotation、audit、restore&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>Container lab&lt;/td>
 &lt;td>dev-only keyring&lt;/td>
 &lt;td>避免和 production policy 混用&lt;/td>
 &lt;/tr>
 &lt;/tbody>
&lt;/table>
&lt;p>Keyring 要進入 backup / restore drill。還原 database 時，只有 data file 而沒有對應 key，restore 會失敗；runbook 要保存 key dependency 與 emergency access。&lt;/p>
&lt;h2 id="tls-policy">TLS Policy&lt;/h2>
&lt;p>TLS policy 的核心責任是讓 client connection、replication connection 與 admin connection 都有明確安全等級。&lt;/p>
&lt;table>
 &lt;thead>
 &lt;tr>
 &lt;th>連線類型&lt;/th>
 &lt;th>建議檢查&lt;/th>
 &lt;/tr>
 &lt;/thead>
 &lt;tbody>
 &lt;tr>
 &lt;td>Application&lt;/td>
 &lt;td>require SSL、verify CA / identity&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>Replication&lt;/td>
 &lt;td>source / replica TLS、cert expiry&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>Admin&lt;/td>
 &lt;td>bastion / VPN / TLS、least privilege&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>Backup tool&lt;/td>
 &lt;td>encrypted transport、secret scope&lt;/td>
 &lt;/tr>
 &lt;/tbody>
&lt;/table>
&lt;p>TLS 驗證要包含 certificate rotation。過期憑證造成的 downtime 很常見；runbook 要記錄 CA、server cert、client cert、rotation window 與 reload / restart 條件。&lt;/p></description><content:encoded><![CDATA[<p>MySQL encryption / TLS / key management 的核心責任是把資料庫保護拆成儲存加密、傳輸加密、<a href="/blog/backend/knowledge-cards/key-management/" data-link-title="Key Management" data-link-desc="說明加密金鑰如何產生、保存、輪替，以及還原時如何依賴金鑰">金鑰生命週期</a>與連線憑證治理。Encryption 是多層保護設計；它涵蓋 InnoDB tablespace、redo / undo、binary log、backup artifact、client connection 與 keyring。</p>
<p>本文的判讀錨點是：加密要服務於 threat model。若風險是磁碟遺失，<a href="/blog/backend/knowledge-cards/at-rest-encryption/" data-link-title="At-Rest Encryption" data-link-desc="說明資料落到儲存媒介前的加密層，以及它對應的威脅模型">at-rest encryption</a> 是重點；若風險是網路攔截，TLS 是重點；若風險是內部濫用，還需要 role、audit、masking 與 SIEM。</p>
<p>官方文件路由的核心責任是固定 MySQL 8.4 security claim。實作前先查 <a href="https://dev.mysql.com/doc/refman/8.2/en/innodb-data-encryption.html">InnoDB data-at-rest encryption</a>、<a href="https://dev.mysql.com/doc/refman/8.0/en/keyring.html">MySQL keyring</a> 與 <a href="https://dev.mysql.com/doc/refman/8.4/en/show-binary-log-status.html">SHOW BINARY LOG STATUS</a>；本文最後檢查日是 2026-05-22。</p>
<h2 id="protection-layers">Protection Layers</h2>
<p>Protection layers 的核心責任是把保護面分層。</p>
<table>
  <thead>
      <tr>
          <th>層級</th>
          <th>主要責任</th>
          <th>Evidence</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>At-rest encryption</td>
          <td>data file、redo、undo、temp</td>
          <td>encryption setting、keyring status</td>
      </tr>
      <tr>
          <td>In-transit TLS</td>
          <td>client / replica / admin connection</td>
          <td>TLS mode、certificate、cipher</td>
      </tr>
      <tr>
          <td>Backup encryption</td>
          <td>dump、snapshot、physical backup</td>
          <td>encrypted artifact、restore drill</td>
      </tr>
      <tr>
          <td>Key management</td>
          <td>key generation、rotation、access</td>
          <td>KMS / keyring log、rotation record</td>
      </tr>
      <tr>
          <td>Credential governance</td>
          <td>user password、secret、rotation</td>
          <td>grant review、secret age</td>
      </tr>
  </tbody>
</table>
<p>這些層級要一起設計。資料檔加密後，backup 若以明文落到 object storage，保護鏈仍然破洞；TLS 開啟後，client 若允許 insecure fallback，也會失去網路保護。</p>
<h2 id="keyring-boundary">Keyring Boundary</h2>
<p>Keyring boundary 的核心責任是定義 MySQL 如何取得與保護 encryption key。MySQL 支援 keyring component / plugin 與外部 KMS 整合；managed MySQL 可能由 provider 接管 key storage。</p>
<table>
  <thead>
      <tr>
          <th>部署型態</th>
          <th>key 責任</th>
          <th>審查問題</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>Self-managed</td>
          <td>自行部署 keyring / KMS</td>
          <td>key file permission、backup、rotation</td>
      </tr>
      <tr>
          <td>Managed MySQL</td>
          <td>provider KMS / customer-managed key</td>
          <td>region、rotation、audit、restore</td>
      </tr>
      <tr>
          <td>Container lab</td>
          <td>dev-only keyring</td>
          <td>避免和 production policy 混用</td>
      </tr>
  </tbody>
</table>
<p>Keyring 要進入 backup / restore drill。還原 database 時，只有 data file 而沒有對應 key，restore 會失敗；runbook 要保存 key dependency 與 emergency access。</p>
<h2 id="tls-policy">TLS Policy</h2>
<p>TLS policy 的核心責任是讓 client connection、replication connection 與 admin connection 都有明確安全等級。</p>
<table>
  <thead>
      <tr>
          <th>連線類型</th>
          <th>建議檢查</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>Application</td>
          <td>require SSL、verify CA / identity</td>
      </tr>
      <tr>
          <td>Replication</td>
          <td>source / replica TLS、cert expiry</td>
      </tr>
      <tr>
          <td>Admin</td>
          <td>bastion / VPN / TLS、least privilege</td>
      </tr>
      <tr>
          <td>Backup tool</td>
          <td>encrypted transport、secret scope</td>
      </tr>
  </tbody>
</table>
<p>TLS 驗證要包含 certificate rotation。過期憑證造成的 downtime 很常見；runbook 要記錄 CA、server cert、client cert、rotation window 與 reload / restart 條件。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-sql" data-lang="sql"><span class="line"><span class="ln">1</span><span class="cl"><span class="k">SHOW</span><span class="w"> </span><span class="n">VARIABLES</span><span class="w"> </span><span class="k">LIKE</span><span class="w"> </span><span class="s1">&#39;require_secure_transport&#39;</span><span class="p">;</span><span class="w">
</span></span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="w"></span><span class="k">SHOW</span><span class="w"> </span><span class="n">STATUS</span><span class="w"> </span><span class="k">LIKE</span><span class="w"> </span><span class="s1">&#39;Ssl_cipher&#39;</span><span class="p">;</span></span></span></code></pre></div><p>這些查詢只能提供 connection 層 evidence。正式驗證還要從 client 設定確認 <code>ssl-mode</code> 是否驗證 CA / identity。</p>
<h2 id="backup-and-binlog-encryption">Backup and Binlog Encryption</h2>
<p>Backup and binlog encryption 的核心責任是保護資料離開 primary 後的生命週期。MySQL backup、binlog、logical dump、object storage、replica seed 都可能含敏感資料。</p>
<table>
  <thead>
      <tr>
          <th>Artifact</th>
          <th>保護方式</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>Logical dump</td>
          <td>client-side encryption、storage policy</td>
      </tr>
      <tr>
          <td>Physical backup</td>
          <td>backup tool encryption、KMS</td>
      </tr>
      <tr>
          <td>Binlog</td>
          <td>encrypted storage、restricted access</td>
      </tr>
      <tr>
          <td>Snapshot</td>
          <td>volume encryption、snapshot policy</td>
      </tr>
      <tr>
          <td>Restore copy</td>
          <td>isolated environment、secret scoping</td>
      </tr>
  </tbody>
</table>
<p>Restore drill 要確認加密 artifact 可被解密並啟動。只有成功產出 encrypted backup，還不足以證明災難時能恢復。</p>
<h2 id="rotation-runbook">Rotation Runbook</h2>
<p>Rotation runbook 的核心責任是讓 key、certificate、password 都可定期更換。</p>
<ol>
<li>Inventory：列出 DB user、TLS cert、KMS key、backup key。</li>
<li>Impact：確認哪些 client / replica / backup job 使用它。</li>
<li>Staging：先在 staging 旋轉並跑 smoke test。</li>
<li>Rollout：使用雙憑證 / 雙 secret window。</li>
<li>Validation：查連線、replication、backup、restore。</li>
<li>Cleanup：移除舊 key / cert / secret。</li>
</ol>
<p>Rotation 要設 calendar 與 owner。安全設定長期無人輪替時，incident 後會難以判斷 exposure window。</p>
<h2 id="failure-modes">Failure Modes</h2>
<p>Failure modes 的核心責任是提前列出加密常見事故。</p>
<table>
  <thead>
      <tr>
          <th>Failure mode</th>
          <th>判讀訊號</th>
          <th>修正方向</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>TLS fallback</td>
          <td>client 仍可明文連線</td>
          <td>require secure transport、client verify</td>
      </tr>
      <tr>
          <td>Cert expiry</td>
          <td>application connection failure</td>
          <td>rotation alert、dual cert window</td>
      </tr>
      <tr>
          <td>Missing keyring</td>
          <td>restore / startup failure</td>
          <td>key backup、KMS access drill</td>
      </tr>
      <tr>
          <td>Plain backup</td>
          <td>storage artifact 未加密</td>
          <td>backup pipeline policy</td>
      </tr>
      <tr>
          <td>Overbroad secret</td>
          <td>admin / app 共用 credential</td>
          <td>role split、secret rotation</td>
      </tr>
  </tbody>
</table>
<p>安全 runbook 要和 audit log 串接。Key rotation、failed TLS、privilege change、restore access 都應留下可追溯紀錄。</p>
<h2 id="下一步路由">下一步路由</h2>
<p>Encryption / TLS / key management 完成後，操作證據讀 <a href="../audit-log-siem/">Audit Log + SIEM</a>；備份恢復讀 <a href="../pitr-backup/">PITR / Backup</a>；資料保護治理讀 <a href="/blog/backend/07-security-data-protection/data-protection-and-masking-governance/" data-link-title="7.4 資料保護與遮罩治理" data-link-desc="以問題驅動方式整理資料分級、遮罩、匯出與備份治理">Data Protection</a>。</p>
]]></content:encoded></item><item><title>mTLS 實際怎麼設定與運維：CA 階層、憑證生命週期、撤銷機制</title><link>https://tarrragon.github.io/blog/work-log/mtls-%E5%AF%A6%E9%9A%9B%E6%80%8E%E9%BA%BC%E8%A8%AD%E5%AE%9A%E8%88%87%E9%81%8B%E7%B6%ADca-%E9%9A%8E%E5%B1%A4%E6%86%91%E8%AD%89%E7%94%9F%E5%91%BD%E9%80%B1%E6%9C%9F%E6%92%A4%E9%8A%B7%E6%A9%9F%E5%88%B6/</link><pubDate>Mon, 18 May 2026 00:00:00 +0000</pubDate><guid>https://tarrragon.github.io/blog/work-log/mtls-%E5%AF%A6%E9%9A%9B%E6%80%8E%E9%BA%BC%E8%A8%AD%E5%AE%9A%E8%88%87%E9%81%8B%E7%B6%ADca-%E9%9A%8E%E5%B1%A4%E6%86%91%E8%AD%89%E7%94%9F%E5%91%BD%E9%80%B1%E6%9C%9F%E6%92%A4%E9%8A%B7%E6%A9%9F%E5%88%B6/</guid><description>&lt;h2 id="mtls-這篇要解決什麼">mTLS 這篇要解決什麼&lt;/h2>
&lt;p>mTLS 的核心是把系統身分綁到 X.509 憑證與私鑰，而不是可重用的 shared secret。介紹文章常把它簡化成「雙向 TLS 憑證、適合金融醫療」，但實際落地時，設計責任會立刻延伸到 CA 階層、憑證生命週期、撤銷與基礎設施整合：&lt;/p>
&lt;ul>
&lt;li>自簽 CA 還是商業 CA？&lt;/li>
&lt;li>憑證放哪、怎麼 rotate？&lt;/li>
&lt;li>怎麼撤銷？CRL 還是 OCSP 還是 short-lived cert？&lt;/li>
&lt;li>nginx 設定怎麼寫、service mesh 怎麼整合？&lt;/li>
&lt;li>跟 API Key、OAuth 比，什麼情境適合承擔 mTLS 的運維成本？&lt;/li>
&lt;/ul>
&lt;p>這些是 mTLS 第一次部署就要處理的基本問題。若只知道「雙向憑證」而沒有 lifecycle 設計，系統會在過期、撤銷或 mesh 升級時失去可預測性。&lt;/p>
&lt;p>本文拆解 mTLS 的工程實務：&lt;/p>
&lt;ol>
&lt;li>&lt;strong>CA 階層&lt;/strong>：為什麼要分層、Root CA / Intermediate CA / Leaf cert&lt;/li>
&lt;li>&lt;strong>憑證生命週期&lt;/strong>：簽發、儲存、rotation、撤銷&lt;/li>
&lt;li>&lt;strong>基礎設施整合&lt;/strong>：nginx / envoy / service mesh 設定模式&lt;/li>
&lt;li>&lt;strong>跟其他 Layer 2 方案的取捨&lt;/strong>：何時 mTLS 才是對的選擇&lt;/li>
&lt;/ol>
&lt;blockquote>
&lt;p>&lt;strong>本文位置&lt;/strong>：本文是 &lt;a href="https://tarrragon.github.io/blog/work-log/api-%E8%AA%8D%E8%AD%89%E7%9A%84%E4%B8%89%E5%B1%A4%E4%BF%A1%E4%BB%BB%E9%82%8A%E7%95%8C%E4%BD%BF%E7%94%A8%E8%80%85%E7%B3%BB%E7%B5%B1%E8%B7%A8%E7%B3%BB%E7%B5%B1-provisioning/" data-link-title="API 認證的三層信任邊界：使用者、系統、跨系統 Provisioning" data-link-desc="API 認證的信任邊界分層（Bearer Token / Shared Secret / Provisioning）：各層的洩漏後果與撤銷方式，以及混用造成的設計失效模式。">API 認證的三層信任邊界&lt;/a> Layer 2 的深入篇之一。主文聚焦「為什麼系統間要獨立 credential」、本文聚焦「用 mTLS 實作這層的具體工程細節」。&lt;/p>&lt;/blockquote>
&lt;hr>
&lt;h2 id="mtls-解什麼問題">mTLS 解什麼問題&lt;/h2>
&lt;h3 id="跟一般-tls-的差異">跟一般 TLS 的差異&lt;/h3>
&lt;p>一般 TLS（HTTPS）是&lt;strong>單向認證&lt;/strong>：client 驗證 server 身分，server 再透過 API Key、token 或 session 辨識 client。&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-text" data-lang="text">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">client ────&amp;#34;我要連 example.com&amp;#34;────▶ server
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">2&lt;/span>&lt;span class="cl"> ◀───server 出示憑證───────── server
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">3&lt;/span>&lt;span class="cl"> 驗證:&amp;#34;這是真的 example.com 嗎&amp;#34;
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">4&lt;/span>&lt;span class="cl"> ↓
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">5&lt;/span>&lt;span class="cl"> 建立加密通道&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>client 驗證 server、但 server 不驗證 client。Client 是匿名的、靠後續 API Key / token 認證。&lt;/p>
&lt;p>mTLS 加上&lt;strong>反向驗證&lt;/strong>：server 也在 TLS handshake 階段驗證 client 憑證，把系統身分提前到連線層建立。&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-text" data-lang="text">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">client ──&amp;#34;我要連 example.com、這是我的憑證&amp;#34;──▶ server
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">2&lt;/span>&lt;span class="cl"> ◀──server 出示憑證───────────────────── server
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">3&lt;/span>&lt;span class="cl"> 
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">4&lt;/span>&lt;span class="cl"> 雙方驗證對方憑證：
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">5&lt;/span>&lt;span class="cl"> client: &amp;#34;這是真的 example.com 嗎&amp;#34;
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">6&lt;/span>&lt;span class="cl"> server: &amp;#34;這個 client 是被授權的嗎&amp;#34;
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">7&lt;/span>&lt;span class="cl"> ↓
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">8&lt;/span>&lt;span class="cl"> 建立加密通道、且雙方都已認證&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>每個 client 有自己的憑證、server 用 CA 信任鏈驗證 client 憑證是否合法。&lt;strong>Client 的身分綁定在 X.509 憑證上、不需要額外的 API Key&lt;/strong>。&lt;/p>
&lt;h3 id="mtls-解的具體威脅">mTLS 解的具體威脅&lt;/h3>
&lt;table>
 &lt;thead>
 &lt;tr>
 &lt;th>威脅&lt;/th>
 &lt;th>一般 TLS + API Key&lt;/th>
 &lt;th>mTLS&lt;/th>
 &lt;/tr>
 &lt;/thead>
 &lt;tbody>
 &lt;tr>
 &lt;td>中間人攔截&lt;/td>
 &lt;td>TLS 已解&lt;/td>
 &lt;td>TLS 已解&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>攻擊者用洩漏的 API Key 假冒 client&lt;/td>
 &lt;td>漏&lt;/td>
 &lt;td>需 client 私鑰、無法只憑網路觀察取得&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>API Key 寫在 client code、被反編譯&lt;/td>
 &lt;td>漏&lt;/td>
 &lt;td>私鑰可放硬體（HSM / TPM / Secure Enclave）&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>Server 端 per-client credential 被攻陷&lt;/td>
 &lt;td>漏（API Key DB 外流）&lt;/td>
 &lt;td>server 無 per-client secret、僅 CA trust chain 暴露&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>Client 端被植入、用合法身分滲透&lt;/td>
 &lt;td>部分（rate limit）&lt;/td>
 &lt;td>同樣（需依靠撤銷機制）&lt;/td>
 &lt;/tr>
 &lt;/tbody>
&lt;/table>
&lt;p>mTLS 的核心優勢是：&lt;strong>client 端的 private key 是 scope-bound、不跨系統共用&lt;/strong>。私鑰理論上不離開 client，且驗證憑藉的是 CA 簽章而非可重用字串；相較 shared API Key，一個 client 的私鑰外流通常可被限制在該 client 的憑證與授權範圍內。&lt;/p></description><content:encoded><![CDATA[<h2 id="mtls-這篇要解決什麼">mTLS 這篇要解決什麼</h2>
<p>mTLS 的核心是把系統身分綁到 X.509 憑證與私鑰，而不是可重用的 shared secret。介紹文章常把它簡化成「雙向 TLS 憑證、適合金融醫療」，但實際落地時，設計責任會立刻延伸到 CA 階層、憑證生命週期、撤銷與基礎設施整合：</p>
<ul>
<li>自簽 CA 還是商業 CA？</li>
<li>憑證放哪、怎麼 rotate？</li>
<li>怎麼撤銷？CRL 還是 OCSP 還是 short-lived cert？</li>
<li>nginx 設定怎麼寫、service mesh 怎麼整合？</li>
<li>跟 API Key、OAuth 比，什麼情境適合承擔 mTLS 的運維成本？</li>
</ul>
<p>這些是 mTLS 第一次部署就要處理的基本問題。若只知道「雙向憑證」而沒有 lifecycle 設計，系統會在過期、撤銷或 mesh 升級時失去可預測性。</p>
<p>本文拆解 mTLS 的工程實務：</p>
<ol>
<li><strong>CA 階層</strong>：為什麼要分層、Root CA / Intermediate CA / Leaf cert</li>
<li><strong>憑證生命週期</strong>：簽發、儲存、rotation、撤銷</li>
<li><strong>基礎設施整合</strong>：nginx / envoy / service mesh 設定模式</li>
<li><strong>跟其他 Layer 2 方案的取捨</strong>：何時 mTLS 才是對的選擇</li>
</ol>
<blockquote>
<p><strong>本文位置</strong>：本文是 <a href="/blog/work-log/api-%E8%AA%8D%E8%AD%89%E7%9A%84%E4%B8%89%E5%B1%A4%E4%BF%A1%E4%BB%BB%E9%82%8A%E7%95%8C%E4%BD%BF%E7%94%A8%E8%80%85%E7%B3%BB%E7%B5%B1%E8%B7%A8%E7%B3%BB%E7%B5%B1-provisioning/" data-link-title="API 認證的三層信任邊界：使用者、系統、跨系統 Provisioning" data-link-desc="API 認證的信任邊界分層（Bearer Token / Shared Secret / Provisioning）：各層的洩漏後果與撤銷方式，以及混用造成的設計失效模式。">API 認證的三層信任邊界</a> Layer 2 的深入篇之一。主文聚焦「為什麼系統間要獨立 credential」、本文聚焦「用 mTLS 實作這層的具體工程細節」。</p></blockquote>
<hr>
<h2 id="mtls-解什麼問題">mTLS 解什麼問題</h2>
<h3 id="跟一般-tls-的差異">跟一般 TLS 的差異</h3>
<p>一般 TLS（HTTPS）是<strong>單向認證</strong>：client 驗證 server 身分，server 再透過 API Key、token 或 session 辨識 client。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="ln">1</span><span class="cl">client ────&#34;我要連 example.com&#34;────▶ server
</span></span><span class="line"><span class="ln">2</span><span class="cl">       ◀───server 出示憑證───────── server
</span></span><span class="line"><span class="ln">3</span><span class="cl">       驗證:&#34;這是真的 example.com 嗎&#34;
</span></span><span class="line"><span class="ln">4</span><span class="cl">       ↓
</span></span><span class="line"><span class="ln">5</span><span class="cl">       建立加密通道</span></span></code></pre></div><p>client 驗證 server、但 server 不驗證 client。Client 是匿名的、靠後續 API Key / token 認證。</p>
<p>mTLS 加上<strong>反向驗證</strong>：server 也在 TLS handshake 階段驗證 client 憑證，把系統身分提前到連線層建立。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="ln">1</span><span class="cl">client ──&#34;我要連 example.com、這是我的憑證&#34;──▶ server
</span></span><span class="line"><span class="ln">2</span><span class="cl">       ◀──server 出示憑證───────────────────── server
</span></span><span class="line"><span class="ln">3</span><span class="cl">       
</span></span><span class="line"><span class="ln">4</span><span class="cl">       雙方驗證對方憑證：
</span></span><span class="line"><span class="ln">5</span><span class="cl">       client: &#34;這是真的 example.com 嗎&#34;
</span></span><span class="line"><span class="ln">6</span><span class="cl">       server: &#34;這個 client 是被授權的嗎&#34;
</span></span><span class="line"><span class="ln">7</span><span class="cl">       ↓
</span></span><span class="line"><span class="ln">8</span><span class="cl">       建立加密通道、且雙方都已認證</span></span></code></pre></div><p>每個 client 有自己的憑證、server 用 CA 信任鏈驗證 client 憑證是否合法。<strong>Client 的身分綁定在 X.509 憑證上、不需要額外的 API Key</strong>。</p>
<h3 id="mtls-解的具體威脅">mTLS 解的具體威脅</h3>
<table>
  <thead>
      <tr>
          <th>威脅</th>
          <th>一般 TLS + API Key</th>
          <th>mTLS</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>中間人攔截</td>
          <td>TLS 已解</td>
          <td>TLS 已解</td>
      </tr>
      <tr>
          <td>攻擊者用洩漏的 API Key 假冒 client</td>
          <td>漏</td>
          <td>需 client 私鑰、無法只憑網路觀察取得</td>
      </tr>
      <tr>
          <td>API Key 寫在 client code、被反編譯</td>
          <td>漏</td>
          <td>私鑰可放硬體（HSM / TPM / Secure Enclave）</td>
      </tr>
      <tr>
          <td>Server 端 per-client credential 被攻陷</td>
          <td>漏（API Key DB 外流）</td>
          <td>server 無 per-client secret、僅 CA trust chain 暴露</td>
      </tr>
      <tr>
          <td>Client 端被植入、用合法身分滲透</td>
          <td>部分（rate limit）</td>
          <td>同樣（需依靠撤銷機制）</td>
      </tr>
  </tbody>
</table>
<p>mTLS 的核心優勢是：<strong>client 端的 private key 是 scope-bound、不跨系統共用</strong>。私鑰理論上不離開 client，且驗證憑藉的是 CA 簽章而非可重用字串；相較 shared API Key，一個 client 的私鑰外流通常可被限制在該 client 的憑證與授權範圍內。</p>
<p>代價是：<strong>PKI 基礎建設複雜</strong>、憑證生命週期管理重、運維成本高。</p>
<hr>
<h2 id="ca-階層設計">CA 階層設計</h2>
<h3 id="為什麼要分層">為什麼要分層</h3>
<p>CA 分層的核心責任是降低最高信任根的暴露頻率。直覺做法是「用一張 Root CA 直接簽 client 憑證」：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="ln">1</span><span class="cl">Root CA ──signs──▶ client-A.crt
</span></span><span class="line"><span class="ln">2</span><span class="cl">        ──signs──▶ client-B.crt
</span></span><span class="line"><span class="ln">3</span><span class="cl">        ──signs──▶ client-C.crt
</span></span><span class="line"><span class="ln">4</span><span class="cl">        ...</span></span></code></pre></div><p>Root CA 私鑰是整個 PKI 的最高信任根，通常需要離線、HSM 與多人簽核。它一旦洩漏，所有信任這個 Root 的系統都要重新建立信任；Root CA 又通常活 10-20 年，撤換成本極高。</p>
<p>如果 Root CA 私鑰要常常拿出來簽 client cert、暴露風險就大幅提高。</p>
<p>解法：<strong>分層</strong>。Root CA 只簽 Intermediate CA、Intermediate CA 負責日常簽發 client cert：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="ln">1</span><span class="cl">Root CA (offline, 20 年)
</span></span><span class="line"><span class="ln">2</span><span class="cl">    ↓ signs (一次性 / 5-10 年)
</span></span><span class="line"><span class="ln">3</span><span class="cl">Intermediate CA (online, 1-5 年)
</span></span><span class="line"><span class="ln">4</span><span class="cl">    ↓ signs (日常、每張 90 天-1 年)
</span></span><span class="line"><span class="ln">5</span><span class="cl">Leaf certificates (client / server)</span></span></code></pre></div><p>Root CA 通常<strong>完全離線</strong>（air-gapped 機器、硬體 HSM）、私鑰一年只拿出來簽幾次（簽 Intermediate）。Intermediate CA 才是 online、處理日常簽發。</p>
<h3 id="階層帶來的好處">階層帶來的好處</h3>
<table>
  <thead>
      <tr>
          <th>好處</th>
          <th>機制</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>Root CA 私鑰暴露次數降到最低</td>
          <td>只在簽 Intermediate 時用、其他時間離線</td>
      </tr>
      <tr>
          <td>Intermediate 被攻陷可撤換</td>
          <td>Root CA 撤掉該 Intermediate、用新 Intermediate 簽</td>
      </tr>
      <tr>
          <td>可按用途分 Intermediate</td>
          <td>一個給 server cert、一個給 client cert、一個給 internal services</td>
      </tr>
      <tr>
          <td>短 chain 仍可驗證</td>
          <td>client 只信任 Root CA、Intermediate 在 chain 中傳遞</td>
      </tr>
  </tbody>
</table>
<h3 id="三種典型部署模式">三種典型部署模式</h3>
<h4 id="模式-a自管-ca">模式 A：自管 CA</h4>
<p>完全自己跑 CA infra：</p>
<ul>
<li>Root CA：離線 HSM、年度作業簽 Intermediate</li>
<li>Intermediate CA：online、用工具如 <code>step-ca</code>、<code>cfssl</code>、<code>Vault PKI</code>、<code>Smallstep</code></li>
<li>Leaf cert：自動化簽發、短 TTL</li>
</ul>
<p>適合：純內部系統、不需 public trust、要完全控制 CA infrastructure。</p>
<h4 id="模式-b商業-cadigicert--sectigo--entrust">模式 B：商業 CA（DigiCert / Sectigo / Entrust）</h4>
<p>買商業 CA 服務、商業 CA 已預埋進所有 OS / browser trust store：</p>
<ul>
<li>適合：需要 public trust（HTTPS server cert、SSL/TLS for end users）</li>
<li>mTLS client cert 通常在自己的封閉系統內驗證，public trust 的價值較低，因此較少使用商業 CA</li>
</ul>
<h4 id="模式-ccloud-managed-pki">模式 C：Cloud-managed PKI</h4>
<p>雲廠商提供 managed PKI：</p>
<ul>
<li>AWS Private CA（ACM PCA）— managed Root + Intermediate</li>
<li>GCP Certificate Authority Service</li>
<li>Azure Key Vault Certificates</li>
</ul>
<p>適合：已在某朵雲、不想自管 CA infra、可接受 vendor lock。</p>
<h3 id="自管-ca-的最小工具鏈">自管 CA 的最小工具鏈</h3>
<p>如果走模式 A、推薦工具：</p>
<table>
  <thead>
      <tr>
          <th>工具</th>
          <th>用途</th>
          <th>特性</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td><strong>step-ca</strong></td>
          <td>Lightweight CA server、支援 ACME</td>
          <td>Smallstep 開源、設定簡單</td>
      </tr>
      <tr>
          <td><strong>HashiCorp Vault PKI</strong></td>
          <td>Vault 內建 PKI engine</td>
          <td>整合 Vault 既有 secret 管理</td>
      </tr>
      <tr>
          <td><strong>cfssl</strong></td>
          <td>Cloudflare 的 CA toolkit</td>
          <td>CLI-based、適合 build pipeline</td>
      </tr>
      <tr>
          <td><strong>OpenSSL</strong></td>
          <td>純手工建 CA</td>
          <td>維運成本高、適合學習與小規模</td>
      </tr>
  </tbody>
</table>
<p><code>step-ca</code> 是最低門檻的起手選擇 — 一行 <code>step ca init</code> 建好整套 CA、自動發 ACME 給 client。</p>
<hr>
<h2 id="憑證生命週期">憑證生命週期</h2>
<h3 id="簽發">簽發</h3>
<p><strong>Server cert 簽發流程</strong>：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="ln">1</span><span class="cl">1. Server 產生 private key (RSA 2048+ 或 ECDSA P-256)
</span></span><span class="line"><span class="ln">2</span><span class="cl">2. Server 用 private key 產生 CSR (Certificate Signing Request)
</span></span><span class="line"><span class="ln">3</span><span class="cl">3. CSR 送給 CA
</span></span><span class="line"><span class="ln">4</span><span class="cl">4. CA 驗證 CSR 內容（DN、SAN、用途）
</span></span><span class="line"><span class="ln">5</span><span class="cl">5. CA 用 Intermediate CA 私鑰簽 cert
</span></span><span class="line"><span class="ln">6</span><span class="cl">6. 把簽好的 cert 回給 server
</span></span><span class="line"><span class="ln">7</span><span class="cl">7. Server 部署 cert + 自己的 private key</span></span></code></pre></div><p><strong>Client cert 簽發流程</strong>：跟 server 一樣，但 SAN 通常是 client identifier（service name、device ID），而非 hostname。</p>
<h3 id="私鑰留在產生端">私鑰留在產生端</h3>
<p>關鍵安全原則是：<strong>private key 在哪產生、就只在那裡存活</strong>。CA 只收 CSR（裡面只有 public key），簽完 cert 回去；client private key 全程留在 client 的受控環境。</p>
<p><strong>失效模式</strong>：</p>
<ul>
<li>CA 幫 client 產生 keypair、把 private key 跟 cert 一起寄給 client（密鑰在 CA 經手了）</li>
<li>把 private key 跟 cert 打包成 PKCS12 用 email 寄</li>
<li>把 keypair 放進公共 git repo</li>
</ul>
<p><strong>操作路由</strong>：</p>
<ul>
<li>Client 端產生 keypair、只送 CSR 給 CA（CSR 只含 public key）、簽完 cert 回來、private key 全程不離開 client</li>
</ul>
<h3 id="儲存">儲存</h3>
<p>Private key 的儲存等級：</p>
<table>
  <thead>
      <tr>
          <th>方式</th>
          <th>安全等級</th>
          <th>適合</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>Plain file（chmod 600）</td>
          <td>低</td>
          <td>dev / staging、無 HSM 的低風險環境</td>
      </tr>
      <tr>
          <td>OS keystore（Keychain / Windows Cert Store）</td>
          <td>中</td>
          <td>desktop client、laptop</td>
      </tr>
      <tr>
          <td>HSM（hardware security module）</td>
          <td>高</td>
          <td>金融、政府、私鑰永不離開硬體</td>
      </tr>
      <tr>
          <td>Cloud KMS（AWS KMS / GCP KMS）</td>
          <td>中-高</td>
          <td>cloud-native、private key 進 KMS、簽章用 API</td>
      </tr>
      <tr>
          <td>TPM / Secure Enclave</td>
          <td>高</td>
          <td>mobile / IoT、跟硬體綁定</td>
      </tr>
  </tbody>
</table>
<p>Production server cert 私鑰至少應該 OS 層保護（檔案權限 + 加密磁碟）、高敏感場景上 HSM。</p>
<h3 id="rotation">Rotation</h3>
<p>mTLS 憑證的 rotation 跟 <a href="/blog/work-log/shared-secret-%E5%AE%89%E5%85%A8%E8%BC%AA%E6%9B%BF%E8%A8%AD%E8%A8%88%E9%9B%99%E5%AF%86%E9%81%8E%E6%B8%A1%E6%9C%9F%E8%87%AA%E5%8B%95%E5%8C%96%E8%88%87%E7%B7%8A%E6%80%A5%E6%B5%81%E7%A8%8B/" data-link-title="Shared Secret 安全輪替設計：雙密過渡期、自動化與緊急流程" data-link-desc="系統間 Shared Secret 輪替的核心機制：dual-secret rollover、自動化工具比較（AWS Secrets Manager / Vault / GCP）、緊急 rotation 流程與多 client 環境的失敗模式。">shared secret rotation</a> 概念類似、但有具體差異：</p>
<table>
  <thead>
      <tr>
          <th>維度</th>
          <th>Shared Secret</th>
          <th>mTLS Cert</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>過期機制</td>
          <td>沒有、要手動 rotate</td>
          <td>內建 <code>notBefore</code> / <code>notAfter</code>、自動過期</td>
      </tr>
      <tr>
          <td>雙密期</td>
          <td>兩把同時 valid</td>
          <td>過渡期 server 同時持有舊 cert（未過期）+ 新 cert（已簽發）、自動有效</td>
      </tr>
      <tr>
          <td>Rotation 觸發</td>
          <td>排程</td>
          <td>排程 + 過期前自動</td>
      </tr>
  </tbody>
</table>
<p>實務上的 rotation 模式：</p>
<p><strong>短 TTL + 自動續發（推薦）</strong>：</p>
<ul>
<li>Leaf cert TTL 設短（24 小時 ~ 7 天）</li>
<li>用 ACME protocol（如 Let&rsquo;s Encrypt 的協定）讓 client 自動續發</li>
<li>rotation 由續發流程承擔，過期前自動換新</li>
</ul>
<p>工具：<code>cert-manager</code>（K8s）、<code>step-ca</code> + <code>step</code>、<code>certbot</code>。</p>
<p><strong>中 TTL + 半自動（傳統）</strong>：</p>
<ul>
<li>TTL 1 年、年度手動 rotation</li>
<li>用工具列管所有 cert 的 <code>notAfter</code>、過期前 30 天自動告警</li>
<li>適合舊架構、無法跑短 TTL 的場景</li>
</ul>
<p><strong>長 TTL（不建議）</strong>：</p>
<ul>
<li>TTL 多年、近乎不 rotate</li>
<li>私鑰暴露窗極長、被洩漏到察覺的時間差大</li>
<li>唯一情境：IoT 設備、無法 OTA 更新</li>
</ul>
<h3 id="撤銷">撤銷</h3>
<p>當 cert 在 <code>notAfter</code> 前需要失效（私鑰洩漏、員工離職、合約終止）、需要撤銷機制。有三種主流方案：</p>
<h4 id="crlcertificate-revocation-list">CRL（Certificate Revocation List）</h4>
<p>CA 維護一份「<strong>已撤銷憑證 list</strong>」、定期發佈（小時級到天級）。Client 端要：</p>
<ol>
<li>下載最新 CRL</li>
<li>連線時檢查對方 cert 是否在 CRL 內</li>
</ol>
<p><strong>優點</strong>：簡單、infrastructure 輕。</p>
<p><strong>缺點</strong>：</p>
<ul>
<li>CRL 大、下載成本高</li>
<li>Cache 期內撤銷不生效（最差幾小時）</li>
<li>Client 沒下載 CRL、撤銷完全沒效</li>
</ul>
<h4 id="ocsponline-certificate-status-protocol">OCSP（Online Certificate Status Protocol）</h4>
<p>Real-time 查詢、client 每次連線時即時 query OCSP responder：「<strong>這張 cert 還有效嗎？</strong>」</p>
<p><strong>優點</strong>：Real-time、撤銷即時生效。</p>
<p><strong>缺點</strong>：</p>
<ul>
<li>每次連線增加一次 OCSP query、延遲</li>
<li>OCSP responder 是 single point of failure</li>
<li>Privacy 顧慮（每次連線都告訴 CA 你在連誰）</li>
</ul>
<p>進階：<strong>OCSP Stapling</strong> — server 預先 query OCSP、把結果 staple 在自己的 cert chain 裡、client 不需自己 query。解決延遲跟 privacy、但 server 端要實作。</p>
<h4 id="short-lived-cert不撤銷讓它過期">Short-lived cert（不撤銷、讓它過期）</h4>
<p>最現代的做法：<strong>cert TTL 極短（小時、甚至分鐘）、不實作撤銷機制、靠過期自然失效</strong>。</p>
<p><strong>優點</strong>：</p>
<ul>
<li>可省略 CRL / OCSP infrastructure</li>
<li>撤銷窗 = TTL（小時級）、可預期</li>
<li>Privacy 友善</li>
</ul>
<p><strong>缺點</strong>：</p>
<ul>
<li>需要可靠的自動續發機制</li>
<li>Client 無法續發時直接斷線</li>
</ul>
<p>工具：<code>SPIFFE</code>/<code>SPIRE</code> 主推這個模式、cert TTL 設小時級。</p>
<h3 id="三種撤銷方案的選擇">三種撤銷方案的選擇</h3>
<table>
  <thead>
      <tr>
          <th>場景</th>
          <th>推薦撤銷方案</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>傳統 enterprise、架構變動成本高</td>
          <td>CRL（最低門檻）</td>
      </tr>
      <tr>
          <td>公開 HTTPS、需要 real-time 撤銷</td>
          <td>OCSP Stapling</td>
      </tr>
      <tr>
          <td>Cloud-native、有自動續發 infra</td>
          <td>Short-lived cert</td>
      </tr>
      <tr>
          <td>內部 service mesh</td>
          <td>Short-lived cert（mesh 自動）</td>
      </tr>
  </tbody>
</table>
<hr>
<h2 id="基礎設施整合">基礎設施整合</h2>
<h3 id="nginx-設定-mtls-server">nginx 設定 mTLS server</h3>
<p>最常見的場景：nginx 當 reverse proxy、要求 client 出示憑證。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-nginx" data-lang="nginx"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="k">server</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">    <span class="kn">listen</span> <span class="mi">443</span> <span class="s">ssl</span><span class="p">;</span>
</span></span><span class="line"><span class="ln"> 3</span><span class="cl">    <span class="kn">server_name</span> <span class="s">api.example.com</span><span class="p">;</span>
</span></span><span class="line"><span class="ln"> 4</span><span class="cl">
</span></span><span class="line"><span class="ln"> 5</span><span class="cl">    <span class="c1"># Server cert (出示給 client)
</span></span></span><span class="line"><span class="ln"> 6</span><span class="cl"><span class="c1"></span>    <span class="kn">ssl_certificate</span>     <span class="s">/etc/ssl/certs/api.crt</span><span class="p">;</span>
</span></span><span class="line"><span class="ln"> 7</span><span class="cl">    <span class="kn">ssl_certificate_key</span> <span class="s">/etc/ssl/private/api.key</span><span class="p">;</span>
</span></span><span class="line"><span class="ln"> 8</span><span class="cl">
</span></span><span class="line"><span class="ln"> 9</span><span class="cl">    <span class="c1"># 要求 client 出示憑證、用這個 CA 驗證
</span></span></span><span class="line"><span class="ln">10</span><span class="cl"><span class="c1"></span>    <span class="kn">ssl_client_certificate</span> <span class="s">/etc/ssl/ca/client-ca-chain.pem</span><span class="p">;</span>
</span></span><span class="line"><span class="ln">11</span><span class="cl">    <span class="kn">ssl_verify_client</span> <span class="no">on</span><span class="p">;</span>            <span class="c1"># 強制 client 出示憑證、否則拒絕
</span></span></span><span class="line"><span class="ln">12</span><span class="cl"><span class="c1"></span>    <span class="kn">ssl_verify_depth</span> <span class="mi">2</span><span class="p">;</span>              <span class="c1"># 驗證 chain 深度、視 PKI 階層調 (Root → Intermediate → Leaf)
</span></span></span><span class="line"><span class="ln">13</span><span class="cl"><span class="c1"></span>
</span></span><span class="line"><span class="ln">14</span><span class="cl">    <span class="kn">location</span> <span class="s">/</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">15</span><span class="cl">        <span class="c1"># 把 client cert 資訊傳給後端 application
</span></span></span><span class="line"><span class="ln">16</span><span class="cl"><span class="c1"></span>        <span class="kn">proxy_set_header</span> <span class="s">X-Client-DN</span>  <span class="nv">$ssl_client_s_dn</span><span class="p">;</span>
</span></span><span class="line"><span class="ln">17</span><span class="cl">        <span class="kn">proxy_set_header</span> <span class="s">X-Client-Verify</span> <span class="nv">$ssl_client_verify</span><span class="p">;</span>
</span></span><span class="line"><span class="ln">18</span><span class="cl">        <span class="kn">proxy_pass</span> <span class="s">http://backend</span><span class="p">;</span>
</span></span><span class="line"><span class="ln">19</span><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="ln">20</span><span class="cl"><span class="p">}</span></span></span></code></pre></div><p>關鍵 directive：</p>
<table>
  <thead>
      <tr>
          <th>Directive</th>
          <th>作用</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td><code>ssl_client_certificate</code></td>
          <td>信任的 CA chain</td>
      </tr>
      <tr>
          <td><code>ssl_verify_client on</code></td>
          <td>強制 client 出示憑證、<code>optional</code> 則彈性接受</td>
      </tr>
      <tr>
          <td><code>ssl_verify_depth</code></td>
          <td>chain 驗證深度、根據 PKI 階層調</td>
      </tr>
      <tr>
          <td><code>$ssl_client_s_dn</code></td>
          <td>傳 client cert 的 subject DN 給 backend</td>
      </tr>
  </tbody>
</table>
<h3 id="nginx-設定-mtls-client呼叫上游">nginx 設定 mTLS client（呼叫上游）</h3>
<p>當 nginx 是 client、要呼叫上游 mTLS server：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-nginx" data-lang="nginx"><span class="line"><span class="ln">1</span><span class="cl"><span class="k">location</span> <span class="s">/upstream</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl">    <span class="kn">proxy_pass</span> <span class="s">https://upstream.example.com</span><span class="p">;</span>
</span></span><span class="line"><span class="ln">3</span><span class="cl">    <span class="kn">proxy_ssl_certificate</span>     <span class="s">/etc/ssl/certs/client.crt</span><span class="p">;</span>
</span></span><span class="line"><span class="ln">4</span><span class="cl">    <span class="kn">proxy_ssl_certificate_key</span> <span class="s">/etc/ssl/private/client.key</span><span class="p">;</span>
</span></span><span class="line"><span class="ln">5</span><span class="cl">    <span class="kn">proxy_ssl_trusted_certificate</span> <span class="s">/etc/ssl/ca/upstream-ca.pem</span><span class="p">;</span>
</span></span><span class="line"><span class="ln">6</span><span class="cl">    <span class="kn">proxy_ssl_verify</span> <span class="no">on</span><span class="p">;</span>
</span></span><span class="line"><span class="ln">7</span><span class="cl"><span class="p">}</span></span></span></code></pre></div><h3 id="envoy--api-gateway-整合">Envoy / API Gateway 整合</h3>
<p>Envoy 是 service mesh 的常見 data plane、mTLS 設定模式：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="nt">listeners</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 2</span><span class="cl"><span class="w"></span>- <span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l">api_listener</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 3</span><span class="cl"><span class="w">  </span><span class="nt">address</span><span class="p">:</span><span class="w"> </span>{<span class="w"> </span><span class="nt">socket_address</span><span class="p">:</span><span class="w"> </span>{<span class="w"> </span><span class="nt">port_value</span><span class="p">:</span><span class="w"> </span><span class="m">443</span><span class="w"> </span>}<span class="w"> </span>}<span class="w">
</span></span></span><span class="line"><span class="ln"> 4</span><span class="cl"><span class="w">  </span><span class="nt">filter_chains</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 5</span><span class="cl"><span class="w">  </span>- <span class="nt">transport_socket</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 6</span><span class="cl"><span class="w">      </span><span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l">envoy.transport_sockets.tls</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 7</span><span class="cl"><span class="w">      </span><span class="nt">typed_config</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 8</span><span class="cl"><span class="w">        </span><span class="nt">&#34;@type&#34;: </span><span class="l">type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.DownstreamTlsContext</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 9</span><span class="cl"><span class="w">        </span><span class="nt">common_tls_context</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln">10</span><span class="cl"><span class="w">          </span><span class="nt">tls_certificates</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln">11</span><span class="cl"><span class="w">          </span>- <span class="nt">certificate_chain</span><span class="p">:</span><span class="w"> </span>{<span class="w"> </span><span class="nt">filename</span><span class="p">:</span><span class="w"> </span><span class="l">/etc/ssl/api.crt }</span><span class="w">
</span></span></span><span class="line"><span class="ln">12</span><span class="cl"><span class="w">            </span><span class="nt">private_key</span><span class="p">:</span><span class="w">      </span>{<span class="w"> </span><span class="nt">filename</span><span class="p">:</span><span class="w"> </span><span class="l">/etc/ssl/api.key }</span><span class="w">
</span></span></span><span class="line"><span class="ln">13</span><span class="cl"><span class="w">          </span><span class="nt">validation_context</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln">14</span><span class="cl"><span class="w">            </span><span class="nt">trusted_ca</span><span class="p">:</span><span class="w"> </span>{<span class="w"> </span><span class="nt">filename</span><span class="p">:</span><span class="w"> </span><span class="l">/etc/ssl/client-ca.pem }</span><span class="w">
</span></span></span><span class="line"><span class="ln">15</span><span class="cl"><span class="w">        </span><span class="nt">require_client_certificate</span><span class="p">:</span><span class="w"> </span><span class="kc">true</span></span></span></code></pre></div><blockquote>
<p>上方只展 inbound listener 的 <code>DownstreamTlsContext</code>。Envoy 作為 client 呼叫上游 mTLS server 時、要在對應的 cluster 配 <code>transport_socket</code> + <code>UpstreamTlsContext</code>（含 client cert + private key + trusted CA）、不在這份 listener 設定裡。</p></blockquote>
<p>跟 nginx 比、Envoy 的優勢：</p>
<ul>
<li>動態設定（xDS API、不需 reload）</li>
<li>支援 SDS（Secret Discovery Service）動態取憑證</li>
<li>跟 Istio / Linkerd 等 mesh 整合</li>
</ul>
<h3 id="service-meshistio--linkerd">Service Mesh（Istio / Linkerd）</h3>
<p>Service mesh 內建 mTLS：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"><span class="line"><span class="ln">1</span><span class="cl"><span class="c"># Istio: 強制 mesh 內所有 service 走 mTLS</span><span class="w">
</span></span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="w"></span><span class="nt">apiVersion</span><span class="p">:</span><span class="w"> </span><span class="l">security.istio.io/v1beta1</span><span class="w">
</span></span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="w"></span><span class="nt">kind</span><span class="p">:</span><span class="w"> </span><span class="l">PeerAuthentication</span><span class="w">
</span></span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="w"></span><span class="nt">metadata</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln">5</span><span class="cl"><span class="w">  </span><span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l">default</span><span class="w">
</span></span></span><span class="line"><span class="ln">6</span><span class="cl"><span class="w">  </span><span class="nt">namespace</span><span class="p">:</span><span class="w"> </span><span class="l">production</span><span class="w">
</span></span></span><span class="line"><span class="ln">7</span><span class="cl"><span class="w"></span><span class="nt">spec</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln">8</span><span class="cl"><span class="w">  </span><span class="nt">mtls</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln">9</span><span class="cl"><span class="w">    </span><span class="nt">mode</span><span class="p">:</span><span class="w"> </span><span class="l">STRICT</span></span></span></code></pre></div><p>機制：</p>
<ul>
<li>Mesh control plane（Istio: Istiod / Linkerd: identity）內建 CA、自動發每個 pod 一張 cert</li>
<li>Sidecar proxy（Envoy / Linkerd proxy）handle TLS termination、application code 完全不感</li>
<li>Cert TTL 短（Istio 預設 24 小時、視版本而定）、自動續發</li>
<li>mTLS identity 綁定 K8s ServiceAccount</li>
</ul>
<p>優點：<strong>application 完全不用改 code、不用管 cert、不用管 rotation</strong> — mesh 全包。</p>
<p>缺點：<strong>綁定整套 mesh 架構</strong>、運維 mesh 本身是大事、學習曲線陡。</p>
<h3 id="為-application-直接做-mtls">為 application 直接做 mTLS</h3>
<p>某些場景（沒 mesh、需要 application 級控制）需要 application 直接做 mTLS：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="ln">1</span><span class="cl"><span class="c1"># Python requests 範例 - mTLS client</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="kn">import</span> <span class="nn">requests</span>
</span></span><span class="line"><span class="ln">3</span><span class="cl">
</span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="n">response</span> <span class="o">=</span> <span class="n">requests</span><span class="o">.</span><span class="n">get</span><span class="p">(</span>
</span></span><span class="line"><span class="ln">5</span><span class="cl">    <span class="s1">&#39;https://api.example.com/data&#39;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">6</span><span class="cl">    <span class="n">cert</span><span class="o">=</span><span class="p">(</span><span class="s1">&#39;/path/to/client.crt&#39;</span><span class="p">,</span> <span class="s1">&#39;/path/to/client.key&#39;</span><span class="p">),</span>
</span></span><span class="line"><span class="ln">7</span><span class="cl">    <span class="n">verify</span><span class="o">=</span><span class="s1">&#39;/path/to/server-ca.pem&#39;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">8</span><span class="cl"><span class="p">)</span></span></span></code></pre></div>




<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-go" data-lang="go"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="c1">// Go net/http 範例 - mTLS client</span>
</span></span><span class="line"><span class="ln"> 2</span><span class="cl"><span class="nx">cert</span><span class="p">,</span> <span class="nx">err</span> <span class="o">:=</span> <span class="nx">tls</span><span class="p">.</span><span class="nf">LoadX509KeyPair</span><span class="p">(</span><span class="s">&#34;client.crt&#34;</span><span class="p">,</span> <span class="s">&#34;client.key&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 3</span><span class="cl"><span class="k">if</span> <span class="nx">err</span> <span class="o">!=</span> <span class="kc">nil</span> <span class="p">{</span> <span class="k">return</span> <span class="nx">err</span> <span class="p">}</span>
</span></span><span class="line"><span class="ln"> 4</span><span class="cl">
</span></span><span class="line"><span class="ln"> 5</span><span class="cl"><span class="nx">caCert</span><span class="p">,</span> <span class="nx">err</span> <span class="o">:=</span> <span class="nx">os</span><span class="p">.</span><span class="nf">ReadFile</span><span class="p">(</span><span class="s">&#34;server-ca.pem&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 6</span><span class="cl"><span class="k">if</span> <span class="nx">err</span> <span class="o">!=</span> <span class="kc">nil</span> <span class="p">{</span> <span class="k">return</span> <span class="nx">err</span> <span class="p">}</span>
</span></span><span class="line"><span class="ln"> 7</span><span class="cl"><span class="nx">caCertPool</span> <span class="o">:=</span> <span class="nx">x509</span><span class="p">.</span><span class="nf">NewCertPool</span><span class="p">()</span>
</span></span><span class="line"><span class="ln"> 8</span><span class="cl"><span class="nx">caCertPool</span><span class="p">.</span><span class="nf">AppendCertsFromPEM</span><span class="p">(</span><span class="nx">caCert</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 9</span><span class="cl">
</span></span><span class="line"><span class="ln">10</span><span class="cl"><span class="nx">client</span> <span class="o">:=</span> <span class="o">&amp;</span><span class="nx">http</span><span class="p">.</span><span class="nx">Client</span><span class="p">{</span>
</span></span><span class="line"><span class="ln">11</span><span class="cl">    <span class="nx">Transport</span><span class="p">:</span> <span class="o">&amp;</span><span class="nx">http</span><span class="p">.</span><span class="nx">Transport</span><span class="p">{</span>
</span></span><span class="line"><span class="ln">12</span><span class="cl">        <span class="nx">TLSClientConfig</span><span class="p">:</span> <span class="o">&amp;</span><span class="nx">tls</span><span class="p">.</span><span class="nx">Config</span><span class="p">{</span>
</span></span><span class="line"><span class="ln">13</span><span class="cl">            <span class="nx">Certificates</span><span class="p">:</span> <span class="p">[]</span><span class="nx">tls</span><span class="p">.</span><span class="nx">Certificate</span><span class="p">{</span><span class="nx">cert</span><span class="p">},</span>
</span></span><span class="line"><span class="ln">14</span><span class="cl">            <span class="nx">RootCAs</span><span class="p">:</span>      <span class="nx">caCertPool</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">15</span><span class="cl">        <span class="p">},</span>
</span></span><span class="line"><span class="ln">16</span><span class="cl">    <span class="p">},</span>
</span></span><span class="line"><span class="ln">17</span><span class="cl"><span class="p">}</span>
</span></span><span class="line"><span class="ln">18</span><span class="cl"><span class="nx">resp</span><span class="p">,</span> <span class="nx">err</span> <span class="o">:=</span> <span class="nx">client</span><span class="p">.</span><span class="nf">Get</span><span class="p">(</span><span class="s">&#34;https://api.example.com/data&#34;</span><span class="p">)</span></span></span></code></pre></div><p>每個語言的 stdlib 都有對應 API、寫法大同小異。但 application 要自己處理 cert reload、過期、rotation — 比 service mesh 麻煩很多。</p>
<hr>
<h2 id="跟其他-layer-2-方案的成本比較">跟其他 Layer 2 方案的成本比較</h2>
<p>mTLS 在三層信任邊界的 Layer 2 是安全強度高、運維責任也重的選項。是否採用，要看威脅模型、合規要求、私鑰保護能力與自動化成熟度。</p>
<table>
  <thead>
      <tr>
          <th>方案</th>
          <th>安全等級</th>
          <th>運維成本</th>
          <th>適合</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td><strong>Shared Secret</strong></td>
          <td>低-中</td>
          <td>低</td>
          <td>純內部、低風險</td>
      </tr>
      <tr>
          <td><strong>API Key + HTTPS</strong></td>
          <td>中</td>
          <td>低</td>
          <td>一般 SaaS、對外 API</td>
      </tr>
      <tr>
          <td><strong>HMAC 簽章</strong></td>
          <td>中-高</td>
          <td>中</td>
          <td>需防 replay / tampering</td>
      </tr>
      <tr>
          <td><strong>OAuth Client Credentials</strong></td>
          <td>中-高</td>
          <td>中</td>
          <td>跨組織、需 short-lived token</td>
      </tr>
      <tr>
          <td><strong>mTLS</strong></td>
          <td>高</td>
          <td>高</td>
          <td>合規、零信任、私鑰可硬體保護</td>
      </tr>
  </tbody>
</table>
<h3 id="mtls-適合的場景">mTLS 適合的場景</h3>
<table>
  <thead>
      <tr>
          <th>場景</th>
          <th>為什麼 mTLS 適合</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>金融、醫療、政府合規要求</td>
          <td>合規條款直接要求 mTLS</td>
      </tr>
      <tr>
          <td>零信任網路（zero-trust）</td>
          <td>網路不可信、每個 hop 都要驗身分</td>
      </tr>
      <tr>
          <td>內部 service mesh（K8s + Istio）</td>
          <td>Mesh 自動處理、邊際成本低</td>
      </tr>
      <tr>
          <td>私鑰能放硬體（HSM / TPM / Secure Enclave）</td>
          <td>比 API Key 強得多</td>
      </tr>
      <tr>
          <td>高頻 service-to-service、API Key rotation 痛苦</td>
          <td>短 TTL cert 自動續發、不用人介入</td>
      </tr>
  </tbody>
</table>
<h3 id="mtls-成本偏高的場景">mTLS 成本偏高的場景</h3>
<table>
  <thead>
      <tr>
          <th>場景</th>
          <th>成本偏高的原因</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>對外開放給第三方 SDK</td>
          <td>第三方管理 cert 的門檻高、API Key + HTTPS 較易落地</td>
      </tr>
      <tr>
          <td>小規模、運維資源少</td>
          <td>PKI infra 維護成本超過安全增益</td>
      </tr>
      <tr>
          <td>純內部、不需強身分隔離</td>
          <td>Shared secret 已經夠用</td>
      </tr>
      <tr>
          <td>大量短連線 client（mobile app）</td>
          <td>Cert 散佈跟 rotation 複雜度高</td>
      </tr>
  </tbody>
</table>
<hr>
<h2 id="常見失敗模式">常見失敗模式</h2>
<h3 id="失敗-1忘記-intermediate-cachain-不完整">失敗 1：忘記 Intermediate CA、chain 不完整</h3>
<p><strong>症狀</strong>：server 設定看似正確、但 client 連線時報 <code>certificate verify failed</code>。</p>
<p><strong>根因</strong>：server 端只放了 leaf cert、沒附 Intermediate CA。Client 端只信任 Root、無法 chain 到 Root。</p>
<p><strong>緩解</strong>：server 端 <code>ssl_certificate</code> 要放<strong>完整 chain</strong>（leaf + intermediate、不含 root）：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">cat leaf.crt intermediate.crt &gt; chain.crt
</span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="c1"># nginx 用 chain.crt 而非單獨 leaf.crt</span></span></span></code></pre></div><h3 id="失敗-2cert-過期造成連線中斷">失敗 2：Cert 過期造成連線中斷</h3>
<p><strong>症狀</strong>：cert <code>notAfter</code> 過了、所有 client 突然連不上。</p>
<p><strong>緩解</strong>：</p>
<ul>
<li>監控 cert 過期時間、提前 30 天告警、提前 7 天緊急告警</li>
<li>用自動續發機制（cert-manager / step-ca / ACME）</li>
<li>過期防護應由系統監控與自動續發承擔，而不是依賴人工記憶</li>
</ul>
<h3 id="失敗-3私鑰權限過寬被同機其他-user-讀走">失敗 3：私鑰權限過寬、被同機其他 user 讀走</h3>
<p><strong>症狀</strong>：security audit 發現 <code>/etc/ssl/private/server.key</code> 是 644、所有 user 可讀。</p>
<p><strong>緩解</strong>：</p>
<ul>
<li>Private key 一律 <code>chmod 600</code>、owner <code>root</code> 或 application user</li>
<li>用 systemd 跑的 service、private key 放 <code>LoadCredential=</code> 而非 file path</li>
<li>定期 audit <code>/etc/ssl/</code> 權限</li>
</ul>
<h3 id="失敗-4撤銷後-cert-仍能用">失敗 4：撤銷後 cert 仍能用</h3>
<p><strong>症狀</strong>：cert 撤銷了、但 client 還能連上。</p>
<p><strong>根因</strong>：</p>
<ul>
<li>CRL 設定但 server 沒 enable CRL check</li>
<li>OCSP 設定但 client 沒 query</li>
<li>用 short-lived cert 但 TTL 太長、撤銷窗不可接受</li>
</ul>
<p><strong>緩解</strong>：撤銷機制要<strong>端到端測試</strong>、不只「設定上有」、要驗證「實際生效」。</p>
<h3 id="失敗-5service-mesh-upgrade-後-mtls-中斷">失敗 5：Service mesh upgrade 後 mTLS 中斷</h3>
<p><strong>症狀</strong>：Istio 升級後、cluster 內部分 service 互相連不上。</p>
<p><strong>根因</strong>：mesh control plane 的 CA 換了、舊 cert chain 不通。</p>
<p><strong>緩解</strong>：</p>
<ul>
<li>Mesh upgrade 走 staged rollout，分批驗證 cert chain</li>
<li>Mesh 提供的 CA migration 流程要完整執行</li>
<li>Staging 環境先跑升級流程</li>
</ul>
<hr>
<h2 id="收尾">收尾</h2>
<p>mTLS 是「<strong>用 PKI 換掉 secret 管理</strong>」的設計 — 私鑰不離 client、身分綁在 X.509 cert 上、不依賴可重用的字串。安全等級高、但代價是要建立 CA infrastructure、處理 cert 生命週期、整合到各種基礎設施。</p>
<p>幾個核心判斷：</p>
<ol>
<li><strong>CA 分層是基本盤</strong> — Root + Intermediate + Leaf，讓最高信任根維持低暴露</li>
<li><strong>私鑰留在產生端</strong> — CA 只簽 CSR、不碰 private key</li>
<li><strong>撤銷方案要實證可用</strong> — CRL / OCSP / Short-lived 三選一，並驗證實際生效</li>
<li><strong>Service mesh 是 cloud-native 的低成本入口</strong> — Istio / Linkerd 把 mTLS 變成基礎設施，application 改動較小</li>
<li><strong>mTLS 是高責任方案</strong> — 對外開放 API、小規模、無 mesh 場景，OAuth / API Key 往往更容易維運</li>
</ol>
<p>延伸閱讀：</p>
<ul>
<li><a href="/blog/work-log/api-%E8%AA%8D%E8%AD%89%E7%9A%84%E4%B8%89%E5%B1%A4%E4%BF%A1%E4%BB%BB%E9%82%8A%E7%95%8C%E4%BD%BF%E7%94%A8%E8%80%85%E7%B3%BB%E7%B5%B1%E8%B7%A8%E7%B3%BB%E7%B5%B1-provisioning/" data-link-title="API 認證的三層信任邊界：使用者、系統、跨系統 Provisioning" data-link-desc="API 認證的信任邊界分層（Bearer Token / Shared Secret / Provisioning）：各層的洩漏後果與撤銷方式，以及混用造成的設計失效模式。">API 認證的三層信任邊界</a> — 本文的主篇、mTLS 在「Layer 2 系統層」的位置</li>
<li><a href="/blog/work-log/shared-secret-%E5%AE%89%E5%85%A8%E8%BC%AA%E6%9B%BF%E8%A8%AD%E8%A8%88%E9%9B%99%E5%AF%86%E9%81%8E%E6%B8%A1%E6%9C%9F%E8%87%AA%E5%8B%95%E5%8C%96%E8%88%87%E7%B7%8A%E6%80%A5%E6%B5%81%E7%A8%8B/" data-link-title="Shared Secret 安全輪替設計：雙密過渡期、自動化與緊急流程" data-link-desc="系統間 Shared Secret 輪替的核心機制：dual-secret rollover、自動化工具比較（AWS Secrets Manager / Vault / GCP）、緊急 rotation 流程與多 client 環境的失敗模式。">Shared Secret 安全輪替設計</a> — 不用 mTLS 走 secret-based 認證的對應 lifecycle 問題</li>
<li><a href="/blog/work-log/laravel-sanctum-%E7%9A%84-bearer-token-%E8%A8%AD%E8%A8%88%E5%89%96%E6%9E%90pksecret-%E7%82%BA%E4%BB%80%E9%BA%BC%E9%80%99%E6%A8%A3%E8%A8%AD%E8%A8%88/" data-link-title="Laravel Sanctum 的 Bearer Token 設計剖析：{PK}|{secret} 為什麼這樣設計" data-link-desc="Laravel Sanctum `{PK}|{secret}` 格式的設計理由、hash 儲存取捨、constant-time 比對位置，以及跟 GitHub PAT、Stripe API Key 的差異。">Laravel Sanctum 的 Bearer Token 設計剖析</a> — Layer 1 使用者層的 token 機制、跟 mTLS 解的問題不同</li>
</ul>
]]></content:encoded></item></channel></rss>