<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/"><channel><title>Vault on Tarragon</title><link>https://tarrragon.github.io/blog/tags/vault/</link><description>Recent content in Vault on Tarragon</description><generator>Hugo -- gohugo.io</generator><language>zh-TW</language><copyright>Tarragon (CC BY 4.0)</copyright><lastBuildDate>Fri, 26 Jun 2026 00:00:00 +0000</lastBuildDate><atom:link href="https://tarrragon.github.io/blog/tags/vault/index.xml" rel="self" type="application/rss+xml"/><item><title>斷網環境的基礎服務：DNS、NTP、CA 與 Secret Management</title><link>https://tarrragon.github.io/blog/infra/air-gapped/air-gapped-infrastructure-services/</link><pubDate>Fri, 26 Jun 2026 00:00:00 +0000</pubDate><guid>https://tarrragon.github.io/blog/infra/air-gapped/air-gapped-infrastructure-services/</guid><description>&lt;p>斷網環境裡的 GitLab、&lt;a href="https://tarrragon.github.io/blog/infra/knowledge-cards/harbor/" data-link-title="Harbor" data-link-desc="開源的 container image registry，支援映像掃描、RBAC、複製，斷網環境取代 Docker Hub 的方案">Harbor&lt;/a>、&lt;a href="https://tarrragon.github.io/blog/infra/knowledge-cards/prometheus/" data-link-title="Prometheus" data-link-desc="開源的 metrics 收集與告警系統，用 pull 模式從 target 拉取指標，斷網環境的預設監控方案">Prometheus&lt;/a>、Nexus 都有一個共同前提：它們需要名稱解析（&lt;a href="https://tarrragon.github.io/blog/infra/knowledge-cards/dns/" data-link-title="DNS" data-link-desc="Domain Name System — 把域名轉成 IP 位址的系統，以及 A record、CNAME、NS、TTL 的角色">DNS&lt;/a>）才能互相找到、需要時間同步（NTP）才能讓 log 和憑證有效、需要 &lt;a href="https://tarrragon.github.io/blog/infra/knowledge-cards/ssl-tls/" data-link-title="SSL / TLS" data-link-desc="加密 client 與 server 之間通訊的協定，讓 HTTPS 成為可能。TLS 是 SSL 的後繼者，但 SSL 憑證的稱呼仍廣泛使用">TLS&lt;/a> 憑證（CA）才能走 HTTPS、需要機密儲存（&lt;a href="https://tarrragon.github.io/blog/infra/knowledge-cards/vault/" data-link-title="HashiCorp Vault" data-link-desc="機密管理系統，集中存放密碼、API key、TLS 私鑰，提供存取控制、稽核和自動輪替">Vault&lt;/a>）才能安全管理密碼和 token。這四個是「服務的服務」——沒有它們，其他自建服務要麼無法啟動、要麼只能用不安全的 HTTP 明文通訊。&lt;/p>
&lt;h2 id="internal-dns內部名稱解析">Internal DNS：內部名稱解析&lt;/h2>
&lt;p>斷網環境沒有公開 DNS 可用。內部服務之間的互相引用（GitLab 連 PostgreSQL、Harbor 連 storage backend）如果靠 IP 位址，每次 IP 變動都要改一輪設定。內部 DNS 讓服務用 hostname（&lt;code>gitlab.internal&lt;/code>、&lt;code>harbor.internal&lt;/code>）互相引用，IP 變動只改 DNS zone 一處。&lt;/p>
&lt;h3 id="coredns-vs-bind">CoreDNS vs BIND&lt;/h3>
&lt;table>
 &lt;thead>
 &lt;tr>
 &lt;th>面向&lt;/th>
 &lt;th>CoreDNS&lt;/th>
 &lt;th>BIND&lt;/th>
 &lt;/tr>
 &lt;/thead>
 &lt;tbody>
 &lt;tr>
 &lt;td>設定方式&lt;/td>
 &lt;td>Corefile（宣告式、短）&lt;/td>
 &lt;td>named.conf（傳統、長）&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>部署方式&lt;/td>
 &lt;td>單一 binary / container&lt;/td>
 &lt;td>系統套件&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>適合情境&lt;/td>
 &lt;td>Kubernetes 原生整合、輕量&lt;/td>
 &lt;td>複雜 DNS 需求（split-horizon、DNSSEC）&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>學習曲線&lt;/td>
 &lt;td>低&lt;/td>
 &lt;td>中高&lt;/td>
 &lt;/tr>
 &lt;/tbody>
&lt;/table>
&lt;p>多數斷網環境用 CoreDNS 就夠——zone 檔案放在磁碟上、Corefile 幾行就能啟動。&lt;/p>
&lt;h3 id="最小設定">最小設定&lt;/h3>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-text" data-lang="text">&lt;span class="line">&lt;span class="ln"> 1&lt;/span>&lt;span class="cl"># Corefile
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 2&lt;/span>&lt;span class="cl">internal:53 {
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 3&lt;/span>&lt;span class="cl"> file /etc/coredns/zones/internal.zone
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 4&lt;/span>&lt;span class="cl"> log
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 5&lt;/span>&lt;span class="cl"> errors
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 6&lt;/span>&lt;span class="cl">}
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 7&lt;/span>&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 8&lt;/span>&lt;span class="cl">.:53 {
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 9&lt;/span>&lt;span class="cl"> forward . /dev/null
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">10&lt;/span>&lt;span class="cl"> log
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">11&lt;/span>&lt;span class="cl">}&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>第一個 block 處理 &lt;code>internal&lt;/code> 域名的查詢、從 zone 檔案回應。第二個 block 攔截所有其他查詢——斷網環境不能轉發到上游 DNS，&lt;code>forward . /dev/null&lt;/code> 讓非內部域名直接返回 NXDOMAIN 而非 timeout。&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-text" data-lang="text">&lt;span class="line">&lt;span class="ln"> 1&lt;/span>&lt;span class="cl">; /etc/coredns/zones/internal.zone
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 2&lt;/span>&lt;span class="cl">$ORIGIN internal.
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 3&lt;/span>&lt;span class="cl">@ IN SOA ns1.internal. admin.internal. (
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 4&lt;/span>&lt;span class="cl"> 2026062601 ; serial
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 5&lt;/span>&lt;span class="cl"> 3600 ; refresh
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 6&lt;/span>&lt;span class="cl"> 600 ; retry
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 7&lt;/span>&lt;span class="cl"> 86400 ; expire
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 8&lt;/span>&lt;span class="cl"> 60 ; minimum
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 9&lt;/span>&lt;span class="cl">)
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">10&lt;/span>&lt;span class="cl"> IN NS ns1.internal.
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">11&lt;/span>&lt;span class="cl">ns1 IN A 10.0.1.10
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">12&lt;/span>&lt;span class="cl">gitlab IN A 10.0.1.20
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">13&lt;/span>&lt;span class="cl">harbor IN A 10.0.1.21
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">14&lt;/span>&lt;span class="cl">vault IN A 10.0.1.22
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">15&lt;/span>&lt;span class="cl">nexus IN A 10.0.1.23
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">16&lt;/span>&lt;span class="cl">prom IN A 10.0.1.24
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">17&lt;/span>&lt;span class="cl">grafana IN A 10.0.1.25
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">18&lt;/span>&lt;span class="cl">ntp IN A 10.0.1.11&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>新增服務時加一行 A record、重載 CoreDNS（&lt;code>kill -SIGUSR1 $(pidof coredns)&lt;/code> 或重啟 container）。serial 號遞增讓變更可追蹤。&lt;/p></description><content:encoded><![CDATA[<p>斷網環境裡的 GitLab、<a href="/blog/infra/knowledge-cards/harbor/" data-link-title="Harbor" data-link-desc="開源的 container image registry，支援映像掃描、RBAC、複製，斷網環境取代 Docker Hub 的方案">Harbor</a>、<a href="/blog/infra/knowledge-cards/prometheus/" data-link-title="Prometheus" data-link-desc="開源的 metrics 收集與告警系統，用 pull 模式從 target 拉取指標，斷網環境的預設監控方案">Prometheus</a>、Nexus 都有一個共同前提：它們需要名稱解析（<a href="/blog/infra/knowledge-cards/dns/" data-link-title="DNS" data-link-desc="Domain Name System — 把域名轉成 IP 位址的系統，以及 A record、CNAME、NS、TTL 的角色">DNS</a>）才能互相找到、需要時間同步（NTP）才能讓 log 和憑證有效、需要 <a href="/blog/infra/knowledge-cards/ssl-tls/" data-link-title="SSL / TLS" data-link-desc="加密 client 與 server 之間通訊的協定，讓 HTTPS 成為可能。TLS 是 SSL 的後繼者，但 SSL 憑證的稱呼仍廣泛使用">TLS</a> 憑證（CA）才能走 HTTPS、需要機密儲存（<a href="/blog/infra/knowledge-cards/vault/" data-link-title="HashiCorp Vault" data-link-desc="機密管理系統，集中存放密碼、API key、TLS 私鑰，提供存取控制、稽核和自動輪替">Vault</a>）才能安全管理密碼和 token。這四個是「服務的服務」——沒有它們，其他自建服務要麼無法啟動、要麼只能用不安全的 HTTP 明文通訊。</p>
<h2 id="internal-dns內部名稱解析">Internal DNS：內部名稱解析</h2>
<p>斷網環境沒有公開 DNS 可用。內部服務之間的互相引用（GitLab 連 PostgreSQL、Harbor 連 storage backend）如果靠 IP 位址，每次 IP 變動都要改一輪設定。內部 DNS 讓服務用 hostname（<code>gitlab.internal</code>、<code>harbor.internal</code>）互相引用，IP 變動只改 DNS zone 一處。</p>
<h3 id="coredns-vs-bind">CoreDNS vs BIND</h3>
<table>
  <thead>
      <tr>
          <th>面向</th>
          <th>CoreDNS</th>
          <th>BIND</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>設定方式</td>
          <td>Corefile（宣告式、短）</td>
          <td>named.conf（傳統、長）</td>
      </tr>
      <tr>
          <td>部署方式</td>
          <td>單一 binary / container</td>
          <td>系統套件</td>
      </tr>
      <tr>
          <td>適合情境</td>
          <td>Kubernetes 原生整合、輕量</td>
          <td>複雜 DNS 需求（split-horizon、DNSSEC）</td>
      </tr>
      <tr>
          <td>學習曲線</td>
          <td>低</td>
          <td>中高</td>
      </tr>
  </tbody>
</table>
<p>多數斷網環境用 CoreDNS 就夠——zone 檔案放在磁碟上、Corefile 幾行就能啟動。</p>
<h3 id="最小設定">最小設定</h3>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="ln"> 1</span><span class="cl"># Corefile
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">internal:53 {
</span></span><span class="line"><span class="ln"> 3</span><span class="cl">    file /etc/coredns/zones/internal.zone
</span></span><span class="line"><span class="ln"> 4</span><span class="cl">    log
</span></span><span class="line"><span class="ln"> 5</span><span class="cl">    errors
</span></span><span class="line"><span class="ln"> 6</span><span class="cl">}
</span></span><span class="line"><span class="ln"> 7</span><span class="cl">
</span></span><span class="line"><span class="ln"> 8</span><span class="cl">.:53 {
</span></span><span class="line"><span class="ln"> 9</span><span class="cl">    forward . /dev/null
</span></span><span class="line"><span class="ln">10</span><span class="cl">    log
</span></span><span class="line"><span class="ln">11</span><span class="cl">}</span></span></code></pre></div><p>第一個 block 處理 <code>internal</code> 域名的查詢、從 zone 檔案回應。第二個 block 攔截所有其他查詢——斷網環境不能轉發到上游 DNS，<code>forward . /dev/null</code> 讓非內部域名直接返回 NXDOMAIN 而非 timeout。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="ln"> 1</span><span class="cl">; /etc/coredns/zones/internal.zone
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">$ORIGIN internal.
</span></span><span class="line"><span class="ln"> 3</span><span class="cl">@       IN SOA  ns1.internal. admin.internal. (
</span></span><span class="line"><span class="ln"> 4</span><span class="cl">        2026062601 ; serial
</span></span><span class="line"><span class="ln"> 5</span><span class="cl">        3600       ; refresh
</span></span><span class="line"><span class="ln"> 6</span><span class="cl">        600        ; retry
</span></span><span class="line"><span class="ln"> 7</span><span class="cl">        86400      ; expire
</span></span><span class="line"><span class="ln"> 8</span><span class="cl">        60         ; minimum
</span></span><span class="line"><span class="ln"> 9</span><span class="cl">)
</span></span><span class="line"><span class="ln">10</span><span class="cl">        IN NS   ns1.internal.
</span></span><span class="line"><span class="ln">11</span><span class="cl">ns1     IN A    10.0.1.10
</span></span><span class="line"><span class="ln">12</span><span class="cl">gitlab  IN A    10.0.1.20
</span></span><span class="line"><span class="ln">13</span><span class="cl">harbor  IN A    10.0.1.21
</span></span><span class="line"><span class="ln">14</span><span class="cl">vault   IN A    10.0.1.22
</span></span><span class="line"><span class="ln">15</span><span class="cl">nexus   IN A    10.0.1.23
</span></span><span class="line"><span class="ln">16</span><span class="cl">prom    IN A    10.0.1.24
</span></span><span class="line"><span class="ln">17</span><span class="cl">grafana IN A    10.0.1.25
</span></span><span class="line"><span class="ln">18</span><span class="cl">ntp     IN A    10.0.1.11</span></span></code></pre></div><p>新增服務時加一行 A record、重載 CoreDNS（<code>kill -SIGUSR1 $(pidof coredns)</code> 或重啟 container）。serial 號遞增讓變更可追蹤。</p>
<h3 id="客戶端設定">客戶端設定</h3>
<p>每台機器的 <code>/etc/resolv.conf</code> 指向 CoreDNS 的 IP：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="ln">1</span><span class="cl">nameserver 10.0.1.10
</span></span><span class="line"><span class="ln">2</span><span class="cl">search internal</span></span></code></pre></div><p>如果環境有 DHCP server，在 DHCP option 裡配 DNS server 位址，新加入的機器自動取得。沒有 DHCP 就靠 provisioning 腳本或 Ansible playbook 推送。</p>
<h2 id="ntp內部時間同步">NTP：內部時間同步</h2>
<p>時間不同步在斷網環境會引發三類問題：log 的時間戳錯亂讓事故排查無法跨機器對齊、TLS 憑證的有效期判斷出錯導致合法憑證被拒絕、以及 Kerberos 等時間敏感的認證協定直接失敗。正常環境從 <code>pool.ntp.org</code> 取得時間，斷網環境需要自己的時間源。</p>
<h3 id="chrony-作為-ntp-server">chrony 作為 NTP server</h3>
<p>chrony 比傳統的 ntpd 更適合網路不穩或隔離的環境——它的時鐘修正演算法在長時間無外部時間源時仍能保持較準確的漂移補償。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl"><span class="c1"># /etc/chrony.conf（NTP server 端）</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="c1"># 斷網環境：沒有上游 NTP、用本機時鐘作為最後手段</span>
</span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="nb">local</span> stratum <span class="m">10</span>
</span></span><span class="line"><span class="ln">4</span><span class="cl">allow 10.0.0.0/8
</span></span><span class="line"><span class="ln">5</span><span class="cl">driftfile /var/lib/chrony/drift</span></span></code></pre></div><p><code>local stratum 10</code> 宣告「我自己是時間源、但 stratum 很低（精度不高）」。其他機器的 chrony 設定指向這台 server：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl"><span class="c1"># /etc/chrony.conf（客戶端）</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl">server ntp.internal iburst
</span></span><span class="line"><span class="ln">3</span><span class="cl">makestep 1.0 <span class="m">3</span></span></span></code></pre></div><p><code>iburst</code> 讓開機時快速同步、<code>makestep 1.0 3</code> 允許前三次校正時跳大步（修正啟動時的大偏差）。</p>
<h3 id="高精度需求">高精度需求</h3>
<p>如果環境對時間精度有要求（金融交易、工控系統），NTP server 需要硬體時間源——GPS 接收器或原子鐘模組。GPS 天線不需要網路連線、只需要看得到衛星的位置（屋頂或窗邊）。chrony 支援 PPS（Pulse Per Second）輸入、可以達到微秒級精度。</p>
<p>多數斷網環境不需要這個精度——毫秒級一致（chrony 預設行為）對 log 對齊和 TLS 驗證已經足夠。</p>
<h2 id="internal-ca內部憑證簽發">Internal CA：內部憑證簽發</h2>
<p>斷網環境的每個內部 HTTPS 服務都需要 TLS 憑證。Let&rsquo;s Encrypt 的 ACME challenge 需要連網驗證，在斷網環境無法使用。替代方案是建立內部 CA（Certificate Authority），自己簽發憑證。</p>
<h3 id="step-casmallstep">step-ca（Smallstep）</h3>
<p>step-ca 是一個輕量的 CA server，支援 ACME 協定——內部服務可以用跟 Let&rsquo;s Encrypt 相同的流程自動申請和續期憑證，只是 ACME server 是內網的 step-ca 而非 Let&rsquo;s Encrypt。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl"><span class="c1"># 初始化 CA</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl">step ca init --name<span class="o">=</span><span class="s2">&#34;Internal CA&#34;</span> --dns<span class="o">=</span><span class="s2">&#34;ca.internal&#34;</span> <span class="se">\
</span></span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="se"></span>  --address<span class="o">=</span><span class="s2">&#34;:443&#34;</span> --provisioner<span class="o">=</span><span class="s2">&#34;admin&#34;</span>
</span></span><span class="line"><span class="ln">4</span><span class="cl">
</span></span><span class="line"><span class="ln">5</span><span class="cl"><span class="c1"># 啟動 CA server</span>
</span></span><span class="line"><span class="ln">6</span><span class="cl">step-ca <span class="k">$(</span>step path<span class="k">)</span>/config/ca.json</span></span></code></pre></div><p>初始化會產生 root CA 和 intermediate CA 的 key pair。root CA 的私鑰是整個信任鏈的根——它的保護等級要最高（離線儲存、存取紀錄）。</p>
<h3 id="憑證簽發流程">憑證簽發流程</h3>
<p>服務用 ACME client 向 step-ca 申請憑證：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl"><span class="c1"># 用 step CLI 申請憑證（手動方式）</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl">step ca certificate <span class="s2">&#34;gitlab.internal&#34;</span> gitlab.crt gitlab.key
</span></span><span class="line"><span class="ln">3</span><span class="cl">
</span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="c1"># 用 ACME 自動續期（搭配 certbot 或 step 的 renewal daemon）</span>
</span></span><span class="line"><span class="ln">5</span><span class="cl">step ca renew --daemon gitlab.crt gitlab.key</span></span></code></pre></div><p>certbot 也能配合 step-ca 使用——把 ACME server URL 從 Let&rsquo;s Encrypt 改成 <code>https://ca.internal/acme/acme/directory</code>。已有 certbot 自動續期腳本的服務只要改一行設定。</p>
<h3 id="root-ca-分發">Root CA 分發</h3>
<p>每台機器和每個服務都要信任內部 CA 的 root certificate：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl"><span class="c1"># Debian/Ubuntu</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl">cp root_ca.crt /usr/local/share/ca-certificates/internal-ca.crt
</span></span><span class="line"><span class="ln">3</span><span class="cl">update-ca-certificates
</span></span><span class="line"><span class="ln">4</span><span class="cl">
</span></span><span class="line"><span class="ln">5</span><span class="cl"><span class="c1"># RHEL/CentOS</span>
</span></span><span class="line"><span class="ln">6</span><span class="cl">cp root_ca.crt /etc/pki/ca-trust/source/anchors/internal-ca.crt
</span></span><span class="line"><span class="ln">7</span><span class="cl">update-ca-trust</span></span></code></pre></div><p>Docker daemon 也需要信任內部 CA（否則 <code>docker pull harbor.internal/image</code> 會報 TLS 錯誤）：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">mkdir -p /etc/docker/certs.d/harbor.internal
</span></span><span class="line"><span class="ln">2</span><span class="cl">cp root_ca.crt /etc/docker/certs.d/harbor.internal/ca.crt
</span></span><span class="line"><span class="ln">3</span><span class="cl">systemctl restart docker</span></span></code></pre></div><p>Ansible playbook 批量推送 root CA 到所有機器，是初始部署的標準做法。</p>
<h3 id="cfssl-作為替代">cfssl 作為替代</h3>
<p>cfssl（Cloudflare 的 PKI 工具組）比 step-ca 更簡單但沒有 ACME 自動化——每張憑證要手動簽發。適合只有 5-10 個服務、不需要自動續期的小規模環境。</p>
<h2 id="secret-managementhashicorp-vault">Secret Management：HashiCorp Vault</h2>
<p>資料庫密碼、API token、TLS 私鑰這些機密值需要一個集中的安全儲存。斷網環境不能用 AWS Secrets Manager 或 GCP Secret Manager，HashiCorp Vault 是最常見的自建選項。</p>
<h3 id="斷網環境的-vault-初始化">斷網環境的 Vault 初始化</h3>
<p>Vault 的初始化（unsealing）在雲端環境通常用 AWS KMS 或 GCP Cloud KMS 自動 unseal。斷網環境沒有雲端 KMS，退回 Shamir&rsquo;s Secret Sharing——初始化時產生 N 個 unseal key、啟動時需要 M 個 key 才能解鎖（典型設定：5 個 key、3 個即可 unseal）。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl"><span class="c1"># 初始化 Vault（5 key shares、3 threshold）</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl">vault operator init -key-shares<span class="o">=</span><span class="m">5</span> -key-threshold<span class="o">=</span><span class="m">3</span>
</span></span><span class="line"><span class="ln">3</span><span class="cl">
</span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="c1"># Unseal（需要 3 次、每次用不同的 key）</span>
</span></span><span class="line"><span class="ln">5</span><span class="cl">vault operator unseal &lt;key-1&gt;
</span></span><span class="line"><span class="ln">6</span><span class="cl">vault operator unseal &lt;key-2&gt;
</span></span><span class="line"><span class="ln">7</span><span class="cl">vault operator unseal &lt;key-3&gt;</span></span></code></pre></div><p>5 個 unseal key 分別交給不同的人保管。任何單一個人都無法獨自解鎖 Vault——這是刻意的安全設計。Vault 重啟後需要重新 unseal，所以 unseal key 的保管和取用流程要事先演練。</p>
<h3 id="機器身分認證">機器身分認證</h3>
<p>服務從 Vault 讀取 secret 時需要認證自己的身分。雲端環境用 IAM role，斷網環境用 AppRole——每個服務拿到一組 role_id + secret_id、用它們換取短期 token。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="c1"># 建立 AppRole</span>
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">vault auth <span class="nb">enable</span> approle
</span></span><span class="line"><span class="ln"> 3</span><span class="cl">vault write auth/approle/role/gitlab <span class="se">\
</span></span></span><span class="line"><span class="ln"> 4</span><span class="cl"><span class="se"></span>  <span class="nv">token_ttl</span><span class="o">=</span>1h <span class="se">\
</span></span></span><span class="line"><span class="ln"> 5</span><span class="cl"><span class="se"></span>  <span class="nv">token_max_ttl</span><span class="o">=</span>4h <span class="se">\
</span></span></span><span class="line"><span class="ln"> 6</span><span class="cl"><span class="se"></span>  <span class="nv">policies</span><span class="o">=</span>gitlab-secrets
</span></span><span class="line"><span class="ln"> 7</span><span class="cl">
</span></span><span class="line"><span class="ln"> 8</span><span class="cl"><span class="c1"># 服務端取得 token</span>
</span></span><span class="line"><span class="ln"> 9</span><span class="cl">vault write auth/approle/login <span class="se">\
</span></span></span><span class="line"><span class="ln">10</span><span class="cl"><span class="se"></span>  <span class="nv">role_id</span><span class="o">=</span><span class="s2">&#34;</span><span class="nv">$ROLE_ID</span><span class="s2">&#34;</span> <span class="se">\
</span></span></span><span class="line"><span class="ln">11</span><span class="cl"><span class="se"></span>  <span class="nv">secret_id</span><span class="o">=</span><span class="s2">&#34;</span><span class="nv">$SECRET_ID</span><span class="s2">&#34;</span></span></span></code></pre></div><p>secret_id 本身也是 secret——初次部署時由 Vault admin 手動提供給服務、或透過 Ansible 的 encrypted variable 推送。</p>
<h3 id="儲存後端">儲存後端</h3>
<p>Vault 需要一個持久化的儲存後端。雲端用 DynamoDB 或 Consul，斷網環境用：</p>
<table>
  <thead>
      <tr>
          <th>後端</th>
          <th>適用情境</th>
          <th>特性</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>檔案系統</td>
          <td>單節點、小規模</td>
          <td>最簡單、但沒有 HA</td>
      </tr>
      <tr>
          <td>PostgreSQL</td>
          <td>已有 PostgreSQL 的環境</td>
          <td>利用現有基礎設施</td>
      </tr>
      <tr>
          <td>Consul</td>
          <td>需要 HA 的環境</td>
          <td>Vault + Consul 是官方推薦的 HA 組合</td>
      </tr>
  </tbody>
</table>
<h2 id="部署順序的相互依賴">部署順序的相互依賴</h2>
<p>四個服務之間有依賴鏈：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="ln">1</span><span class="cl">DNS → NTP → CA → Vault
</span></span><span class="line"><span class="ln">2</span><span class="cl"> ↑_________________↓（Vault 的 FQDN 要 DNS 解析）</span></span></code></pre></div><p>DNS 先啟動（其他服務靠它解析 hostname）→ NTP 跟著（CA 簽發憑證時需要準確的時間、否則 notBefore/notAfter 判斷會出問題）→ CA 啟動（Vault 的 HTTPS 需要 TLS 憑證）→ Vault 最後（依賴 DNS 和 TLS）。</p>
<p>DNS 跟 CA 之間有一個循環依賴：CA 簽發憑證時需要 DNS 解析（ACME challenge 或 CSR 裡的 SAN），但 DNS server 本身要不要 TLS？解法是 DNS 第一次啟動時用明文（不走 HTTPS），CA 啟動後回頭替 DNS 簽一張憑證、再切到 DNS-over-TLS。多數內網環境 DNS 維持明文即可——DNS 查詢在內網不加密是常見做法，風險可控。</p>
<h2 id="時程與維護">時程與維護</h2>
<table>
  <thead>
      <tr>
          <th>服務</th>
          <th>初始部署</th>
          <th>持續維護</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>CoreDNS</td>
          <td>2-4 小時</td>
          <td>新增服務時加 zone record（分鐘級）</td>
      </tr>
      <tr>
          <td>chrony</td>
          <td>1-2 小時</td>
          <td>幾乎不需要（漂移補償自動運作）</td>
      </tr>
      <tr>
          <td>step-ca</td>
          <td>3-4 小時</td>
          <td>憑證到期前的監控和續期（自動化後接近零）</td>
      </tr>
      <tr>
          <td>Vault</td>
          <td>4-8 小時</td>
          <td>unseal key 管理、policy 更新、備份</td>
      </tr>
  </tbody>
</table>
<p>四個服務合計約 1.5-2 個工作天完成初始部署。部署完成後的日常維護負擔集中在 Vault（unseal key 管理和 policy 維護）和 DNS zone 更新。CA 的憑證續期如果用 ACME 自動化就接近零維護。</p>
<p>向管理層溝通時的框架：「這四個服務是所有其他服務的地基——沒有它們，其他服務要麼找不到彼此（DNS）、時間對不上（NTP）、通訊不加密（CA）、密碼寫在設定檔裡（Vault）。部署一次、之後幾乎自動運作。」</p>
<h2 id="跨分類引用">跨分類引用</h2>
<ul>
<li>→ <a href="/blog/infra/air-gapped/air-gapped-principles/" data-link-title="斷網環境的通用原則" data-link-desc="離線套件管理、內容搬運、變更追蹤的共通操作模式 — 所有斷網情境都要先建立的基礎能力">斷網環境的通用原則</a>：content ferry 和離線套件管理的通用操作模式</li>
<li>→ <a href="/blog/infra/air-gapped/air-gapped-iac/" data-link-title="斷網環境的 IaC" data-link-desc="Terraform provider mirror、離線 plugin cache、本地 state backend、沒有雲端時的 plan/apply 流程與內網 CI">斷網環境的 IaC</a>：Vault 作為 Terraform 的 secret backend</li>
<li>→ <a href="/blog/infra/air-gapped/air-gapped-container/" data-link-title="斷網環境的容器與映像管理" data-link-desc="Private registry 架設、映像搬運（docker save/load、skopeo）、base image 更新週期、離線漏洞掃描">斷網環境的容器與映像管理</a>：Harbor 依賴 DNS 和 TLS、映像拉取需要信任內部 CA</li>
<li>→ <a href="/blog/infra/02-identity-credentials/" data-link-title="模組二：身分與憑證地基 — IAM 與 OIDC" data-link-desc="IAM role / policy 設計、最小權限，以及用 OIDC 短期憑證取代長期 access key">模組二：身分與憑證地基</a>：Vault 的角色跟雲端的 Secrets Manager 對應</li>
<li>→ <a href="/blog/infra/08-governance-habits/" data-link-title="模組八：治理好習慣 — 規模長大後不失控的最小節奏" data-link-desc="tagging 規範、secrets 不進 code、成本可見性、最小可行節奏，規模長大後不失控">模組八：治理好習慣</a>：Secret 不進 code 的原則在斷網環境用 Vault 落地</li>
</ul>
]]></content:encoded></item><item><title>HashiCorp Vault Dynamic Credential：lease 治理跟 application 整合的實作層</title><link>https://tarrragon.github.io/blog/backend/07-security-data-protection/vendors/hashicorp-vault/dynamic-credential/</link><pubDate>Mon, 18 May 2026 00:00:00 +0000</pubDate><guid>https://tarrragon.github.io/blog/backend/07-security-data-protection/vendors/hashicorp-vault/dynamic-credential/</guid><description>&lt;blockquote>
&lt;p>本文是 &lt;a href="https://tarrragon.github.io/blog/backend/07-security-data-protection/vendors/hashicorp-vault/" data-link-title="HashiCorp Vault" data-link-desc="Self-hosted secret management 與 dynamic credential / encryption-as-a-service / PKI engine、跨雲跨環境的 secret 控制面">HashiCorp Vault&lt;/a> overview 的 implementation-layer deep article。Overview 已說明 Vault 在 secrets / credentials 治理譜系的定位（跟 cloud-native secrets manager / cert-manager 的取捨）、本文聚焦 &lt;em>dynamic credential engine&lt;/em> 的實作層：怎麼配 database engine、application 怎麼 renew lease、production 踩過哪些坑、跟 cloud-native vault 跟 vault-agent injector 怎麼整合。&lt;/p>&lt;/blockquote>
&lt;h2 id="問題情境">問題情境&lt;/h2>
&lt;p>Long-lived database credential 寫進 application config 是 production 環境最常見的 secret hygiene 失敗：credential 一旦外洩、輪替成本是 &lt;em>跨團隊協調 + 多服務同步重啟&lt;/em>、實務上半年才換一次、credential 在 git history / log / dump file 留下軌跡。動態憑證（dynamic credential）的核心承諾是 &lt;em>credential 生命週期跟 application session 對齊&lt;/em>、用完就 revoke、外洩窗口從幾個月縮到幾分鐘。&lt;/p>
&lt;p>但 dynamic credential 不是「換個 SDK 就好」、它把 &lt;em>credential 治理&lt;/em> 從 secret rotation 問題轉成 &lt;em>lease lifecycle&lt;/em> 問題。lease TTL 設多久、renewal 怎麼跑、DB 端 user 創建會不會撞 max_connections、Vault sealed 時 application 怎麼降級 — 每個都是 production-grade 議題、無法靠 vendor doc 預設值直接上線。&lt;/p>
&lt;h2 id="核心概念lease-lifecycle-跟-secrets-engine-模型">核心概念：lease lifecycle 跟 secrets engine 模型&lt;/h2>
&lt;p>Vault dynamic credential 由三個元件協作：&lt;/p>
&lt;table>
 &lt;thead>
 &lt;tr>
 &lt;th>元件&lt;/th>
 &lt;th>責任&lt;/th>
 &lt;/tr>
 &lt;/thead>
 &lt;tbody>
 &lt;tr>
 &lt;td>&lt;strong>Secrets engine&lt;/strong>&lt;/td>
 &lt;td>後端執行 credential 創建跟 revoke、每個 engine 對應一個 datastore（database / aws / ssh）&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>&lt;strong>Role&lt;/strong>&lt;/td>
 &lt;td>創建 credential 的範本：DB 連線 + creation SQL + default / max TTL + allowed_roles&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>&lt;strong>Lease&lt;/strong>&lt;/td>
 &lt;td>每次 credential 發放都對應一個 lease ID、由 Vault 管 TTL / renew / revoke&lt;/td>
 &lt;/tr>
 &lt;/tbody>
&lt;/table>
&lt;p>跟 static secret（K/V store）對照、dynamic credential 的關鍵差異是 &lt;em>credential 在 read 時才產生&lt;/em>、且 Vault 追蹤每個 outstanding lease；application 必須 &lt;em>主動 renew&lt;/em> 或接受 credential 失效。&lt;/p>
&lt;p>Lease 的兩個 TTL：&lt;/p>
&lt;ul>
&lt;li>&lt;strong>default_ttl&lt;/strong>：credential 初始有效期、application 不 renew 就到期&lt;/li>
&lt;li>&lt;strong>max_ttl&lt;/strong>：credential 最長有效期、不管 renew 幾次都不能超過&lt;/li>
&lt;/ul>
&lt;p>實務 default 配置：&lt;code>default_ttl: 1h&lt;/code> + &lt;code>max_ttl: 24h&lt;/code>、application 每 30-45 分鐘 renew 一次、credential 最多活 24 小時必換新的。&lt;/p></description><content:encoded><![CDATA[<blockquote>
<p>本文是 <a href="/blog/backend/07-security-data-protection/vendors/hashicorp-vault/" data-link-title="HashiCorp Vault" data-link-desc="Self-hosted secret management 與 dynamic credential / encryption-as-a-service / PKI engine、跨雲跨環境的 secret 控制面">HashiCorp Vault</a> overview 的 implementation-layer deep article。Overview 已說明 Vault 在 secrets / credentials 治理譜系的定位（跟 cloud-native secrets manager / cert-manager 的取捨）、本文聚焦 <em>dynamic credential engine</em> 的實作層：怎麼配 database engine、application 怎麼 renew lease、production 踩過哪些坑、跟 cloud-native vault 跟 vault-agent injector 怎麼整合。</p></blockquote>
<h2 id="問題情境">問題情境</h2>
<p>Long-lived database credential 寫進 application config 是 production 環境最常見的 secret hygiene 失敗：credential 一旦外洩、輪替成本是 <em>跨團隊協調 + 多服務同步重啟</em>、實務上半年才換一次、credential 在 git history / log / dump file 留下軌跡。動態憑證（dynamic credential）的核心承諾是 <em>credential 生命週期跟 application session 對齊</em>、用完就 revoke、外洩窗口從幾個月縮到幾分鐘。</p>
<p>但 dynamic credential 不是「換個 SDK 就好」、它把 <em>credential 治理</em> 從 secret rotation 問題轉成 <em>lease lifecycle</em> 問題。lease TTL 設多久、renewal 怎麼跑、DB 端 user 創建會不會撞 max_connections、Vault sealed 時 application 怎麼降級 — 每個都是 production-grade 議題、無法靠 vendor doc 預設值直接上線。</p>
<h2 id="核心概念lease-lifecycle-跟-secrets-engine-模型">核心概念：lease lifecycle 跟 secrets engine 模型</h2>
<p>Vault dynamic credential 由三個元件協作：</p>
<table>
  <thead>
      <tr>
          <th>元件</th>
          <th>責任</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td><strong>Secrets engine</strong></td>
          <td>後端執行 credential 創建跟 revoke、每個 engine 對應一個 datastore（database / aws / ssh）</td>
      </tr>
      <tr>
          <td><strong>Role</strong></td>
          <td>創建 credential 的範本：DB 連線 + creation SQL + default / max TTL + allowed_roles</td>
      </tr>
      <tr>
          <td><strong>Lease</strong></td>
          <td>每次 credential 發放都對應一個 lease ID、由 Vault 管 TTL / renew / revoke</td>
      </tr>
  </tbody>
</table>
<p>跟 static secret（K/V store）對照、dynamic credential 的關鍵差異是 <em>credential 在 read 時才產生</em>、且 Vault 追蹤每個 outstanding lease；application 必須 <em>主動 renew</em> 或接受 credential 失效。</p>
<p>Lease 的兩個 TTL：</p>
<ul>
<li><strong>default_ttl</strong>：credential 初始有效期、application 不 renew 就到期</li>
<li><strong>max_ttl</strong>：credential 最長有效期、不管 renew 幾次都不能超過</li>
</ul>
<p>實務 default 配置：<code>default_ttl: 1h</code> + <code>max_ttl: 24h</code>、application 每 30-45 分鐘 renew 一次、credential 最多活 24 小時必換新的。</p>
<h2 id="step-by-step-配置">Step-by-step 配置</h2>
<h3 id="vault-server-啟用-database-secrets-engine">Vault server 啟用 database secrets engine</h3>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="c1"># 1. enable secrets engine</span>
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">vault secrets <span class="nb">enable</span> -path<span class="o">=</span>database database
</span></span><span class="line"><span class="ln"> 3</span><span class="cl">
</span></span><span class="line"><span class="ln"> 4</span><span class="cl"><span class="c1"># 2. 配置 PostgreSQL connection</span>
</span></span><span class="line"><span class="ln"> 5</span><span class="cl">vault write database/config/myapp-prod <span class="se">\
</span></span></span><span class="line"><span class="ln"> 6</span><span class="cl"><span class="se"></span>  <span class="nv">plugin_name</span><span class="o">=</span>postgresql-database-plugin <span class="se">\
</span></span></span><span class="line"><span class="ln"> 7</span><span class="cl"><span class="se"></span>  <span class="nv">allowed_roles</span><span class="o">=</span><span class="s2">&#34;myapp-reader,myapp-writer&#34;</span> <span class="se">\
</span></span></span><span class="line"><span class="ln"> 8</span><span class="cl"><span class="se"></span>  <span class="nv">connection_url</span><span class="o">=</span><span class="s2">&#34;postgresql://{{username}}:{{password}}@db.internal:5432/myapp?sslmode=require&#34;</span> <span class="se">\
</span></span></span><span class="line"><span class="ln"> 9</span><span class="cl"><span class="se"></span>  <span class="nv">username</span><span class="o">=</span><span class="s2">&#34;vault_root&#34;</span> <span class="se">\
</span></span></span><span class="line"><span class="ln">10</span><span class="cl"><span class="se"></span>  <span class="nv">password</span><span class="o">=</span><span class="s2">&#34;&lt;vault_root_pw&gt;&#34;</span>
</span></span><span class="line"><span class="ln">11</span><span class="cl">
</span></span><span class="line"><span class="ln">12</span><span class="cl"><span class="c1"># 3. 創建 role</span>
</span></span><span class="line"><span class="ln">13</span><span class="cl">vault write database/roles/myapp-reader <span class="se">\
</span></span></span><span class="line"><span class="ln">14</span><span class="cl"><span class="se"></span>  <span class="nv">db_name</span><span class="o">=</span>myapp-prod <span class="se">\
</span></span></span><span class="line"><span class="ln">15</span><span class="cl"><span class="se"></span>  <span class="nv">creation_statements</span><span class="o">=</span><span class="s2">&#34;CREATE ROLE \&#34;{{name}}\&#34; WITH LOGIN PASSWORD &#39;{{password}}&#39; VALID UNTIL &#39;{{expiration}}&#39;; \
</span></span></span><span class="line"><span class="ln">16</span><span class="cl"><span class="s2">                       GRANT SELECT ON ALL TABLES IN SCHEMA public TO \&#34;{{name}}\&#34;;&#34;</span> <span class="se">\
</span></span></span><span class="line"><span class="ln">17</span><span class="cl"><span class="se"></span>  <span class="nv">default_ttl</span><span class="o">=</span><span class="s2">&#34;1h&#34;</span> <span class="se">\
</span></span></span><span class="line"><span class="ln">18</span><span class="cl"><span class="se"></span>  <span class="nv">max_ttl</span><span class="o">=</span><span class="s2">&#34;24h&#34;</span></span></span></code></pre></div><p>關鍵：<code>vault_root</code> 是 Vault 用來創建其他 user 的 <em>bootstrapping account</em>、權限要含 <code>CREATEROLE</code>、但不需要 SUPERUSER；creation_statements 必須含 <code>VALID UNTIL '{{expiration}}'</code>、否則 DB 端 user 不會自動過期、Vault revoke 失敗時會留 zombie account。</p>
<h3 id="application-取得-credential">Application 取得 credential</h3>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl"><span class="c1"># Read 動態 credential（每次 read 都產生新 user）</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl">vault <span class="nb">read</span> database/creds/myapp-reader
</span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="c1"># Key                Value</span>
</span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="c1"># lease_id           database/creds/myapp-reader/abc123</span>
</span></span><span class="line"><span class="ln">5</span><span class="cl"><span class="c1"># lease_duration     1h</span>
</span></span><span class="line"><span class="ln">6</span><span class="cl"><span class="c1"># username           v-myapp-reader-x7y8z9-1747512345</span>
</span></span><span class="line"><span class="ln">7</span><span class="cl"><span class="c1"># password           A1b2C3d4E5f6...</span></span></span></code></pre></div><p>Application 從 response 拿三個值：<code>lease_id</code>（用來 renew / revoke）、<code>username</code> + <code>password</code>（DB 連線）、<code>lease_duration</code>（決定何時 renew）。</p>
<h3 id="renew-lease">Renew lease</h3>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl"><span class="c1"># 在 lease 到期前 renew（推薦在 50-70% TTL 跑）</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl">vault lease renew database/creds/myapp-reader/abc123
</span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="c1"># Key                Value</span>
</span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="c1"># lease_id           database/creds/myapp-reader/abc123</span>
</span></span><span class="line"><span class="ln">5</span><span class="cl"><span class="c1"># lease_duration     1h    # renew 後重置回 default_ttl</span></span></span></code></pre></div><p><code>lease_duration</code> 在 renew 後 <em>重置回 default_ttl</em>、但 <em>不會超過 max_ttl</em>。例：default 1h / max 24h、application 連 renew 23 小時後、第 24 次 renew Vault 拒絕、application 必須拿新 credential。</p>
<h3 id="revoke-leaseapplication-shutdown-時">Revoke lease（application shutdown 時）</h3>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl"><span class="c1"># Graceful shutdown 時主動 revoke</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl">vault lease revoke database/creds/myapp-reader/abc123</span></span></code></pre></div><p>Application 結束時 revoke 是 <em>credential hygiene 的最後一道閘門</em> — 即使 lease 還有時間、主動 revoke 讓 DB 端 user 立刻消失、避免 credential 在 application crash dump / log 內被翻出時還能用。</p>
<h2 id="故障演練--邊界-case">故障演練 / 邊界 case</h2>
<h3 id="case-1lease-renewal-racecredential-中途失效">Case 1：Lease renewal race，credential 中途失效</h3>
<p><strong>徵兆</strong>：application log 突然出現 <code>FATAL: role &quot;v-myapp-reader-x7y8z9-...&quot; does not exist</code>、且時間點接近某個整點 / 半點。</p>
<p><strong>根因</strong>：application 用 lease_duration 推算 renew 時機、但用了 <em>系統時間</em> 而非 <em>lease 簽發時間</em>；application 啟動晚於 lease 簽發 30 秒、renew 跑在 lease 過期後 5 秒、Vault 已 revoke credential、DB 端 user 已刪除。</p>
<p><strong>修法</strong>：用 <em>server 回傳的 lease_duration</em> 反推 renew 時機、留 <em>20-30% buffer</em>。例：lease_duration 3600 秒、application 在 2400-2520 秒（66-70%）開始 renew、不要拖到 3500 秒。Vault SDK 多數有 LifetimeWatcher（Go SDK）或 Renewer（Python hvac）這類 helper、優先用 SDK 不要自管 ticker。</p>
<h3 id="case-2db-max_connections-撞牆">Case 2：DB max_connections 撞牆</h3>
<p><strong>徵兆</strong>：application 在流量高峰開始大量 <code>FATAL: too many connections for role</code>、Vault audit log 顯示新 credential 還在發、PostgreSQL <code>pg_stat_activity</code> 看到上百個 <code>v-myapp-...</code> user 同時連著。</p>
<p><strong>根因</strong>：每個 application instance / pod 在啟動時 read 一次 credential、credential lease 1h、但 <em>application 跑 30 分鐘就重啟</em>（K8s rolling update / OOM）；舊 user 還在 PostgreSQL 端連著（connection pool 沒釋放）、新 user 又被創建、累積到 max_connections。</p>
<p><strong>修法</strong>：兩層</p>
<ol>
<li>Application graceful shutdown 時 <code>vault lease revoke</code> + connection pool drain</li>
<li>PostgreSQL connection pool 加 <code>pool_lifetime_max</code> 跟 application instance lifetime 對齊、避免 connection leak 到 lease 失效後仍 holding</li>
</ol>
<h3 id="case-3vault-sealed-中existing-lease-仍可用但新-lease-拿不到">Case 3：Vault sealed 中、existing lease 仍可用但新 lease 拿不到</h3>
<p><strong>徵兆</strong>：deploy 新 version 時、新 pod 起不來、<code>vault read database/creds/...</code> 卡住或回 <code>Vault is sealed</code>；但 <em>舊 pod 持續運作正常</em>（因為已持有 lease）。</p>
<p><strong>根因</strong>：Vault sealed（master key 被 wrap、需要 unseal key 解封）時、existing lease 因為 <em>credential 已在 DB 端創建</em>、application 連線不需要 Vault 介入；但 <em>新 lease 創建需要 Vault</em> / <em>renew 也需要 Vault</em>。Sealed 期間 application 還能用、但無法擴容、無法 renew。</p>
<p><strong>修法</strong>：</p>
<ol>
<li>Vault HA cluster + auto-unseal（KMS / HSM auto-unseal）避免人工 unseal 鏈</li>
<li>Application 加 retry-with-backoff、Vault 短暫 unavailable 時不要立刻 crash</li>
<li>Lease 設長一點（default 4h、max 48h）給 unseal 流程留時間</li>
</ol>
<h3 id="case-4application-vault-token-expirelease-orphan">Case 4：Application Vault token expire、lease orphan</h3>
<p><strong>徵兆</strong>：application 在連續跑 1-2 週後突然開始 <code>Permission denied</code> on <code>vault lease renew</code>、credential 在 max_ttl 後失效但 application 不知道。</p>
<p><strong>根因</strong>：application 的 Vault token（不是 DB credential 的 lease）也有 TTL；token 過期後 application 無法 renew lease、但 application 可能還沒到 <em>自己拿新 token</em> 的循環。Lease 變 orphan（沒人能 renew）、TTL 到就被 revoke。</p>
<p><strong>修法</strong>：</p>
<ol>
<li>Application 用 vault-agent injector / sidecar pattern、由 sidecar 維護 token + lease；application 只讀 file</li>
<li>不用 sidecar 時、application token 用 <em>renewable token</em> + 跟 lease 同 lifecycle 管</li>
<li>AppRole auth method 的 secret_id 跟 token TTL 都要納入 application reload 流程</li>
</ol>
<h3 id="case-5circleci-2023-incident-對照--secret_id-scope-過寬">Case 5：<a href="/blog/backend/07-security-data-protection/cases/" data-link-title="模組七案例正文" data-link-desc="資安控制面與控制平面轉換案例入口。">CircleCI 2023 incident</a> 對照 — secret_id scope 過寬</h3>
<p><strong>徵兆</strong>：CircleCI 2023 1 月事件、攻擊者拿到開發者 endpoint session token、進而拿到 Vault AppRole 的 secret_id；secret_id 對應的 policy 含 <em>跨環境跨資料庫 read</em>、攻擊者用 secret_id 拿到大量動態 credential。</p>
<p><strong>根因</strong>：AppRole secret_id 的 policy scope 設成 <em>single AppRole 服務所有環境</em>、而不是 <em>per-environment AppRole</em>；secret_id 外洩等於拿到全公司 dynamic credential 發放權。</p>
<p><strong>修法</strong>：</p>
<ol>
<li>Per-environment AppRole：dev / staging / prod 各有獨立 AppRole + secret_id、policy 只允許該環境的 database engine path</li>
<li>Secret_id TTL 短化（&lt; 24h）、用 <em>response wrapping</em> 傳遞、拿到後立刻 unwrap、減少 secret_id 在 build pipeline log 留軌跡</li>
<li>Vault audit log 接 SIEM、<code>approle/login</code> 異常 location / IP 即刻 alert</li>
</ol>
<h2 id="容量規劃">容量規劃</h2>
<p>Dynamic credential 的容量設計圍繞 <em>lease churn rate</em> — 每秒多少新 lease 創建、多少 renew、多少 revoke。</p>
<table>
  <thead>
      <tr>
          <th>維度</th>
          <th>估算方式</th>
          <th>警戒值</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>新 lease / s</td>
          <td><code>應用 instance 數 × (1 / lease_duration)</code></td>
          <td>單 Vault node ~50/s、HA cluster ~200/s</td>
      </tr>
      <tr>
          <td>Renew / s</td>
          <td><code>outstanding lease × renew_freq</code></td>
          <td>renew 跟 read 同 cost</td>
      </tr>
      <tr>
          <td>DB 端 user 數</td>
          <td><code>peak outstanding lease</code></td>
          <td>不能超過 DB max_roles 限制</td>
      </tr>
      <tr>
          <td>DB connection 數</td>
          <td><code>peak outstanding lease × avg connection per credential</code></td>
          <td>不能超過 DB max_connections</td>
      </tr>
      <tr>
          <td>Vault audit log size</td>
          <td>每 lease 操作 ~500 byte、<code>(新+renew+revoke) × 500B</code></td>
          <td>100 lease/s → 50MB/s audit、SIEM 端要 sizing</td>
      </tr>
  </tbody>
</table>
<p>實務 sizing 範例：100 個 application pod、lease_duration 1h、renew at 50% TTL：</p>
<ul>
<li>新 lease：100 / 3600 ≈ 0.03/s（pod 重啟才有）</li>
<li>Renew：100 / 1800 ≈ 0.06/s</li>
<li>Outstanding lease：~100 個（每 pod 一個）</li>
<li>DB user 數：~100 個（peak ~150 含 grace period）</li>
<li>DB connection：100 × 5（pool size）= 500、需要 PostgreSQL <code>max_connections &gt;= 600</code></li>
</ul>
<p>超出單 Vault node 容量（~50 ops/s）時、走 Vault HA cluster + auto-unseal、或拆 namespace。</p>
<h2 id="整合--下一步">整合 / 下一步</h2>
<h3 id="vault-agent-injectork8s-環境推薦">vault-agent injector（K8s 環境推薦）</h3>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="c"># pod annotation</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 2</span><span class="cl"><span class="w"></span><span class="nt">metadata</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 3</span><span class="cl"><span class="w">  </span><span class="nt">annotations</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 4</span><span class="cl"><span class="w">    </span><span class="nt">vault.hashicorp.com/agent-inject</span><span class="p">:</span><span class="w"> </span><span class="s2">&#34;true&#34;</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 5</span><span class="cl"><span class="w">    </span><span class="nt">vault.hashicorp.com/role</span><span class="p">:</span><span class="w"> </span><span class="s2">&#34;myapp-reader&#34;</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 6</span><span class="cl"><span class="w">    </span><span class="nt">vault.hashicorp.com/agent-inject-secret-db-creds</span><span class="p">:</span><span class="w"> </span><span class="s2">&#34;database/creds/myapp-reader&#34;</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 7</span><span class="cl"><span class="w">    </span><span class="nt">vault.hashicorp.com/agent-inject-template-db-creds</span><span class="p">:</span><span class="w"> </span><span class="p">|</span><span class="sd">
</span></span></span><span class="line"><span class="ln"> 8</span><span class="cl"><span class="sd">      {{- with secret &#34;database/creds/myapp-reader&#34; -}}
</span></span></span><span class="line"><span class="ln"> 9</span><span class="cl"><span class="sd">      DB_USER={{ .Data.username }}
</span></span></span><span class="line"><span class="ln">10</span><span class="cl"><span class="sd">      DB_PASSWORD={{ .Data.password }}
</span></span></span><span class="line"><span class="ln">11</span><span class="cl"><span class="sd">      {{- end }}</span></span></span></code></pre></div><p>Sidecar 自動 renew lease、credential 寫進 pod shared volume、application 讀 file。Application code 不需要 Vault SDK、降低 dependency。</p>
<h3 id="sdk-pattern非-k8s-環境">SDK pattern（非 K8s 環境）</h3>
<p>Go：<code>hashicorp/vault/api</code> + <code>LifetimeWatcher</code>、Java：spring-cloud-vault、Python：hvac + Renewer。SDK 已處理 renew timing / retry / token rotation、不要自寫 ticker。</p>
<h3 id="跟-cloud-native-secret-manager-的混搭">跟 cloud-native secret manager 的混搭</h3>
<p><a href="/blog/backend/07-security-data-protection/vendors/aws-secrets-manager/" data-link-title="AWS Secrets Manager" data-link-desc="AWS 原生 secret store &#43; 內建 RDS / Redshift rotation Lambda、Resource Policy 跨帳號共享、KMS 加密">AWS Secrets Manager</a> / <a href="/blog/backend/07-security-data-protection/vendors/google-secret-manager/" data-link-title="Google Secret Manager" data-link-desc="GCP 原生 secret store、CMEK &#43; Workload Identity Federation 整合、rotation 走自寫 Cloud Function 而非 built-in Lambda">Google Secret Manager</a> 也有 dynamic credential rotation（每 30 天輪替）、但 <em>cadence 是按時間</em>、不是 <em>按 application session</em>。混搭 pattern：</p>
<ul>
<li>Cloud-native：infrastructure-level credential（RDS master / k8s service account）、long TTL（30-90 天）</li>
<li>Vault dynamic：application-level credential、short TTL（1-24 小時）</li>
<li>Vault root credential 存 cloud-native secret manager、Vault auto-unseal 也用 cloud KMS</li>
</ul>
<h3 id="下一步議題">下一步議題</h3>
<ul>
<li><strong>Database snapshot 跟 dynamic credential 衝突</strong>：PostgreSQL <code>pg_dump</code> 用 long-lived credential、不適用 dynamic；snapshot user 用 static + scoped policy、跟 application user 分離</li>
<li><strong>Connection pool 端的 dynamic credential 支援</strong>：<a href="/blog/backend/01-database/vendors/postgresql/pgbouncer-config/" data-link-title="PostgreSQL pgBouncer 配置 &#43; 連線池治理" data-link-desc="pgBouncer transaction pooling 配置、跟 application connection pool 的分層、production 故障演練（pool exhaustion / stale connection / DNS failover）跟容量規劃">PgBouncer</a> 不支援 per-connection credential rotation、需要 connection 整個 lifecycle 跟 lease 對齊</li>
<li><strong>多 region Vault replication</strong>：performance replication 跟 disaster recovery replication 對 lease 的處理不同、跨 region application 要 sticky 同一 region 的 Vault primary</li>
</ul>
<h2 id="相關連結">相關連結</h2>
<ul>
<li>上游 vendor 頁：<a href="/blog/backend/07-security-data-protection/vendors/hashicorp-vault/" data-link-title="HashiCorp Vault" data-link-desc="Self-hosted secret management 與 dynamic credential / encryption-as-a-service / PKI engine、跨雲跨環境的 secret 控制面">HashiCorp Vault</a></li>
<li>對照案例：<a href="/blog/backend/07-security-data-protection/cases/failure-credential-rotation-without-scope/" data-link-title="7.C9 反例：憑證輪替未分 Scope" data-link-desc="憑證輪替若未分域分批，容易造成跨系統連鎖中斷。">Failure: Credential Rotation Without Scope</a></li>
<li>對照案例：<a href="/blog/backend/07-security-data-protection/cases/" data-link-title="模組七案例正文" data-link-desc="資安控制面與控制平面轉換案例入口。">CircleCI 2023 AppRole 事件</a> — Cross-vendor mapping</li>
<li>上游 chapter：<a href="/blog/backend/07-security-data-protection/secrets-and-machine-credential-governance/" data-link-title="7.6 秘密管理與機器憑證治理" data-link-desc="以問題驅動方式整理 secret、token、key 與機器身份治理">7.6 秘密管理與機器憑證治理</a></li>
<li>平行 deep article：<a href="/blog/backend/01-database/vendors/postgresql/pgbouncer-config/" data-link-title="PostgreSQL pgBouncer 配置 &#43; 連線池治理" data-link-desc="pgBouncer transaction pooling 配置、跟 application connection pool 的分層、production 故障演練（pool exhaustion / stale connection / DNS failover）跟容量規劃">pgBouncer 配置</a></li>
<li>Methodology：<a href="/blog/posts/vendor-%E6%B7%B1%E5%BA%A6%E6%8A%80%E8%A1%93%E6%96%87%E7%AB%A0%E6%96%B9%E6%B3%95%E8%AB%96%E7%9A%84%E6%BC%94%E5%8C%96%E7%B4%80%E9%8C%84%E5%90%8C-vendor-%E7%B3%BB%E5%88%97%E7%9A%84%E9%96%8B%E5%A0%B4%E8%BC%AA%E6%9B%BF%E9%A9%97%E8%AD%89/" data-link-title="Vendor 深度技術文章方法論的演化紀錄：同 vendor 系列的開場輪替驗證" data-link-desc="vendor overview 飽和後要寫單一功能深度文章、需要選題與結構依據時回來。這套方法論的驗證來源與 cadence variant 在高風險場景（同 vendor sub-tool 系列）的實證。">Vendor 深度技術文章的寫作方法論</a></li>
</ul>
]]></content:encoded></item><item><title>Vault → AWS Secrets Manager：「secret」不是「secret」、identity model 才是核心差異</title><link>https://tarrragon.github.io/blog/backend/07-security-data-protection/vendors/hashicorp-vault/migrate-to-aws-secrets-manager/</link><pubDate>Tue, 19 May 2026 00:00:00 +0000</pubDate><guid>https://tarrragon.github.io/blog/backend/07-security-data-protection/vendors/hashicorp-vault/migrate-to-aws-secrets-manager/</guid><description>&lt;blockquote>
&lt;p>本文是跨 vendor migration playbook、cross-link &lt;a href="https://tarrragon.github.io/blog/backend/07-security-data-protection/vendors/hashicorp-vault/" data-link-title="HashiCorp Vault" data-link-desc="Self-hosted secret management 與 dynamic credential / encryption-as-a-service / PKI engine、跨雲跨環境的 secret 控制面">HashiCorp Vault&lt;/a> 跟 &lt;a href="https://tarrragon.github.io/blog/backend/07-security-data-protection/vendors/aws-secrets-manager/" data-link-title="AWS Secrets Manager" data-link-desc="AWS 原生 secret store &amp;#43; 內建 RDS / Redshift rotation Lambda、Resource Policy 跨帳號共享、KMS 加密">AWS Secrets Manager&lt;/a>。本文同時是 &lt;a href="https://tarrragon.github.io/blog/report/data-topology-as-audit-dimension/" data-link-title="Data topology 是 process content 的第 6 audit 維度" data-link-desc="Process content 的 diff dimension audit 原本 5 維（schema / operational / paradigm / components / application change）漏了 *data topology* — 資料在 cluster / partition / region 之間的分佈拓樸；topology 不在既有 5 維任一個、但決定 re-sharding / partition redesign / multi-region rollout 的結構；本卡擴 audit 到 6 維、新增 Type F「Topology re-layout」結構">#128 self-aware limitation&lt;/a> 第 1 點「6 維仍可能漏類（identity / consistency / residency 三軸候選）」的 &lt;em>identity 軸驗證&lt;/em>。&lt;/p>&lt;/blockquote>
&lt;h2 id="secret不是secret兩家對secret的定義不同">「secret」不是「secret」：兩家對「secret」的定義不同&lt;/h2>
&lt;p>把 Vault → AWS Secrets Manager 當成「secret store 替換」是最常見的誤判 — 兩家的「secret」概念跨完全不同的 identity model：&lt;/p>
&lt;table>
 &lt;thead>
 &lt;tr>
 &lt;th>概念&lt;/th>
 &lt;th>HashiCorp Vault&lt;/th>
 &lt;th>AWS Secrets Manager&lt;/th>
 &lt;/tr>
 &lt;/thead>
 &lt;tbody>
 &lt;tr>
 &lt;td>Secret 本身&lt;/td>
 &lt;td>一個 secret path（&lt;code>secret/data/myapp/db&lt;/code>）&lt;/td>
 &lt;td>一個 ARN（&lt;code>arn:aws:secretsmanager:us-east-1:...&lt;/code>）&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>存取者身份&lt;/td>
 &lt;td>Vault token（self-managed token TTL）&lt;/td>
 &lt;td>AWS principal（IAM user / role / federation）&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>授權模型&lt;/td>
 &lt;td>Vault policy（capabilities：read/create/&amp;hellip;）&lt;/td>
 &lt;td>IAM policy + Resource policy（雙層）&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>Authentication&lt;/td>
 &lt;td>AppRole / Kubernetes / LDAP / OIDC / 自管 auth method&lt;/td>
 &lt;td>AWS Sigv4 + STS token / Identity Federation&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>Dynamic credential&lt;/td>
 &lt;td>Vault database secrets engine（lease + renew）&lt;/td>
 &lt;td>Lambda rotation（無 lease 概念）&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>Audit log&lt;/td>
 &lt;td>Vault audit log（自管 endpoint）&lt;/td>
 &lt;td>CloudTrail event（AWS 統一）&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>Multi-tenant 隔離&lt;/td>
 &lt;td>Namespace + path-level policy&lt;/td>
 &lt;td>Account boundary + resource policy&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>Tooling 整合&lt;/td>
 &lt;td>Application 端 Vault SDK / agent injector&lt;/td>
 &lt;td>AWS SDK + Lambda&lt;/td>
 &lt;/tr>
 &lt;/tbody>
&lt;/table>
&lt;p>&lt;strong>核心差異不在「存 secret 的地方」、在「身份從哪來、怎麼 enforce、怎麼 audit」。&lt;/strong> Migration 的真實工作量在 &lt;em>identity model 重設計&lt;/em>、不是 secret 搬遷。&lt;/p></description><content:encoded><![CDATA[<blockquote>
<p>本文是跨 vendor migration playbook、cross-link <a href="/blog/backend/07-security-data-protection/vendors/hashicorp-vault/" data-link-title="HashiCorp Vault" data-link-desc="Self-hosted secret management 與 dynamic credential / encryption-as-a-service / PKI engine、跨雲跨環境的 secret 控制面">HashiCorp Vault</a> 跟 <a href="/blog/backend/07-security-data-protection/vendors/aws-secrets-manager/" data-link-title="AWS Secrets Manager" data-link-desc="AWS 原生 secret store &#43; 內建 RDS / Redshift rotation Lambda、Resource Policy 跨帳號共享、KMS 加密">AWS Secrets Manager</a>。本文同時是 <a href="/blog/report/data-topology-as-audit-dimension/" data-link-title="Data topology 是 process content 的第 6 audit 維度" data-link-desc="Process content 的 diff dimension audit 原本 5 維（schema / operational / paradigm / components / application change）漏了 *data topology* — 資料在 cluster / partition / region 之間的分佈拓樸；topology 不在既有 5 維任一個、但決定 re-sharding / partition redesign / multi-region rollout 的結構；本卡擴 audit 到 6 維、新增 Type F「Topology re-layout」結構">#128 self-aware limitation</a> 第 1 點「6 維仍可能漏類（identity / consistency / residency 三軸候選）」的 <em>identity 軸驗證</em>。</p></blockquote>
<h2 id="secret不是secret兩家對secret的定義不同">「secret」不是「secret」：兩家對「secret」的定義不同</h2>
<p>把 Vault → AWS Secrets Manager 當成「secret store 替換」是最常見的誤判 — 兩家的「secret」概念跨完全不同的 identity model：</p>
<table>
  <thead>
      <tr>
          <th>概念</th>
          <th>HashiCorp Vault</th>
          <th>AWS Secrets Manager</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>Secret 本身</td>
          <td>一個 secret path（<code>secret/data/myapp/db</code>）</td>
          <td>一個 ARN（<code>arn:aws:secretsmanager:us-east-1:...</code>）</td>
      </tr>
      <tr>
          <td>存取者身份</td>
          <td>Vault token（self-managed token TTL）</td>
          <td>AWS principal（IAM user / role / federation）</td>
      </tr>
      <tr>
          <td>授權模型</td>
          <td>Vault policy（capabilities：read/create/&hellip;）</td>
          <td>IAM policy + Resource policy（雙層）</td>
      </tr>
      <tr>
          <td>Authentication</td>
          <td>AppRole / Kubernetes / LDAP / OIDC / 自管 auth method</td>
          <td>AWS Sigv4 + STS token / Identity Federation</td>
      </tr>
      <tr>
          <td>Dynamic credential</td>
          <td>Vault database secrets engine（lease + renew）</td>
          <td>Lambda rotation（無 lease 概念）</td>
      </tr>
      <tr>
          <td>Audit log</td>
          <td>Vault audit log（自管 endpoint）</td>
          <td>CloudTrail event（AWS 統一）</td>
      </tr>
      <tr>
          <td>Multi-tenant 隔離</td>
          <td>Namespace + path-level policy</td>
          <td>Account boundary + resource policy</td>
      </tr>
      <tr>
          <td>Tooling 整合</td>
          <td>Application 端 Vault SDK / agent injector</td>
          <td>AWS SDK + Lambda</td>
      </tr>
  </tbody>
</table>
<p><strong>核心差異不在「存 secret 的地方」、在「身份從哪來、怎麼 enforce、怎麼 audit」。</strong> Migration 的真實工作量在 <em>identity model 重設計</em>、不是 secret 搬遷。</p>
<p>跑 <a href="/blog/report/content-structure-by-max-diff-dimension/" data-link-title="Process content 結構由最大差異維度決定、不是 universal phased" data-link-desc="跨 X process content（migration / upgrade / rollout / playbook）的結構由 source / target 之間 *差異維度組合* 決定、不存在 universal phased 模板；6 種 migration / process type 實證（schema 差 / drop-in / operational / multi-tool / paradigm / topology re-layout）跑出 6 種不同結構；寫作前必須做 *6 維 diff dimension audit* 才能決定結構、跳過會套錯模板">6 維 diff dimension audit</a>：</p>
<table>
  <thead>
      <tr>
          <th>維度</th>
          <th>評估</th>
          <th>等級</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>Schema / API</td>
          <td>API 完全不同（Vault HTTP API vs AWS SDK）</td>
          <td>Medium</td>
      </tr>
      <tr>
          <td>Operational model</td>
          <td>Self-managed Vault cluster → AWS managed</td>
          <td><strong>High</strong></td>
      </tr>
      <tr>
          <td>Paradigm</td>
          <td>兩家都是 secret store paradigm</td>
          <td>Low</td>
      </tr>
      <tr>
          <td>Components</td>
          <td>Vault binary + storage backend → AWS SaaS</td>
          <td>Low</td>
      </tr>
      <tr>
          <td>Application change</td>
          <td>必改（SDK 換、auth method 換、retry pattern 換）</td>
          <td><strong>High</strong></td>
      </tr>
      <tr>
          <td>Data topology</td>
          <td>同 single instance, no sharding</td>
          <td>Low</td>
      </tr>
      <tr>
          <td><strong>Identity model</strong></td>
          <td><strong>完全不同（Vault token vs IAM principal）</strong></td>
          <td><strong>High</strong></td>
      </tr>
  </tbody>
</table>
<p>6 維 audit 抓不到「Identity model = High」這軸 — 用既有 6 維歸類、會走 Type C operational redesign + Application change 高維獨立段；但實際工作量分佈：</p>
<ul>
<li>Operational redesign（vault cluster 拆 / Lambda 配 / 監控換）：~25%</li>
<li>Application change（SDK / retry / token 換 IAM credential）：~30%</li>
<li><strong>Identity model 重設計（每個 secret 對應的 principal / policy / 跨 service auth chain）：~45%</strong></li>
</ul>
<p>最大工作量塊在 <em>identity model 重設計</em>、不在既有 6 維任一個。Identity 是 <em>候選的第 7 維</em>。</p>
<h2 id="identity-axis-是否獨立4-個論據">Identity axis 是否獨立：4 個論據</h2>
<p><strong>Yes、identity 是獨立軸</strong>：</p>
<ol>
<li><strong>Identity 不變 → operational 仍可變</strong>：Vault on-prem → Vault on-EKS、operational 變 high 但 identity model 不變（仍 Vault token）；可分開 audit</li>
<li><strong>Operational 不變 → identity 仍可變</strong>：Vault namespace 重組（管理 50 個 namespace → 5 個 namespace + namespace-level policy）、operational 不變但 identity boundary 重劃；可分開 audit</li>
<li><strong>Application change 不變 → identity 仍可變</strong>：純 infrastructure-level rotation（手動 → 自動）、application code 不變但 identity issuance flow 變；可分開 audit</li>
<li><strong>Paradigm 不變 → identity 仍可變</strong>：同樣是 secret store paradigm、Vault token vs IAM principal 是 identity model 差、不是 paradigm 差</li>
</ol>
<p><strong>No、identity 可塞 application change</strong>：</p>
<ul>
<li>反論：application code 改 SDK + IAM signer 都算 application change</li>
<li>拒絕：application change 是 <em>consequence</em>、不是 <em>root cause</em>；identity model 變動才是驅動 application change 的原因</li>
</ul>
<p>實證上、本文 migration 工作量 45% 在 identity 對位、確認 identity 是 <em>獨立的工作量主軸</em>、不該被壓進 application change 軸。</p>
<h2 id="結構type-c--identity-model-對位獨立段">結構：Type C + identity model 對位獨立段</h2>
<p>跟既有 Type C <a href="/blog/backend/01-database/vendors/postgresql/migrate-to-aurora/" data-link-title="PostgreSQL → Aurora Migration：protocol 相容、operational 重設計" data-link-desc="Aurora 號稱 PostgreSQL-compatible 但 operational model 不同（storage decouple / cluster endpoint / instance class / 自家備份）；遷移流程是混合（protocol drop-in &#43; operational phased）、5 個 production 踩雷（extension 不支援 / replication slot 不直通 / autovacuum 行為差 / IAM 認證強制 / cost model 換算）、跟 Patroni / read replica / DR 對位">PostgreSQL → Aurora</a> 對照、本文多出 <em>identity model 對位</em> 獨立段：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="ln">1</span><span class="cl">1. 「secret」不是「secret」（identity axis paradox 開頭）
</span></span><span class="line"><span class="ln">2</span><span class="cl">2. Identity axis 是否獨立的論據
</span></span><span class="line"><span class="ln">3</span><span class="cl">3. 結構 differentiator（Type C + identity 獨立段）
</span></span><span class="line"><span class="ln">4</span><span class="cl">4. Identity model 對位（Vault → AWS principal mapping）
</span></span><span class="line"><span class="ln">5</span><span class="cl">5. Operational migration（4 phase）
</span></span><span class="line"><span class="ln">6</span><span class="cl">6. Application change（SDK + retry pattern）
</span></span><span class="line"><span class="ln">7</span><span class="cl">7. Production 故障演練
</span></span><span class="line"><span class="ln">8</span><span class="cl">8. Capacity / cost
</span></span><span class="line"><span class="ln">9</span><span class="cl">9. 整合 / 下一步</span></span></code></pre></div><p>9 章節、260-280 行。比標準 Type C 多 1 段（identity model 對位）+ 1 段（axis 獨立論據）。</p>
<h2 id="identity-model-對位">Identity model 對位</h2>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="ln"> 1</span><span class="cl">Vault concept                    →  AWS Secrets Manager 對應
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">─────────────────────────────────   ────────────────────────────
</span></span><span class="line"><span class="ln"> 3</span><span class="cl">Vault token (auth 結果)           →  AWS STS temporary credential
</span></span><span class="line"><span class="ln"> 4</span><span class="cl">AppRole (auth method)             →  IAM role + AssumeRoleWithWebIdentity
</span></span><span class="line"><span class="ln"> 5</span><span class="cl">Kubernetes auth method            →  IAM Role for Service Account (IRSA)
</span></span><span class="line"><span class="ln"> 6</span><span class="cl">LDAP auth method                  →  IAM Identity Center (formerly SSO)
</span></span><span class="line"><span class="ln"> 7</span><span class="cl">Vault policy (capabilities)       →  IAM policy + Resource policy
</span></span><span class="line"><span class="ln"> 8</span><span class="cl">Path-level ACL (secret/db/*)      →  Resource ARN pattern (arn:aws:secretsmanager:...:secret:db/*)
</span></span><span class="line"><span class="ln"> 9</span><span class="cl">Namespace                         →  AWS account + resource-based isolation
</span></span><span class="line"><span class="ln">10</span><span class="cl">Audit device                      →  CloudTrail event
</span></span><span class="line"><span class="ln">11</span><span class="cl">Database secrets engine           →  Lambda rotation function</span></span></code></pre></div><p>每行對位都有 <em>語意差</em>、不是 1:1 mapping：</p>
<ul>
<li><strong>Vault token TTL vs AWS STS credential expiration</strong>：Vault token TTL 可由 application 主動 renew；STS credential 不能 renew、必須 re-assume</li>
<li><strong>Vault policy capabilities vs IAM action</strong>：Vault <code>read</code> capability 對應 AWS <code>secretsmanager:GetSecretValue</code>、但 AWS 還要 resource policy 允許；雙層授權</li>
<li><strong>Vault Kubernetes auth vs IRSA</strong>：兩者都是 K8s service account → secret access、但 IRSA 需要 EKS + OIDC provider 設置、Vault K8s auth 不需要</li>
</ul>
<p>Migration scope 包含每行對位的 <em>application-level 適配</em>、不是 secret 搬。</p>
<h2 id="operational-migration-4-phase">Operational migration (4 phase)</h2>
<h3 id="phase-0audit--design">Phase 0：Audit + design</h3>
<ul>
<li>列所有 Vault secret + path + 使用 application</li>
<li>每個 secret 對應 AWS principal（IAM role / IRSA / federation）</li>
<li>設計 ARN 命名規則（按 namespace / application / environment）</li>
<li>規劃 AWS account boundary（dev / staging / prod 分 account）</li>
</ul>
<h3 id="phase-1aws-secrets-manager--iam-設置">Phase 1：AWS Secrets Manager + IAM 設置</h3>
<ul>
<li>Terraform / CloudFormation 建 secret + IAM role + resource policy</li>
<li>設 IRSA / WebIdentity provider</li>
<li>預先建 staging secret、跑 application test</li>
</ul>
<h3 id="phase-2application-dual-read">Phase 2：Application dual-read</h3>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="ln">1</span><span class="cl"><span class="c1"># Application 同時讀 Vault + AWS Secrets Manager</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="k">def</span> <span class="nf">get_db_password</span><span class="p">():</span>
</span></span><span class="line"><span class="ln">3</span><span class="cl">    <span class="n">aws_value</span> <span class="o">=</span> <span class="n">boto3</span><span class="o">.</span><span class="n">client</span><span class="p">(</span><span class="s1">&#39;secretsmanager&#39;</span><span class="p">)</span><span class="o">.</span><span class="n">get_secret_value</span><span class="p">(</span><span class="n">SecretId</span><span class="o">=</span><span class="s1">&#39;myapp/db&#39;</span><span class="p">)[</span><span class="s1">&#39;SecretString&#39;</span><span class="p">]</span>
</span></span><span class="line"><span class="ln">4</span><span class="cl">    <span class="n">vault_value</span> <span class="o">=</span> <span class="n">vault_client</span><span class="o">.</span><span class="n">read</span><span class="p">(</span><span class="s1">&#39;secret/data/myapp/db&#39;</span><span class="p">)[</span><span class="s1">&#39;data&#39;</span><span class="p">][</span><span class="s1">&#39;data&#39;</span><span class="p">][</span><span class="s1">&#39;password&#39;</span><span class="p">]</span>
</span></span><span class="line"><span class="ln">5</span><span class="cl">
</span></span><span class="line"><span class="ln">6</span><span class="cl">    <span class="k">if</span> <span class="n">aws_value</span> <span class="o">!=</span> <span class="n">vault_value</span><span class="p">:</span>
</span></span><span class="line"><span class="ln">7</span><span class="cl">        <span class="n">logger</span><span class="o">.</span><span class="n">warning</span><span class="p">(</span><span class="sa">f</span><span class="s2">&#34;Secret diff between Vault and AWS!&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">8</span><span class="cl">
</span></span><span class="line"><span class="ln">9</span><span class="cl">    <span class="k">return</span> <span class="n">aws_value</span>  <span class="c1"># Use AWS as source of truth</span></span></span></code></pre></div><p>跑 1-2 週、確認兩端一致 + AWS API latency / error rate 接受。</p>
<h3 id="phase-3cutover--cleanup">Phase 3：Cutover + cleanup</h3>
<ul>
<li>Application 端切到 AWS Secrets Manager only</li>
<li>Vault read-only 1-2 週 standby</li>
<li>之後 decommission Vault cluster</li>
</ul>
<h2 id="application-change">Application change</h2>
<p>Application 端必改的 4 個 pattern：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="ln">1</span><span class="cl"><span class="c1"># Before: Vault SDK</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="kn">import</span> <span class="nn">hvac</span>
</span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="n">vault_client</span> <span class="o">=</span> <span class="n">hvac</span><span class="o">.</span><span class="n">Client</span><span class="p">(</span><span class="n">url</span><span class="o">=</span><span class="s1">&#39;https://vault.internal&#39;</span><span class="p">,</span> <span class="n">token</span><span class="o">=</span><span class="n">vault_token</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="n">secret</span> <span class="o">=</span> <span class="n">vault_client</span><span class="o">.</span><span class="n">read</span><span class="p">(</span><span class="s1">&#39;secret/data/myapp/db&#39;</span><span class="p">)[</span><span class="s1">&#39;data&#39;</span><span class="p">][</span><span class="s1">&#39;data&#39;</span><span class="p">][</span><span class="s1">&#39;password&#39;</span><span class="p">]</span>
</span></span><span class="line"><span class="ln">5</span><span class="cl">
</span></span><span class="line"><span class="ln">6</span><span class="cl"><span class="c1"># After: AWS SDK + IAM</span>
</span></span><span class="line"><span class="ln">7</span><span class="cl"><span class="kn">import</span> <span class="nn">boto3</span>
</span></span><span class="line"><span class="ln">8</span><span class="cl"><span class="n">sm</span> <span class="o">=</span> <span class="n">boto3</span><span class="o">.</span><span class="n">client</span><span class="p">(</span><span class="s1">&#39;secretsmanager&#39;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">9</span><span class="cl"><span class="n">secret</span> <span class="o">=</span> <span class="n">sm</span><span class="o">.</span><span class="n">get_secret_value</span><span class="p">(</span><span class="n">SecretId</span><span class="o">=</span><span class="s1">&#39;myapp/db&#39;</span><span class="p">)[</span><span class="s1">&#39;SecretString&#39;</span><span class="p">]</span></span></span></code></pre></div><p>關鍵差異點：</p>
<ul>
<li><strong>Authentication</strong>：Vault token 由 application 自管 / refresh；AWS SDK 自動處理 STS credential（透過 IAM role / instance profile / IRSA）</li>
<li><strong>Caching</strong>：Vault secret read 通常 cache 5-15 分鐘；AWS Secrets Manager 有 cache library（aws-secretsmanager-caching-python）需顯式啟用</li>
<li><strong>Retry pattern</strong>：Vault 用 exponential backoff；AWS SDK 自帶 retry but boto3 default 跟 application requirement 不一定 match</li>
<li><strong>Rotation hook</strong>：Vault 用 SDK 端 lease renewal；AWS 用 Lambda rotation function、application 端只需要 re-read</li>
</ul>
<h2 id="production-故障演練">Production 故障演練</h2>
<h3 id="case-1iam-principal-對位錯production-application-拿不到-secret">Case 1：IAM principal 對位錯、production application 拿不到 secret</h3>
<p><strong>徵兆</strong>：cutover 後 application 啟動失敗、log 顯示 <code>AccessDeniedException: User: arn:aws:sts::...:assumed-role/EKS-NodeRole/i-xxx is not authorized to perform: secretsmanager:GetSecretValue</code>。</p>
<p><strong>根因</strong>：EKS pod 用 <em>node role</em> 而非 <em>pod IRSA role</em>；Phase 0 audit 沒設 service account 對應的 OIDC trust。</p>
<p><strong>修法</strong>：</p>
<ol>
<li><strong>預先設 IRSA</strong>：建 IAM OIDC provider for EKS、設 service account annotation</li>
<li><strong>驗證 principal</strong>：<code>aws sts get-caller-identity</code> 從 pod 內跑、確認 returned role 是預期的</li>
<li><strong>Resource policy + IAM policy 雙層</strong>：確認 secret 的 resource policy allow 該 role、IAM policy 也 allow</li>
</ol>
<h3 id="case-2dynamic-credential-對等失敗application-連-db-失敗">Case 2：Dynamic credential 對等失敗、application 連 DB 失敗</h3>
<p><strong>徵兆</strong>：Vault 端用 database secrets engine 自動 rotate DB password、application 透過 Vault SDK 拿 lease；切到 AWS Secrets Manager + Lambda rotation 後、Lambda rotation 完成、但 application 端仍用 cached old password、連 DB 拒絕。</p>
<p><strong>根因</strong>：Vault SDK 自帶 lease renewal logic、application 知道 password 即將過期會主動 re-read；AWS SDK 沒 lease 概念、application 自己決定多久 re-read 一次。</p>
<p><strong>修法</strong>：</p>
<ol>
<li><strong>設 cache TTL 短於 rotation interval</strong>：rotation 24 小時、cache TTL 1 小時、最壞情況 1 小時 stale</li>
<li><strong>顯式 cache invalidation</strong>：rotation Lambda 跑完發 SNS、application subscribe 主動 refresh</li>
<li><strong>Connection-level retry</strong>：DB connection 認證失敗時 application 重 fetch secret 跟重連</li>
<li><strong>重新評估 rotation cadence</strong>：AWS Lambda rotation 不是 <em>Vault dynamic</em>、是 <em>scheduled rotation</em>；不能假設兩者同 semantic</li>
</ol>
<h3 id="case-3audit-log-結構差soc-dashboard-失效">Case 3：Audit log 結構差、SOC dashboard 失效</h3>
<p><strong>徵兆</strong>：cutover 後 SOC 端 dashboard 顯示 secret access metric 全 0；舊 Vault audit log 結構在 Splunk 端 parse 過、AWS CloudTrail 結構完全不同、search query 全失效。</p>
<p><strong>根因</strong>：Vault audit log 是 <em>Vault-specific</em> JSON 結構（含 lease_id / policy / token）；CloudTrail event 是 <em>AWS-specific</em>（含 eventName / requestParameters / userIdentity）；SOC parse rule 不能搬。</p>
<p><strong>修法</strong>：</p>
<ol>
<li><strong>Pre-cutover 重寫 SOC rule</strong>：CloudTrail event 對應 Vault audit log 的 detection coverage 必須 1:1 mapping</li>
<li><strong>GuardDuty integration</strong>：AWS GuardDuty 自動 surface secret access anomaly、降低自寫 rule 工作量</li>
<li><strong>CloudTrail → S3 → Athena</strong>：long-term audit query 走 Athena、tooling 跟 Vault 完全不同、SOC re-training</li>
</ol>
<h3 id="case-4calling-cost-反轉aws-比-vault-自管貴">Case 4：Calling cost 反轉、AWS 比 Vault 自管貴</h3>
<p><strong>徵兆</strong>：Vault on-prem 跑了 $200 / month（EC2 + ops），切到 AWS Secrets Manager 後 $1500 / month；帳單拆解後 <code>GetSecretValue</code> API call 是大頭。</p>
<p><strong>根因</strong>：AWS Secrets Manager <code>$0.05 per 10K API call</code> — application 高頻 read（每 request 都讀 secret + 沒 cache）會爆 cost；Vault 端 application 自管 cache + token TTL 內無 API call。</p>
<p><strong>修法</strong>：</p>
<ol>
<li><strong>強制 application-side cache</strong>：用 aws-secretsmanager-caching library、cache TTL 5-15 分鐘、API call 從 100M/month 降到 10K/month</li>
<li><strong>Re-architect application</strong>：把 high-frequency secret read 改 connection-level（建 DB connection 時讀一次、connection lifecycle 內復用）</li>
<li><strong>Cost monitoring</strong>：對 secret access 設 CloudWatch alarm、過 threshold 立即 alert</li>
</ol>
<h3 id="case-5跨-region-replication-對位失敗dr-演練失效">Case 5：跨 region replication 對位失敗、DR 演練失效</h3>
<p><strong>徵兆</strong>：DR drill 切 region 後、application 連不到 secret；發現 us-west-2 的 Secrets Manager 沒有 us-east-1 的 secret。</p>
<p><strong>根因</strong>：AWS Secrets Manager 不是 <em>global resource</em>、是 <em>region-scoped</em>；Vault 自管 multi-DC replication；cutover 漏設 <em>cross-region replication</em>。</p>
<p><strong>修法</strong>：</p>
<ol>
<li><strong>設 secret replication</strong>：AWS Secrets Manager 內建 replication 到其他 region（<code>ReplicaRegions</code>）</li>
<li><strong>DR drill 必跑</strong>：cutover 前 + cutover 後各 drill 一次、驗證 region failover 順</li>
<li><strong>架構</strong>：考慮用 <em>AWS Backup</em> 對 Secrets Manager 做 cross-region backup 補強</li>
</ol>
<h2 id="capacity--cost">Capacity / cost</h2>
<table>
  <thead>
      <tr>
          <th>維度</th>
          <th>Vault self-managed</th>
          <th>AWS Secrets Manager</th>
          <th>Trade-off</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>Setup cost</td>
          <td>Mid（自管 cluster + storage + HA）</td>
          <td>Low（一鍵建 secret）</td>
          <td>AWS 顯著低</td>
      </tr>
      <tr>
          <td>Operational FTE</td>
          <td>0.3-1 FTE</td>
          <td>0.05-0.1 FTE</td>
          <td>AWS 省 SRE</td>
      </tr>
      <tr>
          <td>Per-secret cost</td>
          <td>~$0（含在 cluster）</td>
          <td>$0.40 / month</td>
          <td>AWS 按 secret 數計費</td>
      </tr>
      <tr>
          <td>API call cost</td>
          <td>~$0（含在 cluster）</td>
          <td>$0.05 / 10K call</td>
          <td>High-frequency app 顯著貴</td>
      </tr>
      <tr>
          <td>Cross-region</td>
          <td>自管 replication</td>
          <td>內建 <code>ReplicaRegions</code></td>
          <td>AWS 簡化</td>
      </tr>
      <tr>
          <td>Audit</td>
          <td>Vault audit device</td>
          <td>CloudTrail（內建）</td>
          <td>AWS 跟 SOC pipeline 統一</td>
      </tr>
      <tr>
          <td>Identity integration</td>
          <td>多 auth method</td>
          <td>IAM + IRSA + Identity Center</td>
          <td>AWS 跟 cloud-native 整合好</td>
      </tr>
      <tr>
          <td>Total cost (100 secret, 50K read/day)</td>
          <td>$200 / mo (含 ops)</td>
          <td>$40 + $7 + replication = ~$50 / mo + ops 省</td>
          <td>AWS 1/4 cost、若 read 不爆</td>
      </tr>
  </tbody>
</table>
<p><strong>判讀</strong>：少 secret + 中頻 read 走 AWS Secrets Manager；高頻 read + multi-cloud / on-prem 約束走 Vault。</p>
<h2 id="整合--下一步">整合 / 下一步</h2>
<h3 id="跟-vault-dynamic-credential-對比">跟 <a href="/blog/backend/07-security-data-protection/vendors/hashicorp-vault/dynamic-credential/" data-link-title="HashiCorp Vault Dynamic Credential：lease 治理跟 application 整合的實作層" data-link-desc="Vault database secrets engine 怎麼配、application 怎麼 renew lease、production 五大踩雷（lease 過期 race、DB max_connections 撞牆、Vault sealed、token expire、scope 過寬）、容量規劃跟 vault-agent injector 整合">Vault Dynamic Credential</a> 對比</h3>
<p>Vault dynamic credential 是 Vault 特有 feature、AWS Secrets Manager 用 <em>Lambda rotation</em> 對應、但 semantic 不同：</p>
<ul>
<li>Vault: per-application lease、application-aware lifecycle</li>
<li>AWS: scheduled rotation、application 不知道何時被 rotate</li>
</ul>
<p>Migration scope 應該 <em>降級</em> dynamic credential 場景、用 Lambda rotation 替代、application logic 改 cache + retry pattern。</p>
<h3 id="跟-iam-identity-center-整合">跟 IAM Identity Center 整合</h3>
<p>人類存取 secret（emergency break-glass）走 IAM Identity Center + temporary role assumption；不要直接給 user IAM key。</p>
<h3 id="下一步議題">下一步議題</h3>
<ul>
<li><strong>Reverse migration（AWS → Vault）</strong>：通常是 multi-cloud / on-prem 約束驅動、cost 在大 scale 反轉</li>
<li><strong>Hybrid pattern</strong>：cloud-native secret 走 AWS、cross-cloud / on-prem secret 走 Vault；應用程式根據 secret 來源 routing</li>
<li><strong>identity axis 驗證</strong>：本文認為 identity 是獨立軸、未來累積 LDAP → OIDC / 自管 RBAC → IAM 等 migration 驗證</li>
</ul>
<h2 id="相關連結">相關連結</h2>
<ul>
<li>Source vendor：<a href="/blog/backend/07-security-data-protection/vendors/hashicorp-vault/" data-link-title="HashiCorp Vault" data-link-desc="Self-hosted secret management 與 dynamic credential / encryption-as-a-service / PKI engine、跨雲跨環境的 secret 控制面">HashiCorp Vault</a></li>
<li>Target vendor：<a href="/blog/backend/07-security-data-protection/vendors/aws-secrets-manager/" data-link-title="AWS Secrets Manager" data-link-desc="AWS 原生 secret store &#43; 內建 RDS / Redshift rotation Lambda、Resource Policy 跨帳號共享、KMS 加密">AWS Secrets Manager</a></li>
<li>平行 deep article：<a href="/blog/backend/07-security-data-protection/vendors/hashicorp-vault/dynamic-credential/" data-link-title="HashiCorp Vault Dynamic Credential：lease 治理跟 application 整合的實作層" data-link-desc="Vault database secrets engine 怎麼配、application 怎麼 renew lease、production 五大踩雷（lease 過期 race、DB max_connections 撞牆、Vault sealed、token expire、scope 過寬）、容量規劃跟 vault-agent injector 整合">Vault Dynamic Credential</a></li>
<li>平行 migration playbook (Type C)：<a href="/blog/backend/01-database/vendors/postgresql/migrate-to-aurora/" data-link-title="PostgreSQL → Aurora Migration：protocol 相容、operational 重設計" data-link-desc="Aurora 號稱 PostgreSQL-compatible 但 operational model 不同（storage decouple / cluster endpoint / instance class / 自家備份）；遷移流程是混合（protocol drop-in &#43; operational phased）、5 個 production 踩雷（extension 不支援 / replication slot 不直通 / autovacuum 行為差 / IAM 認證強制 / cost model 換算）、跟 Patroni / read replica / DR 對位">PostgreSQL → Aurora</a>（標準 Type C） / <a href="/blog/backend/01-database/vendors/mongodb/migrate-to-atlas/" data-link-title="MongoDB → Atlas：Atlas 不是 MongoDB &#43; managed、是另一個 product" data-link-desc="Atlas 號稱「MongoDB managed」但 operational model 完全不同（auto-scaling / VPC peering / IAM-driven access / 內建 backup / billing 模型）；本文採用 Type C operational redesign hybrid 結構、4-phase operational migration &#43; drop-in cutover、5 個 production 踩雷（連線數限制 / IP whitelist / backup retention / IAM token 過期 / billing 暴漲）">MongoDB → Atlas</a></li>
<li>平行 axis 候選驗證 (sibling)：<a href="/blog/backend/01-database/vendors/dynamodb/consistency-model-optimization/" data-link-title="DynamoDB Strongly Consistent → Eventually Consistent：same protocol, different contract" data-link-desc="DynamoDB consistency model 從 strongly consistent read 改 eventually consistent read 是 50% cost 優化但風險集中在 application contract — 同 vendor / 同 protocol / 同 table / 不同 read consistency；驗證 [#128](/report/data-topology-as-audit-dimension/) self-aware limitation 提出的 consistency axis 候選；涵蓋 read pattern audit / 5 個 production 踩雷">DynamoDB Consistency Model</a>（consistency 候選） / <a href="/blog/backend/01-database/vendors/postgresql/multi-region-gdpr-rollout/" data-link-title="PostgreSQL Multi-Region GDPR Rollout：政策驅動的 migration 屬本 methodology 嗎" data-link-desc="PostgreSQL 單 region → multi-region 同時滿足 GDPR EU residency 是 *政策驅動* 兼 *topology 變動* 兼 *operational redesign* 的多軸 migration；驗證 [#128](/report/data-topology-as-audit-dimension/) self-aware limitation 提出的 residency axis 候選 — residency 是 driver 還是獨立 audit 軸；涵蓋 logical replication 配 GDPR / 5 個 production 踩雷 / cross-region cost">PostgreSQL Multi-Region GDPR Rollout</a>（residency 候選）</li>
<li>Methodology：<a href="/blog/posts/migration-playbook-%E6%96%B9%E6%B3%95%E8%AB%96%E7%9A%84%E6%BC%94%E5%8C%96%E7%B4%80%E9%8C%84stage-0-variant-%E8%A6%8F%E5%8A%83%E6%8A%8A-collapse-%E7%8E%87%E5%BE%9E-60-%E9%99%8D%E5%88%B0-0/" data-link-title="Migration Playbook 方法論的演化紀錄：Stage 0 variant 規劃把 collapse 率從 60% 降到 0%" data-link-desc="跨 vendor migration playbook 需要獨立寫作方法論的依據，以及這套方法論從三輪 batch dogfood 中演化出來的驗證證據。">Migration playbook methodology</a> / <a href="/blog/report/data-topology-as-audit-dimension/" data-link-title="Data topology 是 process content 的第 6 audit 維度" data-link-desc="Process content 的 diff dimension audit 原本 5 維（schema / operational / paradigm / components / application change）漏了 *data topology* — 資料在 cluster / partition / region 之間的分佈拓樸；topology 不在既有 5 維任一個、但決定 re-sharding / partition redesign / multi-region rollout 的結構；本卡擴 audit 到 6 維、新增 Type F「Topology re-layout」結構">#128 self-aware limitation 第 1 點</a>（identity axis 候選驗證、本文是該驗證的 dogfood）</li>
</ul>
]]></content:encoded></item></channel></rss>