<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/"><channel><title>Github-Actions on Tarragon</title><link>https://tarrragon.github.io/blog/tags/github-actions/</link><description>Recent content in Github-Actions on Tarragon</description><generator>Hugo -- gohugo.io</generator><language>zh-TW</language><copyright>Tarragon (CC BY 4.0)</copyright><lastBuildDate>Fri, 26 Jun 2026 00:00:00 +0000</lastBuildDate><atom:link href="https://tarrragon.github.io/blog/tags/github-actions/index.xml" rel="self" type="application/rss+xml"/><item><title>CI/CD 失敗到修復發布流程</title><link>https://tarrragon.github.io/blog/ci/github-actions-failure-flow/</link><pubDate>Wed, 06 May 2026 00:00:00 +0000</pubDate><guid>https://tarrragon.github.io/blog/ci/github-actions-failure-flow/</guid><description>&lt;p>CI/CD 失敗處理的核心責任是把紅燈轉成明確的下一步路由。紅燈本身是驗證或交付層的訊號；工程流程要做的是找出失敗層、重現同一個條件、修正後重新讓 &lt;a href="https://tarrragon.github.io/blog/ci/knowledge-cards/ci-pipeline/" data-link-title="CI Pipeline" data-link-desc="說明持續整合如何在合併前自動驗證變更品質與相容性">CI Pipeline&lt;/a> 證明變更可發布。&lt;/p>
&lt;h2 id="失敗後先看什麼">失敗後先看什麼&lt;/h2>
&lt;p>失敗後第一步是定位 workflow 與 job。CI/CD 系統會把一次 push、pull request、tag 或 release 拆成多個 workflow，每個 workflow 下面又有多個 job；真正的下一步取決於是哪一層失敗。&lt;/p>
&lt;table>
 &lt;thead>
 &lt;tr>
 &lt;th>失敗位置&lt;/th>
 &lt;th>常見原因&lt;/th>
 &lt;th>下一步路由&lt;/th>
 &lt;/tr>
 &lt;/thead>
 &lt;tbody>
 &lt;tr>
 &lt;td>Lint / format&lt;/td>
 &lt;td>程式碼、文件或設定格式不符&lt;/td>
 &lt;td>回本機跑同一條 lint / format 命令&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>Test&lt;/td>
 &lt;td>單元、整合、瀏覽器或裝置測試回歸&lt;/td>
 &lt;td>下載 report，回本機用同條件重現&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>Build&lt;/td>
 &lt;td>編譯、bundle、package 或靜態產物失敗&lt;/td>
 &lt;td>回本機跑 production build 入口&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>Package&lt;/td>
 &lt;td>image、app bundle、&lt;a href="https://tarrragon.github.io/blog/ci/knowledge-cards/artifact/" data-link-title="Artifact" data-link-desc="說明 CI/CD 中可被驗證、保存與發布的交付產物">artifact&lt;/a> 產生失敗&lt;/td>
 &lt;td>檢查版本、簽章、registry 或路徑&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>Deploy&lt;/td>
 &lt;td>hosting、runtime、store 或權限設定&lt;/td>
 &lt;td>先確認 build &lt;a href="https://tarrragon.github.io/blog/ci/knowledge-cards/artifact/" data-link-title="Artifact" data-link-desc="說明 CI/CD 中可被驗證、保存與發布的交付產物">artifact&lt;/a> 是否已成功&lt;/td>
 &lt;/tr>
 &lt;/tbody>
&lt;/table>
&lt;p>Lint / format 失敗代表靜態契約沒有通過。常見情境是程式格式、文件格式、型別檢查、schema 或設定規則不符合規範。這類失敗的修復路徑通常很短：讀錯誤訊息、修正來源、必要時跑 formatter，再提交修正。&lt;/p>
&lt;p>Test 失敗代表某個行為或契約沒有符合預期。這類失敗要先看 report、screenshot、trace、device log 或 error context，確認是功能真的回歸、測試假設過期，還是測試環境缺少 production-like artifact。直接改測試前，要先確認測試原本守的是哪個使用者或系統行為。&lt;/p>
&lt;p>Build 失敗代表 pipeline 尚未產生可部署產物。這類失敗通常來自編譯錯誤、bundle 設定、依賴版本、環境變數、template 或資源路徑。修復時以專案定義的 production build 命令作為最小重現入口。&lt;/p>
&lt;p>Deploy 失敗代表發布動作沒有完成。這類失敗需要先區分 artifact 是否存在、發布通道權限是否正確、環境保護是否放行。若測試與 build 已成功，deploy 失敗多半是發布通道問題；若 artifact 沒有產生，應回到 build 或 package 階段。&lt;/p>
&lt;h2 id="本機重現流程">本機重現流程&lt;/h2>
&lt;p>本機重現的責任是讓修復建立在同一個驗證條件上。CI 是用乾淨環境執行的一組命令；只要能在本機跑出同樣的失敗，修復就能被快速驗證。&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-bash" data-lang="bash">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">make build
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">2&lt;/span>&lt;span class="cl">make &lt;span class="nb">test&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">3&lt;/span>&lt;span class="cl">make deploy-dry-run&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>Build 命令驗證 production artifact 是否能產生。這一步應該接近 CI 使用的 build 入口，避免開發模式遮蔽 production 問題。&lt;/p>
&lt;p>Test 命令驗證產物或程式行為。前端可能是 browser test，後端可能是 integration / contract test，App 可能是 device test，Docker 可能是 image scan 或 smoke test。&lt;/p>
&lt;p>Deploy dry-run 命令驗證發布前條件。高風險部署至少要能檢查 artifact、權限、環境與版本資訊；沒有 dry-run 的專案，也應保留對等的 preflight check。&lt;/p>
&lt;h2 id="修復與重新觸發">修復與重新觸發&lt;/h2>
&lt;p>修復流程的核心是用新 commit 讓 CI 重新驗證。一般流程不需要刪掉失敗 commit，也不需要 force push；失敗 commit 留在歷史裡，後續 fix commit 會形成清楚的修復脈絡。&lt;/p>
&lt;ol>
&lt;li>讀失敗 job 的 log 或 artifact。&lt;/li>
&lt;li>在本機跑對應命令重現。&lt;/li>
&lt;li>修改最小必要範圍。&lt;/li>
&lt;li>跑同一條本機命令確認修復。&lt;/li>
&lt;li>commit 並 push。&lt;/li>
&lt;li>等 GitHub Actions 重新跑。&lt;/li>
&lt;/ol>
&lt;p>這個流程的好處是保留可追溯性。日後再看到同類失敗，可以從 commit history 與 CI log 找到當時的判讀方式。&lt;/p>
&lt;h2 id="發布-gate-路由">發布 gate 路由&lt;/h2>
&lt;p>發布 gate 的責任是把「是否進入下一階段」變成明確條件。這一頁只處理失敗後的操作路由；&lt;a href="https://tarrragon.github.io/blog/ci/knowledge-cards/required-checks/" data-link-title="Required Checks" data-link-desc="說明 pull request 的必要檢查如何作為合併 gate">required checks&lt;/a>、job &lt;code>needs&lt;/code>、&lt;a href="https://tarrragon.github.io/blog/ci/knowledge-cards/environment-protection/" data-link-title="Environment Protection" data-link-desc="說明目標環境的審核、權限與放行條件如何保護發布">environment protection&lt;/a> 與 &lt;a href="https://tarrragon.github.io/blog/ci/knowledge-cards/artifact-handoff/" data-link-title="Artifact Handoff" data-link-desc="說明測試與部署如何共用同一份可追溯產物">artifact handoff&lt;/a> 的設計原理，獨立放在 &lt;a href="../ci-gate-workflow-boundary/">CI gate 與 workflow 邊界&lt;/a>。&lt;/p></description><content:encoded><![CDATA[<p>CI/CD 失敗處理的核心責任是把紅燈轉成明確的下一步路由。紅燈本身是驗證或交付層的訊號；工程流程要做的是找出失敗層、重現同一個條件、修正後重新讓 <a href="/blog/ci/knowledge-cards/ci-pipeline/" data-link-title="CI Pipeline" data-link-desc="說明持續整合如何在合併前自動驗證變更品質與相容性">CI Pipeline</a> 證明變更可發布。</p>
<h2 id="失敗後先看什麼">失敗後先看什麼</h2>
<p>失敗後第一步是定位 workflow 與 job。CI/CD 系統會把一次 push、pull request、tag 或 release 拆成多個 workflow，每個 workflow 下面又有多個 job；真正的下一步取決於是哪一層失敗。</p>
<table>
  <thead>
      <tr>
          <th>失敗位置</th>
          <th>常見原因</th>
          <th>下一步路由</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>Lint / format</td>
          <td>程式碼、文件或設定格式不符</td>
          <td>回本機跑同一條 lint / format 命令</td>
      </tr>
      <tr>
          <td>Test</td>
          <td>單元、整合、瀏覽器或裝置測試回歸</td>
          <td>下載 report，回本機用同條件重現</td>
      </tr>
      <tr>
          <td>Build</td>
          <td>編譯、bundle、package 或靜態產物失敗</td>
          <td>回本機跑 production build 入口</td>
      </tr>
      <tr>
          <td>Package</td>
          <td>image、app bundle、<a href="/blog/ci/knowledge-cards/artifact/" data-link-title="Artifact" data-link-desc="說明 CI/CD 中可被驗證、保存與發布的交付產物">artifact</a> 產生失敗</td>
          <td>檢查版本、簽章、registry 或路徑</td>
      </tr>
      <tr>
          <td>Deploy</td>
          <td>hosting、runtime、store 或權限設定</td>
          <td>先確認 build <a href="/blog/ci/knowledge-cards/artifact/" data-link-title="Artifact" data-link-desc="說明 CI/CD 中可被驗證、保存與發布的交付產物">artifact</a> 是否已成功</td>
      </tr>
  </tbody>
</table>
<p>Lint / format 失敗代表靜態契約沒有通過。常見情境是程式格式、文件格式、型別檢查、schema 或設定規則不符合規範。這類失敗的修復路徑通常很短：讀錯誤訊息、修正來源、必要時跑 formatter，再提交修正。</p>
<p>Test 失敗代表某個行為或契約沒有符合預期。這類失敗要先看 report、screenshot、trace、device log 或 error context，確認是功能真的回歸、測試假設過期，還是測試環境缺少 production-like artifact。直接改測試前，要先確認測試原本守的是哪個使用者或系統行為。</p>
<p>Build 失敗代表 pipeline 尚未產生可部署產物。這類失敗通常來自編譯錯誤、bundle 設定、依賴版本、環境變數、template 或資源路徑。修復時以專案定義的 production build 命令作為最小重現入口。</p>
<p>Deploy 失敗代表發布動作沒有完成。這類失敗需要先區分 artifact 是否存在、發布通道權限是否正確、環境保護是否放行。若測試與 build 已成功，deploy 失敗多半是發布通道問題；若 artifact 沒有產生，應回到 build 或 package 階段。</p>
<h2 id="本機重現流程">本機重現流程</h2>
<p>本機重現的責任是讓修復建立在同一個驗證條件上。CI 是用乾淨環境執行的一組命令；只要能在本機跑出同樣的失敗，修復就能被快速驗證。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">make build
</span></span><span class="line"><span class="ln">2</span><span class="cl">make <span class="nb">test</span>
</span></span><span class="line"><span class="ln">3</span><span class="cl">make deploy-dry-run</span></span></code></pre></div><p>Build 命令驗證 production artifact 是否能產生。這一步應該接近 CI 使用的 build 入口，避免開發模式遮蔽 production 問題。</p>
<p>Test 命令驗證產物或程式行為。前端可能是 browser test，後端可能是 integration / contract test，App 可能是 device test，Docker 可能是 image scan 或 smoke test。</p>
<p>Deploy dry-run 命令驗證發布前條件。高風險部署至少要能檢查 artifact、權限、環境與版本資訊；沒有 dry-run 的專案，也應保留對等的 preflight check。</p>
<h2 id="修復與重新觸發">修復與重新觸發</h2>
<p>修復流程的核心是用新 commit 讓 CI 重新驗證。一般流程不需要刪掉失敗 commit，也不需要 force push；失敗 commit 留在歷史裡，後續 fix commit 會形成清楚的修復脈絡。</p>
<ol>
<li>讀失敗 job 的 log 或 artifact。</li>
<li>在本機跑對應命令重現。</li>
<li>修改最小必要範圍。</li>
<li>跑同一條本機命令確認修復。</li>
<li>commit 並 push。</li>
<li>等 GitHub Actions 重新跑。</li>
</ol>
<p>這個流程的好處是保留可追溯性。日後再看到同類失敗，可以從 commit history 與 CI log 找到當時的判讀方式。</p>
<h2 id="發布-gate-路由">發布 gate 路由</h2>
<p>發布 gate 的責任是把「是否進入下一階段」變成明確條件。這一頁只處理失敗後的操作路由；<a href="/blog/ci/knowledge-cards/required-checks/" data-link-title="Required Checks" data-link-desc="說明 pull request 的必要檢查如何作為合併 gate">required checks</a>、job <code>needs</code>、<a href="/blog/ci/knowledge-cards/environment-protection/" data-link-title="Environment Protection" data-link-desc="說明目標環境的審核、權限與放行條件如何保護發布">environment protection</a> 與 <a href="/blog/ci/knowledge-cards/artifact-handoff/" data-link-title="Artifact Handoff" data-link-desc="說明測試與部署如何共用同一份可追溯產物">artifact handoff</a> 的設計原理，獨立放在 <a href="../ci-gate-workflow-boundary/">CI gate 與 workflow 邊界</a>。</p>
<h2 id="常見處理情境">常見處理情境</h2>
<p>CI 失敗但本機通過時，優先檢查環境差異。常見差異包括語言版本、套件管理器版本、缺少子模組、缺少 build artifact、測試依賴未安裝、時區或檔案大小寫差異。這類問題要把版本與建置前置條件寫進 workflow、Makefile 或 script，讓重現條件成為專案的一部分。</p>
<p>測試不穩定時，優先把 <a href="/blog/ci/knowledge-cards/flaky-test/" data-link-title="Flaky Test" data-link-desc="說明非決定性測試如何降低 CI gate 信任度與治理方式">Flaky Test</a> 狀態標出來並建立 owner。短期可以隔離或重跑，長期要找到不穩定來源，例如等待條件錯誤、外部網路依賴、時間假設、測試資料不穩或動畫 transition 尚未完成。測試不穩定會降低 gate 信任度，因此它本身就是需要治理的 CI 問題。</p>
<p>Deploy 失敗但測試通過時，優先看 artifact 與權限。若 build output 存在且可下載，問題通常在部署通道、token permission 或 <a href="/blog/ci/knowledge-cards/environment-protection/" data-link-title="Environment Protection" data-link-desc="說明目標環境的審核、權限與放行條件如何保護發布">environment protection</a>；若 artifact 缺失，就回到 build job。</p>
<h2 id="反模式與替代做法">反模式與替代做法</h2>
<table>
  <thead>
      <tr>
          <th>反模式</th>
          <th>風險</th>
          <th>替代做法</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>看到紅燈直接重跑</td>
          <td>掩蓋 flaky 或環境問題</td>
          <td>先看失敗 log，再決定是否重跑</td>
      </tr>
      <tr>
          <td>用 <code>--no-verify</code> 或跳過 CI</td>
          <td>把局部問題帶進主線</td>
          <td>修掉 gate 或明確記錄例外</td>
      </tr>
      <tr>
          <td>CI 與本機命令不同</td>
          <td>本機通過但 CI 失敗</td>
          <td>把命令收斂到 Makefile / npm script</td>
      </tr>
      <tr>
          <td>測試直接打外部服務</td>
          <td>網路與第三方狀態污染判斷</td>
          <td>使用 fixture、mock 或可控環境</td>
      </tr>
  </tbody>
</table>
<p>反模式的共同問題是讓 CI 失去判讀價值。CI 的目標是讓綠燈代表「這次變更在定義好的條件下可發布」。</p>
<h2 id="最小可用流程">最小可用流程</h2>
<p>最小可用流程是讓每次變更都有同一條路徑。對小型靜態網站或個人 blog，先做到以下四件事，就能形成穩定發布節奏。</p>
<ol>
<li><code>push</code> 或 PR 觸發 lint / test / build。</li>
<li>production build 有單一入口。</li>
<li>測試失敗時保留 artifact 或 report。</li>
<li>deploy 只接受測試與 build 通過後的產物。</li>
</ol>
<p>這套流程建立後，CI 紅燈就會成為清楚的路由訊號：哪一層壞、用哪個命令重現、修完後用哪個 gate 放行。</p>
<p>若變更涉及後端服務，可再對照 backend 知識卡的 <a href="/blog/backend/knowledge-cards/runbook/" data-link-title="Runbook" data-link-desc="說明 runbook 如何把事故判斷與操作步驟標準化">Runbook</a>、<a href="/blog/backend/knowledge-cards/rollback-strategy/" data-link-title="Rollback Strategy" data-link-desc="說明事故期間如何判斷回滾、回切與暫停變更">Rollback Strategy</a> 與 <a href="/blog/backend/knowledge-cards/release-gate/" data-link-title="Release Gate" data-link-desc="說明變更在正式釋出前如何通過或阻擋">Release Gate</a> 進一步細化故障處理順序與放行條件。</p>
<h2 id="下一步路由">下一步路由</h2>
<ul>
<li>需要理解 CI 在可靠性模組的位置：讀 <a href="/blog/backend/06-reliability/ci-pipeline/" data-link-title="6.1 CI pipeline" data-link-desc="CI pipeline 的分層策略、artifact 管理、flaky 治理與 release gate 輸入">6.1 CI pipeline</a>。</li>
<li>需要看靜態站部署案例：讀 <a href="../blog-project-deploy/">本 blog 專案部署</a>。</li>
<li>需要理解 CI gate 設計：讀 <a href="../ci-gate-workflow-boundary/">CI gate 與 workflow 邊界</a>。</li>
<li>需要理解發布阻擋策略：讀 <a href="/blog/backend/06-reliability/release-gate/" data-link-title="6.8 Release Gate 與變更節奏" data-link-desc="把驗證、migration、相容性納入放行判準">6.8 Release Gate 與變更節奏</a>。</li>
</ul>
]]></content:encoded></item><item><title>本 blog 專案的 GitHub Actions workflow</title><link>https://tarrragon.github.io/blog/ci/blog-project-deploy/github-actions-workflows/</link><pubDate>Wed, 06 May 2026 00:00:00 +0000</pubDate><guid>https://tarrragon.github.io/blog/ci/blog-project-deploy/github-actions-workflows/</guid><description>&lt;p>本 blog 的 GitHub Actions workflow 負責把內容檢查、瀏覽器回歸測試、Hugo 發布與 Claude 協作分成不同自動化流程。每條 workflow 都是一個獨立入口；維護時要先分清楚它是在保護內容品質、使用者行為、發布產物，還是協作流程。&lt;/p>
&lt;h2 id="workflow-總覽">Workflow 總覽&lt;/h2>
&lt;p>本專案目前有五條 workflow。三條屬於 CI / CD 主流程，兩條屬於 Claude 協作輔助流程。&lt;/p>
&lt;table>
 &lt;thead>
 &lt;tr>
 &lt;th>Workflow&lt;/th>
 &lt;th>檔案&lt;/th>
 &lt;th>觸發條件&lt;/th>
 &lt;th>核心責任&lt;/th>
 &lt;/tr>
 &lt;/thead>
 &lt;tbody>
 &lt;tr>
 &lt;td>&lt;code>md-check&lt;/code>&lt;/td>
 &lt;td>&lt;code>.github/workflows/md-check.yml&lt;/code>&lt;/td>
 &lt;td>push / pull request 到 &lt;code>main&lt;/code>&lt;/td>
 &lt;td>檢查 content Markdown 契約&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>&lt;code>Playwright tests&lt;/code>&lt;/td>
 &lt;td>&lt;code>.github/workflows/playwright.yml&lt;/code>&lt;/td>
 &lt;td>push / pull request 到 &lt;code>main&lt;/code>&lt;/td>
 &lt;td>驗證瀏覽器層行為與版面回歸&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>&lt;code>Deploy Hugo site to Pages&lt;/code>&lt;/td>
 &lt;td>&lt;code>.github/workflows/deploy.yml&lt;/code>&lt;/td>
 &lt;td>push 到 &lt;code>main&lt;/code>&lt;/td>
 &lt;td>建置 Hugo、產生搜尋索引並部署&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>&lt;code>Claude Code&lt;/code>&lt;/td>
 &lt;td>&lt;code>.github/workflows/claude.yml&lt;/code>&lt;/td>
 &lt;td>issue / comment / review 叫 Claude&lt;/td>
 &lt;td>讓 Claude 讀 issue、PR 與 CI 結果&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>&lt;code>Claude Code Review&lt;/code>&lt;/td>
 &lt;td>&lt;code>.github/workflows/claude-code-review.yml&lt;/code>&lt;/td>
 &lt;td>PR opened / synchronize 等事件&lt;/td>
 &lt;td>對 PR 進行 Claude code review&lt;/td>
 &lt;/tr>
 &lt;/tbody>
&lt;/table>
&lt;p>這張表的責任是提供入口。看到 GitHub Actions 紅燈時，先對照 workflow 名稱，把失敗歸到內容檢查、瀏覽器測試、部署或協作流程。&lt;/p>
&lt;h2 id="md-check">&lt;code>md-check&lt;/code>&lt;/h2>
&lt;p>&lt;code>md-check&lt;/code> 的責任是讓 &lt;code>content/&lt;/code> 裡的 Markdown 維持同一套結構契約。它會先用 Go build 出 &lt;code>scripts/mdtools&lt;/code>，再依序執行 formatter 檢查、lint 與卡片連結檢查。&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-yaml" data-lang="yaml">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">&lt;span class="nt">name&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">md-check&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">2&lt;/span>&lt;span class="cl">&lt;span class="w">&lt;/span>&lt;span class="nt">on&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">3&lt;/span>&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">push&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">4&lt;/span>&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">branches&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="l">main]&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">5&lt;/span>&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">pull_request&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">6&lt;/span>&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">branches&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="p">[&lt;/span>&lt;span class="l">main]&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>這條 workflow 的核心步驟是：&lt;/p>
&lt;ol>
&lt;li>&lt;code>actions/checkout@v6&lt;/code>&lt;/li>
&lt;li>&lt;code>actions/setup-go@v6&lt;/code>&lt;/li>
&lt;li>&lt;code>go build -o ../../bin/mdtools&lt;/code>&lt;/li>
&lt;li>&lt;code>./bin/mdtools fmt --check content/&lt;/code>&lt;/li>
&lt;li>&lt;code>./bin/mdtools lint content/&lt;/code>&lt;/li>
&lt;li>&lt;code>./bin/mdtools cards content/&lt;/code>&lt;/li>
&lt;/ol>
&lt;p>&lt;code>md-check&lt;/code> 失敗時，下一步是回本機跑同一組命令。&lt;code>fmt --check&lt;/code> 失敗代表格式可由 &lt;code>fmt --fix&lt;/code> 修正；&lt;code>lint&lt;/code> 失敗代表標題、front matter、URL、code block 等結構契約不符；&lt;code>cards&lt;/code> 失敗代表卡片連結、orphan 或 K4 規則需要修。&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-bash" data-lang="bash">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">./bin/mdtools fmt --check content/
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">2&lt;/span>&lt;span class="cl">./bin/mdtools lint content/
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">3&lt;/span>&lt;span class="cl">./bin/mdtools cards content/&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>維護這條 workflow 時，規則來源要和 &lt;a href="https://tarrragon.github.io/blog/posts/blog-markdown-%E5%AF%AB%E4%BD%9C%E8%A6%8F%E7%AF%84%E8%88%87-mdtools-%E6%AA%A2%E6%9F%A5/" data-link-title="Blog Markdown 寫作規範與 mdtools 檢查" data-link-desc="本 blog 的 Markdown 排版規範權威契約。涵蓋 H1 禁用、MD024 siblings_only、反釣魚 TLD 校驗、卡片雙向完整性、front matter schema；改規則時要與 scripts/mdtools 實作同步。">Blog Markdown 寫作規範與 mdtools 檢查&lt;/a> 對齊。改 &lt;code>scripts/mdtools/internal/rules/&lt;/code> 時，也要同步更新規範文章，避免 CI 行為和文件描述分叉。&lt;/p>
&lt;h2 id="playwright-tests">&lt;code>Playwright tests&lt;/code>&lt;/h2>
&lt;p>&lt;code>Playwright tests&lt;/code> 的責任是驗證使用者可見行為。它會先建出完整 Hugo site 與 Pagefind index，再用 Chromium 驗證搜尋、版面與互動。&lt;/p></description><content:encoded><![CDATA[<p>本 blog 的 GitHub Actions workflow 負責把內容檢查、瀏覽器回歸測試、Hugo 發布與 Claude 協作分成不同自動化流程。每條 workflow 都是一個獨立入口；維護時要先分清楚它是在保護內容品質、使用者行為、發布產物，還是協作流程。</p>
<h2 id="workflow-總覽">Workflow 總覽</h2>
<p>本專案目前有五條 workflow。三條屬於 CI / CD 主流程，兩條屬於 Claude 協作輔助流程。</p>
<table>
  <thead>
      <tr>
          <th>Workflow</th>
          <th>檔案</th>
          <th>觸發條件</th>
          <th>核心責任</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td><code>md-check</code></td>
          <td><code>.github/workflows/md-check.yml</code></td>
          <td>push / pull request 到 <code>main</code></td>
          <td>檢查 content Markdown 契約</td>
      </tr>
      <tr>
          <td><code>Playwright tests</code></td>
          <td><code>.github/workflows/playwright.yml</code></td>
          <td>push / pull request 到 <code>main</code></td>
          <td>驗證瀏覽器層行為與版面回歸</td>
      </tr>
      <tr>
          <td><code>Deploy Hugo site to Pages</code></td>
          <td><code>.github/workflows/deploy.yml</code></td>
          <td>push 到 <code>main</code></td>
          <td>建置 Hugo、產生搜尋索引並部署</td>
      </tr>
      <tr>
          <td><code>Claude Code</code></td>
          <td><code>.github/workflows/claude.yml</code></td>
          <td>issue / comment / review 叫 Claude</td>
          <td>讓 Claude 讀 issue、PR 與 CI 結果</td>
      </tr>
      <tr>
          <td><code>Claude Code Review</code></td>
          <td><code>.github/workflows/claude-code-review.yml</code></td>
          <td>PR opened / synchronize 等事件</td>
          <td>對 PR 進行 Claude code review</td>
      </tr>
  </tbody>
</table>
<p>這張表的責任是提供入口。看到 GitHub Actions 紅燈時，先對照 workflow 名稱，把失敗歸到內容檢查、瀏覽器測試、部署或協作流程。</p>
<h2 id="md-check"><code>md-check</code></h2>
<p><code>md-check</code> 的責任是讓 <code>content/</code> 裡的 Markdown 維持同一套結構契約。它會先用 Go build 出 <code>scripts/mdtools</code>，再依序執行 formatter 檢查、lint 與卡片連結檢查。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"><span class="line"><span class="ln">1</span><span class="cl"><span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l">md-check</span><span class="w">
</span></span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="w"></span><span class="nt">on</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="w">  </span><span class="nt">push</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="w">    </span><span class="nt">branches</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="l">main]</span><span class="w">
</span></span></span><span class="line"><span class="ln">5</span><span class="cl"><span class="w">  </span><span class="nt">pull_request</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln">6</span><span class="cl"><span class="w">    </span><span class="nt">branches</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="l">main]</span></span></span></code></pre></div><p>這條 workflow 的核心步驟是：</p>
<ol>
<li><code>actions/checkout@v6</code></li>
<li><code>actions/setup-go@v6</code></li>
<li><code>go build -o ../../bin/mdtools</code></li>
<li><code>./bin/mdtools fmt --check content/</code></li>
<li><code>./bin/mdtools lint content/</code></li>
<li><code>./bin/mdtools cards content/</code></li>
</ol>
<p><code>md-check</code> 失敗時，下一步是回本機跑同一組命令。<code>fmt --check</code> 失敗代表格式可由 <code>fmt --fix</code> 修正；<code>lint</code> 失敗代表標題、front matter、URL、code block 等結構契約不符；<code>cards</code> 失敗代表卡片連結、orphan 或 K4 規則需要修。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">./bin/mdtools fmt --check content/
</span></span><span class="line"><span class="ln">2</span><span class="cl">./bin/mdtools lint content/
</span></span><span class="line"><span class="ln">3</span><span class="cl">./bin/mdtools cards content/</span></span></code></pre></div><p>維護這條 workflow 時，規則來源要和 <a href="/blog/posts/blog-markdown-%E5%AF%AB%E4%BD%9C%E8%A6%8F%E7%AF%84%E8%88%87-mdtools-%E6%AA%A2%E6%9F%A5/" data-link-title="Blog Markdown 寫作規範與 mdtools 檢查" data-link-desc="本 blog 的 Markdown 排版規範權威契約。涵蓋 H1 禁用、MD024 siblings_only、反釣魚 TLD 校驗、卡片雙向完整性、front matter schema；改規則時要與 scripts/mdtools 實作同步。">Blog Markdown 寫作規範與 mdtools 檢查</a> 對齊。改 <code>scripts/mdtools/internal/rules/</code> 時，也要同步更新規範文章，避免 CI 行為和文件描述分叉。</p>
<h2 id="playwright-tests"><code>Playwright tests</code></h2>
<p><code>Playwright tests</code> 的責任是驗證使用者可見行為。它會先建出完整 Hugo site 與 Pagefind index，再用 Chromium 驗證搜尋、版面與互動。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"><span class="line"><span class="ln">1</span><span class="cl"><span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l">Playwright tests</span><span class="w">
</span></span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="w"></span><span class="nt">on</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="w">  </span><span class="nt">push</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="w">    </span><span class="nt">branches</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="l">main]</span><span class="w">
</span></span></span><span class="line"><span class="ln">5</span><span class="cl"><span class="w">  </span><span class="nt">pull_request</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln">6</span><span class="cl"><span class="w">    </span><span class="nt">branches</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="l">main]</span></span></span></code></pre></div><p>這條 workflow 的核心步驟是：</p>
<ol>
<li>checkout，並包含 submodules</li>
<li>安裝 Hugo <code>0.148.2</code> extended</li>
<li>安裝 Node <code>24</code></li>
<li><code>npm ci</code></li>
<li><code>npx playwright install --with-deps chromium</code></li>
<li><code>make site</code></li>
<li><code>npx playwright test</code></li>
<li>失敗時上傳 <code>playwright-report/</code></li>
</ol>
<p><code>make site</code> 是這條 workflow 的關鍵前置條件。它會產生 Hugo 靜態檔與三份 Pagefind index：<code>pagefind</code>、<code>pagefind-title</code>、<code>pagefind-content</code>。如果只跑 <code>hugo --minify</code> 就跑 Playwright，搜尋測試會因為缺少 index 而失敗。</p>
<p>Playwright 失敗時，下一步是下載 <code>playwright-report</code> 或讀 error context。若失敗發生在搜尋頁，先確認 <code>make site</code> 是否完整成功；若失敗發生在版面，先看 screenshot、bounding box 或 computed style；若失敗發生在互動，先看 selector 是否仍對準真實 DOM。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">make site
</span></span><span class="line"><span class="ln">2</span><span class="cl">npm test</span></span></code></pre></div><p>維護這條 workflow 時，測試要守使用者行為，不應只守 implementation detail。像 TOC RWD 這類版面行為，可以用 viewport 測試固定桌面、筆電與手機三種狀態。</p>
<h2 id="deploy-hugo-site-to-pages"><code>Deploy Hugo site to Pages</code></h2>
<p><code>Deploy Hugo site to Pages</code> 的責任是把 <code>main</code> 上的內容建置成 GitHub Pages artifact 並部署。它只在 push 到 <code>main</code> 時觸發，不在 pull request 上部署。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"><span class="line"><span class="ln">1</span><span class="cl"><span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l">Deploy Hugo site to Pages</span><span class="w">
</span></span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="w"></span><span class="nt">on</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="w">  </span><span class="nt">push</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="w">    </span><span class="nt">branches</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln">5</span><span class="cl"><span class="w">      </span>- <span class="l">main</span></span></span></code></pre></div><p>這條 workflow 有兩個 job：</p>
<table>
  <thead>
      <tr>
          <th>Job</th>
          <th>責任</th>
          <th>關鍵設定</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td><code>build</code></td>
          <td>checkout、Hugo build、Pagefind、artifact</td>
          <td><code>runs-on: ubuntu-latest</code></td>
      </tr>
      <tr>
          <td><code>deploy</code></td>
          <td>發布 GitHub Pages</td>
          <td><code>needs: build</code></td>
      </tr>
  </tbody>
</table>
<p><code>build</code> job 會先跑 <code>hugo --minify</code>，並把輸出寫到 <code>hugo-build-output.txt</code>。目前它設了 <code>continue-on-error: true</code>，所以 Hugo build 失敗時會進入 Claude Debug 步驟，嘗試讓 Claude 分析錯誤並 commit 修復。</p>
<p><code>Fail if build was not fixed</code> 是第二道保護。若原本 Hugo build 失敗，workflow 會重新跑一次 <code>hugo --minify</code>；如果 Claude 沒修好，這一步會讓 workflow 停止。</p>
<p>Pagefind index 會在 Hugo build 後產生：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">npx -y pagefind --site public --root-selector main
</span></span><span class="line"><span class="ln">2</span><span class="cl">npx -y pagefind --site public --root-selector <span class="s2">&#34;article.article-content &gt; h1&#34;</span> --output-subdir pagefind-title
</span></span><span class="line"><span class="ln">3</span><span class="cl">npx -y pagefind --site public --root-selector <span class="s2">&#34;.article-body&#34;</span> --output-subdir pagefind-content</span></span></code></pre></div><p>Deploy 失敗時，下一步先分層判讀。若 <code>build</code> job 失敗，回到 Hugo 或 Pagefind；若 <code>Upload artifact</code> 成功但 <code>deploy</code> job 失敗，檢查 Pages environment、permission、artifact 與 GitHub Pages 設定。</p>
<p>這條 workflow 目前的注意事項是：deploy workflow 自己沒有直接 <code>needs</code> <code>md-check</code> 或 <code>Playwright tests</code>，因為它們是獨立 workflow。這是本專案目前的實際邊界；gate 設計原理見 <a href="../../ci-gate-workflow-boundary/">CI gate 與 workflow 邊界</a>。</p>
<h2 id="claude-code"><code>Claude Code</code></h2>
<p><code>Claude Code</code> 的責任是提供互動式 Claude 協作入口。它不會在每次 push 自動修程式，而是在 issue、comment 或 review 內容包含 <code>@claude</code> 時觸發。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"><span class="line"><span class="ln">1</span><span class="cl"><span class="nt">on</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="w">  </span><span class="nt">issue_comment</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="w">    </span><span class="nt">types</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="l">created]</span><span class="w">
</span></span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="w">  </span><span class="nt">pull_request_review_comment</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln">5</span><span class="cl"><span class="w">    </span><span class="nt">types</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="l">created]</span><span class="w">
</span></span></span><span class="line"><span class="ln">6</span><span class="cl"><span class="w">  </span><span class="nt">issues</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln">7</span><span class="cl"><span class="w">    </span><span class="nt">types</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="l">opened, assigned]</span><span class="w">
</span></span></span><span class="line"><span class="ln">8</span><span class="cl"><span class="w">  </span><span class="nt">pull_request_review</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln">9</span><span class="cl"><span class="w">    </span><span class="nt">types</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="l">submitted]</span></span></span></code></pre></div><p>這條 workflow 的 gate 寫在 job <code>if</code>。只有以下情境會真正執行：</p>
<ul>
<li>issue comment 包含 <code>@claude</code></li>
<li>pull request review comment 包含 <code>@claude</code></li>
<li>pull request review body 包含 <code>@claude</code></li>
<li>issue title 或 body 包含 <code>@claude</code></li>
</ul>
<p>這條 workflow 給 Claude <code>actions: read</code> 權限，讓它能讀 PR 上的 CI 結果。這對「請 Claude 看 CI 為什麼失敗」很重要，因為 Claude 需要讀 workflow run、job log 或 check 結果才能判斷。</p>
<p>維護這條 workflow 時，重點是權限最小化。它目前給的是 <code>contents: read</code>、<code>pull-requests: read</code>、<code>issues: read</code>、<code>id-token: write</code>、<code>actions: read</code>，適合互動分析；若未來要讓 Claude 直接 commit，才需要重新評估寫入權限與保護條件。</p>
<h2 id="claude-code-review"><code>Claude Code Review</code></h2>
<p><code>Claude Code Review</code> 的責任是在 PR 事件發生時跑 Claude code review。它和 <code>Claude Code</code> 不同，前者是 PR review automation，後者是被 <code>@claude</code> 叫起來的互動入口。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"><span class="line"><span class="ln">1</span><span class="cl"><span class="nt">on</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="w">  </span><span class="nt">pull_request</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="w">    </span><span class="nt">types</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="l">opened, synchronize, ready_for_review, reopened]</span></span></span></code></pre></div><p>這條 workflow 使用 <code>code-review@claude-code-plugins</code>，prompt 是：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="ln">1</span><span class="cl">/code-review:code-review ${{ github.repository }}/pull/${{ github.event.pull_request.number }}</span></span></code></pre></div><p>它的責任是提供 review 視角。Claude review 可以指出風險、邏輯問題或測試缺口；真正阻擋合併與發布的責任仍在 <a href="/blog/ci/knowledge-cards/required-checks/" data-link-title="Required Checks" data-link-desc="說明 pull request 的必要檢查如何作為合併 gate">Required Checks</a>、測試 workflow 與 deploy gate。</p>
<p>維護這條 workflow 時，可以依 PR 類型決定是否加 path filter。若未來只想在程式碼或 workflow 變更時觸發，可打開 <code>paths</code> 設定；若希望文章內容也被 review，就維持目前全 PR 觸發。</p>
<h2 id="本專案的發布阻擋邊界">本專案的發布阻擋邊界</h2>
<p>本 blog 的發布阻擋邊界需要同時看 YAML 與 GitHub repository 設定。這一節只記錄本專案目前能從 YAML 判讀出的事實；required checks、environment protection 與 artifact handoff 的原理不在本頁展開。</p>
<p>目前從 YAML 可直接確認的阻擋關係是：</p>
<table>
  <thead>
      <tr>
          <th>關係</th>
          <th>是否在 YAML 中明確存在</th>
          <th>說明</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td><code>deploy</code> 等 <code>build</code></td>
          <td>是</td>
          <td><code>deploy</code> job 有 <code>needs: build</code></td>
      </tr>
      <tr>
          <td><code>deploy</code> 等 <code>md-check</code></td>
          <td>否</td>
          <td><code>md-check</code> 是另一條 workflow</td>
      </tr>
      <tr>
          <td><code>deploy</code> 等 Playwright</td>
          <td>否</td>
          <td><code>Playwright tests</code> 是另一條 workflow</td>
      </tr>
      <tr>
          <td>PR 需要通過測試才能合併</td>
          <td>需查 repository 設定</td>
          <td>需要看 GitHub branch protection 設定</td>
      </tr>
      <tr>
          <td>Pages deploy 需要人工審核</td>
          <td>需查 environment 設定</td>
          <td>需要看 GitHub Pages environment protection 設定</td>
      </tr>
  </tbody>
</table>
<p>若日後發現測試紅燈但 Pages 仍發布，本頁只負責指出目前 workflow 邊界；具體改法回到 <a href="../../ci-gate-workflow-boundary/">CI gate 與 workflow 邊界</a> 判斷，並對照 <a href="/blog/ci/knowledge-cards/required-checks/" data-link-title="Required Checks" data-link-desc="說明 pull request 的必要檢查如何作為合併 gate">Required Checks</a> 與 <a href="/blog/ci/knowledge-cards/environment-protection/" data-link-title="Environment Protection" data-link-desc="說明目標環境的審核、權限與放行條件如何保護發布">Environment Protection</a>。</p>
<h2 id="失敗時的維護路由">失敗時的維護路由</h2>
<p>失敗時的維護路由要先定位 workflow，再定位 job，再回到本機重現。這能避免在錯誤層修錯問題。</p>
<table>
  <thead>
      <tr>
          <th>紅燈位置</th>
          <th>優先看什麼</th>
          <th>本機重現命令</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td><code>md-check</code></td>
          <td>mdtools 訊息</td>
          <td><code>./bin/mdtools lint content/</code></td>
      </tr>
      <tr>
          <td><code>Playwright tests</code></td>
          <td><code>playwright-report</code> / error context</td>
          <td><code>make site</code> 後 <code>npm test</code></td>
      </tr>
      <tr>
          <td><code>Deploy</code> 的 Hugo build</td>
          <td><code>hugo-build-output.txt</code></td>
          <td><code>hugo --minify</code></td>
      </tr>
      <tr>
          <td><code>Deploy</code> 的 Pagefind</td>
          <td>Pagefind command output</td>
          <td><code>make site</code></td>
      </tr>
      <tr>
          <td><code>Deploy</code> 的 Pages step</td>
          <td>artifact / permission / environment</td>
          <td>GitHub Actions UI + Pages 設定</td>
      </tr>
      <tr>
          <td><code>Claude Code</code></td>
          <td>secret / permission / trigger <code>if</code></td>
          <td>檢查 <code>@claude</code> 觸發文字與 secrets</td>
      </tr>
      <tr>
          <td><code>Claude Code Review</code></td>
          <td>plugin marketplace / token</td>
          <td>檢查 PR event、secret 與 action log</td>
      </tr>
  </tbody>
</table>
<p>這份路由也可以當維護 checklist。新增 workflow 時，至少要補三件事：觸發條件、失敗時看哪個 artifact 或 log、本機要用哪條命令重現。</p>
<h2 id="本專案維護注意事項">本專案維護注意事項</h2>
<p>本專案維護注意事項的責任是記錄和目前 YAML 直接相關的操作提醒。這些提醒隨 workflow 實作改變而更新，不承擔通用 CI 設計原理。</p>
<ul>
<li><code>Playwright tests</code> 依賴 <code>make site</code> 產生 Pagefind index；搜尋測試失敗時先確認 production build 是否完整。</li>
<li><code>deploy.yml</code> 的 Hugo build 使用 <code>continue-on-error: true</code>，後面用 Claude Debug 與 retry build 接住失敗。</li>
<li><code>Claude Code</code> 目前是 read-oriented 互動入口；若未來要寫入 repo，需要重新審核 permission。</li>
<li><code>.github/workflows/*.yml</code> 有實作變更時，要同步更新本頁，讓維護入口維持可信。</li>
</ul>
<h2 id="下一步路由">下一步路由</h2>
<ul>
<li>CI 紅燈處理流程：讀 <a href="../../github-actions-failure-flow/">CI 失敗到修復發布流程</a>。</li>
<li>CI gate 設計原理：讀 <a href="../../ci-gate-workflow-boundary/">CI gate 與 workflow 邊界</a>。</li>
<li>CI 在可靠性模組的位置：讀 <a href="/blog/backend/06-reliability/ci-pipeline/" data-link-title="6.1 CI pipeline" data-link-desc="CI pipeline 的分層策略、artifact 管理、flaky 治理與 release gate 輸入">6.1 CI pipeline</a>。</li>
<li>發布 gate 設計：讀 <a href="/blog/backend/06-reliability/release-gate/" data-link-title="6.8 Release Gate 與變更節奏" data-link-desc="把驗證、migration、相容性納入放行判準">6.8 Release Gate 與變更節奏</a>。</li>
<li>Markdown 檢查規則：讀 <a href="/blog/posts/blog-markdown-%E5%AF%AB%E4%BD%9C%E8%A6%8F%E7%AF%84%E8%88%87-mdtools-%E6%AA%A2%E6%9F%A5/" data-link-title="Blog Markdown 寫作規範與 mdtools 檢查" data-link-desc="本 blog 的 Markdown 排版規範權威契約。涵蓋 H1 禁用、MD024 siblings_only、反釣魚 TLD 校驗、卡片雙向完整性、front matter schema；改規則時要與 scripts/mdtools 實作同步。">Blog Markdown 寫作規範與 mdtools 檢查</a>。</li>
</ul>
]]></content:encoded></item><item><title>Terraform CI Pipeline 設定指南</title><link>https://tarrragon.github.io/blog/infra/07-infra-as-pr/terraform-ci-pipeline-setup/</link><pubDate>Fri, 26 Jun 2026 00:00:00 +0000</pubDate><guid>https://tarrragon.github.io/blog/infra/07-infra-as-pr/terraform-ci-pipeline-setup/</guid><description>&lt;p>Terraform 的 PR 流程要發揮價值，plan 和 apply 需要在 CI 裡自動執行，而非在工程師的本機跑。本篇用 GitHub Actions 建立一條完整的 pipeline：PR 開啟時跑檢查和 plan、plan 結果貼回 PR comment 讓 reviewer 看、合併到主幹後才 apply。整條管線的 credential 用 OIDC 取得短期 token（見 &lt;a href="https://tarrragon.github.io/blog/infra/02-identity-credentials/oidc-trust-policy-setup/" data-link-title="OIDC Trust Policy 設定指南" data-link-desc="GitHub Actions 與 AWS 之間的 OIDC 聯合設定：建立 provider、設計 trust policy 的 claim 收斂、plan 與 apply role 分離、常見錯誤排查">OIDC Trust Policy 設定&lt;/a>），不存任何長期 key。&lt;/p>
&lt;h2 id="pipeline-的兩個階段">Pipeline 的兩個階段&lt;/h2>
&lt;p>整條 pipeline 分成兩個觸發時機，各自承擔不同責任：&lt;/p>
&lt;table>
 &lt;thead>
 &lt;tr>
 &lt;th>階段&lt;/th>
 &lt;th>觸發條件&lt;/th>
 &lt;th>責任&lt;/th>
 &lt;th>失敗時&lt;/th>
 &lt;/tr>
 &lt;/thead>
 &lt;tbody>
 &lt;tr>
 &lt;td>Plan&lt;/td>
 &lt;td>PR 開啟或更新&lt;/td>
 &lt;td>檢查格式、驗證語法、靜態掃描、產出 plan diff&lt;/td>
 &lt;td>PR 無法合併&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>Apply&lt;/td>
 &lt;td>合併到 main&lt;/td>
 &lt;td>把 plan 過的變更套用到雲端&lt;/td>
 &lt;td>需要人工介入&lt;/td>
 &lt;/tr>
 &lt;/tbody>
&lt;/table>
&lt;p>兩個階段用不同的 IAM role：plan role 只有唯讀權限（能跑 &lt;code>terraform plan&lt;/code> 但不能改任何資源），apply role 有寫入權限。這個分離確保 PR 階段的任何 code 都沒辦法偷偷改動雲端資源。&lt;/p>
&lt;h2 id="plan-階段的完整-workflow">Plan 階段的完整 workflow&lt;/h2>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-yaml" data-lang="yaml">&lt;span class="line">&lt;span class="ln"> 1&lt;/span>&lt;span class="cl">&lt;span class="nt">name&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">Terraform Plan&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 2&lt;/span>&lt;span class="cl">&lt;span class="w">&lt;/span>&lt;span class="nt">on&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 3&lt;/span>&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">pull_request&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 4&lt;/span>&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">paths&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 5&lt;/span>&lt;span class="cl">&lt;span class="w"> &lt;/span>- &lt;span class="s1">&amp;#39;infra/**&amp;#39;&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 6&lt;/span>&lt;span class="cl">&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 7&lt;/span>&lt;span class="cl">&lt;span class="w">&lt;/span>&lt;span class="nt">permissions&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 8&lt;/span>&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">id-token&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">write&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 9&lt;/span>&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">contents&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">read&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">10&lt;/span>&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">pull-requests&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">write&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">11&lt;/span>&lt;span class="cl">&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">12&lt;/span>&lt;span class="cl">&lt;span class="w">&lt;/span>&lt;span class="nt">jobs&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">13&lt;/span>&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">plan&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">14&lt;/span>&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">runs-on&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">ubuntu-latest&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">15&lt;/span>&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">defaults&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">16&lt;/span>&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">run&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">17&lt;/span>&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">working-directory&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">infra/environments/prod&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">18&lt;/span>&lt;span class="cl">&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">19&lt;/span>&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">steps&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">20&lt;/span>&lt;span class="cl">&lt;span class="w"> &lt;/span>- &lt;span class="nt">uses&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">actions/checkout@v4&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">21&lt;/span>&lt;span class="cl">&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">22&lt;/span>&lt;span class="cl">&lt;span class="w"> &lt;/span>- &lt;span class="nt">uses&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">aws-actions/configure-aws-credentials@v4&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">23&lt;/span>&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">with&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">24&lt;/span>&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">role-to-assume&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">arn:aws:iam::123456789012:role/infra-plan&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">25&lt;/span>&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">aws-region&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">ap-northeast-1&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">26&lt;/span>&lt;span class="cl">&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">27&lt;/span>&lt;span class="cl">&lt;span class="w"> &lt;/span>- &lt;span class="nt">uses&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">hashicorp/setup-terraform@v3&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">28&lt;/span>&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">with&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">29&lt;/span>&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">terraform_version&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="m">1.9.0&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">30&lt;/span>&lt;span class="cl">&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">31&lt;/span>&lt;span class="cl">&lt;span class="w"> &lt;/span>- &lt;span class="nt">name&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">Format check&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">32&lt;/span>&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">run&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">terraform fmt -check -recursive -diff&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">33&lt;/span>&lt;span class="cl">&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">34&lt;/span>&lt;span class="cl">&lt;span class="w"> &lt;/span>- &lt;span class="nt">name&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">Init&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">35&lt;/span>&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">run&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">terraform init -input=false&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">36&lt;/span>&lt;span class="cl">&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">37&lt;/span>&lt;span class="cl">&lt;span class="w"> &lt;/span>- &lt;span class="nt">name&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">Validate&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">38&lt;/span>&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">run&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">terraform validate&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">39&lt;/span>&lt;span class="cl">&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">40&lt;/span>&lt;span class="cl">&lt;span class="w"> &lt;/span>- &lt;span class="nt">name&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">TFLint&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">41&lt;/span>&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">uses&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">terraform-linters/setup-tflint@v4&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">42&lt;/span>&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">with&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">43&lt;/span>&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">tflint_version&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">latest&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">44&lt;/span>&lt;span class="cl">&lt;span class="w"> &lt;/span>- &lt;span class="nt">run&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">tflint --recursive --format compact&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">45&lt;/span>&lt;span class="cl">&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">46&lt;/span>&lt;span class="cl">&lt;span class="w"> &lt;/span>- &lt;span class="nt">name&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">Plan&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">47&lt;/span>&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">id&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">plan&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">48&lt;/span>&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">run&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="p">|&lt;/span>&lt;span class="sd">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">49&lt;/span>&lt;span class="cl">&lt;span class="sd"> terraform plan -no-color -input=false -out=tfplan \
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">50&lt;/span>&lt;span class="cl">&lt;span class="sd"> -detailed-exitcode 2&amp;gt;&amp;amp;1 | tee plan-output.txt&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">51&lt;/span>&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">continue-on-error&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="kc">true&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">52&lt;/span>&lt;span class="cl">&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">53&lt;/span>&lt;span class="cl">&lt;span class="w"> &lt;/span>- &lt;span class="nt">name&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">Comment plan on PR&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">54&lt;/span>&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">uses&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">actions/github-script@v7&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">55&lt;/span>&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">with&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">56&lt;/span>&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">script&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="p">|&lt;/span>&lt;span class="sd">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">57&lt;/span>&lt;span class="cl">&lt;span class="sd"> const fs = require(&amp;#39;fs&amp;#39;);
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">58&lt;/span>&lt;span class="cl">&lt;span class="sd"> const plan = fs.readFileSync(&amp;#39;infra/environments/prod/plan-output.txt&amp;#39;, &amp;#39;utf8&amp;#39;);
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">59&lt;/span>&lt;span class="cl">&lt;span class="sd"> const truncated = plan.length &amp;gt; 60000
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">60&lt;/span>&lt;span class="cl">&lt;span class="sd"> ? plan.substring(0, 60000) + &amp;#39;\n\n... (truncated)&amp;#39;
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">61&lt;/span>&lt;span class="cl">&lt;span class="sd"> : plan;
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">62&lt;/span>&lt;span class="cl">&lt;span class="sd"> await github.rest.issues.createComment({
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">63&lt;/span>&lt;span class="cl">&lt;span class="sd"> owner: context.repo.owner,
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">64&lt;/span>&lt;span class="cl">&lt;span class="sd"> repo: context.repo.repo,
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">65&lt;/span>&lt;span class="cl">&lt;span class="sd"> issue_number: context.issue.number,
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">66&lt;/span>&lt;span class="cl">&lt;span class="sd"> body: `### Terraform Plan\n\`\`\`\n${truncated}\n\`\`\``
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">67&lt;/span>&lt;span class="cl">&lt;span class="sd"> });&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">68&lt;/span>&lt;span class="cl">&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">69&lt;/span>&lt;span class="cl">&lt;span class="w"> &lt;/span>- &lt;span class="nt">name&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">Fail if plan errored&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">70&lt;/span>&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">if&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">steps.plan.outcome == &amp;#39;failure&amp;#39;&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">71&lt;/span>&lt;span class="cl">&lt;span class="w"> &lt;/span>&lt;span class="nt">run&lt;/span>&lt;span class="p">:&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="l">exit 1&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h3 id="各步驟的職責">各步驟的職責&lt;/h3>
&lt;p>&lt;strong>Format check&lt;/strong> 驗證 HCL 是否符合標準排版。它不影響功能，但消除 diff 噪音——排版不一致時 PR diff 會混入純格式變更，reviewer 分不清哪些是邏輯改動。&lt;code>-diff&lt;/code> flag 讓 CI 輸出具體哪幾行不符合，作者在本地跑 &lt;code>terraform fmt&lt;/code> 就能修。&lt;/p></description><content:encoded><![CDATA[<p>Terraform 的 PR 流程要發揮價值，plan 和 apply 需要在 CI 裡自動執行，而非在工程師的本機跑。本篇用 GitHub Actions 建立一條完整的 pipeline：PR 開啟時跑檢查和 plan、plan 結果貼回 PR comment 讓 reviewer 看、合併到主幹後才 apply。整條管線的 credential 用 OIDC 取得短期 token（見 <a href="/blog/infra/02-identity-credentials/oidc-trust-policy-setup/" data-link-title="OIDC Trust Policy 設定指南" data-link-desc="GitHub Actions 與 AWS 之間的 OIDC 聯合設定：建立 provider、設計 trust policy 的 claim 收斂、plan 與 apply role 分離、常見錯誤排查">OIDC Trust Policy 設定</a>），不存任何長期 key。</p>
<h2 id="pipeline-的兩個階段">Pipeline 的兩個階段</h2>
<p>整條 pipeline 分成兩個觸發時機，各自承擔不同責任：</p>
<table>
  <thead>
      <tr>
          <th>階段</th>
          <th>觸發條件</th>
          <th>責任</th>
          <th>失敗時</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>Plan</td>
          <td>PR 開啟或更新</td>
          <td>檢查格式、驗證語法、靜態掃描、產出 plan diff</td>
          <td>PR 無法合併</td>
      </tr>
      <tr>
          <td>Apply</td>
          <td>合併到 main</td>
          <td>把 plan 過的變更套用到雲端</td>
          <td>需要人工介入</td>
      </tr>
  </tbody>
</table>
<p>兩個階段用不同的 IAM role：plan role 只有唯讀權限（能跑 <code>terraform plan</code> 但不能改任何資源），apply role 有寫入權限。這個分離確保 PR 階段的任何 code 都沒辦法偷偷改動雲端資源。</p>
<h2 id="plan-階段的完整-workflow">Plan 階段的完整 workflow</h2>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l">Terraform Plan</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 2</span><span class="cl"><span class="w"></span><span class="nt">on</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 3</span><span class="cl"><span class="w">  </span><span class="nt">pull_request</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 4</span><span class="cl"><span class="w">    </span><span class="nt">paths</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 5</span><span class="cl"><span class="w">      </span>- <span class="s1">&#39;infra/**&#39;</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 6</span><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="ln"> 7</span><span class="cl"><span class="w"></span><span class="nt">permissions</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 8</span><span class="cl"><span class="w">  </span><span class="nt">id-token</span><span class="p">:</span><span class="w"> </span><span class="l">write</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 9</span><span class="cl"><span class="w">  </span><span class="nt">contents</span><span class="p">:</span><span class="w"> </span><span class="l">read</span><span class="w">
</span></span></span><span class="line"><span class="ln">10</span><span class="cl"><span class="w">  </span><span class="nt">pull-requests</span><span class="p">:</span><span class="w"> </span><span class="l">write</span><span class="w">
</span></span></span><span class="line"><span class="ln">11</span><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="ln">12</span><span class="cl"><span class="w"></span><span class="nt">jobs</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln">13</span><span class="cl"><span class="w">  </span><span class="nt">plan</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln">14</span><span class="cl"><span class="w">    </span><span class="nt">runs-on</span><span class="p">:</span><span class="w"> </span><span class="l">ubuntu-latest</span><span class="w">
</span></span></span><span class="line"><span class="ln">15</span><span class="cl"><span class="w">    </span><span class="nt">defaults</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln">16</span><span class="cl"><span class="w">      </span><span class="nt">run</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln">17</span><span class="cl"><span class="w">        </span><span class="nt">working-directory</span><span class="p">:</span><span class="w"> </span><span class="l">infra/environments/prod</span><span class="w">
</span></span></span><span class="line"><span class="ln">18</span><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="ln">19</span><span class="cl"><span class="w">    </span><span class="nt">steps</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln">20</span><span class="cl"><span class="w">      </span>- <span class="nt">uses</span><span class="p">:</span><span class="w"> </span><span class="l">actions/checkout@v4</span><span class="w">
</span></span></span><span class="line"><span class="ln">21</span><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="ln">22</span><span class="cl"><span class="w">      </span>- <span class="nt">uses</span><span class="p">:</span><span class="w"> </span><span class="l">aws-actions/configure-aws-credentials@v4</span><span class="w">
</span></span></span><span class="line"><span class="ln">23</span><span class="cl"><span class="w">        </span><span class="nt">with</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln">24</span><span class="cl"><span class="w">          </span><span class="nt">role-to-assume</span><span class="p">:</span><span class="w"> </span><span class="l">arn:aws:iam::123456789012:role/infra-plan</span><span class="w">
</span></span></span><span class="line"><span class="ln">25</span><span class="cl"><span class="w">          </span><span class="nt">aws-region</span><span class="p">:</span><span class="w"> </span><span class="l">ap-northeast-1</span><span class="w">
</span></span></span><span class="line"><span class="ln">26</span><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="ln">27</span><span class="cl"><span class="w">      </span>- <span class="nt">uses</span><span class="p">:</span><span class="w"> </span><span class="l">hashicorp/setup-terraform@v3</span><span class="w">
</span></span></span><span class="line"><span class="ln">28</span><span class="cl"><span class="w">        </span><span class="nt">with</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln">29</span><span class="cl"><span class="w">          </span><span class="nt">terraform_version</span><span class="p">:</span><span class="w"> </span><span class="m">1.9.0</span><span class="w">
</span></span></span><span class="line"><span class="ln">30</span><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="ln">31</span><span class="cl"><span class="w">      </span>- <span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l">Format check</span><span class="w">
</span></span></span><span class="line"><span class="ln">32</span><span class="cl"><span class="w">        </span><span class="nt">run</span><span class="p">:</span><span class="w"> </span><span class="l">terraform fmt -check -recursive -diff</span><span class="w">
</span></span></span><span class="line"><span class="ln">33</span><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="ln">34</span><span class="cl"><span class="w">      </span>- <span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l">Init</span><span class="w">
</span></span></span><span class="line"><span class="ln">35</span><span class="cl"><span class="w">        </span><span class="nt">run</span><span class="p">:</span><span class="w"> </span><span class="l">terraform init -input=false</span><span class="w">
</span></span></span><span class="line"><span class="ln">36</span><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="ln">37</span><span class="cl"><span class="w">      </span>- <span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l">Validate</span><span class="w">
</span></span></span><span class="line"><span class="ln">38</span><span class="cl"><span class="w">        </span><span class="nt">run</span><span class="p">:</span><span class="w"> </span><span class="l">terraform validate</span><span class="w">
</span></span></span><span class="line"><span class="ln">39</span><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="ln">40</span><span class="cl"><span class="w">      </span>- <span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l">TFLint</span><span class="w">
</span></span></span><span class="line"><span class="ln">41</span><span class="cl"><span class="w">        </span><span class="nt">uses</span><span class="p">:</span><span class="w"> </span><span class="l">terraform-linters/setup-tflint@v4</span><span class="w">
</span></span></span><span class="line"><span class="ln">42</span><span class="cl"><span class="w">        </span><span class="nt">with</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln">43</span><span class="cl"><span class="w">          </span><span class="nt">tflint_version</span><span class="p">:</span><span class="w"> </span><span class="l">latest</span><span class="w">
</span></span></span><span class="line"><span class="ln">44</span><span class="cl"><span class="w">      </span>- <span class="nt">run</span><span class="p">:</span><span class="w"> </span><span class="l">tflint --recursive --format compact</span><span class="w">
</span></span></span><span class="line"><span class="ln">45</span><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="ln">46</span><span class="cl"><span class="w">      </span>- <span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l">Plan</span><span class="w">
</span></span></span><span class="line"><span class="ln">47</span><span class="cl"><span class="w">        </span><span class="nt">id</span><span class="p">:</span><span class="w"> </span><span class="l">plan</span><span class="w">
</span></span></span><span class="line"><span class="ln">48</span><span class="cl"><span class="w">        </span><span class="nt">run</span><span class="p">:</span><span class="w"> </span><span class="p">|</span><span class="sd">
</span></span></span><span class="line"><span class="ln">49</span><span class="cl"><span class="sd">          terraform plan -no-color -input=false -out=tfplan \
</span></span></span><span class="line"><span class="ln">50</span><span class="cl"><span class="sd">            -detailed-exitcode 2&gt;&amp;1 | tee plan-output.txt</span><span class="w">
</span></span></span><span class="line"><span class="ln">51</span><span class="cl"><span class="w">        </span><span class="nt">continue-on-error</span><span class="p">:</span><span class="w"> </span><span class="kc">true</span><span class="w">
</span></span></span><span class="line"><span class="ln">52</span><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="ln">53</span><span class="cl"><span class="w">      </span>- <span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l">Comment plan on PR</span><span class="w">
</span></span></span><span class="line"><span class="ln">54</span><span class="cl"><span class="w">        </span><span class="nt">uses</span><span class="p">:</span><span class="w"> </span><span class="l">actions/github-script@v7</span><span class="w">
</span></span></span><span class="line"><span class="ln">55</span><span class="cl"><span class="w">        </span><span class="nt">with</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln">56</span><span class="cl"><span class="w">          </span><span class="nt">script</span><span class="p">:</span><span class="w"> </span><span class="p">|</span><span class="sd">
</span></span></span><span class="line"><span class="ln">57</span><span class="cl"><span class="sd">            const fs = require(&#39;fs&#39;);
</span></span></span><span class="line"><span class="ln">58</span><span class="cl"><span class="sd">            const plan = fs.readFileSync(&#39;infra/environments/prod/plan-output.txt&#39;, &#39;utf8&#39;);
</span></span></span><span class="line"><span class="ln">59</span><span class="cl"><span class="sd">            const truncated = plan.length &gt; 60000
</span></span></span><span class="line"><span class="ln">60</span><span class="cl"><span class="sd">              ? plan.substring(0, 60000) + &#39;\n\n... (truncated)&#39;
</span></span></span><span class="line"><span class="ln">61</span><span class="cl"><span class="sd">              : plan;
</span></span></span><span class="line"><span class="ln">62</span><span class="cl"><span class="sd">            await github.rest.issues.createComment({
</span></span></span><span class="line"><span class="ln">63</span><span class="cl"><span class="sd">              owner: context.repo.owner,
</span></span></span><span class="line"><span class="ln">64</span><span class="cl"><span class="sd">              repo: context.repo.repo,
</span></span></span><span class="line"><span class="ln">65</span><span class="cl"><span class="sd">              issue_number: context.issue.number,
</span></span></span><span class="line"><span class="ln">66</span><span class="cl"><span class="sd">              body: `### Terraform Plan\n\`\`\`\n${truncated}\n\`\`\``
</span></span></span><span class="line"><span class="ln">67</span><span class="cl"><span class="sd">            });</span><span class="w">
</span></span></span><span class="line"><span class="ln">68</span><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="ln">69</span><span class="cl"><span class="w">      </span>- <span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l">Fail if plan errored</span><span class="w">
</span></span></span><span class="line"><span class="ln">70</span><span class="cl"><span class="w">        </span><span class="nt">if</span><span class="p">:</span><span class="w"> </span><span class="l">steps.plan.outcome == &#39;failure&#39;</span><span class="w">
</span></span></span><span class="line"><span class="ln">71</span><span class="cl"><span class="w">        </span><span class="nt">run</span><span class="p">:</span><span class="w"> </span><span class="l">exit 1</span></span></span></code></pre></div><h3 id="各步驟的職責">各步驟的職責</h3>
<p><strong>Format check</strong> 驗證 HCL 是否符合標準排版。它不影響功能，但消除 diff 噪音——排版不一致時 PR diff 會混入純格式變更，reviewer 分不清哪些是邏輯改動。<code>-diff</code> flag 讓 CI 輸出具體哪幾行不符合，作者在本地跑 <code>terraform fmt</code> 就能修。</p>
<p><strong>Init</strong> 初始化 provider 和 backend。<code>-input=false</code> 避免 CI 卡在等待互動式輸入。如果 backend 設定錯了（bucket 不存在、權限不足），這一步就會失敗，不會跑到後面浪費時間。</p>
<p><strong>Validate</strong> 檢查 HCL 的語法和內部一致性——變數沒宣告、型別不匹配、必填參數缺漏。它不連線雲端，只讀 code，所以不需要 AWS credential 也能跑（但放在 init 之後是因為 validate 需要 provider schema）。</p>
<p><strong>TFLint</strong> 做 provider 層的正確性檢查：instance type 在該 region 不存在、已棄用的參數、命名不符規範。它補的是 validate 抓不到的「語法對但值不對」的問題。</p>
<p><strong>Plan</strong> 是整條 pipeline 的核心產出。<code>-detailed-exitcode</code> 讓 exit code 區分三種狀態：0 = 無差異、1 = 錯誤、2 = 有差異。<code>-out=tfplan</code> 把 plan 結果存成二進位檔，apply 階段可以直接用這份 plan 執行，避免 plan 和 apply 之間的時間差導致不一致。</p>
<p><strong>Comment</strong> 把 plan 輸出貼回 PR，reviewer 看 code diff 的同時看到 plan 的實際變更。plan 輸出可能很長（幾百行），超過 GitHub comment 上限時截斷，但保留開頭（通常包含 add/change/destroy 的摘要行）。</p>
<h2 id="apply-階段">Apply 階段</h2>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l">Terraform Apply</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 2</span><span class="cl"><span class="w"></span><span class="nt">on</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 3</span><span class="cl"><span class="w">  </span><span class="nt">push</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 4</span><span class="cl"><span class="w">    </span><span class="nt">branches</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="l">main]</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 5</span><span class="cl"><span class="w">    </span><span class="nt">paths</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 6</span><span class="cl"><span class="w">      </span>- <span class="s1">&#39;infra/**&#39;</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 7</span><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="ln"> 8</span><span class="cl"><span class="w"></span><span class="nt">permissions</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 9</span><span class="cl"><span class="w">  </span><span class="nt">id-token</span><span class="p">:</span><span class="w"> </span><span class="l">write</span><span class="w">
</span></span></span><span class="line"><span class="ln">10</span><span class="cl"><span class="w">  </span><span class="nt">contents</span><span class="p">:</span><span class="w"> </span><span class="l">read</span><span class="w">
</span></span></span><span class="line"><span class="ln">11</span><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="ln">12</span><span class="cl"><span class="w"></span><span class="nt">jobs</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln">13</span><span class="cl"><span class="w">  </span><span class="nt">apply</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln">14</span><span class="cl"><span class="w">    </span><span class="nt">runs-on</span><span class="p">:</span><span class="w"> </span><span class="l">ubuntu-latest</span><span class="w">
</span></span></span><span class="line"><span class="ln">15</span><span class="cl"><span class="w">    </span><span class="nt">environment</span><span class="p">:</span><span class="w"> </span><span class="l">production</span><span class="w">
</span></span></span><span class="line"><span class="ln">16</span><span class="cl"><span class="w">    </span><span class="nt">defaults</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln">17</span><span class="cl"><span class="w">      </span><span class="nt">run</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln">18</span><span class="cl"><span class="w">        </span><span class="nt">working-directory</span><span class="p">:</span><span class="w"> </span><span class="l">infra/environments/prod</span><span class="w">
</span></span></span><span class="line"><span class="ln">19</span><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="ln">20</span><span class="cl"><span class="w">    </span><span class="nt">steps</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln">21</span><span class="cl"><span class="w">      </span>- <span class="nt">uses</span><span class="p">:</span><span class="w"> </span><span class="l">actions/checkout@v4</span><span class="w">
</span></span></span><span class="line"><span class="ln">22</span><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="ln">23</span><span class="cl"><span class="w">      </span>- <span class="nt">uses</span><span class="p">:</span><span class="w"> </span><span class="l">aws-actions/configure-aws-credentials@v4</span><span class="w">
</span></span></span><span class="line"><span class="ln">24</span><span class="cl"><span class="w">        </span><span class="nt">with</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln">25</span><span class="cl"><span class="w">          </span><span class="nt">role-to-assume</span><span class="p">:</span><span class="w"> </span><span class="l">arn:aws:iam::123456789012:role/infra-apply</span><span class="w">
</span></span></span><span class="line"><span class="ln">26</span><span class="cl"><span class="w">          </span><span class="nt">aws-region</span><span class="p">:</span><span class="w"> </span><span class="l">ap-northeast-1</span><span class="w">
</span></span></span><span class="line"><span class="ln">27</span><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="ln">28</span><span class="cl"><span class="w">      </span>- <span class="nt">uses</span><span class="p">:</span><span class="w"> </span><span class="l">hashicorp/setup-terraform@v3</span><span class="w">
</span></span></span><span class="line"><span class="ln">29</span><span class="cl"><span class="w">        </span><span class="nt">with</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln">30</span><span class="cl"><span class="w">          </span><span class="nt">terraform_version</span><span class="p">:</span><span class="w"> </span><span class="m">1.9.0</span><span class="w">
</span></span></span><span class="line"><span class="ln">31</span><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="ln">32</span><span class="cl"><span class="w">      </span>- <span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l">Init</span><span class="w">
</span></span></span><span class="line"><span class="ln">33</span><span class="cl"><span class="w">        </span><span class="nt">run</span><span class="p">:</span><span class="w"> </span><span class="l">terraform init -input=false</span><span class="w">
</span></span></span><span class="line"><span class="ln">34</span><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="ln">35</span><span class="cl"><span class="w">      </span>- <span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l">Plan (verify)</span><span class="w">
</span></span></span><span class="line"><span class="ln">36</span><span class="cl"><span class="w">        </span><span class="nt">run</span><span class="p">:</span><span class="w"> </span><span class="l">terraform plan -no-color -input=false -detailed-exitcode</span><span class="w">
</span></span></span><span class="line"><span class="ln">37</span><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="ln">38</span><span class="cl"><span class="w">      </span>- <span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l">Apply</span><span class="w">
</span></span></span><span class="line"><span class="ln">39</span><span class="cl"><span class="w">        </span><span class="nt">run</span><span class="p">:</span><span class="w"> </span><span class="l">terraform apply -auto-approve -input=false</span></span></span></code></pre></div><h3 id="environment-protection-rule">environment protection rule</h3>
<p><code>environment: production</code> 這一行啟用 GitHub 的環境保護功能。在 repo 的 Settings → Environments → production 設定：</p>
<ul>
<li><strong>Required reviewers</strong>：指定至少一個人 approve 才能執行 apply job</li>
<li><strong>Wait timer</strong>：合併後等 N 分鐘才開始 apply（給人反應時間）</li>
<li><strong>Deployment branches</strong>：限定只有 main branch 能觸發</li>
</ul>
<p>這層保護讓高風險的變更（plan 顯示 destroy 或 replace）在 apply 前多一道人工確認。日常低風險變更（加一個 tag、調一個參數）可以直接通過。取捨點是：每次 apply 都要人按確認會拖慢頻繁的小變更，可以用 deployment rule 的條件只攔 production 環境。</p>
<h3 id="apply-階段重跑-plan-的理由">Apply 階段重跑 plan 的理由</h3>
<p>apply 之前重跑一次 plan，是為了驗證合併後的現實跟 PR review 時看到的一致。PR 從開啟到合併可能隔了幾小時或幾天，期間有人可能手動改了雲端資源（drift）或別的 PR 先 apply 了。重跑 plan 確認差異跟預期一致，不一致就停下來而非盲目 apply。</p>
<p>如果使用了 plan 階段的 <code>-out=tfplan</code> 保存 plan 檔，apply 可以改為 <code>terraform apply tfplan</code> 直接執行已 review 過的 plan。代價是 plan 檔需要跨 job 傳遞（GitHub Actions 的 artifact），且 plan 檔有時效——state 在 plan 之後被修改，apply 會拒絕執行。</p>
<h2 id="多環境的-pipeline-設計">多環境的 pipeline 設計</h2>
<p>管理 dev / staging / prod 三個環境時，pipeline 有兩種常見結構：</p>
<p><strong>單 workflow 加 matrix</strong>：一份 YAML 用 <code>strategy.matrix</code> 跑三個環境，每個環境有自己的 working directory 和 IAM role。好處是維護一份 YAML；代價是三個環境的 plan 都在同一次 PR run 裡，reviewer 要看三份 plan 輸出。</p>
<p><strong>每環境獨立 workflow</strong>：三份 YAML 各自觸發在對應環境目錄的變更上（<code>paths: ['infra/environments/dev/**']</code>）。好處是只有改到的環境才跑、PR comment 乾淨；代價是三份 YAML 有重複。</p>
<p>多數團隊起步時用單 workflow + matrix，環境數量超過三個或各環境的 apply 策略不同（dev 自動、prod 要 approval）時切到獨立 workflow。</p>
<h2 id="安全邊界">安全邊界</h2>
<p>CI pipeline 是 infra 變更的自動化執行者，它的安全性等同於 apply role 的權限。幾個邊界要守住：</p>
<p><strong>OIDC claim 收斂</strong>：apply role 的 trust policy 只允許特定 repo 的 main branch 假扮（見 <a href="/blog/infra/02-identity-credentials/oidc-trust-policy-setup/" data-link-title="OIDC Trust Policy 設定指南" data-link-desc="GitHub Actions 與 AWS 之間的 OIDC 聯合設定：建立 provider、設計 trust policy 的 claim 收斂、plan 與 apply role 分離、常見錯誤排查">OIDC Trust Policy 設定</a>）。如果 claim 只驗 repo 不驗 branch，任何人在 feature branch 推一個修改過的 workflow 就能觸發 apply。</p>
<p><strong>Workflow 修改的 review</strong>：<code>.github/workflows/</code> 底下的 YAML 變更應該跟 infra code 一樣走 PR review。修改 workflow 等於修改 pipeline 的行為——加一個 <code>terraform destroy</code> step 就能在合併時清掉整個環境。GitHub 的 CODEOWNERS 功能可以強制特定人 review workflow 變更。</p>
<p><strong>Secret 與 environment variable</strong>：OIDC 取代了存在 repo secrets 裡的 access key，但 workflow 可能還用到其他 secret（Terraform Cloud token、Slack webhook URL）。這些 secret 要限定在特定 environment 才能存取，不開放給所有 branch。</p>
<p>本篇聚焦 GitHub Actions。如果團隊選擇 Atlantis（常駐服務、內建 state lock 與 apply 語意），見<a href="/blog/infra/07-infra-as-pr/plan-review-apply-guardrails/" data-link-title="infra 走 PR 流程與自動化護欄" data-link-desc="infra 變更走 PR → plan → review diff → 合併 → apply，配 fmt / validate / tflint / checkov / tfsec 與 Atlantis 自動化，讓基礎設施可審查、可回溯、可交接">主文章的 Atlantis 段</a>的選型討論。</p>
<h2 id="跨分類引用">跨分類引用</h2>
<ul>
<li>→ <a href="/blog/infra/02-identity-credentials/oidc-trust-policy-setup/" data-link-title="OIDC Trust Policy 設定指南" data-link-desc="GitHub Actions 與 AWS 之間的 OIDC 聯合設定：建立 provider、設計 trust policy 的 claim 收斂、plan 與 apply role 分離、常見錯誤排查">OIDC Trust Policy 設定</a>：pipeline 的 credential 來源</li>
<li>→ <a href="/blog/infra/07-infra-as-pr/checkov-tfsec-rule-customization/" data-link-title="checkov 與 tfsec 規則配置" data-link-desc="靜態掃描工具的規則選擇策略、自訂規則、豁免管理、false positive 處理與 CI 整合，讓掃描從噪音來源變成可信的品質關卡">checkov / tfsec 規則配置</a>：pipeline 裡的靜態安全掃描怎麼配</li>
<li>→ <a href="/blog/infra/07-infra-as-pr/plan-review-apply-guardrails/" data-link-title="infra 走 PR 流程與自動化護欄" data-link-desc="infra 變更走 PR → plan → review diff → 合併 → apply，配 fmt / validate / tflint / checkov / tfsec 與 Atlantis 自動化，讓基礎設施可審查、可回溯、可交接">infra 走 PR 流程與自動化護欄</a>：pipeline 背後的審查原則</li>
<li>→ <a href="/blog/infra/04-environment-separation/" data-link-title="模組四：環境分離與模組化" data-link-desc="dev / staging / prod 切分、目錄結構 vs workspace、用可重用 module 避免環境漂移">模組四：環境分離與模組化</a>：多環境的目錄結構決定 pipeline 的 working directory</li>
</ul>
]]></content:encoded></item><item><title>OIDC Trust Policy 設定指南</title><link>https://tarrragon.github.io/blog/infra/02-identity-credentials/oidc-trust-policy-setup/</link><pubDate>Fri, 26 Jun 2026 00:00:00 +0000</pubDate><guid>https://tarrragon.github.io/blog/infra/02-identity-credentials/oidc-trust-policy-setup/</guid><description>&lt;p>OIDC 聯合讓 CI/CD pipeline 用短期 token 取代長期 access key 存取雲端資源。設定本身不複雜，但 trust policy 的 claim 條件寫錯一個字就會變成「任何 repo 都能假扮這個 role」或「完全無法 assume」。本篇是 GitHub Actions 與 AWS 之間的 OIDC 聯合的完整設定步驟，從建立 provider 到 trust policy 設計到測試驗證。其他 CI 平台（GitLab CI、CircleCI）的原理相同，差別只在 issuer URL 和 claim 結構：&lt;/p>
&lt;table>
 &lt;thead>
 &lt;tr>
 &lt;th>平台&lt;/th>
 &lt;th>Issuer URL&lt;/th>
 &lt;th>sub claim 格式範例&lt;/th>
 &lt;/tr>
 &lt;/thead>
 &lt;tbody>
 &lt;tr>
 &lt;td>GitHub Actions&lt;/td>
 &lt;td>&lt;code>token.actions.githubusercontent.com&lt;/code>&lt;/td>
 &lt;td>&lt;code>repo:{org}/{repo}:ref:refs/heads/{branch}&lt;/code>&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>GitLab CI&lt;/td>
 &lt;td>&lt;code>gitlab.com&lt;/code>&lt;/td>
 &lt;td>&lt;code>project_path:{group}/{project}:ref_type:branch:ref:main&lt;/code>&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>CircleCI&lt;/td>
 &lt;td>&lt;code>oidc.circleci.com/org/{org-id}&lt;/code>&lt;/td>
 &lt;td>&lt;code>org/{org-id}/project/{project-id}/user/{user-id}&lt;/code>&lt;/td>
 &lt;/tr>
 &lt;/tbody>
&lt;/table>
&lt;p>本篇以 GitHub Actions 為主，其他平台替換 issuer URL 和 sub condition 即可。&lt;/p>
&lt;h2 id="建立-oidc-provider">建立 OIDC Provider&lt;/h2>
&lt;p>OIDC provider 是 AWS 帳號裡的一個資源，聲明「我信任這個外部 identity provider 簽發的 token」。GitHub Actions 的 OIDC issuer URL 是固定的，每個 AWS 帳號只需要建一個 provider。&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-hcl" data-lang="hcl">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">&lt;span class="k">resource&lt;/span> &lt;span class="s2">&amp;#34;aws_iam_openid_connect_provider&amp;#34; &amp;#34;github&amp;#34;&lt;/span> {
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">2&lt;/span>&lt;span class="cl">&lt;span class="n"> url&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="s2">&amp;#34;https://token.actions.githubusercontent.com&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">3&lt;/span>&lt;span class="cl">&lt;span class="n"> client_id_list&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="p">[&lt;/span>&lt;span class="s2">&amp;#34;sts.amazonaws.com&amp;#34;&lt;/span>&lt;span class="p">]&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">4&lt;/span>&lt;span class="cl">&lt;span class="n"> thumbprint_list&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="p">[&lt;/span>&lt;span class="s2">&amp;#34;ffffffffffffffffffffffffffffffffffffffff&amp;#34;&lt;/span>&lt;span class="p">]&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">5&lt;/span>&lt;span class="cl">}&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>&lt;code>client_id_list&lt;/code> 設為 &lt;code>sts.amazonaws.com&lt;/code> 是 GitHub 官方建議的 audience 值。&lt;code>thumbprint_list&lt;/code> 在 2023 年之後 AWS 不再用它驗證 GitHub 的憑證鏈（改用 AWS 自己維護的根憑證清單），但欄位仍然是必填，填 40 個 &lt;code>f&lt;/code> 作為佔位值即可。&lt;/p>
&lt;p>這個 provider 建一次就好。多個 role 可以共用同一個 provider，差別在各自的 trust policy 怎麼寫。&lt;/p>
&lt;h2 id="trust-policy-設計claim-收斂">Trust Policy 設計：claim 收斂&lt;/h2>
&lt;p>Trust policy 決定「誰能假扮這個 role」。OIDC token 裡帶有多個 claim（描述「這是哪個 repo、哪個 branch、哪個 workflow 在跑」），trust policy 用 condition 比對這些 claim，全部命中才允許 assume。&lt;/p>
&lt;h3 id="最小可行的-trust-policy">最小可行的 trust policy&lt;/h3>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-hcl" data-lang="hcl">&lt;span class="line">&lt;span class="ln"> 1&lt;/span>&lt;span class="cl">&lt;span class="k">data&lt;/span> &lt;span class="s2">&amp;#34;aws_iam_policy_document&amp;#34; &amp;#34;ci_trust&amp;#34;&lt;/span> {
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 2&lt;/span>&lt;span class="cl"> &lt;span class="k">statement&lt;/span> {
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 3&lt;/span>&lt;span class="cl">&lt;span class="n"> actions&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="p">[&lt;/span>&lt;span class="s2">&amp;#34;sts:AssumeRoleWithWebIdentity&amp;#34;&lt;/span>&lt;span class="p">]&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 4&lt;/span>&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 5&lt;/span>&lt;span class="cl"> &lt;span class="k">principals&lt;/span> {
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 6&lt;/span>&lt;span class="cl">&lt;span class="n"> type&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="s2">&amp;#34;Federated&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 7&lt;/span>&lt;span class="cl">&lt;span class="n"> identifiers&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="p">[&lt;/span>&lt;span class="k">aws_iam_openid_connect_provider&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="k">github&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="k">arn&lt;/span>&lt;span class="p">]&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 8&lt;/span>&lt;span class="cl"> }
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 9&lt;/span>&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">10&lt;/span>&lt;span class="cl"> &lt;span class="k">condition&lt;/span> {
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">11&lt;/span>&lt;span class="cl">&lt;span class="n"> test&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="s2">&amp;#34;StringEquals&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">12&lt;/span>&lt;span class="cl">&lt;span class="n"> variable&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="s2">&amp;#34;token.actions.githubusercontent.com:aud&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">13&lt;/span>&lt;span class="cl">&lt;span class="n"> values&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="p">[&lt;/span>&lt;span class="s2">&amp;#34;sts.amazonaws.com&amp;#34;&lt;/span>&lt;span class="p">]&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">14&lt;/span>&lt;span class="cl"> }
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">15&lt;/span>&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">16&lt;/span>&lt;span class="cl"> &lt;span class="k">condition&lt;/span> {
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">17&lt;/span>&lt;span class="cl">&lt;span class="n"> test&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="s2">&amp;#34;StringLike&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">18&lt;/span>&lt;span class="cl">&lt;span class="n"> variable&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="s2">&amp;#34;token.actions.githubusercontent.com:sub&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">19&lt;/span>&lt;span class="cl">&lt;span class="n"> values&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="p">[&lt;/span>&lt;span class="s2">&amp;#34;repo:my-org/my-app:ref:refs/heads/main&amp;#34;&lt;/span>&lt;span class="p">]&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">20&lt;/span>&lt;span class="cl"> }
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">21&lt;/span>&lt;span class="cl"> }
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">22&lt;/span>&lt;span class="cl">}&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>兩個 condition 各守一個邊界。&lt;code>aud&lt;/code> 驗證 audience 對不對（防止其他用途的 token 被拿來 assume）。&lt;code>sub&lt;/code> 驗證請求來自哪個 repo 和 branch——這是最關鍵的收斂點。&lt;/p></description><content:encoded><![CDATA[<p>OIDC 聯合讓 CI/CD pipeline 用短期 token 取代長期 access key 存取雲端資源。設定本身不複雜，但 trust policy 的 claim 條件寫錯一個字就會變成「任何 repo 都能假扮這個 role」或「完全無法 assume」。本篇是 GitHub Actions 與 AWS 之間的 OIDC 聯合的完整設定步驟，從建立 provider 到 trust policy 設計到測試驗證。其他 CI 平台（GitLab CI、CircleCI）的原理相同，差別只在 issuer URL 和 claim 結構：</p>
<table>
  <thead>
      <tr>
          <th>平台</th>
          <th>Issuer URL</th>
          <th>sub claim 格式範例</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>GitHub Actions</td>
          <td><code>token.actions.githubusercontent.com</code></td>
          <td><code>repo:{org}/{repo}:ref:refs/heads/{branch}</code></td>
      </tr>
      <tr>
          <td>GitLab CI</td>
          <td><code>gitlab.com</code></td>
          <td><code>project_path:{group}/{project}:ref_type:branch:ref:main</code></td>
      </tr>
      <tr>
          <td>CircleCI</td>
          <td><code>oidc.circleci.com/org/{org-id}</code></td>
          <td><code>org/{org-id}/project/{project-id}/user/{user-id}</code></td>
      </tr>
  </tbody>
</table>
<p>本篇以 GitHub Actions 為主，其他平台替換 issuer URL 和 sub condition 即可。</p>
<h2 id="建立-oidc-provider">建立 OIDC Provider</h2>
<p>OIDC provider 是 AWS 帳號裡的一個資源，聲明「我信任這個外部 identity provider 簽發的 token」。GitHub Actions 的 OIDC issuer URL 是固定的，每個 AWS 帳號只需要建一個 provider。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-hcl" data-lang="hcl"><span class="line"><span class="ln">1</span><span class="cl"><span class="k">resource</span> <span class="s2">&#34;aws_iam_openid_connect_provider&#34; &#34;github&#34;</span> {
</span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="n">  url</span>             <span class="o">=</span> <span class="s2">&#34;https://token.actions.githubusercontent.com&#34;</span>
</span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="n">  client_id_list</span>  <span class="o">=</span> <span class="p">[</span><span class="s2">&#34;sts.amazonaws.com&#34;</span><span class="p">]</span>
</span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="n">  thumbprint_list</span> <span class="o">=</span> <span class="p">[</span><span class="s2">&#34;ffffffffffffffffffffffffffffffffffffffff&#34;</span><span class="p">]</span>
</span></span><span class="line"><span class="ln">5</span><span class="cl">}</span></span></code></pre></div><p><code>client_id_list</code> 設為 <code>sts.amazonaws.com</code> 是 GitHub 官方建議的 audience 值。<code>thumbprint_list</code> 在 2023 年之後 AWS 不再用它驗證 GitHub 的憑證鏈（改用 AWS 自己維護的根憑證清單），但欄位仍然是必填，填 40 個 <code>f</code> 作為佔位值即可。</p>
<p>這個 provider 建一次就好。多個 role 可以共用同一個 provider，差別在各自的 trust policy 怎麼寫。</p>
<h2 id="trust-policy-設計claim-收斂">Trust Policy 設計：claim 收斂</h2>
<p>Trust policy 決定「誰能假扮這個 role」。OIDC token 裡帶有多個 claim（描述「這是哪個 repo、哪個 branch、哪個 workflow 在跑」），trust policy 用 condition 比對這些 claim，全部命中才允許 assume。</p>
<h3 id="最小可行的-trust-policy">最小可行的 trust policy</h3>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-hcl" data-lang="hcl"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="k">data</span> <span class="s2">&#34;aws_iam_policy_document&#34; &#34;ci_trust&#34;</span> {
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">  <span class="k">statement</span> {
</span></span><span class="line"><span class="ln"> 3</span><span class="cl"><span class="n">    actions</span> <span class="o">=</span> <span class="p">[</span><span class="s2">&#34;sts:AssumeRoleWithWebIdentity&#34;</span><span class="p">]</span>
</span></span><span class="line"><span class="ln"> 4</span><span class="cl">
</span></span><span class="line"><span class="ln"> 5</span><span class="cl">    <span class="k">principals</span> {
</span></span><span class="line"><span class="ln"> 6</span><span class="cl"><span class="n">      type</span>        <span class="o">=</span> <span class="s2">&#34;Federated&#34;</span>
</span></span><span class="line"><span class="ln"> 7</span><span class="cl"><span class="n">      identifiers</span> <span class="o">=</span> <span class="p">[</span><span class="k">aws_iam_openid_connect_provider</span><span class="p">.</span><span class="k">github</span><span class="p">.</span><span class="k">arn</span><span class="p">]</span>
</span></span><span class="line"><span class="ln"> 8</span><span class="cl">    }
</span></span><span class="line"><span class="ln"> 9</span><span class="cl">
</span></span><span class="line"><span class="ln">10</span><span class="cl">    <span class="k">condition</span> {
</span></span><span class="line"><span class="ln">11</span><span class="cl"><span class="n">      test</span>     <span class="o">=</span> <span class="s2">&#34;StringEquals&#34;</span>
</span></span><span class="line"><span class="ln">12</span><span class="cl"><span class="n">      variable</span> <span class="o">=</span> <span class="s2">&#34;token.actions.githubusercontent.com:aud&#34;</span>
</span></span><span class="line"><span class="ln">13</span><span class="cl"><span class="n">      values</span>   <span class="o">=</span> <span class="p">[</span><span class="s2">&#34;sts.amazonaws.com&#34;</span><span class="p">]</span>
</span></span><span class="line"><span class="ln">14</span><span class="cl">    }
</span></span><span class="line"><span class="ln">15</span><span class="cl">
</span></span><span class="line"><span class="ln">16</span><span class="cl">    <span class="k">condition</span> {
</span></span><span class="line"><span class="ln">17</span><span class="cl"><span class="n">      test</span>     <span class="o">=</span> <span class="s2">&#34;StringLike&#34;</span>
</span></span><span class="line"><span class="ln">18</span><span class="cl"><span class="n">      variable</span> <span class="o">=</span> <span class="s2">&#34;token.actions.githubusercontent.com:sub&#34;</span>
</span></span><span class="line"><span class="ln">19</span><span class="cl"><span class="n">      values</span>   <span class="o">=</span> <span class="p">[</span><span class="s2">&#34;repo:my-org/my-app:ref:refs/heads/main&#34;</span><span class="p">]</span>
</span></span><span class="line"><span class="ln">20</span><span class="cl">    }
</span></span><span class="line"><span class="ln">21</span><span class="cl">  }
</span></span><span class="line"><span class="ln">22</span><span class="cl">}</span></span></code></pre></div><p>兩個 condition 各守一個邊界。<code>aud</code> 驗證 audience 對不對（防止其他用途的 token 被拿來 assume）。<code>sub</code> 驗證請求來自哪個 repo 和 branch——這是最關鍵的收斂點。</p>
<h3 id="sub-claim-的結構">sub claim 的結構</h3>
<p>GitHub Actions 的 <code>sub</code> claim 格式是 <code>repo:{owner}/{repo}:{context}</code>，其中 context 隨觸發方式不同：</p>
<table>
  <thead>
      <tr>
          <th>觸發方式</th>
          <th>sub claim 值</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>push to branch</td>
          <td><code>repo:my-org/my-app:ref:refs/heads/main</code></td>
      </tr>
      <tr>
          <td>pull request</td>
          <td><code>repo:my-org/my-app:pull_request</code></td>
      </tr>
      <tr>
          <td>environment deploy</td>
          <td><code>repo:my-org/my-app:environment:production</code></td>
      </tr>
      <tr>
          <td>tag push</td>
          <td><code>repo:my-org/my-app:ref:refs/tags/v1.0.0</code></td>
      </tr>
      <tr>
          <td>manual dispatch</td>
          <td><code>repo:my-org/my-app:ref:refs/heads/main</code></td>
      </tr>
  </tbody>
</table>
<p>Trust policy 的 <code>sub</code> condition 要根據實際需要選擇收斂到哪個層級。只允許 main branch 的 push 就寫 <code>repo:my-org/my-app:ref:refs/heads/main</code>；只允許 production environment 的 deploy 就寫 <code>repo:my-org/my-app:environment:production</code>。</p>
<h3 id="environment-based-收斂推薦">environment-based 收斂（推薦）</h3>
<p>GitHub Actions 的 environment 功能讓 <code>sub</code> claim 帶上 environment 名稱。搭配 environment protection rules（required reviewers、wait timer），可以在 trust policy 層和 GitHub 層各設一道 gate：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-hcl" data-lang="hcl"><span class="line"><span class="ln">1</span><span class="cl"><span class="k">condition</span> {
</span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="n">  test</span>     <span class="o">=</span> <span class="s2">&#34;StringEquals&#34;</span>
</span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="n">  variable</span> <span class="o">=</span> <span class="s2">&#34;token.actions.githubusercontent.com:sub&#34;</span>
</span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="n">  values</span>   <span class="o">=</span> <span class="p">[</span><span class="s2">&#34;repo:my-org/my-app:environment:production&#34;</span><span class="p">]</span>
</span></span><span class="line"><span class="ln">5</span><span class="cl">}</span></span></code></pre></div><p>Workflow 裡對應的設定：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"><span class="line"><span class="ln">1</span><span class="cl"><span class="nt">jobs</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="w">  </span><span class="nt">apply</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="w">    </span><span class="nt">environment</span><span class="p">:</span><span class="w"> </span><span class="l">production</span><span class="w">
</span></span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="w">    </span><span class="nt">permissions</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln">5</span><span class="cl"><span class="w">      </span><span class="nt">id-token</span><span class="p">:</span><span class="w"> </span><span class="l">write</span><span class="w">
</span></span></span><span class="line"><span class="ln">6</span><span class="cl"><span class="w">      </span><span class="nt">contents</span><span class="p">:</span><span class="w"> </span><span class="l">read</span></span></span></code></pre></div><p>只有 workflow 宣告了 <code>environment: production</code> 且通過 environment 的 protection rules 後，runner 拿到的 token 才會帶上 <code>environment:production</code> 的 sub claim，才能 assume 這個 role。</p>
<h2 id="plan-role-與-apply-role-分離">Plan Role 與 Apply Role 分離</h2>
<p>把 plan 和 apply 拆成兩個 role，各自給最小權限。plan 只需要 read 權限（讀 state、讀雲端現況），apply 需要 write 權限（建立/修改/刪除資源）。分離的好處是 PR 階段的 plan 即使被攻破，攻擊者也只能讀不能改。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-hcl" data-lang="hcl"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="k">resource</span> <span class="s2">&#34;aws_iam_role&#34; &#34;infra_plan&#34;</span> {
</span></span><span class="line"><span class="ln"> 2</span><span class="cl"><span class="n">  name</span>               <span class="o">=</span> <span class="s2">&#34;infra-plan&#34;</span>
</span></span><span class="line"><span class="ln"> 3</span><span class="cl"><span class="n">  assume_role_policy</span> <span class="o">=</span> <span class="k">data</span><span class="p">.</span><span class="k">aws_iam_policy_document</span><span class="p">.</span><span class="k">plan_trust</span><span class="p">.</span><span class="k">json</span>
</span></span><span class="line"><span class="ln"> 4</span><span class="cl">}
</span></span><span class="line"><span class="ln"> 5</span><span class="cl">
</span></span><span class="line"><span class="ln"> 6</span><span class="cl"><span class="k">resource</span> <span class="s2">&#34;aws_iam_role&#34; &#34;infra_apply&#34;</span> {
</span></span><span class="line"><span class="ln"> 7</span><span class="cl"><span class="n">  name</span>               <span class="o">=</span> <span class="s2">&#34;infra-apply&#34;</span>
</span></span><span class="line"><span class="ln"> 8</span><span class="cl"><span class="n">  assume_role_policy</span> <span class="o">=</span> <span class="k">data</span><span class="p">.</span><span class="k">aws_iam_policy_document</span><span class="p">.</span><span class="k">apply_trust</span><span class="p">.</span><span class="k">json</span>
</span></span><span class="line"><span class="ln"> 9</span><span class="cl">}
</span></span><span class="line"><span class="ln">10</span><span class="cl">
</span></span><span class="line"><span class="ln">11</span><span class="cl"><span class="k">resource</span> <span class="s2">&#34;aws_iam_role_policy_attachment&#34; &#34;plan_readonly&#34;</span> {
</span></span><span class="line"><span class="ln">12</span><span class="cl"><span class="n">  role</span>       <span class="o">=</span> <span class="k">aws_iam_role</span><span class="p">.</span><span class="k">infra_plan</span><span class="p">.</span><span class="k">name</span>
</span></span><span class="line"><span class="ln">13</span><span class="cl"><span class="n">  policy_arn</span> <span class="o">=</span> <span class="s2">&#34;arn:aws:iam::aws:policy/ReadOnlyAccess&#34;</span>
</span></span><span class="line"><span class="ln">14</span><span class="cl">}</span></span></code></pre></div><p>Trust policy 的差異：plan role 允許任何 branch 的 PR 觸發（<code>repo:my-org/my-app:pull_request</code>）；apply role 只允許 main branch 或 production environment（<code>repo:my-org/my-app:environment:production</code>）。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="nt">jobs</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 2</span><span class="cl"><span class="w">  </span><span class="nt">plan</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 3</span><span class="cl"><span class="w">    </span><span class="nt">if</span><span class="p">:</span><span class="w"> </span><span class="l">github.event_name == &#39;pull_request&#39;</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 4</span><span class="cl"><span class="w">    </span><span class="nt">permissions</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 5</span><span class="cl"><span class="w">      </span><span class="nt">id-token</span><span class="p">:</span><span class="w"> </span><span class="l">write</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 6</span><span class="cl"><span class="w">      </span><span class="nt">contents</span><span class="p">:</span><span class="w"> </span><span class="l">read</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 7</span><span class="cl"><span class="w">      </span><span class="nt">pull-requests</span><span class="p">:</span><span class="w"> </span><span class="l">write</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 8</span><span class="cl"><span class="w">    </span><span class="nt">steps</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 9</span><span class="cl"><span class="w">      </span>- <span class="nt">uses</span><span class="p">:</span><span class="w"> </span><span class="l">aws-actions/configure-aws-credentials@v4</span><span class="w">
</span></span></span><span class="line"><span class="ln">10</span><span class="cl"><span class="w">        </span><span class="nt">with</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln">11</span><span class="cl"><span class="w">          </span><span class="nt">role-to-assume</span><span class="p">:</span><span class="w"> </span><span class="l">arn:aws:iam::123456789012:role/infra-plan</span><span class="w">
</span></span></span><span class="line"><span class="ln">12</span><span class="cl"><span class="w">          </span><span class="nt">aws-region</span><span class="p">:</span><span class="w"> </span><span class="l">ap-northeast-1</span><span class="w">
</span></span></span><span class="line"><span class="ln">13</span><span class="cl"><span class="w">      </span>- <span class="nt">run</span><span class="p">:</span><span class="w"> </span><span class="l">terraform plan -out=plan.tfplan</span><span class="w">
</span></span></span><span class="line"><span class="ln">14</span><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="ln">15</span><span class="cl"><span class="w">  </span><span class="nt">apply</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln">16</span><span class="cl"><span class="w">    </span><span class="nt">if</span><span class="p">:</span><span class="w"> </span><span class="l">github.ref == &#39;refs/heads/main&#39;</span><span class="w">
</span></span></span><span class="line"><span class="ln">17</span><span class="cl"><span class="w">    </span><span class="nt">environment</span><span class="p">:</span><span class="w"> </span><span class="l">production</span><span class="w">
</span></span></span><span class="line"><span class="ln">18</span><span class="cl"><span class="w">    </span><span class="nt">permissions</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln">19</span><span class="cl"><span class="w">      </span><span class="nt">id-token</span><span class="p">:</span><span class="w"> </span><span class="l">write</span><span class="w">
</span></span></span><span class="line"><span class="ln">20</span><span class="cl"><span class="w">      </span><span class="nt">contents</span><span class="p">:</span><span class="w"> </span><span class="l">read</span><span class="w">
</span></span></span><span class="line"><span class="ln">21</span><span class="cl"><span class="w">    </span><span class="nt">steps</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln">22</span><span class="cl"><span class="w">      </span>- <span class="nt">uses</span><span class="p">:</span><span class="w"> </span><span class="l">aws-actions/configure-aws-credentials@v4</span><span class="w">
</span></span></span><span class="line"><span class="ln">23</span><span class="cl"><span class="w">        </span><span class="nt">with</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln">24</span><span class="cl"><span class="w">          </span><span class="nt">role-to-assume</span><span class="p">:</span><span class="w"> </span><span class="l">arn:aws:iam::123456789012:role/infra-apply</span><span class="w">
</span></span></span><span class="line"><span class="ln">25</span><span class="cl"><span class="w">          </span><span class="nt">aws-region</span><span class="p">:</span><span class="w"> </span><span class="l">ap-northeast-1</span><span class="w">
</span></span></span><span class="line"><span class="ln">26</span><span class="cl"><span class="w">      </span>- <span class="nt">run</span><span class="p">:</span><span class="w"> </span><span class="l">terraform apply -auto-approve</span></span></span></code></pre></div><h2 id="常見設定錯誤">常見設定錯誤</h2>
<h3 id="audience-不匹配">audience 不匹配</h3>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="ln">1</span><span class="cl">Error: Not authorized to perform sts:AssumeRoleWithWebIdentity</span></span></code></pre></div><p>最常見的原因是 trust policy 的 <code>aud</code> condition 值跟 OIDC provider 的 <code>client_id_list</code> 不一致。兩者都要是 <code>sts.amazonaws.com</code>。如果用了舊版的 <code>configure-aws-credentials</code> action（v1），它預設用 <code>sigstore</code> 作為 audience，跟 <code>sts.amazonaws.com</code> 對不上。確認 action 版本是 v4+。</p>
<h3 id="sub-condition-太寬">sub condition 太寬</h3>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-hcl" data-lang="hcl"><span class="line"><span class="ln">1</span><span class="cl"><span class="k">condition</span> {
</span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="n">  test</span>     <span class="o">=</span> <span class="s2">&#34;StringLike&#34;</span>
</span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="n">  variable</span> <span class="o">=</span> <span class="s2">&#34;token.actions.githubusercontent.com:sub&#34;</span>
</span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="n">  values</span>   <span class="o">=</span> <span class="p">[</span><span class="s2">&#34;repo:my-org/*&#34;</span><span class="p">]</span>
</span></span><span class="line"><span class="ln">5</span><span class="cl">}</span></span></code></pre></div><p>這允許 <code>my-org</code> 底下任何 repo 的任何 branch assume 這個 role。如果組織裡有公開 repo 或 fork 權限寬鬆的 repo，攻擊者可以在那些 repo 裡觸發 workflow 來 assume 生產環境的 role。至少收斂到 repo 層級（<code>repo:my-org/my-app:*</code>），生產環境收斂到 branch 或 environment。</p>
<h3 id="sub-condition-太緊">sub condition 太緊</h3>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-hcl" data-lang="hcl"><span class="line"><span class="ln">1</span><span class="cl"><span class="k">condition</span> {
</span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="n">  test</span>     <span class="o">=</span> <span class="s2">&#34;StringEquals&#34;</span>
</span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="n">  variable</span> <span class="o">=</span> <span class="s2">&#34;token.actions.githubusercontent.com:sub&#34;</span>
</span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="n">  values</span>   <span class="o">=</span> <span class="p">[</span><span class="s2">&#34;repo:my-org/my-app:ref:refs/heads/main&#34;</span><span class="p">]</span>
</span></span><span class="line"><span class="ln">5</span><span class="cl">}</span></span></code></pre></div><p>這只允許 push to main 觸發的 workflow。PR 觸發的 workflow 拿到的 sub 是 <code>repo:my-org/my-app:pull_request</code>，跟這個 condition 不匹配，plan 階段會失敗。如果 plan 需要在 PR 階段跑，plan role 的 trust policy 要加 PR 的 sub pattern。</p>
<h3 id="忘記設-permissions">忘記設 permissions</h3>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"><span class="line"><span class="ln">1</span><span class="cl"><span class="nt">jobs</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="w">  </span><span class="nt">deploy</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="w">    </span><span class="c"># 缺少 permissions 區塊</span><span class="w">
</span></span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="w">    </span><span class="nt">steps</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln">5</span><span class="cl"><span class="w">      </span>- <span class="nt">uses</span><span class="p">:</span><span class="w"> </span><span class="l">aws-actions/configure-aws-credentials@v4</span></span></span></code></pre></div><p>GitHub Actions 的 OIDC token 只有在 workflow 宣告 <code>permissions: { id-token: write }</code> 時才會簽發。缺了這一行，<code>configure-aws-credentials</code> 拿不到 token，報「OIDC token not available」。這個錯誤訊息不直觀——它說的是 token 不存在，不是權限不夠。</p>
<h3 id="多帳號時忘記指定-provider">多帳號時忘記指定 provider</h3>
<p>如果組織有多個 AWS 帳號，每個帳號都要各自建 OIDC provider。trust policy 的 <code>Federated</code> principal 要指向本帳號的 provider ARN，不能跨帳號引用。跨帳號部署時，workflow 用不同的 <code>role-to-assume</code> 切換帳號，每個帳號的 role 各自信任同一個 GitHub OIDC issuer 但是各自獨立的 provider 資源。</p>
<h2 id="測試與驗證">測試與驗證</h2>
<p>設定完成後的驗證步驟：</p>
<ol>
<li><strong>手動觸發 workflow</strong>：push 一個無害的 commit 到 main、開一個 test PR，觀察 <code>configure-aws-credentials</code> 步驟是否成功</li>
<li><strong>檢查 CloudTrail</strong>：搜尋 <code>AssumeRoleWithWebIdentity</code> 事件，確認 source identity 和 assumed role 正確</li>
<li><strong>反向驗證</strong>：從一個不在 trust policy 允許範圍的 repo 或 branch 觸發 workflow，確認 assume 被拒絕</li>
<li><strong>權限範圍驗證</strong>：在 plan job 裡嘗試一個 write 操作（如 <code>aws s3 rm</code>），確認被拒絕——驗證 plan role 的 read-only 限制確實生效</li>
</ol>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl"><span class="c1"># 在 CloudTrail 搜尋 OIDC assume 事件</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl">aws cloudtrail lookup-events <span class="se">\
</span></span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="se"></span>  --lookup-attributes <span class="nv">AttributeKey</span><span class="o">=</span>EventName,AttributeValue<span class="o">=</span>AssumeRoleWithWebIdentity <span class="se">\
</span></span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="se"></span>  --max-items <span class="m">5</span></span></span></code></pre></div><p>驗證通過後，這套 OIDC 設定就取代了所有存放在 CI 環境變數裡的 access key。原有的 key 可以排程停用和刪除，排程的節奏見<a href="/blog/infra/02-identity-credentials/access-key-rotation-playbook/" data-link-title="Access Key 輪替手冊" data-link-desc="從 credential report 盤點散落的長期 access key，到逐把輪替、自動化輪替與 key age 監控的完整操作步驟">access key 輪替</a>。trust policy 的持續維護重點是：新增 repo 時 sub condition 要同步更新、組織改名時 issuer 的 repo 路徑要全面修正。</p>
<p>時程參考：OIDC provider 建立 + trust policy 設計 + workflow 驗證約需 1-2 小時。OIDC provider 與 IAM role 本身不產生額外費用。</p>
<h2 id="跨分類引用">跨分類引用</h2>
<ul>
<li>→ <a href="/blog/infra/02-identity-credentials/iam-oidc-privilege-boundary/" data-link-title="身分與憑證地基 — IAM 模型、OIDC 短期憑證與權限邊界設計" data-link-desc="IAM 的 identity / policy / role 三元件、最小權限的持續收斂、用 OIDC 取代長期 access key，以及 SCP 與 Permissions Boundary 的環境隔離">身分與憑證地基</a>：OIDC 的概念基礎與權限邊界設計</li>
<li>→ <a href="/blog/infra/07-infra-as-pr/plan-review-apply-guardrails/" data-link-title="infra 走 PR 流程與自動化護欄" data-link-desc="infra 變更走 PR → plan → review diff → 合併 → apply，配 fmt / validate / tflint / checkov / tfsec 與 Atlantis 自動化，讓基礎設施可審查、可回溯、可交接">infra 走 PR 流程</a>：plan/apply 的 CI pipeline 怎麼用這裡設定好的 role</li>
<li>→ <a href="/blog/infra/02-identity-credentials/multi-account-strategy/" data-link-title="跨帳號策略 — Organizations、SCP 與帳號工廠" data-link-desc="用 AWS Organizations 把環境拆成獨立帳號、用 SCP 設定連管理員都越不過的護欄、用帳號工廠讓每個新帳號自帶安全基線">跨帳號策略</a>：多帳號環境下的 OIDC provider 配置</li>
</ul>
]]></content:encoded></item><item><title>Jenkins → GitHub Actions：Pipeline 5 段 lifecycle 的對位 + 翻譯</title><link>https://tarrragon.github.io/blog/backend/06-reliability/vendors/github-actions/migrate-from-jenkins/</link><pubDate>Tue, 19 May 2026 00:00:00 +0000</pubDate><guid>https://tarrragon.github.io/blog/backend/06-reliability/vendors/github-actions/migrate-from-jenkins/</guid><description>&lt;blockquote>
&lt;p>本文是跨 vendor migration playbook、cross-link &lt;a href="https://www.jenkins.io/">Jenkins&lt;/a> 跟 &lt;a href="https://tarrragon.github.io/blog/backend/06-reliability/vendors/github-actions/" data-link-title="GitHub Actions" data-link-desc="GitHub 原生 CI/CD、PR check、deploy gate">GitHub Actions&lt;/a>。跑 &lt;a href="https://tarrragon.github.io/blog/posts/migration-playbook-%E6%96%B9%E6%B3%95%E8%AB%96%E7%9A%84%E6%BC%94%E5%8C%96%E7%B4%80%E9%8C%84stage-0-variant-%E8%A6%8F%E5%8A%83%E6%8A%8A-collapse-%E7%8E%87%E5%BE%9E-60-%E9%99%8D%E5%88%B0-0/" data-link-title="Migration Playbook 方法論的演化紀錄：Stage 0 variant 規劃把 collapse 率從 60% 降到 0%" data-link-desc="跨 vendor migration playbook 需要獨立寫作方法論的依據，以及這套方法論從三輪 batch dogfood 中演化出來的驗證證據。">migration-playbook-methodology 6 維 audit&lt;/a> 後對映 &lt;em>Schema = High（Groovy DSL ↔ YAML workflow）→ Type A phased translation&lt;/em>。&lt;/p>&lt;/blockquote>
&lt;h2 id="pipeline-5-段-lifecycle-的對位--翻譯">Pipeline 5 段 lifecycle 的對位 + 翻譯&lt;/h2>
&lt;p>本文按 &lt;em>pipeline lifecycle 5 段&lt;/em> 組織內容（variant E）— 不是「為什麼遷」driver 開頭，是 &lt;em>Jenkins vs GHA 對 5 段各自的處理&lt;/em>：&lt;/p>
&lt;table>
 &lt;thead>
 &lt;tr>
 &lt;th>Lifecycle 段&lt;/th>
 &lt;th>Jenkins 機制&lt;/th>
 &lt;th>GHA 機制&lt;/th>
 &lt;/tr>
 &lt;/thead>
 &lt;tbody>
 &lt;tr>
 &lt;td>1. Source / SCM&lt;/td>
 &lt;td>SCM polling / webhook trigger&lt;/td>
 &lt;td>&lt;code>on: [push, pull_request]&lt;/code> event&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>2. Build / Package&lt;/td>
 &lt;td>&lt;code>stage('Build') { sh 'mvn package' }&lt;/code>&lt;/td>
 &lt;td>&lt;code>jobs.build.steps[].run: mvn package&lt;/code>&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>3. Test / 並行 matrix&lt;/td>
 &lt;td>&lt;code>parallel { ... }&lt;/code> + agents&lt;/td>
 &lt;td>&lt;code>jobs.test.strategy.matrix: ...&lt;/code>&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>4. Security scan&lt;/td>
 &lt;td>Plugin（Snyk / SonarQube / Aqua）&lt;/td>
 &lt;td>Action（snyk/actions / sonarsource-actions）&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>5. Deploy / promote&lt;/td>
 &lt;td>Deploy plugin + approval gate&lt;/td>
 &lt;td>&lt;code>environment: production&lt;/code> + reviewer approval&lt;/td>
 &lt;/tr>
 &lt;/tbody>
&lt;/table>
&lt;p>跑 &lt;a href="https://tarrragon.github.io/blog/posts/migration-playbook-%E6%96%B9%E6%B3%95%E8%AB%96%E7%9A%84%E6%BC%94%E5%8C%96%E7%B4%80%E9%8C%84stage-0-variant-%E8%A6%8F%E5%8A%83%E6%8A%8A-collapse-%E7%8E%87%E5%BE%9E-60-%E9%99%8D%E5%88%B0-0/" data-link-title="Migration Playbook 方法論的演化紀錄：Stage 0 variant 規劃把 collapse 率從 60% 降到 0%" data-link-desc="跨 vendor migration playbook 需要獨立寫作方法論的依據，以及這套方法論從三輪 batch dogfood 中演化出來的驗證證據。">6 維 diff dimension audit&lt;/a>：&lt;/p>
&lt;table>
 &lt;thead>
 &lt;tr>
 &lt;th>維度&lt;/th>
 &lt;th>評估&lt;/th>
 &lt;th>等級&lt;/th>
 &lt;/tr>
 &lt;/thead>
 &lt;tbody>
 &lt;tr>
 &lt;td>Schema / API&lt;/td>
 &lt;td>Groovy DSL ↔ YAML、syntax 完全不同&lt;/td>
 &lt;td>&lt;strong>High&lt;/strong>&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>Operational model&lt;/td>
 &lt;td>Self-hosted Jenkins → GHA SaaS / self-hosted runners&lt;/td>
 &lt;td>Medium&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>Paradigm&lt;/td>
 &lt;td>Imperative pipeline → declarative workflow + events&lt;/td>
 &lt;td>Medium&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>Components&lt;/td>
 &lt;td>Jenkins + plugins → GHA + actions marketplace&lt;/td>
 &lt;td>Low&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>Application change&lt;/td>
 &lt;td>Build script 多數不改、CI integration 端要改&lt;/td>
 &lt;td>Low&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>Data topology&lt;/td>
 &lt;td>同單一 build state&lt;/td>
 &lt;td>Low&lt;/td>
 &lt;/tr>
 &lt;/tbody>
&lt;/table>
&lt;p>Schema = High（其他 Medium-Low）→ &lt;strong>Type A phased translation&lt;/strong> 為主、加 paradigm + operational 獨立段。&lt;/p></description><content:encoded><![CDATA[<blockquote>
<p>本文是跨 vendor migration playbook、cross-link <a href="https://www.jenkins.io/">Jenkins</a> 跟 <a href="/blog/backend/06-reliability/vendors/github-actions/" data-link-title="GitHub Actions" data-link-desc="GitHub 原生 CI/CD、PR check、deploy gate">GitHub Actions</a>。跑 <a href="/blog/posts/migration-playbook-%E6%96%B9%E6%B3%95%E8%AB%96%E7%9A%84%E6%BC%94%E5%8C%96%E7%B4%80%E9%8C%84stage-0-variant-%E8%A6%8F%E5%8A%83%E6%8A%8A-collapse-%E7%8E%87%E5%BE%9E-60-%E9%99%8D%E5%88%B0-0/" data-link-title="Migration Playbook 方法論的演化紀錄：Stage 0 variant 規劃把 collapse 率從 60% 降到 0%" data-link-desc="跨 vendor migration playbook 需要獨立寫作方法論的依據，以及這套方法論從三輪 batch dogfood 中演化出來的驗證證據。">migration-playbook-methodology 6 維 audit</a> 後對映 <em>Schema = High（Groovy DSL ↔ YAML workflow）→ Type A phased translation</em>。</p></blockquote>
<h2 id="pipeline-5-段-lifecycle-的對位--翻譯">Pipeline 5 段 lifecycle 的對位 + 翻譯</h2>
<p>本文按 <em>pipeline lifecycle 5 段</em> 組織內容（variant E）— 不是「為什麼遷」driver 開頭，是 <em>Jenkins vs GHA 對 5 段各自的處理</em>：</p>
<table>
  <thead>
      <tr>
          <th>Lifecycle 段</th>
          <th>Jenkins 機制</th>
          <th>GHA 機制</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>1. Source / SCM</td>
          <td>SCM polling / webhook trigger</td>
          <td><code>on: [push, pull_request]</code> event</td>
      </tr>
      <tr>
          <td>2. Build / Package</td>
          <td><code>stage('Build') { sh 'mvn package' }</code></td>
          <td><code>jobs.build.steps[].run: mvn package</code></td>
      </tr>
      <tr>
          <td>3. Test / 並行 matrix</td>
          <td><code>parallel { ... }</code> + agents</td>
          <td><code>jobs.test.strategy.matrix: ...</code></td>
      </tr>
      <tr>
          <td>4. Security scan</td>
          <td>Plugin（Snyk / SonarQube / Aqua）</td>
          <td>Action（snyk/actions / sonarsource-actions）</td>
      </tr>
      <tr>
          <td>5. Deploy / promote</td>
          <td>Deploy plugin + approval gate</td>
          <td><code>environment: production</code> + reviewer approval</td>
      </tr>
  </tbody>
</table>
<p>跑 <a href="/blog/posts/migration-playbook-%E6%96%B9%E6%B3%95%E8%AB%96%E7%9A%84%E6%BC%94%E5%8C%96%E7%B4%80%E9%8C%84stage-0-variant-%E8%A6%8F%E5%8A%83%E6%8A%8A-collapse-%E7%8E%87%E5%BE%9E-60-%E9%99%8D%E5%88%B0-0/" data-link-title="Migration Playbook 方法論的演化紀錄：Stage 0 variant 規劃把 collapse 率從 60% 降到 0%" data-link-desc="跨 vendor migration playbook 需要獨立寫作方法論的依據，以及這套方法論從三輪 batch dogfood 中演化出來的驗證證據。">6 維 diff dimension audit</a>：</p>
<table>
  <thead>
      <tr>
          <th>維度</th>
          <th>評估</th>
          <th>等級</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>Schema / API</td>
          <td>Groovy DSL ↔ YAML、syntax 完全不同</td>
          <td><strong>High</strong></td>
      </tr>
      <tr>
          <td>Operational model</td>
          <td>Self-hosted Jenkins → GHA SaaS / self-hosted runners</td>
          <td>Medium</td>
      </tr>
      <tr>
          <td>Paradigm</td>
          <td>Imperative pipeline → declarative workflow + events</td>
          <td>Medium</td>
      </tr>
      <tr>
          <td>Components</td>
          <td>Jenkins + plugins → GHA + actions marketplace</td>
          <td>Low</td>
      </tr>
      <tr>
          <td>Application change</td>
          <td>Build script 多數不改、CI integration 端要改</td>
          <td>Low</td>
      </tr>
      <tr>
          <td>Data topology</td>
          <td>同單一 build state</td>
          <td>Low</td>
      </tr>
  </tbody>
</table>
<p>Schema = High（其他 Medium-Low）→ <strong>Type A phased translation</strong> 為主、加 paradigm + operational 獨立段。</p>
<h2 id="為什麼遷cost--vendor--cloud-native-三條-driver">為什麼遷：cost / vendor / cloud-native 三條 driver</h2>
<ul>
<li><strong>Cost</strong>：Jenkins self-hosted 是「免費 software + 高 ops cost」、GHA 按 minute 計費對中小團隊更便宜</li>
<li><strong>Vendor consolidation</strong>：repository 已在 GitHub、整合進 GHA 省一個外部系統</li>
<li><strong>Cloud-native</strong>：GHA matrix build + reusable workflow 對 cloud-native deploy（K8s / serverless）有 first-class action</li>
</ul>
<h2 id="phase-0audit--classify">Phase 0：Audit + classify</h2>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="c1"># Jenkins workspace 盤點</span>
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">find . -name <span class="s2">&#34;Jenkinsfile&#34;</span> -o -name <span class="s2">&#34;*.groovy&#34;</span>
</span></span><span class="line"><span class="ln"> 3</span><span class="cl"><span class="c1"># 列所有 pipeline file</span>
</span></span><span class="line"><span class="ln"> 4</span><span class="cl">
</span></span><span class="line"><span class="ln"> 5</span><span class="cl"><span class="c1"># 統計 plugin 使用</span>
</span></span><span class="line"><span class="ln"> 6</span><span class="cl"><span class="c1"># Jenkinsfile 內 import / @Library / sh &#34;tool plugin...&#34;</span>
</span></span><span class="line"><span class="ln"> 7</span><span class="cl">grep -rE <span class="s2">&#34;@Library|import|tools\s*\{&#34;</span> Jenkinsfile*
</span></span><span class="line"><span class="ln"> 8</span><span class="cl">
</span></span><span class="line"><span class="ln"> 9</span><span class="cl"><span class="c1"># 每 pipeline 評估 complexity</span>
</span></span><span class="line"><span class="ln">10</span><span class="cl"><span class="c1"># - Simple linear pipeline: 1-3 stage、無 shared library</span>
</span></span><span class="line"><span class="ln">11</span><span class="cl"><span class="c1"># - Medium: parallel stage + 2-5 shared library</span>
</span></span><span class="line"><span class="ln">12</span><span class="cl"><span class="c1"># - Complex: 條件分支 + 動態 stage + 10+ plugin / 5+ shared library</span></span></span></code></pre></div><p>Audit output：</p>
<ul>
<li>列「100 個 pipeline、35 simple / 50 medium / 15 complex」</li>
<li>每 complexity level 估翻譯時間（simple 0.5 day / medium 2 day / complex 5-10 day）</li>
<li>Plugin 依賴清單對應 GHA action 替代品</li>
</ul>
<h2 id="phase-1schema-對位groovy-dsl--yaml">Phase 1：Schema 對位（Groovy DSL ↔ YAML）</h2>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-groovy" data-lang="groovy"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="c1">// Jenkins Declarative Pipeline
</span></span></span><span class="line"><span class="ln"> 2</span><span class="cl"><span class="c1"></span><span class="n">pipeline</span> <span class="o">{</span>
</span></span><span class="line"><span class="ln"> 3</span><span class="cl">  <span class="n">agent</span> <span class="o">{</span> <span class="n">label</span> <span class="s1">&#39;docker-build&#39;</span> <span class="o">}</span>
</span></span><span class="line"><span class="ln"> 4</span><span class="cl">  <span class="n">stages</span> <span class="o">{</span>
</span></span><span class="line"><span class="ln"> 5</span><span class="cl">    <span class="n">stage</span><span class="o">(</span><span class="s1">&#39;Test&#39;</span><span class="o">)</span> <span class="o">{</span>
</span></span><span class="line"><span class="ln"> 6</span><span class="cl">      <span class="n">parallel</span> <span class="o">{</span>
</span></span><span class="line"><span class="ln"> 7</span><span class="cl">        <span class="n">stage</span><span class="o">(</span><span class="s1">&#39;Unit&#39;</span><span class="o">)</span> <span class="o">{</span> <span class="n">steps</span> <span class="o">{</span> <span class="n">sh</span> <span class="s1">&#39;mvn test&#39;</span> <span class="o">}</span> <span class="o">}</span>
</span></span><span class="line"><span class="ln"> 8</span><span class="cl">        <span class="n">stage</span><span class="o">(</span><span class="s1">&#39;Integration&#39;</span><span class="o">)</span> <span class="o">{</span> <span class="n">steps</span> <span class="o">{</span> <span class="n">sh</span> <span class="s1">&#39;mvn verify&#39;</span> <span class="o">}</span> <span class="o">}</span>
</span></span><span class="line"><span class="ln"> 9</span><span class="cl">      <span class="o">}</span>
</span></span><span class="line"><span class="ln">10</span><span class="cl">    <span class="o">}</span>
</span></span><span class="line"><span class="ln">11</span><span class="cl">  <span class="o">}</span>
</span></span><span class="line"><span class="ln">12</span><span class="cl">  <span class="n">post</span> <span class="o">{</span>
</span></span><span class="line"><span class="ln">13</span><span class="cl">    <span class="n">failure</span> <span class="o">{</span> <span class="n">mail</span> <span class="nl">to:</span> <span class="s1">&#39;devops@&#39;</span><span class="o">,</span> <span class="nl">subject:</span> <span class="s1">&#39;Build failed&#39;</span> <span class="o">}</span>
</span></span><span class="line"><span class="ln">14</span><span class="cl">  <span class="o">}</span>
</span></span><span class="line"><span class="ln">15</span><span class="cl"><span class="o">}</span></span></span></code></pre></div>




<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="c"># GHA Workflow 對等</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 2</span><span class="cl"><span class="w"></span><span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l">CI</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 3</span><span class="cl"><span class="w"></span><span class="nt">on</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="l">push]</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 4</span><span class="cl"><span class="w"></span><span class="nt">jobs</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 5</span><span class="cl"><span class="w">  </span><span class="nt">test</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 6</span><span class="cl"><span class="w">    </span><span class="nt">runs-on</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="l">self-hosted, docker-build]</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 7</span><span class="cl"><span class="w">    </span><span class="nt">strategy</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 8</span><span class="cl"><span class="w">      </span><span class="nt">matrix</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 9</span><span class="cl"><span class="w">        </span><span class="nt">suite</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="l">unit, integration]</span><span class="w">
</span></span></span><span class="line"><span class="ln">10</span><span class="cl"><span class="w">    </span><span class="nt">steps</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln">11</span><span class="cl"><span class="w">      </span>- <span class="nt">uses</span><span class="p">:</span><span class="w"> </span><span class="l">actions/checkout@v4</span><span class="w">
</span></span></span><span class="line"><span class="ln">12</span><span class="cl"><span class="w">      </span>- <span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l">Run ${{ matrix.suite }}</span><span class="w">
</span></span></span><span class="line"><span class="ln">13</span><span class="cl"><span class="w">        </span><span class="nt">run</span><span class="p">:</span><span class="w"> </span><span class="p">|</span><span class="sd">
</span></span></span><span class="line"><span class="ln">14</span><span class="cl"><span class="sd">          case &#34;${{ matrix.suite }}&#34; in
</span></span></span><span class="line"><span class="ln">15</span><span class="cl"><span class="sd">            unit) mvn test ;;
</span></span></span><span class="line"><span class="ln">16</span><span class="cl"><span class="sd">            integration) mvn verify ;;
</span></span></span><span class="line"><span class="ln">17</span><span class="cl"><span class="sd">          esac</span><span class="w">
</span></span></span><span class="line"><span class="ln">18</span><span class="cl"><span class="w">  </span><span class="nt">notify-failure</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln">19</span><span class="cl"><span class="w">    </span><span class="nt">needs</span><span class="p">:</span><span class="w"> </span><span class="l">test</span><span class="w">
</span></span></span><span class="line"><span class="ln">20</span><span class="cl"><span class="w">    </span><span class="nt">if</span><span class="p">:</span><span class="w"> </span><span class="l">failure()</span><span class="w">
</span></span></span><span class="line"><span class="ln">21</span><span class="cl"><span class="w">    </span><span class="nt">runs-on</span><span class="p">:</span><span class="w"> </span><span class="l">ubuntu-latest</span><span class="w">
</span></span></span><span class="line"><span class="ln">22</span><span class="cl"><span class="w">    </span><span class="nt">steps</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln">23</span><span class="cl"><span class="w">      </span>- <span class="nt">uses</span><span class="p">:</span><span class="w"> </span><span class="l">dawidd6/action-send-mail@v3</span><span class="w">
</span></span></span><span class="line"><span class="ln">24</span><span class="cl"><span class="w">        </span><span class="nt">with</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln">25</span><span class="cl"><span class="w">          </span><span class="nt">to</span><span class="p">:</span><span class="w"> </span><span class="l">devops@</span><span class="w">
</span></span></span><span class="line"><span class="ln">26</span><span class="cl"><span class="w">          </span><span class="nt">subject</span><span class="p">:</span><span class="w"> </span><span class="l">Build failed</span></span></span></code></pre></div><p>對位差異：</p>
<ul>
<li><code>parallel { ... }</code> → <code>strategy.matrix</code>（粒度不同、matrix 是「同 step 不同參數」、parallel 是「不同 step」）</li>
<li><code>post.failure</code> → 獨立 job + <code>if: failure()</code></li>
<li><code>@Library</code> shared library → reusable workflow（<code>uses: ./.github/workflows/reusable.yml</code>）</li>
<li>Jenkins <code>tools { jdk 'java17' }</code> → setup-java action（手動配 toolchain）</li>
</ul>
<h2 id="phase-2translation-pipeline3-tier-hybrid">Phase 2：Translation pipeline（3-tier hybrid）</h2>
<p>對應 <a href="/blog/backend/07-security-data-protection/vendors/splunk/migrate-to-elastic-security/" data-link-title="Splunk → Elastic Security Detection Rule Migration：6 段 phased playbook 跟 5 大踩雷" data-link-desc="從 Splunk Enterprise Security 遷到 Elastic Security 的 detection rule translation playbook：SPL ↔ KQL/ES|QL schema 對位、AI-assisted translation pipeline、parallel run 比對、cutover routing、5 個 production 踩雷（macro 沒對應 / time zone 差異 / summary index 不對位 / alert dedup key 衝突 / 過早 decommission）、capacity / cost 對照">Splunk → Elastic translation</a> 同 3-tier：</p>
<ul>
<li><strong>Tier 1</strong>：community tool（jenkins-to-actions converter、cover 簡單 pipeline 30-50%）</li>
<li><strong>Tier 2</strong>：LLM-assisted（Claude / GPT 翻 medium complexity、人工 verify）</li>
<li><strong>Tier 3</strong>：manual（shared library 改 reusable workflow / conditional 動態 stage 重寫）</li>
</ul>
<h2 id="phase-3parallel-run雙-ci-跑-4-8-週">Phase 3：Parallel run（雙 CI 跑 4-8 週）</h2>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="ln">1</span><span class="cl">Repository ──┬─→ Jenkins webhook ──→ Jenkinsfile pipeline
</span></span><span class="line"><span class="ln">2</span><span class="cl">             └─→ GitHub Action ────→ .github/workflows/ci.yml
</span></span><span class="line"><span class="ln">3</span><span class="cl">
</span></span><span class="line"><span class="ln">4</span><span class="cl">Compare:
</span></span><span class="line"><span class="ln">5</span><span class="cl">- 同 commit 兩端結果一致
</span></span><span class="line"><span class="ln">6</span><span class="cl">- Latency / cost / artifact location 對齊</span></span></code></pre></div><p>Diff dashboard 列「test pass rate / build time / failure mode」三 metric、跑到 95%+ 一致才進 cutover。</p>
<h2 id="phase-4cutover--cleanup">Phase 4：Cutover + cleanup</h2>
<ul>
<li>Disable Jenkins webhook</li>
<li>GHA 成 primary CI</li>
<li>Jenkins 留 standby 2 週 fallback</li>
<li>Decommission Jenkins controller + agents</li>
</ul>
<h2 id="production-故障演練">Production 故障演練</h2>
<h3 id="case-1shared-library-equivalencereusable-workflow-表達不足">Case 1：Shared library equivalence、reusable workflow 表達不足</h3>
<p><strong>徵兆</strong>：複雜 Jenkins shared library（含 Groovy class / closure / 動態變數）翻成 reusable workflow 後失準、某些動態邏輯無法表達。</p>
<p><strong>根因</strong>：Jenkins Groovy 是 imperative + 完整 programming language；GHA reusable workflow 是 declarative YAML、limited expressiveness。</p>
<p><strong>修法</strong>：</p>
<ol>
<li><strong>複雜邏輯外包到 script</strong>：reusable workflow 只當 <em>orchestrator</em>、複雜邏輯放 <code>.github/scripts/*.sh</code> 或 <code>actions/javascript-action</code></li>
<li><strong>自定 composite action</strong>：multi-step logic 包進 composite action、reuse 程度比 reusable workflow 高</li>
<li><strong>退役過度設計的 shared library</strong>：trans 過程暴露 90% library code 其實只用 10%</li>
</ol>
<h3 id="case-2ephemeral-workspacebuild-cache-失敗">Case 2：Ephemeral workspace、build cache 失敗</h3>
<p><strong>徵兆</strong>：cutover 後 build time 從 5 分鐘漲到 20 分鐘；Maven / Gradle / node_modules / Docker layer 每次都重抓。</p>
<p><strong>根因</strong>：Jenkins agent workspace persistent、build cache 跨 build 保留；GHA ephemeral runner 每次新 VM、cache 預設沒帶。</p>
<p><strong>修法</strong>：</p>
<ol>
<li><strong><code>actions/cache@v4</code></strong>：cache key 用 <code>hashFiles('**/pom.xml')</code> 等 lock file、cross-build 復用</li>
<li><strong>Self-hosted runner with cache</strong>：critical pipeline 跑 self-hosted runner、persistent volume</li>
<li><strong>Docker layer cache</strong>：用 <code>docker/build-push-action</code> 配 BuildKit cache、不 rebuild full image</li>
</ol>
<h3 id="case-3plugin-不對等ci-feature-退化">Case 3：Plugin 不對等、CI feature 退化</h3>
<p><strong>徵兆</strong>：Jenkins 用 50+ plugin、GHA action marketplace 找不到對應；team 對 SonarQube quality gate / Jira integration / custom report 等失去 first-class 支援。</p>
<p><strong>根因</strong>：Jenkins plugin ecosystem 20+ 年累積、GHA marketplace 5 年；某些 niche plugin 在 GHA 沒對等 action。</p>
<p><strong>修法</strong>：</p>
<ol>
<li><strong>API-based integration</strong>：用 <code>curl</code> 對 vendor API 直接 call、不依賴 plugin / action</li>
<li><strong>自寫 action</strong>：critical feature 自寫 composite / JavaScript action、publish 到 marketplace</li>
<li><strong>退役舊 plugin</strong>：trans 期間 audit plugin 真實使用、80% 可退役</li>
</ol>
<h3 id="case-4self-hosted-runner-setup--scaling">Case 4：Self-hosted runner setup + scaling</h3>
<p><strong>徵兆</strong>：production workload 需要 GPU / large memory runner；GHA hosted runner spec 不夠、想用 self-hosted runner、發現 scaling / security / monitoring 比 Jenkins agent 複雜。</p>
<p><strong>根因</strong>：GHA self-hosted runner 是 ephemeral、scaling 需要 <em>runner controller</em>（actions-runner-controller on K8s）；跟 Jenkins agent / Kubernetes plugin 對應但 setup 不同。</p>
<p><strong>修法</strong>：</p>
<ol>
<li><strong>actions-runner-controller (ARC)</strong>：K8s-native runner scaling、跟 Jenkins K8s plugin 對應</li>
<li><strong>Runner labels</strong>：用 label 路由 job（<code>runs-on: [self-hosted, gpu, linux]</code>）</li>
<li><strong>Security</strong>：ephemeral runner 用 short-lived token、不跨 job persist secret</li>
</ol>
<h3 id="case-5matrix-build-vs-parallel-stage-表達差">Case 5：Matrix build vs parallel stage 表達差</h3>
<p><strong>徵兆</strong>：Jenkins 有 <em>動態 parallel</em>（runtime 決定要跑哪些 stage、按 input 變動）；GHA matrix 是 <em>static at workflow load time</em>、表達不到。</p>
<p><strong>根因</strong>：GHA matrix 是 declarative、workflow parse 時 expand；runtime 動態決定 stage 需要用 <code>if:</code> condition + 多 job。</p>
<p><strong>修法</strong>：</p>
<ol>
<li><strong>動態 matrix</strong>：用 <code>jobs.set-matrix</code> 先跑一個 job 算 matrix、輸出 JSON、後續 job <code>strategy.matrix: ${{ needs.set-matrix.outputs.matrix }}</code></li>
<li><strong>conditional job</strong>：每個 dynamic stage 寫獨立 job + <code>if:</code> 控制觸發</li>
<li><strong>重設計</strong>：90% 動態邏輯其實可改 static matrix + condition、純 runtime 動態通常是 over-engineering</li>
</ol>
<h2 id="capacity--cost">Capacity / cost</h2>
<table>
  <thead>
      <tr>
          <th>維度</th>
          <th>Self-managed Jenkins</th>
          <th>GitHub Actions</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>Compute cost</td>
          <td>EC2 + agent licenses</td>
          <td>per-minute billing（free tier + over-cap）</td>
      </tr>
      <tr>
          <td>Operational FTE</td>
          <td>0.5-1.5 FTE</td>
          <td>0.1-0.3 FTE</td>
      </tr>
      <tr>
          <td>Plugin / action ecosystem</td>
          <td>20+ 年成熟</td>
          <td>5 年快速成長</td>
      </tr>
      <tr>
          <td>Cold start</td>
          <td>Agent ready &lt; 1 min</td>
          <td>Hosted runner 30-60s spin-up</td>
      </tr>
      <tr>
          <td>Self-hosted scaling</td>
          <td>Jenkins K8s plugin</td>
          <td>ARC（actions-runner-controller）</td>
      </tr>
      <tr>
          <td>Security</td>
          <td>Self-managed VPC + secret</td>
          <td>OIDC + repository secret + environment</td>
      </tr>
      <tr>
          <td>Migration cost</td>
          <td>-</td>
          <td>1-3 FTE × 1-3 個月</td>
      </tr>
  </tbody>
</table>
<p><strong>判讀</strong>：100+ pipeline organization 切 GHA 通常 6-12 月 ROI 持平、之後省 ops cost；&lt; 30 pipeline 早就該切。</p>
<h2 id="整合--下一步">整合 / 下一步</h2>
<h3 id="跟-gitlab-ci-對位">跟 <a href="https://docs.gitlab.com/ee/ci/">GitLab CI</a> 對位</h3>
<p>GitLab CI YAML 語法跟 GHA 接近、shared library 對應 <code>include:</code>、self-hosted runner 對等；Jenkins → GitLab CI migration 流程跟本文鏡像對稱、3-tier translation pipeline 通用。</p>
<h3 id="跟-circle-ci-對位">跟 <a href="/blog/backend/06-reliability/vendors/circleci/" data-link-title="CircleCI" data-link-desc="CI/CD 平台、強 cache 與 parallelism">Circle CI</a> 對位</h3>
<p>CircleCI orb 對等 GHA composite action；跨 SaaS CI 切換比 Jenkins → GHA 簡單（都 YAML-based）。</p>
<h3 id="反向-migrationgha--jenkins">反向 migration（GHA → Jenkins）</h3>
<p>少數 enterprise（金融 / 政府）合規要求 self-hosted CI / on-prem；GHA → Jenkins 鏡像對稱、注意 Jenkins shared library 表達力更強、reusable workflow 內 dynamic 邏輯可不必拆。</p>
<h3 id="下一步議題">下一步議題</h3>
<ul>
<li><strong>Reusable workflow + composite action 混用</strong>：reusable workflow 適合 <em>跨 repo orchestration</em>、composite action 適合 <em>單 repo logic encapsulation</em></li>
<li><strong>OIDC + cloud deploy</strong>：用 OIDC token 取代 long-lived cloud credential、是 GHA migration 順便升級的機會</li>
<li><strong>Cost optimization</strong>：minute-based billing 對 high-volume CI 需要 monitoring + budget alert</li>
</ul>
<h2 id="相關連結">相關連結</h2>
<ul>
<li>Target vendor：<a href="/blog/backend/06-reliability/vendors/github-actions/" data-link-title="GitHub Actions" data-link-desc="GitHub 原生 CI/CD、PR check、deploy gate">GitHub Actions</a></li>
<li>平行 vendor：<a href="/blog/backend/06-reliability/vendors/circleci/" data-link-title="CircleCI" data-link-desc="CI/CD 平台、強 cache 與 parallelism">CircleCI</a></li>
<li>平行 migration playbook（Type A）：<a href="/blog/backend/07-security-data-protection/vendors/splunk/migrate-to-elastic-security/" data-link-title="Splunk → Elastic Security Detection Rule Migration：6 段 phased playbook 跟 5 大踩雷" data-link-desc="從 Splunk Enterprise Security 遷到 Elastic Security 的 detection rule translation playbook：SPL ↔ KQL/ES|QL schema 對位、AI-assisted translation pipeline、parallel run 比對、cutover routing、5 個 production 踩雷（macro 沒對應 / time zone 差異 / summary index 不對位 / alert dedup key 衝突 / 過早 decommission）、capacity / cost 對照">Splunk → Elastic Security</a> / <a href="/blog/backend/01-database/vendors/mysql/migrate-to-postgresql/" data-link-title="MySQL → PostgreSQL：從 SQL dialect diff 跑出來的 Type A 6-phase migration" data-link-desc="MySQL → PostgreSQL 是 Type A 高 schema 差 migration 的標準形態 — SQL dialect / collation / case sensitivity / replication 模型差異主導；用 pgloader / AWS DMS / 自管 dual-write 三條 path、5 個 production 踩雷（auto_increment vs SERIAL / charset 跟 collation / case sensitivity / index syntax / triggers）">MySQL → PostgreSQL</a></li>
<li>Methodology：<a href="/blog/posts/migration-playbook-%E6%96%B9%E6%B3%95%E8%AB%96%E7%9A%84%E6%BC%94%E5%8C%96%E7%B4%80%E9%8C%84stage-0-variant-%E8%A6%8F%E5%8A%83%E6%8A%8A-collapse-%E7%8E%87%E5%BE%9E-60-%E9%99%8D%E5%88%B0-0/" data-link-title="Migration Playbook 方法論的演化紀錄：Stage 0 variant 規劃把 collapse 率從 60% 降到 0%" data-link-desc="跨 vendor migration playbook 需要獨立寫作方法論的依據，以及這套方法論從三輪 batch dogfood 中演化出來的驗證證據。">Migration playbook methodology</a></li>
</ul>
]]></content:encoded></item><item><title>本 blog 專案部署</title><link>https://tarrragon.github.io/blog/ci/blog-project-deploy/</link><pubDate>Wed, 06 May 2026 00:00:00 +0000</pubDate><guid>https://tarrragon.github.io/blog/ci/blog-project-deploy/</guid><description>&lt;p>本 blog 專案部署是前端靜態站部署的一個具體案例。這個資料夾只記錄本專案實際使用的 Hugo、Pagefind、Playwright、GitHub Pages 與 Claude workflow，不把這些細節當成所有 CI/CD 場域的通用規則。&lt;/p>
&lt;h2 id="專案定位">專案定位&lt;/h2>
&lt;p>本專案的部署產物是靜態網站。Hugo 負責產生 HTML，Pagefind 負責產生搜尋索引，GitHub Pages 負責 hosting，Playwright 負責驗證搜尋與版面行為。&lt;/p>
&lt;table>
 &lt;thead>
 &lt;tr>
 &lt;th>文件&lt;/th>
 &lt;th>責任&lt;/th>
 &lt;/tr>
 &lt;/thead>
 &lt;tbody>
 &lt;tr>
 &lt;td>&lt;a href="github-actions-workflows/">GitHub Actions workflow&lt;/a>&lt;/td>
 &lt;td>記錄本專案 &lt;code>.github/workflows/&lt;/code> 的實際設定&lt;/td>
 &lt;/tr>
 &lt;/tbody>
&lt;/table>
&lt;h2 id="與通用-cicd-的關係">與通用 CI/CD 的關係&lt;/h2>
&lt;p>本資料夾是實例層。通用 gate 原理、不同部署場域差異與失敗處理流程放在上層文章；本資料夾只回答「這個 blog 專案現在怎麼部署、失敗時要看哪裡」。術語定義統一回連 &lt;a href="https://tarrragon.github.io/blog/ci/knowledge-cards/" data-link-title="Knowledge Cards" data-link-desc="用原子化卡片整理 CI/CD 章節的核心術語，讓流程文章專注在判讀與決策">CI 知識卡片&lt;/a>。&lt;/p>
&lt;h2 id="下一步路由">下一步路由&lt;/h2>
&lt;ul>
&lt;li>本專案 workflow：讀 &lt;a href="github-actions-workflows/">GitHub Actions workflow&lt;/a>。&lt;/li>
&lt;li>前端部署通用注意事項：讀 &lt;a href="../frontend-deploy/">前端部署 CI/CD&lt;/a>。&lt;/li>
&lt;li>CI gate 原理：讀 &lt;a href="../ci-gate-workflow-boundary/">CI gate 與 workflow 邊界&lt;/a>。&lt;/li>
&lt;li>Markdown CI 規則：讀 &lt;a href="https://tarrragon.github.io/blog/posts/blog-markdown-%E5%AF%AB%E4%BD%9C%E8%A6%8F%E7%AF%84%E8%88%87-mdtools-%E6%AA%A2%E6%9F%A5/" data-link-title="Blog Markdown 寫作規範與 mdtools 檢查" data-link-desc="本 blog 的 Markdown 排版規範權威契約。涵蓋 H1 禁用、MD024 siblings_only、反釣魚 TLD 校驗、卡片雙向完整性、front matter schema；改規則時要與 scripts/mdtools 實作同步。">Blog Markdown 寫作規範與 mdtools 檢查&lt;/a>。&lt;/li>
&lt;/ul></description><content:encoded><![CDATA[<p>本 blog 專案部署是前端靜態站部署的一個具體案例。這個資料夾只記錄本專案實際使用的 Hugo、Pagefind、Playwright、GitHub Pages 與 Claude workflow，不把這些細節當成所有 CI/CD 場域的通用規則。</p>
<h2 id="專案定位">專案定位</h2>
<p>本專案的部署產物是靜態網站。Hugo 負責產生 HTML，Pagefind 負責產生搜尋索引，GitHub Pages 負責 hosting，Playwright 負責驗證搜尋與版面行為。</p>
<table>
  <thead>
      <tr>
          <th>文件</th>
          <th>責任</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td><a href="github-actions-workflows/">GitHub Actions workflow</a></td>
          <td>記錄本專案 <code>.github/workflows/</code> 的實際設定</td>
      </tr>
  </tbody>
</table>
<h2 id="與通用-cicd-的關係">與通用 CI/CD 的關係</h2>
<p>本資料夾是實例層。通用 gate 原理、不同部署場域差異與失敗處理流程放在上層文章；本資料夾只回答「這個 blog 專案現在怎麼部署、失敗時要看哪裡」。術語定義統一回連 <a href="/blog/ci/knowledge-cards/" data-link-title="Knowledge Cards" data-link-desc="用原子化卡片整理 CI/CD 章節的核心術語，讓流程文章專注在判讀與決策">CI 知識卡片</a>。</p>
<h2 id="下一步路由">下一步路由</h2>
<ul>
<li>本專案 workflow：讀 <a href="github-actions-workflows/">GitHub Actions workflow</a>。</li>
<li>前端部署通用注意事項：讀 <a href="../frontend-deploy/">前端部署 CI/CD</a>。</li>
<li>CI gate 原理：讀 <a href="../ci-gate-workflow-boundary/">CI gate 與 workflow 邊界</a>。</li>
<li>Markdown CI 規則：讀 <a href="/blog/posts/blog-markdown-%E5%AF%AB%E4%BD%9C%E8%A6%8F%E7%AF%84%E8%88%87-mdtools-%E6%AA%A2%E6%9F%A5/" data-link-title="Blog Markdown 寫作規範與 mdtools 檢查" data-link-desc="本 blog 的 Markdown 排版規範權威契約。涵蓋 H1 禁用、MD024 siblings_only、反釣魚 TLD 校驗、卡片雙向完整性、front matter schema；改規則時要與 scripts/mdtools 實作同步。">Blog Markdown 寫作規範與 mdtools 檢查</a>。</li>
</ul>
]]></content:encoded></item><item><title>CI step silent hang：時間真空才是訊號、happy log 反而是 anti-signal</title><link>https://tarrragon.github.io/blog/work-log/ci-step-silent-hang%E6%99%82%E9%96%93%E7%9C%9F%E7%A9%BA%E6%89%8D%E6%98%AF%E8%A8%8A%E8%99%9Fhappy-log-%E5%8F%8D%E8%80%8C%E6%98%AF-anti-signal/</link><pubDate>Thu, 28 May 2026 00:00:00 +0000</pubDate><guid>https://tarrragon.github.io/blog/work-log/ci-step-silent-hang%E6%99%82%E9%96%93%E7%9C%9F%E7%A9%BA%E6%89%8D%E6%98%AF%E8%A8%8A%E8%99%9Fhappy-log-%E5%8F%8D%E8%80%8C%E6%98%AF-anti-signal/</guid><description>&lt;blockquote>
&lt;p>&lt;strong>核心議題&lt;/strong>：CI step 看起來「跑了很久才 timeout」時，要分辨「真的時間不夠」跟「silent hang 占滿時間」 — 兩者修法完全不同。Silent hang 的訊號是「最後一行 happy log 到 cancel 之間有大段時間真空」、不是「最後一行錯誤訊息」。第一次歸因錯誤後、第二次 fail 不該再加 timeout、該停下來重看 detailed log。
&lt;strong>案例骨幹&lt;/strong>：本 blog 的 Playwright CI 一直 timeout、初診「cache 缺失 + timeout 太緊」加了 cache + bump timeout、仍 timeout。重看 detailed log 發現 chromium 下載 2 秒完成、之後 24 分 31 秒&lt;strong>完全沒任何 log&lt;/strong> 才被 cancel — Playwright 1.59 在 Node.js 24.16.0 的 extract-zip regression（&lt;a href="https://github.com/microsoft/playwright/issues/41000">microsoft/playwright#41000&lt;/a>、上游 &lt;a href="https://github.com/nodejs/node/issues/63487">nodejs/node#63487&lt;/a>）。升 Playwright 1.60.0 後該 step 從 25 分鐘卡死降到 22 秒。&lt;/p>&lt;/blockquote>
&lt;hr>
&lt;h2 id="1-silent-hang-是-happy-log-的-anti-signal">1. Silent hang 是 happy log 的 anti-signal&lt;/h2>
&lt;p>CI step timeout 時、第一個本能是看「step 跑了多久」。15 分鐘 timeout 然後被砍、直覺判斷是「時間不夠、bump timeout」。這個直覺對應的失敗模式是「step 真的需要 16 分鐘才能跑完」。&lt;/p>
&lt;p>但有另一種失敗模式長得很像、修法完全不同：&lt;strong>silent hang&lt;/strong> — step 在某個點之後就不再輸出任何 log、process 仍在執行（沒有 crash）、直到外部 timeout 才被砍。表面看跟「時間不夠」一樣（step 跑很久才被 cancel）、但根因是 process 本身卡死、給多少時間都跑不完。&lt;/p>
&lt;p>辨識 silent hang 的關鍵訊號是「最後一行 happy log 到 cancel 訊息之間有大段時間真空」。&lt;strong>「Happy log」指的是看起來成功的訊息&lt;/strong>（例：下載 100% 完成、build succeeded、X tests passed）— 這類訊息特別會誤導判斷、因為它讓人以為任務在進展。Silent hang 開始之前的最後一行通常正是這種 happy log、是正常結束訊號的反面。&lt;/p>
&lt;h3 id="三類-timeout-模式的對照">三類 timeout 模式的對照&lt;/h3>
&lt;table>
 &lt;thead>
 &lt;tr>
 &lt;th>訊號&lt;/th>
 &lt;th>可能根因&lt;/th>
 &lt;th>修法&lt;/th>
 &lt;/tr>
 &lt;/thead>
 &lt;tbody>
 &lt;tr>
 &lt;td>整個 step 進度持續、最後階段加速到 timeout&lt;/td>
 &lt;td>時間真的不夠&lt;/td>
 &lt;td>bump timeout&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>有失敗訊息（exception / non-zero exit）之後 timeout&lt;/td>
 &lt;td>code 邏輯錯&lt;/td>
 &lt;td>看訊息修&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>&lt;strong>最後一行 log 之後有大段時間真空、然後 cancel&lt;/strong>&lt;/td>
 &lt;td>&lt;strong>Silent hang&lt;/strong>、可能 upstream bug&lt;/td>
 &lt;td>&lt;strong>查 upstream issue tracker、不是加 timeout&lt;/strong>&lt;/td>
 &lt;/tr>
 &lt;/tbody>
&lt;/table>
&lt;p>第三種最容易誤判、因為「log 之間沒輸出」沒被當成訊號 — 但&lt;strong>訊息真空本身就是訊號&lt;/strong>。寫 debug log 的人會記得補 error 訊息、但 silent hang 通常發生在工具內部的某個沒輸出 log 的等待點、所以沒有 error 訊息可看。&lt;/p>
&lt;hr>
&lt;h2 id="2-為什麼cache-缺失--bump-timeout的初診是-false-positive">2. 為什麼「cache 缺失 + bump timeout」的初診是 false positive&lt;/h2>
&lt;p>第一次看 CI fail log 時、有三件容易抓到的事：&lt;/p>
&lt;ol>
&lt;li>workflow YAML 裡的 &lt;code>timeout-minutes: 15&lt;/code>&lt;/li>
&lt;li>step 跑了 &lt;code>15m 6s&lt;/code>（幾乎等於 timeout 上限）&lt;/li>
&lt;li>step 名稱是 &lt;code>Install Playwright browsers&lt;/code>（要下載 170 MiB）&lt;/li>
&lt;/ol>
&lt;p>直覺合成的結論：「cache 缺失 + timeout 太緊」。這結論看起來「應該對」 — 因為這兩個都是「Install Playwright browsers」眾所周知的優化點。修法：加 &lt;code>actions/cache&lt;/code> + bump timeout 25 min。&lt;/p></description><content:encoded><![CDATA[<blockquote>
<p><strong>核心議題</strong>：CI step 看起來「跑了很久才 timeout」時，要分辨「真的時間不夠」跟「silent hang 占滿時間」 — 兩者修法完全不同。Silent hang 的訊號是「最後一行 happy log 到 cancel 之間有大段時間真空」、不是「最後一行錯誤訊息」。第一次歸因錯誤後、第二次 fail 不該再加 timeout、該停下來重看 detailed log。
<strong>案例骨幹</strong>：本 blog 的 Playwright CI 一直 timeout、初診「cache 缺失 + timeout 太緊」加了 cache + bump timeout、仍 timeout。重看 detailed log 發現 chromium 下載 2 秒完成、之後 24 分 31 秒<strong>完全沒任何 log</strong> 才被 cancel — Playwright 1.59 在 Node.js 24.16.0 的 extract-zip regression（<a href="https://github.com/microsoft/playwright/issues/41000">microsoft/playwright#41000</a>、上游 <a href="https://github.com/nodejs/node/issues/63487">nodejs/node#63487</a>）。升 Playwright 1.60.0 後該 step 從 25 分鐘卡死降到 22 秒。</p></blockquote>
<hr>
<h2 id="1-silent-hang-是-happy-log-的-anti-signal">1. Silent hang 是 happy log 的 anti-signal</h2>
<p>CI step timeout 時、第一個本能是看「step 跑了多久」。15 分鐘 timeout 然後被砍、直覺判斷是「時間不夠、bump timeout」。這個直覺對應的失敗模式是「step 真的需要 16 分鐘才能跑完」。</p>
<p>但有另一種失敗模式長得很像、修法完全不同：<strong>silent hang</strong> — step 在某個點之後就不再輸出任何 log、process 仍在執行（沒有 crash）、直到外部 timeout 才被砍。表面看跟「時間不夠」一樣（step 跑很久才被 cancel）、但根因是 process 本身卡死、給多少時間都跑不完。</p>
<p>辨識 silent hang 的關鍵訊號是「最後一行 happy log 到 cancel 訊息之間有大段時間真空」。<strong>「Happy log」指的是看起來成功的訊息</strong>（例：下載 100% 完成、build succeeded、X tests passed）— 這類訊息特別會誤導判斷、因為它讓人以為任務在進展。Silent hang 開始之前的最後一行通常正是這種 happy log、是正常結束訊號的反面。</p>
<h3 id="三類-timeout-模式的對照">三類 timeout 模式的對照</h3>
<table>
  <thead>
      <tr>
          <th>訊號</th>
          <th>可能根因</th>
          <th>修法</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>整個 step 進度持續、最後階段加速到 timeout</td>
          <td>時間真的不夠</td>
          <td>bump timeout</td>
      </tr>
      <tr>
          <td>有失敗訊息（exception / non-zero exit）之後 timeout</td>
          <td>code 邏輯錯</td>
          <td>看訊息修</td>
      </tr>
      <tr>
          <td><strong>最後一行 log 之後有大段時間真空、然後 cancel</strong></td>
          <td><strong>Silent hang</strong>、可能 upstream bug</td>
          <td><strong>查 upstream issue tracker、不是加 timeout</strong></td>
      </tr>
  </tbody>
</table>
<p>第三種最容易誤判、因為「log 之間沒輸出」沒被當成訊號 — 但<strong>訊息真空本身就是訊號</strong>。寫 debug log 的人會記得補 error 訊息、但 silent hang 通常發生在工具內部的某個沒輸出 log 的等待點、所以沒有 error 訊息可看。</p>
<hr>
<h2 id="2-為什麼cache-缺失--bump-timeout的初診是-false-positive">2. 為什麼「cache 缺失 + bump timeout」的初診是 false positive</h2>
<p>第一次看 CI fail log 時、有三件容易抓到的事：</p>
<ol>
<li>workflow YAML 裡的 <code>timeout-minutes: 15</code></li>
<li>step 跑了 <code>15m 6s</code>（幾乎等於 timeout 上限）</li>
<li>step 名稱是 <code>Install Playwright browsers</code>（要下載 170 MiB）</li>
</ol>
<p>直覺合成的結論：「cache 缺失 + timeout 太緊」。這結論看起來「應該對」 — 因為這兩個都是「Install Playwright browsers」眾所周知的優化點。修法：加 <code>actions/cache</code> + bump timeout 25 min。</p>
<p>修完仍 timeout、但這次跑 <code>25m 6s</code>（一樣頂到上限）。</p>
<p><strong>這時的訊號應該是「同樣的 step 在 1.67 倍的 timeout 下仍頂到上限」</strong> — 如果是時間不夠、bump 之後該往中間靠（譬如完成在 18-20 min）；如果一直頂到上限、意思是 step 不會自己結束、是 hang。</p>
<p>但初診時很容易略過這個訊號、轉而繼續想「是不是 cache step 設定有問題？」。這個歸因方向是錯的、因為前置假設「cache 是瓶頸」本身就沒驗證過。</p>
<h3 id="一輪-false-positive-的-anatomy">一輪 false positive 的 anatomy</h3>
<table>
  <thead>
      <tr>
          <th>步驟</th>
          <th>容易做的</th>
          <th>該做的</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>看到 timeout</td>
          <td>假設「時間不夠」</td>
          <td>先區分「時間不夠」vs「silent hang」</td>
      </tr>
      <tr>
          <td>看 high-level log</td>
          <td>假設「下載慢」</td>
          <td>應該看下載前後 timestamp 比對</td>
      </tr>
      <tr>
          <td>提解法</td>
          <td>加 cache + bump timeout</td>
          <td>應該先確認瓶頸真的在下載</td>
      </tr>
      <tr>
          <td>解法仍 fail</td>
          <td>假設「cache 沒 hit」</td>
          <td>應該意識到「同個 step 又頂到上限」是 hang 訊號</td>
      </tr>
  </tbody>
</table>
<p>每一步單看都合理、合起來就是把 false positive 越雕越精緻。這個 anatomy 對任何「初診沒驗證就改」的場景都適用、不限 CI。</p>
<hr>
<h2 id="3-wrap-的-r-在第二次-fail-時是-stop-訊號">3. WRAP 的 R 在第二次 fail 時是 stop 訊號</h2>
<p>WRAP 決策框架的 R（Reality Test）原則是「需要什麼事證才能證明這個方法可行？」。它不只是決策前的檢查、更是<strong>連續失敗後的 stop 訊號</strong>。</p>
<p>第二次 fail 時、繼續同方向加 timeout 是自動駕駛模式。WRAP 在這個位置該提醒的事：</p>
<ul>
<li>「兩次同類修法都沒解、是不是前置假設錯了？」</li>
<li>「我有沒有資料去判斷真正卡哪？」（資料充足度閘門）</li>
<li>「同類問題的 base rate 是什麼？」（基本率思考）</li>
</ul>
<p><strong>Stop 訊號的觸發條件是「同方向修法連續 fail 2 次」、不是「fail 3 次」</strong>。第二次就該回到資料層；第三次已經是浪費 cycle 而且強化錯誤假設。</p>
<p>實際上第二次 fail 後做的對的事是停下來、grep detailed log 的 timestamp 序列、發現「下載完成」跟「cancel」之間有 24 分鐘空白 — 這時才確認是 silent hang。如果第二次沒做這個轉折、第三次大概率是「換更大的 timeout」或「換不同的 cache key」、仍 fail。</p>
<hr>
<h2 id="4-detailed-log-的關鍵讀法找沒輸出的時間段">4. Detailed log 的關鍵讀法：找「沒輸出的時間段」</h2>
<p>CI 平台的 step log 通常很長、人眼掃容易跳過。看 silent hang 嫌疑時、讀法不是順序讀、是抓四個 timestamp：</p>
<ol>
<li><strong>Step 開始的 timestamp</strong>（log header 通常有）</li>
<li><strong>Step 結束（cancel / fail）的 timestamp</strong></li>
<li><strong>最後一行有意義輸出的 timestamp</strong></li>
<li>計算 #3 到 #2 之間的時間真空</li>
</ol>
<p>真空夠大（&gt; 1 分鐘）+ #3 是 happy log = silent hang 嫌疑高。</p>
<p>GitHub Actions 用 <code>gh</code> CLI 的具體做法：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl"><span class="c1"># 取某個 step 的所有 log（filter step 名稱）</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl">gh run view &lt;run-id&gt; --log --job &lt;job-id&gt; <span class="p">|</span> rg <span class="s2">&#34;Install Playwright browsers&#34;</span>
</span></span><span class="line"><span class="ln">3</span><span class="cl">
</span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="c1"># 抓最後幾行看真空尾巴</span>
</span></span><span class="line"><span class="ln">5</span><span class="cl">gh run view &lt;run-id&gt; --log --job &lt;job-id&gt; <span class="p">|</span> rg <span class="s2">&#34;Install Playwright browsers&#34;</span> <span class="p">|</span> tail -3</span></span></code></pre></div><p>本案例的最後 3 行（簡化過）：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="ln">1</span><span class="cl">2026-05-27T09:59:44.110Z  | 100% of 170.4 MiB
</span></span><span class="line"><span class="ln">2</span><span class="cl">2026-05-27T10:24:15.201Z  ##[error]The operation was canceled.</span></span></code></pre></div><p>24 分 31 秒真空、最後一行 happy log 是「下載 100% 完成」 — silent hang 確認。</p>
<p>這個讀法的核心是「<strong>時間真空優先於訊息內容</strong>」。技術人員習慣讀訊息內容找 error keyword、但 silent hang 沒有 error keyword 可找、只有時間真空。轉個訊號類型才看得到。</p>
<hr>
<h2 id="5-upstream-issue-搜尋的優先序">5. Upstream issue 搜尋的優先序</h2>
<p>Silent hang 確認後、下一步通常<strong>不是繼續 reason 根因</strong>、是去查 upstream issue tracker。Silent hang 多半是工具 / 依賴的 bug、而非自己 config 錯 — 因為 config 錯通常有 error message、不會 silent。</p>
<p>查詢策略：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">gh api <span class="s1">&#39;search/issues?q=repo:&lt;upstream&gt;/&lt;repo&gt;+&lt;symptom keywords&gt;+is:issue&amp;per_page=10&amp;sort=updated&#39;</span></span></span></code></pre></div><p>關鍵是 <strong>keyword 選擇用「症狀詞」而不是「猜測詞」</strong>。症狀詞描述讀者實際觀察到的現象（<code>hangs after download</code>、<code>stuck during extract</code>），猜測詞描述讀者推測的根因（<code>slow</code>、<code>timeout</code>、<code>network issue</code>）。猜測詞會找到大量無關 issue；症狀詞通常直接命中。</p>
<p>本案例查詢 <code>playwright install hangs chromium</code> 第二筆結果就是 issue #41000、標題完全匹配「<code>playwright install chromium</code> hangs after download completes on Node.js 24.16.0 (extract-zip)」。Issue 詳情指向上游 <a href="https://github.com/nodejs/node/issues/63487">nodejs/node#63487</a>、給出兩個 workaround（升 Playwright 1.60.0 或 pin Node 24.15.0）。從查詢到確認根因、全程不到 5 分鐘。</p>
<h3 id="為什麼-issue-tracker-該優先於-self-reasoning">為什麼 issue tracker 該優先於 self-reasoning</h3>
<p>技術人員的 instinct 是「自己想出根因」。但 CI silent hang 這類問題、根因通常在工具版本、runtime 版本、OS、container image 的微妙交互、不在自己的 codebase。<strong>Reasoning 找不到的東西、社群 issue tracker 經常已經有人回報過</strong>。</p>
<p>「先 reason 再查」跟「先查再 reason」的取捨：</p>
<table>
  <thead>
      <tr>
          <th>問題範圍</th>
          <th>哪個優先</th>
          <th>為什麼</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>自己 codebase 內的邏輯 bug</td>
          <td>reason</td>
          <td>自己最熟、reasoning 通常較快</td>
      </tr>
      <tr>
          <td>Upstream tool / runtime / OS / container 範圍</td>
          <td>查 issue</td>
          <td>自己沒上游知識、reasoning 容易卡在錯誤前置假設</td>
      </tr>
      <tr>
          <td>兩者交界（自己 config 觸發 upstream bug）</td>
          <td>並行</td>
          <td>先查找 known issue、同時 reason 自己 config</td>
      </tr>
  </tbody>
</table>
<p>Silent hang 預設屬於第二類、應該優先查 issue tracker。</p>
<hr>
<h2 id="6-整合訊號--行動-mapping">6. 整合：訊號 → 行動 mapping</h2>
<p>把本案例的經驗整理成可重用的訊號表：</p>
<table>
  <thead>
      <tr>
          <th>訊號</th>
          <th>行動</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>Step timeout 且最後一行是 happy log</td>
          <td>計算 timestamp 真空、確認是否 silent hang</td>
      </tr>
      <tr>
          <td>同方向修法 2 次都 fail</td>
          <td>停止、回到資料層、不再加 timeout / retry</td>
      </tr>
      <tr>
          <td>Silent hang 確認</td>
          <td>用症狀詞查 upstream issue tracker</td>
      </tr>
      <tr>
          <td>Issue 命中且有 workaround</td>
          <td>套 workaround、不要先 reason</td>
      </tr>
      <tr>
          <td>Issue 沒命中</td>
          <td>才回到 self-debug、加 verbose log（<code>DEBUG=</code> env）</td>
      </tr>
  </tbody>
</table>
<p>這張表的順序很重要：每一步的「該做的事」是下一步的「前置條件」。略過任一步、後面的判斷會建立在錯誤假設上。</p>
<hr>
<h2 id="適用範圍">適用範圍</h2>
<p>「Silent log 是 happy log 的 anti-signal」這個原則對所有非互動 process（CI、cron job、background worker、container init）都適用：</p>
<ul>
<li><strong>Docker build 卡住</strong>（特別是 RUN apt-get / npm install / pip install）— 同類 silent hang 模式</li>
<li><strong>CI cache restore 卡住</strong> — 大量小檔案的 cache 操作可能 silent hang</li>
<li><strong>Database migration 卡住</strong> — schema 變更 + 長 transaction 可能 silent hang</li>
<li><strong>任何 process 跑時間接近 timeout 上限被 cancel</strong> — 先檢查是否 silent hang 才提解法</li>
</ul>
<p>「WRAP R 在第二次 fail 時是 stop 訊號」這條原則不限 CI、適用所有「同方向修法重複 fail」的場景：debug、設定調校、效能優化。</p>
<hr>
<h2 id="參考資料">參考資料</h2>
<ul>
<li><a href="https://github.com/microsoft/playwright/issues/41000">microsoft/playwright issue #41000</a> — 本案例的 upstream issue（Playwright 1.57-1.59 在 Node 24.16.0 extract-zip hang）</li>
<li><a href="https://github.com/nodejs/node/issues/63487">nodejs/node issue #63487</a> — Node 24.16 extract-zip / yauzl regression 上游</li>
<li>同 blog 文章：<a href="/blog/skills/wrap-decision/" data-link-title="WRAP 決策框架 — 認知偏誤防護與決策品質" data-link-desc="WRAP 決策框架的 blog 好讀版：用錨點確認、資料充足度、選項擴增、現實檢驗、機會成本、行前預想與絆腳索防止自動駕駛式決策。">WRAP 決策框架的 R 階段操作</a> — Reality Test 詳細用法</li>
</ul>
]]></content:encoded></item><item><title>用 Claude Code GitHub Actions 自動除錯 CI 建置失敗</title><link>https://tarrragon.github.io/blog/posts/%E7%94%A8-claude-code-github-actions-%E8%87%AA%E5%8B%95%E9%99%A4%E9%8C%AF-ci-%E5%BB%BA%E7%BD%AE%E5%A4%B1%E6%95%97/</link><pubDate>Wed, 04 Mar 2026 00:00:00 +0000</pubDate><guid>https://tarrragon.github.io/blog/posts/%E7%94%A8-claude-code-github-actions-%E8%87%AA%E5%8B%95%E9%99%A4%E9%8C%AF-ci-%E5%BB%BA%E7%BD%AE%E5%A4%B1%E6%95%97/</guid><description>&lt;h2 id="這是什麼">這是什麼&lt;/h2>
&lt;p>&lt;a href="https://github.com/anthropics/claude-code-action">Claude Code GitHub Actions&lt;/a> 讓 Claude 直接參與你的 GitHub 工作流程，主要功能：&lt;/p>
&lt;ul>
&lt;li>&lt;strong>互動式助手&lt;/strong> — 在 PR/Issue 留言 &lt;code>@claude&lt;/code>，Claude 會分析程式碼並回覆&lt;/li>
&lt;li>&lt;strong>自動 Code Review&lt;/strong> — PR 開啟時自動審查變更&lt;/li>
&lt;li>&lt;strong>CI 除錯修復&lt;/strong> — build 失敗時自動分析錯誤並修復&lt;/li>
&lt;/ul></description><content:encoded><![CDATA[<h2 id="這是什麼">這是什麼</h2>
<p><a href="https://github.com/anthropics/claude-code-action">Claude Code GitHub Actions</a> 讓 Claude 直接參與你的 GitHub 工作流程，主要功能：</p>
<ul>
<li><strong>互動式助手</strong> — 在 PR/Issue 留言 <code>@claude</code>，Claude 會分析程式碼並回覆</li>
<li><strong>自動 Code Review</strong> — PR 開啟時自動審查變更</li>
<li><strong>CI 除錯修復</strong> — build 失敗時自動分析錯誤並修復</li>
</ul>
<p>完整功能說明參考 <a href="https://code.claude.com/docs/en/github-actions">官方文件</a>。</p>
<h2 id="設定方式">設定方式</h2>
<h3 id="install-github-app推薦"><code>/install-github-app</code>（推薦）</h3>
<p>在 Claude Code 終端執行 <code>/install-github-app</code>，它會引導你完成所有設定。</p>
<p>流程中的關鍵步驟：</p>
<ol>
<li><strong>選擇 repo</strong> — 指定要安裝的 GitHub repository</li>
<li><strong>安裝 Claude GitHub App</strong> — 自動安裝到指定 repo，授予 Contents、Issues、Pull requests 的 Read &amp; Write 權限</li>
<li><strong>選擇認證方式</strong> — 選擇 <strong>long-life token</strong> 會產生 OAuth token，自動寫入 GitHub Secrets 為 <code>CLAUDE_CODE_OAUTH_TOKEN</code></li>
<li><strong>建立 workflow 檔案</strong> — 自動建立並 push 兩個 workflow：
<ul>
<li><code>claude.yml</code> — <code>@claude</code> 互動回覆</li>
<li><code>claude-code-review.yml</code> — PR 自動 code review</li>
</ul>
</li>
</ol>
<p>完成後不需要額外設定。</p>
<h3 id="手動設定使用-anthropic-api-key">手動設定（使用 Anthropic API Key）</h3>
<p>如果不想用 <code>/install-github-app</code>，可以手動操作：</p>
<ol>
<li>前往 <a href="https://github.com/apps/claude">github.com/apps/claude</a> 安裝 App 到你的 repo</li>
<li>到 repo 的 <strong>Settings → Secrets and variables → Actions</strong>，新增 <code>ANTHROPIC_API_KEY</code></li>
<li>手動建立 workflow 檔案到 <code>.github/workflows/</code></li>
</ol>
<p>兩種認證方式的差異：</p>
<table>
  <thead>
      <tr>
          <th>認證方式</th>
          <th>Secret 名稱</th>
          <th>適用對象</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>OAuth Token</td>
          <td><code>CLAUDE_CODE_OAUTH_TOKEN</code></td>
          <td>Pro/Max 用戶，<code>/install-github-app</code> 自動設定</td>
      </tr>
      <tr>
          <td>API Key</td>
          <td><code>ANTHROPIC_API_KEY</code></td>
          <td>直接使用 Anthropic API，需手動到 <a href="https://console.anthropic.com">console.anthropic.com</a> 取得</td>
      </tr>
  </tbody>
</table>
<h2 id="加入-ci-自動除錯">加入 CI 自動除錯</h2>
<p><code>/install-github-app</code> 建立的 workflow 只處理 <code>@claude</code> 互動和 code review。如果你想在 <strong>build 失敗時自動觸發 Claude 修復</strong>，需要修改既有的 deploy workflow。</p>
<p>首先，補上 Claude 需要的權限（原本可能只有 <code>contents: read</code>）：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"><span class="line"><span class="ln">1</span><span class="cl"><span class="nt">permissions</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="w">  </span><span class="nt">contents</span><span class="p">:</span><span class="w"> </span><span class="l">write       </span><span class="w"> </span><span class="c"># Claude 需要寫入修復後的檔案</span><span class="w">
</span></span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="w">  </span><span class="nt">pull-requests</span><span class="p">:</span><span class="w"> </span><span class="l">write  </span><span class="w"> </span><span class="c"># Claude 可能需要建立 PR</span><span class="w">
</span></span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="w">  </span><span class="nt">issues</span><span class="p">:</span><span class="w"> </span><span class="l">write         </span><span class="w"> </span><span class="c"># Claude 回報結果</span><span class="w">
</span></span></span><span class="line"><span class="ln">5</span><span class="cl"><span class="w">  </span><span class="nt">pages</span><span class="p">:</span><span class="w"> </span><span class="l">write          </span><span class="w"> </span><span class="c"># 原本的 deploy 權限</span><span class="w">
</span></span></span><span class="line"><span class="ln">6</span><span class="cl"><span class="w">  </span><span class="nt">id-token</span><span class="p">:</span><span class="w"> </span><span class="l">write       </span><span class="w"> </span><span class="c"># 原本的 deploy 權限</span></span></span></code></pre></div><p>然後在 build 步驟加入 Claude 除錯邏輯：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="c"># 在原本的 build step 加上 continue-on-error 和 id</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 2</span><span class="cl"><span class="w"></span>- <span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l">Build</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 3</span><span class="cl"><span class="w">  </span><span class="nt">id</span><span class="p">:</span><span class="w"> </span><span class="l">hugo-build</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 4</span><span class="cl"><span class="w">  </span><span class="nt">run</span><span class="p">:</span><span class="w"> </span><span class="l">hugo --minify 2&gt;&amp;1 | tee hugo-build-output.txt</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 5</span><span class="cl"><span class="w">  </span><span class="nt">continue-on-error</span><span class="p">:</span><span class="w"> </span><span class="kc">true</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 6</span><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="ln"> 7</span><span class="cl"><span class="w"></span><span class="c"># Build 失敗時觸發 Claude 除錯</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 8</span><span class="cl"><span class="w"></span>- <span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l">Claude Debug on Build Failure</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 9</span><span class="cl"><span class="w">  </span><span class="nt">if</span><span class="p">:</span><span class="w"> </span><span class="l">steps.hugo-build.outcome == &#39;failure&#39;</span><span class="w">
</span></span></span><span class="line"><span class="ln">10</span><span class="cl"><span class="w">  </span><span class="nt">uses</span><span class="p">:</span><span class="w"> </span><span class="l">anthropics/claude-code-action@v1</span><span class="w">
</span></span></span><span class="line"><span class="ln">11</span><span class="cl"><span class="w">  </span><span class="nt">with</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln">12</span><span class="cl"><span class="w">    </span><span class="c"># 依你的認證方式擇一</span><span class="w">
</span></span></span><span class="line"><span class="ln">13</span><span class="cl"><span class="w">    </span><span class="nt">claude_code_oauth_token</span><span class="p">:</span><span class="w"> </span><span class="l">${{ secrets.CLAUDE_CODE_OAUTH_TOKEN }}</span><span class="w">
</span></span></span><span class="line"><span class="ln">14</span><span class="cl"><span class="w">    </span><span class="c"># anthropic_api_key: ${{ secrets.ANTHROPIC_API_KEY }}</span><span class="w">
</span></span></span><span class="line"><span class="ln">15</span><span class="cl"><span class="w">    </span><span class="nt">prompt</span><span class="p">:</span><span class="w"> </span><span class="p">|</span><span class="sd">
</span></span></span><span class="line"><span class="ln">16</span><span class="cl"><span class="sd">      Hugo build failed. Here is the error output:
</span></span></span><span class="line"><span class="ln">17</span><span class="cl"><span class="sd">
</span></span></span><span class="line"><span class="ln">18</span><span class="cl"><span class="sd">      $(cat hugo-build-output.txt)
</span></span></span><span class="line"><span class="ln">19</span><span class="cl"><span class="sd">
</span></span></span><span class="line"><span class="ln">20</span><span class="cl"><span class="sd">      Please analyze the error, find the problematic file(s),
</span></span></span><span class="line"><span class="ln">21</span><span class="cl"><span class="sd">      fix the YAML front matter or content issue, and commit the fix.</span><span class="w">
</span></span></span><span class="line"><span class="ln">22</span><span class="cl"><span class="w">    </span><span class="nt">claude_args</span><span class="p">:</span><span class="w"> </span><span class="s2">&#34;--max-turns 10&#34;</span><span class="w">
</span></span></span><span class="line"><span class="ln">23</span><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="ln">24</span><span class="cl"><span class="w"></span><span class="c"># 修復後重新 build 驗證</span><span class="w">
</span></span></span><span class="line"><span class="ln">25</span><span class="cl"><span class="w"></span>- <span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l">Retry build after fix</span><span class="w">
</span></span></span><span class="line"><span class="ln">26</span><span class="cl"><span class="w">  </span><span class="nt">if</span><span class="p">:</span><span class="w"> </span><span class="l">steps.hugo-build.outcome == &#39;failure&#39;</span><span class="w">
</span></span></span><span class="line"><span class="ln">27</span><span class="cl"><span class="w">  </span><span class="nt">run</span><span class="p">:</span><span class="w"> </span><span class="l">hugo --minify</span></span></span></code></pre></div><p>核心設計：</p>
<ol>
<li><code>continue-on-error: true</code> — build 失敗不中斷流程，讓後續 Claude 步驟有機會執行</li>
<li><code>if: steps.hugo-build.outcome == 'failure'</code> — 只在失敗時觸發，正常 build 不消耗 API 額度</li>
<li>修復後重新 <code>hugo --minify</code> 驗證是否成功</li>
</ol>
<h2 id="計費方式">計費方式</h2>
<p>計費取決於你使用哪種認證方式：</p>
<table>
  <thead>
      <tr>
          <th>認證方式</th>
          <th>計費來源</th>
          <th>說明</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>OAuth Token</td>
          <td><strong>訂閱額度</strong>（Pro/Max）</td>
          <td>跟 claude.ai 網頁、Claude Code CLI、Claude Desktop <strong>共用同一個額度池</strong></td>
      </tr>
      <tr>
          <td>API Key</td>
          <td><strong>獨立 API 計費</strong></td>
          <td>按 token 用量付費，與訂閱額度完全分開</td>
      </tr>
  </tbody>
</table>
<p>OAuth token 的額度是共用的，GitHub Actions 跑多了會擠壓你日常在 claude.ai 和 CLI 的使用額度。如果 CI 觸發頻繁，建議改用 API Key 避免互相影響。</p>
<p>詳細的費率可參考 <a href="https://www.anthropic.com/pricing">Claude 定價頁面</a>。</p>
<h3 id="降低成本的設定">降低成本的設定</h3>
<table>
  <thead>
      <tr>
          <th>設定</th>
          <th>說明</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td><code>--max-turns 10</code></td>
          <td>限制迭代次數，避免無限循環</td>
      </tr>
      <tr>
          <td>只在 <code>failure</code> 時觸發</td>
          <td>正常 build 不消耗 API 額度</td>
      </tr>
      <tr>
          <td><code>@claude</code> 觸發詞</td>
          <td>互動模式只在明確呼叫時才啟動</td>
      </tr>
  </tbody>
</table>
<h2 id="搭配-claudemd">搭配 CLAUDE.md</h2>
<p>在 repo 根目錄建立 <code>CLAUDE.md</code>，Claude 會自動讀取作為上下文，提升修復準確度。</p>
<h2 id="參考資料">參考資料</h2>
<ul>
<li><a href="https://code.claude.com/docs/en/github-actions">Claude Code GitHub Actions 官方文件</a></li>
<li><a href="https://github.com/anthropics/claude-code-action">claude-code-action GitHub Repo</a></li>
<li><a href="https://github.com/anthropics/claude-code-action/blob/main/docs/setup.md">Setup Guide</a></li>
</ul>]]></content:encoded></item></channel></rss>