<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/"><channel><title>模組四：環境分離與模組化 on Tarragon</title><link>https://tarrragon.github.io/blog/infra/04-environment-separation/</link><description>Recent content in 模組四：環境分離與模組化 on Tarragon</description><generator>Hugo -- gohugo.io</generator><language>zh-TW</language><copyright>Tarragon (CC BY 4.0)</copyright><lastBuildDate>Fri, 26 Jun 2026 00:00:00 +0000</lastBuildDate><atom:link href="https://tarrragon.github.io/blog/infra/04-environment-separation/index.xml" rel="self" type="application/rss+xml"/><item><title>環境分離與模組化 — 目錄結構、module 參數化與 retrofit 路徑</title><link>https://tarrragon.github.io/blog/infra/04-environment-separation/directory-module-parameterization/</link><pubDate>Fri, 26 Jun 2026 00:00:00 +0000</pubDate><guid>https://tarrragon.github.io/blog/infra/04-environment-separation/directory-module-parameterization/</guid><description>&lt;p>環境分離的核心責任是讓 dev 的實驗、staging 的驗證、prod 的真實流量彼此不可見也不可達。從目錄結構就定好環境邊界的專案，dev 跟 prod 是兩棵獨立的 state 樹、改錯一邊不會波及另一邊；等資源都長出來、流量都上線了才回頭切的專案，每一次 retrofit 都在帶電作業，動到的是正在服務客戶的網路與身分。同樣一套 module、同樣的工程師，差別只在「環境邊界是設計出來的、還是事後補的」，而這個差別在第一天幾乎零成本、在第一百天可能是一個季度的遷移專案。&lt;/p>
&lt;h2 id="環境分離從第一天的目錄結構就定好">環境分離從第一天的目錄結構就定好&lt;/h2>
&lt;p>環境分離的本質是把「同一套基礎設施定義」複製成多份隔離的執行實例，每份有自己的 state、自己的雲端資源、自己的故障半徑。它承擔的責任是讓 dev 的實驗、staging 的驗證、prod 的真實流量彼此不可見也不可達 — 在 dev 跑壞一個資料庫、套錯一條 security group 規則，prod 完全無感。&lt;/p>
&lt;p>這個邊界要在第一天就用目錄結構表達出來，原因是 state 一旦混在一起就難以無痛拆開。Terraform 這類工具用 state 檔記錄「哪個資源由哪段 code 管理」，如果 dev 跟 prod 的資源都登記在同一份 state，後續想把 prod 移出去，等於要對正在服務的資源做 &lt;code>state mv&lt;/code> 或 import/remove 操作 — 任何一步算錯，工具可能判定資源該銷毀重建，而那是 prod 的資料庫。第一天就分目錄，dev 與 prod 從來不曾共用 state，這個風險根本不存在。&lt;/p>
&lt;p>檢查自己的 repo：如果現在只有一份 &lt;code>main.tf&lt;/code>、裡面同時宣告了 &lt;code>dev-db&lt;/code> 跟 &lt;code>prod-db&lt;/code>，或者 &lt;code>terraform.tfstate&lt;/code> 裡同時記錄了兩個環境的資源，這個專案已經欠下環境分離的債，債齡每天都在增加。下一步路由是先確立目錄骨架，再決定差異怎麼參數化。&lt;/p>
&lt;h2 id="目錄分離-vs-terraform-workspace-的取捨">目錄分離 vs Terraform workspace 的取捨&lt;/h2>
&lt;p>切分環境有兩條主流路徑：每個環境一個獨立目錄（各自持有 backend 與 state），或共用一份 code 用 Terraform workspace 切換不同 state。兩者都能讓 state 隔離，差別在「環境差異藏在哪裡」以及「誤操作的故障半徑多大」。&lt;/p>
&lt;h3 id="隔離強度光譜">隔離強度光譜&lt;/h3>
&lt;p>在挑這兩條路之前，先把它們放回完整的分離強度光譜。環境分離橫跨一條從帳號到 workspace、隔離由粗到細的階梯：&lt;/p>
&lt;table>
 &lt;thead>
 &lt;tr>
 &lt;th>隔離層級&lt;/th>
 &lt;th>邊界機制&lt;/th>
 &lt;th>適用情境&lt;/th>
 &lt;th>初始成本&lt;/th>
 &lt;/tr>
 &lt;/thead>
 &lt;tbody>
 &lt;tr>
 &lt;td>帳號級隔離&lt;/td>
 &lt;td>各環境獨立雲端帳號&lt;/td>
 &lt;td>prod 需法規等級的權限與計費分離&lt;/td>
 &lt;td>高&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>獨立 repo&lt;/td>
 &lt;td>各環境獨立程式碼庫與 CI pipeline&lt;/td>
 &lt;td>各環境由不同團隊維護或受不同合規約束&lt;/td>
 &lt;td>中高&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>目錄分離&lt;/td>
 &lt;td>同 repo 內各環境有獨立目錄與 state&lt;/td>
 &lt;td>多數早期團隊的平衡點&lt;/td>
 &lt;td>低&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>Workspace&lt;/td>
 &lt;td>同份 code、執行期切換 state&lt;/td>
 &lt;td>環境高度同構、數量多&lt;/td>
 &lt;td>最低&lt;/td>
 &lt;/tr>
 &lt;/tbody>
&lt;/table>
&lt;p>光譜越靠粗的一端，隔離越強、跨環境共用越少、初始與維運成本越高；越靠細的一端，重複越少、邊界越隱性。多數早期團隊在目錄分離這一格落腳，因為它在顯式邊界與維運成本之間平衡得宜。當隔離需求升高（例如 prod 要法規等級的帳務與權限隔離），再沿光譜往帳號級或獨立 repo 移。帳號級隔離的權限邊界設計見&lt;a href="https://tarrragon.github.io/blog/infra/02-identity-credentials/" data-link-title="模組二：身分與憑證地基 — IAM 與 OIDC" data-link-desc="IAM role / policy 設計、最小權限，以及用 OIDC 短期憑證取代長期 access key">模組二：身分與憑證地基&lt;/a>。&lt;/p>
&lt;h3 id="目錄分離的結構">目錄分離的結構&lt;/h3>
&lt;p>目錄分離把每個環境寫成可獨立進入的工作目錄，差異透過各自的 &lt;code>terraform.tfvars&lt;/code> 表達，prod 的 backend 設定、變數值、甚至 provider 版本都各自鎖定。&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-text" data-lang="text">&lt;span class="line">&lt;span class="ln"> 1&lt;/span>&lt;span class="cl">infra/
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 2&lt;/span>&lt;span class="cl">├── modules/ # 可重用模組、不含任何環境專屬值
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 3&lt;/span>&lt;span class="cl">│ ├── network/
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 4&lt;/span>&lt;span class="cl">│ ├── database/
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 5&lt;/span>&lt;span class="cl">│ └── service/
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 6&lt;/span>&lt;span class="cl">└── environments/
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 7&lt;/span>&lt;span class="cl"> ├── dev/
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 8&lt;/span>&lt;span class="cl"> │ ├── main.tf # 呼叫 modules、傳 dev 參數
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 9&lt;/span>&lt;span class="cl"> │ ├── backend.tf # state 指向 dev 專屬位址
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">10&lt;/span>&lt;span class="cl"> │ └── terraform.tfvars # dev 的差異值
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">11&lt;/span>&lt;span class="cl"> ├── staging/
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">12&lt;/span>&lt;span class="cl"> │ └── ...
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">13&lt;/span>&lt;span class="cl"> └── prod/
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">14&lt;/span>&lt;span class="cl"> ├── main.tf
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">15&lt;/span>&lt;span class="cl"> ├── backend.tf # state 指向 prod 專屬位址
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">16&lt;/span>&lt;span class="cl"> └── terraform.tfvars # prod 的差異值&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>它的代價是目錄之間有重複的 boilerplate（&lt;code>main.tf&lt;/code> 裡的 module 呼叫在每個環境幾乎一樣），好處是邊界顯式 — &lt;code>cd&lt;/code> 進哪個目錄、apply 就只會動那個環境，prod 的 state 位址寫死在 prod 目錄的 backend 設定裡，不會因為忘記切換而打錯環境。&lt;/p></description><content:encoded><![CDATA[<p>環境分離的核心責任是讓 dev 的實驗、staging 的驗證、prod 的真實流量彼此不可見也不可達。從目錄結構就定好環境邊界的專案，dev 跟 prod 是兩棵獨立的 state 樹、改錯一邊不會波及另一邊；等資源都長出來、流量都上線了才回頭切的專案，每一次 retrofit 都在帶電作業，動到的是正在服務客戶的網路與身分。同樣一套 module、同樣的工程師，差別只在「環境邊界是設計出來的、還是事後補的」，而這個差別在第一天幾乎零成本、在第一百天可能是一個季度的遷移專案。</p>
<h2 id="環境分離從第一天的目錄結構就定好">環境分離從第一天的目錄結構就定好</h2>
<p>環境分離的本質是把「同一套基礎設施定義」複製成多份隔離的執行實例，每份有自己的 state、自己的雲端資源、自己的故障半徑。它承擔的責任是讓 dev 的實驗、staging 的驗證、prod 的真實流量彼此不可見也不可達 — 在 dev 跑壞一個資料庫、套錯一條 security group 規則，prod 完全無感。</p>
<p>這個邊界要在第一天就用目錄結構表達出來，原因是 state 一旦混在一起就難以無痛拆開。Terraform 這類工具用 state 檔記錄「哪個資源由哪段 code 管理」，如果 dev 跟 prod 的資源都登記在同一份 state，後續想把 prod 移出去，等於要對正在服務的資源做 <code>state mv</code> 或 import/remove 操作 — 任何一步算錯，工具可能判定資源該銷毀重建，而那是 prod 的資料庫。第一天就分目錄，dev 與 prod 從來不曾共用 state，這個風險根本不存在。</p>
<p>檢查自己的 repo：如果現在只有一份 <code>main.tf</code>、裡面同時宣告了 <code>dev-db</code> 跟 <code>prod-db</code>，或者 <code>terraform.tfstate</code> 裡同時記錄了兩個環境的資源，這個專案已經欠下環境分離的債，債齡每天都在增加。下一步路由是先確立目錄骨架，再決定差異怎麼參數化。</p>
<h2 id="目錄分離-vs-terraform-workspace-的取捨">目錄分離 vs Terraform workspace 的取捨</h2>
<p>切分環境有兩條主流路徑：每個環境一個獨立目錄（各自持有 backend 與 state），或共用一份 code 用 Terraform workspace 切換不同 state。兩者都能讓 state 隔離，差別在「環境差異藏在哪裡」以及「誤操作的故障半徑多大」。</p>
<h3 id="隔離強度光譜">隔離強度光譜</h3>
<p>在挑這兩條路之前，先把它們放回完整的分離強度光譜。環境分離橫跨一條從帳號到 workspace、隔離由粗到細的階梯：</p>
<table>
  <thead>
      <tr>
          <th>隔離層級</th>
          <th>邊界機制</th>
          <th>適用情境</th>
          <th>初始成本</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>帳號級隔離</td>
          <td>各環境獨立雲端帳號</td>
          <td>prod 需法規等級的權限與計費分離</td>
          <td>高</td>
      </tr>
      <tr>
          <td>獨立 repo</td>
          <td>各環境獨立程式碼庫與 CI pipeline</td>
          <td>各環境由不同團隊維護或受不同合規約束</td>
          <td>中高</td>
      </tr>
      <tr>
          <td>目錄分離</td>
          <td>同 repo 內各環境有獨立目錄與 state</td>
          <td>多數早期團隊的平衡點</td>
          <td>低</td>
      </tr>
      <tr>
          <td>Workspace</td>
          <td>同份 code、執行期切換 state</td>
          <td>環境高度同構、數量多</td>
          <td>最低</td>
      </tr>
  </tbody>
</table>
<p>光譜越靠粗的一端，隔離越強、跨環境共用越少、初始與維運成本越高；越靠細的一端，重複越少、邊界越隱性。多數早期團隊在目錄分離這一格落腳，因為它在顯式邊界與維運成本之間平衡得宜。當隔離需求升高（例如 prod 要法規等級的帳務與權限隔離），再沿光譜往帳號級或獨立 repo 移。帳號級隔離的權限邊界設計見<a href="/blog/infra/02-identity-credentials/" data-link-title="模組二：身分與憑證地基 — IAM 與 OIDC" data-link-desc="IAM role / policy 設計、最小權限，以及用 OIDC 短期憑證取代長期 access key">模組二：身分與憑證地基</a>。</p>
<h3 id="目錄分離的結構">目錄分離的結構</h3>
<p>目錄分離把每個環境寫成可獨立進入的工作目錄，差異透過各自的 <code>terraform.tfvars</code> 表達，prod 的 backend 設定、變數值、甚至 provider 版本都各自鎖定。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="ln"> 1</span><span class="cl">infra/
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">├── modules/                  # 可重用模組、不含任何環境專屬值
</span></span><span class="line"><span class="ln"> 3</span><span class="cl">│   ├── network/
</span></span><span class="line"><span class="ln"> 4</span><span class="cl">│   ├── database/
</span></span><span class="line"><span class="ln"> 5</span><span class="cl">│   └── service/
</span></span><span class="line"><span class="ln"> 6</span><span class="cl">└── environments/
</span></span><span class="line"><span class="ln"> 7</span><span class="cl">    ├── dev/
</span></span><span class="line"><span class="ln"> 8</span><span class="cl">    │   ├── main.tf           # 呼叫 modules、傳 dev 參數
</span></span><span class="line"><span class="ln"> 9</span><span class="cl">    │   ├── backend.tf        # state 指向 dev 專屬位址
</span></span><span class="line"><span class="ln">10</span><span class="cl">    │   └── terraform.tfvars  # dev 的差異值
</span></span><span class="line"><span class="ln">11</span><span class="cl">    ├── staging/
</span></span><span class="line"><span class="ln">12</span><span class="cl">    │   └── ...
</span></span><span class="line"><span class="ln">13</span><span class="cl">    └── prod/
</span></span><span class="line"><span class="ln">14</span><span class="cl">        ├── main.tf
</span></span><span class="line"><span class="ln">15</span><span class="cl">        ├── backend.tf        # state 指向 prod 專屬位址
</span></span><span class="line"><span class="ln">16</span><span class="cl">        └── terraform.tfvars  # prod 的差異值</span></span></code></pre></div><p>它的代價是目錄之間有重複的 boilerplate（<code>main.tf</code> 裡的 module 呼叫在每個環境幾乎一樣），好處是邊界顯式 — <code>cd</code> 進哪個目錄、apply 就只會動那個環境，prod 的 state 位址寫死在 prod 目錄的 backend 設定裡，不會因為忘記切換而打錯環境。</p>
<p>每個環境目錄的 <code>backend.tf</code> 各自指向獨立的 state 路徑，這是邊界的物理保證：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-hcl" data-lang="hcl"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="c1"># environments/prod/backend.tf
</span></span></span><span class="line"><span class="ln"> 2</span><span class="cl"><span class="c1"></span><span class="k">terraform</span> {
</span></span><span class="line"><span class="ln"> 3</span><span class="cl">  <span class="k">backend</span> <span class="s2">&#34;s3&#34;</span> {
</span></span><span class="line"><span class="ln"> 4</span><span class="cl"><span class="n">    bucket</span>         <span class="o">=</span> <span class="s2">&#34;acme-tf-state&#34;</span>
</span></span><span class="line"><span class="ln"> 5</span><span class="cl"><span class="n">    key</span>            <span class="o">=</span> <span class="s2">&#34;prod/terraform.tfstate&#34;</span>
</span></span><span class="line"><span class="ln"> 6</span><span class="cl"><span class="n">    region</span>         <span class="o">=</span> <span class="s2">&#34;ap-northeast-1&#34;</span>
</span></span><span class="line"><span class="ln"> 7</span><span class="cl"><span class="n">    encrypt</span>        <span class="o">=</span> <span class="kt">true</span>
</span></span><span class="line"><span class="ln"> 8</span><span class="cl"><span class="n">    dynamodb_table</span> <span class="o">=</span> <span class="s2">&#34;acme-tf-lock&#34;</span>
</span></span><span class="line"><span class="ln"> 9</span><span class="cl">  }
</span></span><span class="line"><span class="ln">10</span><span class="cl">}</span></span></code></pre></div>




<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-hcl" data-lang="hcl"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="c1"># environments/dev/backend.tf
</span></span></span><span class="line"><span class="ln"> 2</span><span class="cl"><span class="c1"></span><span class="k">terraform</span> {
</span></span><span class="line"><span class="ln"> 3</span><span class="cl">  <span class="k">backend</span> <span class="s2">&#34;s3&#34;</span> {
</span></span><span class="line"><span class="ln"> 4</span><span class="cl"><span class="n">    bucket</span>         <span class="o">=</span> <span class="s2">&#34;acme-tf-state&#34;</span>
</span></span><span class="line"><span class="ln"> 5</span><span class="cl"><span class="n">    key</span>            <span class="o">=</span> <span class="s2">&#34;dev/terraform.tfstate&#34;</span>
</span></span><span class="line"><span class="ln"> 6</span><span class="cl"><span class="n">    region</span>         <span class="o">=</span> <span class="s2">&#34;ap-northeast-1&#34;</span>
</span></span><span class="line"><span class="ln"> 7</span><span class="cl"><span class="n">    encrypt</span>        <span class="o">=</span> <span class="kt">true</span>
</span></span><span class="line"><span class="ln"> 8</span><span class="cl"><span class="n">    dynamodb_table</span> <span class="o">=</span> <span class="s2">&#34;acme-tf-lock&#34;</span>
</span></span><span class="line"><span class="ln"> 9</span><span class="cl">  }
</span></span><span class="line"><span class="ln">10</span><span class="cl">}</span></span></code></pre></div><h3 id="terragrunt-收斂-boilerplate">Terragrunt 收斂 boilerplate</h3>
<p>目錄分離的 boilerplate 重複可以用 Terragrunt 收斂。Terragrunt 的存在理由正是把跨環境目錄共通的 backend、provider、module 呼叫抽成一份範本，各環境目錄只留差異值，等於在保留目錄顯式邊界的前提下補上一層 DRY。它值得引入的情境是環境數量多（超過三個）、共通 boilerplate 開始拖慢維護時；環境只有兩三個時，直接維護幾份目錄的成本通常還低於多引入一個工具與它的學習曲線。</p>
<h3 id="workspace-的邊界">Workspace 的邊界</h3>
<p>Workspace 共用同一份 code、用 <code>terraform workspace select prod</code> 在執行期切換 state。它的好處是零重複，所有環境的 code 保證同步；代價是環境差異只能靠 <code>terraform.workspace</code> 在 code 裡寫條件判斷，而當前選中哪個 workspace 是 shell 的隱性狀態。</p>
<p>這個隱性狀態正是早期最該避免的失誤來源。在 dev workspace 以為自己在改 dev、其實上一個指令切到了 prod，apply 下去才發現故障半徑是 prod。沒有任何檔案層級的信號能防止這件事 — workspace 的當前狀態存在本地的 <code>.terraform/</code> 目錄裡，git diff 看不到、code review 也看不到。</p>
<p>早期推薦目錄分離，理由是故障半徑與認知負荷的取捨在小團隊明顯偏向「顯式邊界」這一側：團隊還沒有成熟的 CI gate 攔截誤 apply，顯式目錄是最便宜的防呆。Workspace 較划算的情境是環境數量多且高度同構（例如每個客戶一個隔離環境、差異只有名稱與配額），重複目錄的維護成本開始超過 workspace 隱性狀態的風險時，再切過去。每個環境的 state 要怎麼各自隔離、backend 怎麼設定，見<a href="/blog/infra/01-minimal-iac/" data-link-title="模組一：最小可行 IaC — state 地基與 Console 唯讀鐵律" data-link-desc="Terraform / OpenTofu 選型、remote state 與 lock，以及「Console 只能看不能改」鐵律">模組一：最小可行 IaC</a>。</p>
<h2 id="module-化同一套-code不同參數">module 化：同一套 code、不同參數</h2>
<p>Module 是把一組會被多環境重複使用的資源封裝成有輸入參數的單元，承擔的責任是讓 dev 與 prod 共享同一份邏輯定義、只在參數上分歧。沒有 module 時，dev 與 prod 各自維護一份 copy-paste 的資源宣告，兩份會隨時間漂移 — 有人只在 prod 補了一條 security group 規則、忘了同步 dev，於是「dev 能跑、prod 卻爆掉」或更糟的「dev 測過了、prod 行為不同」。</p>
<p>避免漂移的關鍵是讓環境之間唯一合法的差異來源是傳進 module 的參數，而不是 module 內部的 code 分支。Module 內部不寫 <code>if env == &quot;prod&quot;</code> 這類判斷，所有環境相關的值都從 <code>variable</code> 進來：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-hcl" data-lang="hcl"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="c1"># modules/database/variables.tf — module 只宣告它需要什麼參數
</span></span></span><span class="line"><span class="ln"> 2</span><span class="cl"><span class="c1"></span><span class="k">variable</span> <span class="s2">&#34;instance_class&#34;</span> {
</span></span><span class="line"><span class="ln"> 3</span><span class="cl"><span class="n">  type</span>        <span class="o">=</span> <span class="k">string</span>
</span></span><span class="line"><span class="ln"> 4</span><span class="cl"><span class="n">  description</span> <span class="o">=</span> <span class="s2">&#34;RDS instance type&#34;</span>
</span></span><span class="line"><span class="ln"> 5</span><span class="cl">}
</span></span><span class="line"><span class="ln"> 6</span><span class="cl">
</span></span><span class="line"><span class="ln"> 7</span><span class="cl"><span class="k">variable</span> <span class="s2">&#34;multi_az&#34;</span> {
</span></span><span class="line"><span class="ln"> 8</span><span class="cl"><span class="n">  type</span>    <span class="o">=</span> <span class="k">bool</span>
</span></span><span class="line"><span class="ln"> 9</span><span class="cl"><span class="n">  default</span> <span class="o">=</span> <span class="kt">false</span>
</span></span><span class="line"><span class="ln">10</span><span class="cl">}
</span></span><span class="line"><span class="ln">11</span><span class="cl">
</span></span><span class="line"><span class="ln">12</span><span class="cl"><span class="k">variable</span> <span class="s2">&#34;backup_retention_days&#34;</span> {
</span></span><span class="line"><span class="ln">13</span><span class="cl"><span class="n">  type</span>    <span class="o">=</span> <span class="k">number</span>
</span></span><span class="line"><span class="ln">14</span><span class="cl"><span class="n">  default</span> <span class="o">=</span> <span class="m">7</span>
</span></span><span class="line"><span class="ln">15</span><span class="cl">}
</span></span><span class="line"><span class="ln">16</span><span class="cl">
</span></span><span class="line"><span class="ln">17</span><span class="cl"><span class="k">variable</span> <span class="s2">&#34;deletion_protection&#34;</span> {
</span></span><span class="line"><span class="ln">18</span><span class="cl"><span class="n">  type</span>    <span class="o">=</span> <span class="k">bool</span>
</span></span><span class="line"><span class="ln">19</span><span class="cl"><span class="n">  default</span> <span class="o">=</span> <span class="kt">true</span>
</span></span><span class="line"><span class="ln">20</span><span class="cl">}</span></span></code></pre></div>




<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-hcl" data-lang="hcl"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="c1"># modules/database/main.tf — module 用參數建資源，不含環境判斷
</span></span></span><span class="line"><span class="ln"> 2</span><span class="cl"><span class="c1"></span><span class="k">resource</span> <span class="s2">&#34;aws_db_instance&#34; &#34;primary&#34;</span> {
</span></span><span class="line"><span class="ln"> 3</span><span class="cl"><span class="n">  identifier</span>              <span class="o">=</span> <span class="s2">&#34;${var.service_name}-${var.env}&#34;</span>
</span></span><span class="line"><span class="ln"> 4</span><span class="cl"><span class="n">  engine</span>                  <span class="o">=</span> <span class="s2">&#34;postgres&#34;</span>
</span></span><span class="line"><span class="ln"> 5</span><span class="cl"><span class="n">  engine_version</span>          <span class="o">=</span> <span class="k">var</span><span class="p">.</span><span class="k">engine_version</span>
</span></span><span class="line"><span class="ln"> 6</span><span class="cl"><span class="n">  instance_class</span>          <span class="o">=</span> <span class="k">var</span><span class="p">.</span><span class="k">instance_class</span>
</span></span><span class="line"><span class="ln"> 7</span><span class="cl"><span class="n">  multi_az</span>                <span class="o">=</span> <span class="k">var</span><span class="p">.</span><span class="k">multi_az</span>
</span></span><span class="line"><span class="ln"> 8</span><span class="cl"><span class="n">  backup_retention_period</span> <span class="o">=</span> <span class="k">var</span><span class="p">.</span><span class="k">backup_retention_days</span>
</span></span><span class="line"><span class="ln"> 9</span><span class="cl"><span class="n">  deletion_protection</span>     <span class="o">=</span> <span class="k">var</span><span class="p">.</span><span class="k">deletion_protection</span>
</span></span><span class="line"><span class="ln">10</span><span class="cl"><span class="n">  db_subnet_group_name</span>    <span class="o">=</span> <span class="k">var</span><span class="p">.</span><span class="k">subnet_group_name</span>
</span></span><span class="line"><span class="ln">11</span><span class="cl"><span class="n">  vpc_security_group_ids</span>  <span class="o">=</span> <span class="k">var</span><span class="p">.</span><span class="k">security_group_ids</span>
</span></span><span class="line"><span class="ln">12</span><span class="cl">}</span></span></code></pre></div>




<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-hcl" data-lang="hcl"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="c1"># environments/prod/main.tf — prod 傳自己的值
</span></span></span><span class="line"><span class="ln"> 2</span><span class="cl"><span class="c1"></span><span class="k">module</span> <span class="s2">&#34;database&#34;</span> {
</span></span><span class="line"><span class="ln"> 3</span><span class="cl"><span class="n">  source</span>                <span class="o">=</span> <span class="s2">&#34;../../modules/database&#34;</span>
</span></span><span class="line"><span class="ln"> 4</span><span class="cl"><span class="n">  service_name</span>          <span class="o">=</span> <span class="s2">&#34;payments&#34;</span>
</span></span><span class="line"><span class="ln"> 5</span><span class="cl"><span class="n">  env</span>                   <span class="o">=</span> <span class="s2">&#34;prod&#34;</span>
</span></span><span class="line"><span class="ln"> 6</span><span class="cl"><span class="n">  instance_class</span>        <span class="o">=</span> <span class="s2">&#34;db.r6g.xlarge&#34;</span>
</span></span><span class="line"><span class="ln"> 7</span><span class="cl"><span class="n">  engine_version</span>        <span class="o">=</span> <span class="s2">&#34;16.3&#34;</span>
</span></span><span class="line"><span class="ln"> 8</span><span class="cl"><span class="n">  multi_az</span>              <span class="o">=</span> <span class="kt">true</span>
</span></span><span class="line"><span class="ln"> 9</span><span class="cl"><span class="n">  backup_retention_days</span> <span class="o">=</span> <span class="m">30</span>
</span></span><span class="line"><span class="ln">10</span><span class="cl"><span class="n">  deletion_protection</span>   <span class="o">=</span> <span class="kt">true</span>
</span></span><span class="line"><span class="ln">11</span><span class="cl"><span class="n">  subnet_group_name</span>     <span class="o">=</span> <span class="k">module</span><span class="p">.</span><span class="k">network</span><span class="p">.</span><span class="k">private_subnet_group</span>
</span></span><span class="line"><span class="ln">12</span><span class="cl"><span class="n">  security_group_ids</span>    <span class="o">=</span> <span class="p">[</span><span class="k">module</span><span class="p">.</span><span class="k">network</span><span class="p">.</span><span class="k">db_security_group_id</span><span class="p">]</span>
</span></span><span class="line"><span class="ln">13</span><span class="cl">}</span></span></code></pre></div>




<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-hcl" data-lang="hcl"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="c1"># environments/dev/main.tf — dev 傳縮小版的值
</span></span></span><span class="line"><span class="ln"> 2</span><span class="cl"><span class="c1"></span><span class="k">module</span> <span class="s2">&#34;database&#34;</span> {
</span></span><span class="line"><span class="ln"> 3</span><span class="cl"><span class="n">  source</span>                <span class="o">=</span> <span class="s2">&#34;../../modules/database&#34;</span>
</span></span><span class="line"><span class="ln"> 4</span><span class="cl"><span class="n">  service_name</span>          <span class="o">=</span> <span class="s2">&#34;payments&#34;</span>
</span></span><span class="line"><span class="ln"> 5</span><span class="cl"><span class="n">  env</span>                   <span class="o">=</span> <span class="s2">&#34;dev&#34;</span>
</span></span><span class="line"><span class="ln"> 6</span><span class="cl"><span class="n">  instance_class</span>        <span class="o">=</span> <span class="s2">&#34;db.t3.micro&#34;</span>
</span></span><span class="line"><span class="ln"> 7</span><span class="cl"><span class="n">  engine_version</span>        <span class="o">=</span> <span class="s2">&#34;16.3&#34;</span>
</span></span><span class="line"><span class="ln"> 8</span><span class="cl"><span class="n">  multi_az</span>              <span class="o">=</span> <span class="kt">false</span>
</span></span><span class="line"><span class="ln"> 9</span><span class="cl"><span class="n">  backup_retention_days</span> <span class="o">=</span> <span class="m">1</span>
</span></span><span class="line"><span class="ln">10</span><span class="cl"><span class="n">  deletion_protection</span>   <span class="o">=</span> <span class="kt">false</span>
</span></span><span class="line"><span class="ln">11</span><span class="cl"><span class="n">  subnet_group_name</span>     <span class="o">=</span> <span class="k">module</span><span class="p">.</span><span class="k">network</span><span class="p">.</span><span class="k">private_subnet_group</span>
</span></span><span class="line"><span class="ln">12</span><span class="cl"><span class="n">  security_group_ids</span>    <span class="o">=</span> <span class="p">[</span><span class="k">module</span><span class="p">.</span><span class="k">network</span><span class="p">.</span><span class="k">db_security_group_id</span><span class="p">]</span>
</span></span><span class="line"><span class="ln">13</span><span class="cl">}</span></span></code></pre></div><p>這樣 dev 與 prod 跑的是位元層級相同的 module code，差異全部收斂在 <code>main.tf</code> 的呼叫參數裡、一眼可審。Review 時只要 diff 各環境的參數區塊就能看完所有環境差異。如果發現有人為了某環境的特例去改 module 內部，那是漂移正在發生的徵兆——該把特例改寫成新的參數，而非在 module 裡加 <code>if env == &quot;prod&quot;</code> 分支。核心服務怎麼用 module 跨環境重用，見<a href="/blog/infra/05-core-services/" data-link-title="模組五：核心服務上 IaC" data-link-desc="資料庫、運算、儲存、load balancer 怎麼寫進基礎設施程式碼，以及上線順序">模組五：核心服務上 IaC</a>。</p>
<h2 id="環境差異參數化prod-放大dev-縮小">環境差異參數化：prod 放大、dev 縮小</h2>
<p>環境之間真正該不同的是規模與冗餘等級，而這些差異全部表達成參數值、不表達成不同的 code。Prod 承擔真實流量與可用性承諾，所以跨多個可用區（multi-AZ）部署、機器規格放大、備份保留更久、開啟刪除保護；dev 承擔的是迭代速度與成本控制，所以單 AZ、最小機型、短備份甚至無備份，壞了重建即可。</p>
<p>把這些差異參數化的好處是「環境拓樸的形狀一致、只有刻度不同」。Dev 與 prod 都經過同一段 module 邏輯，prod 不會出現一段 dev 從未執行過的 code path — 真正上線的設定，在 dev 已經以縮小版驗證過邏輯正確性。</p>
<table>
  <thead>
      <tr>
          <th>參數</th>
          <th>prod</th>
          <th>staging</th>
          <th>dev</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>instance_class</td>
          <td><code>db.r6g.xlarge</code></td>
          <td><code>db.r6g.large</code></td>
          <td><code>db.t3.micro</code></td>
      </tr>
      <tr>
          <td>multi_az</td>
          <td><code>true</code></td>
          <td><code>true</code></td>
          <td><code>false</code></td>
      </tr>
      <tr>
          <td>backup_retention</td>
          <td><code>30</code></td>
          <td><code>14</code></td>
          <td><code>1</code></td>
      </tr>
      <tr>
          <td>deletion_protection</td>
          <td><code>true</code></td>
          <td><code>true</code></td>
          <td><code>false</code></td>
      </tr>
      <tr>
          <td>desired_count</td>
          <td><code>6</code></td>
          <td><code>2</code></td>
          <td><code>1</code></td>
      </tr>
  </tbody>
</table>
<p>常見陷阱是把成本差異做成「dev 直接砍掉某個元件」：例如 dev 為了省錢不建負載平衡器、prod 才建，結果 prod 的 LB 相關設定從來沒在 dev 測過。較合理的做法是 dev 也建同型元件、只把規格與數量縮到最小，讓拓樸保持同構、只縮放刻度。</p>
<p>邊界在於少數差異無法只靠刻度表達。Prod 需要合規要求的稽核 log、dev 不需要；prod 要開 WAF、dev 不需要。這類差異用 <code>count</code> 或 <code>for_each</code> 配一個布林參數開關：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-hcl" data-lang="hcl"><span class="line"><span class="ln">1</span><span class="cl"><span class="k">resource</span> <span class="s2">&#34;aws_cloudwatch_log_group&#34; &#34;audit&#34;</span> {
</span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="n">  count</span>             <span class="o">=</span> <span class="k">var</span><span class="p">.</span><span class="k">enable_audit_log</span> <span class="err">?</span> <span class="m">1</span> <span class="err">:</span> <span class="m">0</span>
</span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="n">  name</span>              <span class="o">=</span> <span class="s2">&#34;/app/${var.env}/audit&#34;</span>
</span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="n">  retention_in_days</span> <span class="o">=</span> <span class="m">365</span>
</span></span><span class="line"><span class="ln">5</span><span class="cl">}</span></span></code></pre></div><p>仍然走參數化、不分叉 code — 分叉 code 是漂移的起點。跨可用區與冗餘的網路面怎麼鋪，見<a href="/blog/infra/03-network-foundation/" data-link-title="模組三：網路地基 — VPC 與分層" data-link-desc="VPC、public / private subnet 切分、route table、NAT、security group 設計">模組三：網路地基</a>。</p>
<h2 id="retrofit-路徑把單環境拆成-per-env-module">retrofit 路徑：把單環境拆成 per-env module</h2>
<p>很多專案是先在單一環境把 IAM、VPC、核心資源都建起來、跑通了，才意識到需要環境分離 — 這是常見且合理的演進順序，尤其是先救火上線、之後才回頭納管的情況。Retrofit 的目標是在不破壞正在服務的資源前提下，把這份「隱含為 prod」的單環境，重構成「modules + per-env 呼叫」的結構，並讓現有資源平移成 prod 環境。</p>
<p>安全的步驟順序是先重構 code、再動資源歸屬，且每一步都用 <code>terraform plan</code> 確認「零變更」。</p>
<h3 id="第一步抽-module宣告搬遷">第一步：抽 module、宣告搬遷</h3>
<p>把現有資源宣告抽成 module：把 <code>main.tf</code> 裡的資源搬進 <code>modules/</code>，原地用 module 呼叫取代，所有值先寫死成現況。資源在 state 裡的位址會從 <code>aws_db_instance.primary</code> 變成 <code>module.database.aws_db_instance.primary</code>，用 <code>moved {}</code> 區塊宣告搬遷，避免工具誤判為「銷毀舊的、建新的」：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-hcl" data-lang="hcl"><span class="line"><span class="ln">1</span><span class="cl"><span class="k">moved</span> {
</span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="n">  from</span> <span class="o">=</span> <span class="k">aws_db_instance</span><span class="p">.</span><span class="k">primary</span>
</span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="n">  to</span>   <span class="o">=</span> <span class="k">module</span><span class="p">.</span><span class="k">database</span><span class="p">.</span><span class="k">aws_db_instance</span><span class="p">.</span><span class="k">primary</span>
</span></span><span class="line"><span class="ln">4</span><span class="cl">}
</span></span><span class="line"><span class="ln">5</span><span class="cl">
</span></span><span class="line"><span class="ln">6</span><span class="cl"><span class="k">moved</span> {
</span></span><span class="line"><span class="ln">7</span><span class="cl"><span class="n">  from</span> <span class="o">=</span> <span class="k">aws_security_group</span><span class="p">.</span><span class="k">db</span>
</span></span><span class="line"><span class="ln">8</span><span class="cl"><span class="n">  to</span>   <span class="o">=</span> <span class="k">module</span><span class="p">.</span><span class="k">database</span><span class="p">.</span><span class="k">aws_security_group</span><span class="p">.</span><span class="k">db</span>
</span></span><span class="line"><span class="ln">9</span><span class="cl">}</span></span></code></pre></div><p>此時 <code>plan</code> 必須顯示無任何新增或銷毀 — 只是重新組織 code。如果 plan 出現任何 <code>destroy</code> 或 <code>forces replacement</code>，在 prod 路徑上停下來查 <code>moved</code> 區塊哪裡寫錯。</p>
<h3 id="第二步參數化">第二步：參數化</h3>
<p>把寫死的值換成 prod 的參數：把現況值搬進 <code>environments/prod/terraform.tfvars</code>，module 改吃參數。<code>plan</code> 仍須零變更，因為參數值就等於現況值。這一步的驗證方式很機械：每個從字面值改成變數引用的欄位，都能在 tfvars 裡找到完全相同的值。</p>
<h3 id="第三步新增其他環境">第三步：新增其他環境</h3>
<p>複製 prod 的呼叫結構成 <code>environments/dev/</code>，給它自己的 backend（獨立 state）與縮小的參數值。這一步是純新增、不碰 prod。先在 dev <code>apply</code> 出一套完整的縮小版環境、確認 module 在新環境也能 plan/apply 乾淨，再回頭確信 prod 的重構沒有副作用。</p>
<h3 id="風險控制">風險控制</h3>
<p>最大的風險集中在前兩步：現有資源是活的，任何讓工具判定「需要替換」的改動，對 IAM 角色可能是短暫權限真空、對 VPC 可能是子網重建導致服務中斷。防護措施有三個層級：</p>
<p>第一，把每一次 <code>plan</code> 的輸出當成必須為零的驗收條件。非零就停下來查 <code>moved</code> 區塊或參數值哪裡跟現況不符。狀態危險的訊號是 <code>plan</code> 出現任何 <code>destroy</code> 或 <code>forces replacement</code>，在 prod 路徑上這幾乎都該暫停。</p>
<p>第二，在 retrofit 開始前備份 state 檔。S3 backend 有 versioning 可以回捲，但多一份本地備份增加保險層：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl"><span class="c1"># retrofit 前備份 state</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl">aws s3 cp s3://acme-tf-state/prod/terraform.tfstate <span class="se">\
</span></span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="se"></span>  ./state-backup-<span class="k">$(</span>date +%Y%m%d<span class="k">)</span>.tfstate</span></span></code></pre></div><p>第三，<code>moved</code> 區塊優先用宣告式（可 review、可回滾），手動 <code>state mv</code> 留給 <code>moved</code> 表達不了的跨 module 搬遷。整個 retrofit 走 PR 流程、讓 plan 輸出在 review 時可見，見<a href="/blog/infra/07-infra-as-pr/" data-link-title="模組七：infra 走 PR 流程與自動化護欄" data-link-desc="infra 變更走 PR → plan → review diff → 合併 → apply，配 fmt / validate / tflint / checkov / tfsec 與 Atlantis 自動化，讓基礎設施可審查、可回溯、可交接">模組七：infra 走 PR 流程</a>。</p>
<p>時程參考：10-20 個資源的典型環境，從單環境拆成 module + per-env 結構約需 1-2 週（含每步 plan 驗證與跨環境推送）。50 個以上資源的環境需要 3-4 週分階段執行，每階段以一組功能相關的資源為單位。這些時程的主要變數是 stateful 資源的數量——每個 stateful 資源的 moved/import 操作都需要額外的 plan 驗證與備份保險。</p>
<h2 id="跨分類引用">跨分類引用</h2>
<ul>
<li>→ <a href="/blog/infra/01-minimal-iac/" data-link-title="模組一：最小可行 IaC — state 地基與 Console 唯讀鐵律" data-link-desc="Terraform / OpenTofu 選型、remote state 與 lock，以及「Console 只能看不能改」鐵律">模組一：最小可行 IaC</a>：每個環境的 state 怎麼隔開</li>
<li>→ <a href="/blog/infra/02-identity-credentials/" data-link-title="模組二：身分與憑證地基 — IAM 與 OIDC" data-link-desc="IAM role / policy 設計、最小權限，以及用 OIDC 短期憑證取代長期 access key">模組二：身分與憑證地基</a>：帳號級隔離的權限邊界</li>
<li>→ <a href="/blog/infra/03-network-foundation/" data-link-title="模組三：網路地基 — VPC 與分層" data-link-desc="VPC、public / private subnet 切分、route table、NAT、security group 設計">模組三：網路地基</a>：跨可用區冗餘的網路面</li>
<li>→ <a href="/blog/infra/05-core-services/" data-link-title="模組五：核心服務上 IaC" data-link-desc="資料庫、運算、儲存、load balancer 怎麼寫進基礎設施程式碼，以及上線順序">模組五：核心服務上 IaC</a>：核心服務怎麼用 module 跨環境重用</li>
<li>→ <a href="/blog/infra/07-infra-as-pr/" data-link-title="模組七：infra 走 PR 流程與自動化護欄" data-link-desc="infra 變更走 PR → plan → review diff → 合併 → apply，配 fmt / validate / tflint / checkov / tfsec 與 Atlantis 自動化，讓基礎設施可審查、可回溯、可交接">模組七：infra 走 PR 流程</a>：retrofit 的 plan 輸出怎麼進 review</li>
<li>→ <a href="/blog/infra/02-identity-credentials/multi-account-strategy/" data-link-title="跨帳號策略 — Organizations、SCP 與帳號工廠" data-link-desc="用 AWS Organizations 把環境拆成獨立帳號、用 SCP 設定連管理員都越不過的護欄、用帳號工廠讓每個新帳號自帶安全基線">跨帳號策略</a>：帳號級隔離是環境分離光譜最粗的一端</li>
</ul>
]]></content:encoded></item><item><title>單環境到多環境的 Retrofit 操作手冊</title><link>https://tarrragon.github.io/blog/infra/04-environment-separation/single-to-multi-env-retrofit/</link><pubDate>Fri, 26 Jun 2026 00:00:00 +0000</pubDate><guid>https://tarrragon.github.io/blog/infra/04-environment-separation/single-to-multi-env-retrofit/</guid><description>&lt;p>單環境的 Terraform 設定在資源數量少、只有一個人操作時運作順暢。當需要第二個環境（dev 或 staging）、或第二個人開始改 infra 時，單環境的限制會開始浮現：沒有地方安全地測試變更、apply 一次就是對 production 動手。Retrofit 的目標是把這份單環境設定拆成「module + per-env 目錄」的結構，讓 dev 與 prod 各持獨立 state、共用同一套邏輯，而且在整個過程中 production 的資源不受任何影響。&lt;/p>
&lt;h2 id="retrofit-前的準備">Retrofit 前的準備&lt;/h2>
&lt;p>Retrofit 操作的是正在服務的 production 資源，每一步都要確認「plan 顯示零變更」才能往下走。準備工作的目的是降低操作過程中的風險。&lt;/p>
&lt;h3 id="state-備份">State 備份&lt;/h3>
&lt;p>開始之前把 state 拉一份完整備份到本地：&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-bash" data-lang="bash">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">terraform state pull &amp;gt; state-backup-&lt;span class="k">$(&lt;/span>date +%Y%m%d&lt;span class="k">)&lt;/span>.json&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>這份備份是最後的回退手段。如果 retrofit 過程中 state 被弄壞（例如 moved block 指向錯誤的位置），可以用 &lt;code>terraform state push state-backup.json&lt;/code> 回到起點重來。state push 會覆蓋遠端 state，屬於危險操作——只在回退時使用。&lt;/p>
&lt;h3 id="識別-stateful-資源">識別 stateful 資源&lt;/h3>
&lt;p>列出所有 state 裡的資源，標記哪些是 stateful（RDS、S3 含資料、EBS volume）：&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-bash" data-lang="bash">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">terraform state list &lt;span class="p">|&lt;/span> sort&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>Stateful 資源在 retrofit 過程中的風險最高：如果 moved block 寫錯導致 Terraform 判定需要 replace（先刪後建），stateful 資源的 replace 代表資料遺失。後面每一步的 plan 輸出都要特別檢查 stateful 資源有沒有出現 &lt;code>must be replaced&lt;/code> 或 &lt;code>forces replacement&lt;/code>。&lt;/p>
&lt;h3 id="確認-plan-baseline">確認 plan baseline&lt;/h3>
&lt;p>在還沒改任何 code 之前先跑一次 plan，確認起點是乾淨的：&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-bash" data-lang="bash">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">terraform plan -detailed-exitcode&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>Exit code 0 代表 state 與現實一致、沒有 drift。如果此時就有 drift（exit code 2），先解決 drift 再做 retrofit——在已經有 drift 的基礎上做結構重構，plan 的差異訊號會被 drift 淹沒，無法區分「drift 造成的差異」和「retrofit 造成的差異」。&lt;/p>
&lt;h2 id="步驟一把資源宣告抽成-module">步驟一：把資源宣告抽成 module&lt;/h2>
&lt;p>第一步純粹是程式碼重組——把 &lt;code>main.tf&lt;/code> 裡的資源宣告搬進 &lt;code>modules/&lt;/code> 目錄，原地改成 module 呼叫。這一步不改任何資源屬性、不改 backend、不改 provider，所有值先寫死成當前的值。&lt;/p>
&lt;h3 id="目標目錄結構">目標目錄結構&lt;/h3>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-text" data-lang="text">&lt;span class="line">&lt;span class="ln"> 1&lt;/span>&lt;span class="cl">infra/
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 2&lt;/span>&lt;span class="cl">├── modules/
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 3&lt;/span>&lt;span class="cl">│ ├── network/
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 4&lt;/span>&lt;span class="cl">│ │ ├── main.tf # VPC、subnet、SG 從根目錄搬過來
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 5&lt;/span>&lt;span class="cl">│ │ ├── variables.tf # 先把所有值寫死在 default 裡
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 6&lt;/span>&lt;span class="cl">│ │ └── outputs.tf # 暴露 VPC ID、subnet IDs 等
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 7&lt;/span>&lt;span class="cl">│ └── database/
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 8&lt;/span>&lt;span class="cl">│ ├── main.tf # RDS 從根目錄搬過來
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 9&lt;/span>&lt;span class="cl">│ ├── variables.tf
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">10&lt;/span>&lt;span class="cl">│ └── outputs.tf
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">11&lt;/span>&lt;span class="cl">├── main.tf # 改成 module 呼叫
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">12&lt;/span>&lt;span class="cl">├── backend.tf # 不動
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">13&lt;/span>&lt;span class="cl">└── terraform.tfvars # 這一步還不存在&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h3 id="用-moved-block-告訴-terraform-搬家">用 moved block 告訴 Terraform 搬家&lt;/h3>
&lt;p>資源從根目錄搬進 module 後，Terraform 的內部位址從 &lt;code>aws_vpc.main&lt;/code> 變成 &lt;code>module.network.aws_vpc.main&lt;/code>。如果不告訴 Terraform 這個對應關係，它會判定舊位址的資源「要刪」、新位址的資源「要建」——對 VPC 或 RDS 來說這代表服務中斷。&lt;/p></description><content:encoded><![CDATA[<p>單環境的 Terraform 設定在資源數量少、只有一個人操作時運作順暢。當需要第二個環境（dev 或 staging）、或第二個人開始改 infra 時，單環境的限制會開始浮現：沒有地方安全地測試變更、apply 一次就是對 production 動手。Retrofit 的目標是把這份單環境設定拆成「module + per-env 目錄」的結構，讓 dev 與 prod 各持獨立 state、共用同一套邏輯，而且在整個過程中 production 的資源不受任何影響。</p>
<h2 id="retrofit-前的準備">Retrofit 前的準備</h2>
<p>Retrofit 操作的是正在服務的 production 資源，每一步都要確認「plan 顯示零變更」才能往下走。準備工作的目的是降低操作過程中的風險。</p>
<h3 id="state-備份">State 備份</h3>
<p>開始之前把 state 拉一份完整備份到本地：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">terraform state pull &gt; state-backup-<span class="k">$(</span>date +%Y%m%d<span class="k">)</span>.json</span></span></code></pre></div><p>這份備份是最後的回退手段。如果 retrofit 過程中 state 被弄壞（例如 moved block 指向錯誤的位置），可以用 <code>terraform state push state-backup.json</code> 回到起點重來。state push 會覆蓋遠端 state，屬於危險操作——只在回退時使用。</p>
<h3 id="識別-stateful-資源">識別 stateful 資源</h3>
<p>列出所有 state 裡的資源，標記哪些是 stateful（RDS、S3 含資料、EBS volume）：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">terraform state list <span class="p">|</span> sort</span></span></code></pre></div><p>Stateful 資源在 retrofit 過程中的風險最高：如果 moved block 寫錯導致 Terraform 判定需要 replace（先刪後建），stateful 資源的 replace 代表資料遺失。後面每一步的 plan 輸出都要特別檢查 stateful 資源有沒有出現 <code>must be replaced</code> 或 <code>forces replacement</code>。</p>
<h3 id="確認-plan-baseline">確認 plan baseline</h3>
<p>在還沒改任何 code 之前先跑一次 plan，確認起點是乾淨的：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">terraform plan -detailed-exitcode</span></span></code></pre></div><p>Exit code 0 代表 state 與現實一致、沒有 drift。如果此時就有 drift（exit code 2），先解決 drift 再做 retrofit——在已經有 drift 的基礎上做結構重構，plan 的差異訊號會被 drift 淹沒，無法區分「drift 造成的差異」和「retrofit 造成的差異」。</p>
<h2 id="步驟一把資源宣告抽成-module">步驟一：把資源宣告抽成 module</h2>
<p>第一步純粹是程式碼重組——把 <code>main.tf</code> 裡的資源宣告搬進 <code>modules/</code> 目錄，原地改成 module 呼叫。這一步不改任何資源屬性、不改 backend、不改 provider，所有值先寫死成當前的值。</p>
<h3 id="目標目錄結構">目標目錄結構</h3>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="ln"> 1</span><span class="cl">infra/
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">├── modules/
</span></span><span class="line"><span class="ln"> 3</span><span class="cl">│   ├── network/
</span></span><span class="line"><span class="ln"> 4</span><span class="cl">│   │   ├── main.tf        # VPC、subnet、SG 從根目錄搬過來
</span></span><span class="line"><span class="ln"> 5</span><span class="cl">│   │   ├── variables.tf   # 先把所有值寫死在 default 裡
</span></span><span class="line"><span class="ln"> 6</span><span class="cl">│   │   └── outputs.tf     # 暴露 VPC ID、subnet IDs 等
</span></span><span class="line"><span class="ln"> 7</span><span class="cl">│   └── database/
</span></span><span class="line"><span class="ln"> 8</span><span class="cl">│       ├── main.tf        # RDS 從根目錄搬過來
</span></span><span class="line"><span class="ln"> 9</span><span class="cl">│       ├── variables.tf
</span></span><span class="line"><span class="ln">10</span><span class="cl">│       └── outputs.tf
</span></span><span class="line"><span class="ln">11</span><span class="cl">├── main.tf                # 改成 module 呼叫
</span></span><span class="line"><span class="ln">12</span><span class="cl">├── backend.tf             # 不動
</span></span><span class="line"><span class="ln">13</span><span class="cl">└── terraform.tfvars       # 這一步還不存在</span></span></code></pre></div><h3 id="用-moved-block-告訴-terraform-搬家">用 moved block 告訴 Terraform 搬家</h3>
<p>資源從根目錄搬進 module 後，Terraform 的內部位址從 <code>aws_vpc.main</code> 變成 <code>module.network.aws_vpc.main</code>。如果不告訴 Terraform 這個對應關係，它會判定舊位址的資源「要刪」、新位址的資源「要建」——對 VPC 或 RDS 來說這代表服務中斷。</p>
<p><code>moved</code> block 宣告式地描述搬遷：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-hcl" data-lang="hcl"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="k">moved</span> {
</span></span><span class="line"><span class="ln"> 2</span><span class="cl"><span class="n">  from</span> <span class="o">=</span> <span class="k">aws_vpc</span><span class="p">.</span><span class="k">main</span>
</span></span><span class="line"><span class="ln"> 3</span><span class="cl"><span class="n">  to</span>   <span class="o">=</span> <span class="k">module</span><span class="p">.</span><span class="k">network</span><span class="p">.</span><span class="k">aws_vpc</span><span class="p">.</span><span class="k">main</span>
</span></span><span class="line"><span class="ln"> 4</span><span class="cl">}
</span></span><span class="line"><span class="ln"> 5</span><span class="cl">
</span></span><span class="line"><span class="ln"> 6</span><span class="cl"><span class="k">moved</span> {
</span></span><span class="line"><span class="ln"> 7</span><span class="cl"><span class="n">  from</span> <span class="o">=</span> <span class="k">aws_subnet</span><span class="p">.</span><span class="k">public</span>
</span></span><span class="line"><span class="ln"> 8</span><span class="cl"><span class="n">  to</span>   <span class="o">=</span> <span class="k">module</span><span class="p">.</span><span class="k">network</span><span class="p">.</span><span class="k">aws_subnet</span><span class="p">.</span><span class="k">public</span>
</span></span><span class="line"><span class="ln"> 9</span><span class="cl">}
</span></span><span class="line"><span class="ln">10</span><span class="cl">
</span></span><span class="line"><span class="ln">11</span><span class="cl"><span class="k">moved</span> {
</span></span><span class="line"><span class="ln">12</span><span class="cl"><span class="n">  from</span> <span class="o">=</span> <span class="k">aws_subnet</span><span class="p">.</span><span class="k">private</span>
</span></span><span class="line"><span class="ln">13</span><span class="cl"><span class="n">  to</span>   <span class="o">=</span> <span class="k">module</span><span class="p">.</span><span class="k">network</span><span class="p">.</span><span class="k">aws_subnet</span><span class="p">.</span><span class="k">private</span>
</span></span><span class="line"><span class="ln">14</span><span class="cl">}
</span></span><span class="line"><span class="ln">15</span><span class="cl">
</span></span><span class="line"><span class="ln">16</span><span class="cl"><span class="k">moved</span> {
</span></span><span class="line"><span class="ln">17</span><span class="cl"><span class="n">  from</span> <span class="o">=</span> <span class="k">aws_db_instance</span><span class="p">.</span><span class="k">primary</span>
</span></span><span class="line"><span class="ln">18</span><span class="cl"><span class="n">  to</span>   <span class="o">=</span> <span class="k">module</span><span class="p">.</span><span class="k">database</span><span class="p">.</span><span class="k">aws_db_instance</span><span class="p">.</span><span class="k">primary</span>
</span></span><span class="line"><span class="ln">19</span><span class="cl">}</span></span></code></pre></div><p>每個搬進 module 的資源都需要一條 moved block。遺漏任何一條，plan 就會顯示該資源要 destroy + create。</p>
<h3 id="zero-change-plan-驗證">Zero-change plan 驗證</h3>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">terraform plan</span></span></code></pre></div><p>這一步的 plan 輸出必須是：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="ln">1</span><span class="cl">Plan: 0 to add, 0 to change, 0 to destroy.</span></span></code></pre></div><p>如果 plan 顯示任何 add、change 或 destroy，先停下來檢查：</p>
<ul>
<li><code>destroy + create</code>：moved block 遺漏或位址寫錯</li>
<li><code>change</code>：module 內的 resource 屬性跟搬進來之前不一致（漏了某個 attribute、default 值不同）</li>
<li><code>add</code>：新的 module output 或 data source 被 Terraform 當成新資源</li>
</ul>
<p>修到 plan 顯示零變更才能 apply。apply 之後 state 裡的資源位址從 <code>aws_vpc.main</code> 更新成 <code>module.network.aws_vpc.main</code>，雲端資源本身不受影響。</p>
<p>安全暫停點：本步完成後 code 已重組、state 位址已更新、雲端資源未變，環境處於自洽狀態，可隔日繼續。</p>
<h2 id="步驟二把寫死的值換成參數">步驟二：把寫死的值換成參數</h2>
<p>Module 內部的寫死值搬到 <code>variables.tf</code>，module 呼叫端從 <code>terraform.tfvars</code> 讀入。這一步的 plan 仍然必須是零變更——因為參數的值就等於原本寫死的值。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-hcl" data-lang="hcl"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="c1"># modules/database/variables.tf
</span></span></span><span class="line"><span class="ln"> 2</span><span class="cl"><span class="c1"></span><span class="k">variable</span> <span class="s2">&#34;instance_class&#34;</span> {
</span></span><span class="line"><span class="ln"> 3</span><span class="cl"><span class="n">  type</span> <span class="o">=</span> <span class="k">string</span>
</span></span><span class="line"><span class="ln"> 4</span><span class="cl">}
</span></span><span class="line"><span class="ln"> 5</span><span class="cl">
</span></span><span class="line"><span class="ln"> 6</span><span class="cl"><span class="k">variable</span> <span class="s2">&#34;multi_az&#34;</span> {
</span></span><span class="line"><span class="ln"> 7</span><span class="cl"><span class="n">  type</span>    <span class="o">=</span> <span class="k">bool</span>
</span></span><span class="line"><span class="ln"> 8</span><span class="cl"><span class="n">  default</span> <span class="o">=</span> <span class="kt">false</span>
</span></span><span class="line"><span class="ln"> 9</span><span class="cl">}
</span></span><span class="line"><span class="ln">10</span><span class="cl">
</span></span><span class="line"><span class="ln">11</span><span class="cl"><span class="k">variable</span> <span class="s2">&#34;backup_retention_days&#34;</span> {
</span></span><span class="line"><span class="ln">12</span><span class="cl"><span class="n">  type</span>    <span class="o">=</span> <span class="k">number</span>
</span></span><span class="line"><span class="ln">13</span><span class="cl"><span class="n">  default</span> <span class="o">=</span> <span class="m">7</span>
</span></span><span class="line"><span class="ln">14</span><span class="cl">}</span></span></code></pre></div>




<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-hcl" data-lang="hcl"><span class="line"><span class="ln">1</span><span class="cl"><span class="c1"># main.tf — module 呼叫端
</span></span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="c1"></span><span class="k">module</span> <span class="s2">&#34;database&#34;</span> {
</span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="n">  source</span>                <span class="o">=</span> <span class="s2">&#34;./modules/database&#34;</span>
</span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="n">  instance_class</span>        <span class="o">=</span> <span class="k">var</span><span class="p">.</span><span class="k">db_instance_class</span>
</span></span><span class="line"><span class="ln">5</span><span class="cl"><span class="n">  multi_az</span>              <span class="o">=</span> <span class="k">var</span><span class="p">.</span><span class="k">db_multi_az</span>
</span></span><span class="line"><span class="ln">6</span><span class="cl"><span class="n">  backup_retention_days</span> <span class="o">=</span> <span class="k">var</span><span class="p">.</span><span class="k">db_backup_retention_days</span>
</span></span><span class="line"><span class="ln">7</span><span class="cl">}</span></span></code></pre></div>




<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-hcl" data-lang="hcl"><span class="line"><span class="ln">1</span><span class="cl"><span class="c1"># terraform.tfvars — prod 的值
</span></span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="c1"></span><span class="n">db_instance_class</span>        <span class="o">=</span> <span class="s2">&#34;db.r6g.large&#34;</span>
</span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="n">db_multi_az</span>              <span class="o">=</span> <span class="kt">true</span>
</span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="n">db_backup_retention_days</span> <span class="o">=</span> <span class="m">30</span></span></span></code></pre></div><p>再跑一次 plan 確認零變更。值從寫死改成參數傳入，但傳入的值跟原來一樣，所以 Terraform 算出的差異是零。</p>
<p>安全暫停點：本步完成後 module 已參數化、prod 行為不變，可隔日繼續。</p>
<h2 id="步驟三建立新環境目錄">步驟三：建立新環境目錄</h2>
<p>prod 確認穩定後，建 dev 環境的獨立目錄。這一步是純新增——不碰 prod 的任何檔案。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="ln"> 1</span><span class="cl">infra/
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">├── modules/           # 共用（不動）
</span></span><span class="line"><span class="ln"> 3</span><span class="cl">├── environments/
</span></span><span class="line"><span class="ln"> 4</span><span class="cl">│   ├── prod/
</span></span><span class="line"><span class="ln"> 5</span><span class="cl">│   │   ├── main.tf          # 原本根目錄的 module 呼叫搬過來
</span></span><span class="line"><span class="ln"> 6</span><span class="cl">│   │   ├── backend.tf       # prod 的 state 位址
</span></span><span class="line"><span class="ln"> 7</span><span class="cl">│   │   └── terraform.tfvars # prod 的值
</span></span><span class="line"><span class="ln"> 8</span><span class="cl">│   └── dev/
</span></span><span class="line"><span class="ln"> 9</span><span class="cl">│       ├── main.tf          # 複製 prod 的 module 呼叫
</span></span><span class="line"><span class="ln">10</span><span class="cl">│       ├── backend.tf       # dev 的獨立 state 位址
</span></span><span class="line"><span class="ln">11</span><span class="cl">│       └── terraform.tfvars # dev 的縮小值</span></span></code></pre></div><p>dev 的 <code>terraform.tfvars</code> 用縮小的規格：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-hcl" data-lang="hcl"><span class="line"><span class="ln">1</span><span class="cl"><span class="c1"># environments/dev/terraform.tfvars
</span></span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="c1"></span><span class="n">db_instance_class</span>        <span class="o">=</span> <span class="s2">&#34;db.t3.micro&#34;</span>
</span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="n">db_multi_az</span>              <span class="o">=</span> <span class="kt">false</span>
</span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="n">db_backup_retention_days</span> <span class="o">=</span> <span class="m">1</span></span></span></code></pre></div><p>dev 的 <code>backend.tf</code> 指向獨立的 state 路徑——dev 和 prod 的 state 從一開始就是分開的，不存在「事後拆」的需求：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-hcl" data-lang="hcl"><span class="line"><span class="ln">1</span><span class="cl"><span class="k">terraform</span> {
</span></span><span class="line"><span class="ln">2</span><span class="cl">  <span class="k">backend</span> <span class="s2">&#34;s3&#34;</span> {
</span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="n">    bucket</span>         <span class="o">=</span> <span class="s2">&#34;acme-tf-state&#34;</span>
</span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="n">    key</span>            <span class="o">=</span> <span class="s2">&#34;dev/terraform.tfstate&#34;</span>
</span></span><span class="line"><span class="ln">5</span><span class="cl"><span class="n">    region</span>         <span class="o">=</span> <span class="s2">&#34;ap-northeast-1&#34;</span>
</span></span><span class="line"><span class="ln">6</span><span class="cl"><span class="n">    encrypt</span>        <span class="o">=</span> <span class="kt">true</span>
</span></span><span class="line"><span class="ln">7</span><span class="cl"><span class="n">    dynamodb_table</span> <span class="o">=</span> <span class="s2">&#34;acme-tf-lock&#34;</span>
</span></span><span class="line"><span class="ln">8</span><span class="cl">  }
</span></span><span class="line"><span class="ln">9</span><span class="cl">}</span></span></code></pre></div><p>如果原本的 prod 是在根目錄操作（不是在 <code>environments/prod/</code> 目錄），這一步還需要把 prod 的操作也搬進 <code>environments/prod/</code>。這個搬遷本身又是一次 moved block + zero-change plan 驗證的循環。</p>
<p>安全暫停點：本步是純新增（建目錄和檔案），不影響 prod 的 state 或資源，可隔日繼續。</p>
<h2 id="步驟四先在-dev-apply-驗證">步驟四：先在 dev apply 驗證</h2>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl"><span class="nb">cd</span> environments/dev
</span></span><span class="line"><span class="ln">2</span><span class="cl">terraform init
</span></span><span class="line"><span class="ln">3</span><span class="cl">terraform plan
</span></span><span class="line"><span class="ln">4</span><span class="cl">terraform apply</span></span></code></pre></div><p>dev 是全新環境、全新 state，apply 會建出一整套資源。這一步驗證的是 module 在「從零建立」的情境下能否正常運作。如果 dev apply 成功且環境可用，代表 module 的邏輯正確。</p>
<p>dev 環境 apply 後跑一次 plan 確認零 drift：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">terraform plan -detailed-exitcode
</span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="c1"># 預期 exit code 0</span></span></span></code></pre></div><p>安全暫停點：dev 環境已驗證、prod 未受影響，可隔日繼續最後的 prod 驗證。</p>
<h2 id="步驟五驗證-prod-未受影響">步驟五：驗證 prod 未受影響</h2>
<p>回到 prod 目錄，跑 plan 確認 prod 的資源沒有任何變化：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl"><span class="nb">cd</span> environments/prod
</span></span><span class="line"><span class="ln">2</span><span class="cl">terraform plan -detailed-exitcode
</span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="c1"># 預期 exit code 0</span></span></span></code></pre></div><p>如果此時 prod plan 顯示差異，可能的原因：</p>
<ul>
<li>prod 的 module 呼叫路徑變了（<code>source = &quot;./modules/...&quot;</code> → <code>source = &quot;../../modules/...&quot;</code>）但 moved block 沒跟著更新</li>
<li><code>terraform.tfvars</code> 的某個值跟原本寫死的不一致</li>
<li>provider 版本在 init 時升級了</li>
</ul>
<p>修到零變更。這一步結束後 retrofit 完成——prod 和 dev 各持獨立 state、共用同一套 module、環境差異全部收斂在 tfvars 裡。</p>
<h2 id="常見陷阱">常見陷阱</h2>
<h3 id="moved-block-vs-terraform-state-mv">moved block vs terraform state mv</h3>
<p>兩者都能告訴 Terraform 資源搬了家。<code>moved</code> block 是宣告式的——寫在 HCL 裡、可以 review、可以 revert（刪掉 moved block 就回去）。<code>terraform state mv</code> 是命令式的——直接改 state，沒有 review 機制、改完沒有 undo。</p>
<p>優先用 moved block。<code>state mv</code> 留給 moved block 表達不了的情境：跨 state 搬遷（把資源從一份 state 移到另一份）、或 Terraform 版本太舊不支援 moved block（0.13 以下）。</p>
<h3 id="forces-replacement-觸發">forces replacement 觸發</h3>
<p>某些 resource 的某些 attribute 是「改了就要重建」的（immutable attribute）。常見的觸發：</p>
<table>
  <thead>
      <tr>
          <th>Resource</th>
          <th>Attribute</th>
          <th>改了會怎樣</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td><code>aws_db_instance</code></td>
          <td><code>identifier</code></td>
          <td>forces replacement（資料遺失）</td>
      </tr>
      <tr>
          <td><code>aws_db_instance</code></td>
          <td><code>engine</code></td>
          <td>forces replacement</td>
      </tr>
      <tr>
          <td><code>aws_instance</code></td>
          <td><code>ami</code></td>
          <td>forces replacement</td>
      </tr>
      <tr>
          <td><code>aws_s3_bucket</code></td>
          <td><code>bucket</code></td>
          <td>forces replacement（bucket 名稱不可改）</td>
      </tr>
      <tr>
          <td><code>aws_vpc</code></td>
          <td><code>cidr_block</code></td>
          <td>forces replacement</td>
      </tr>
  </tbody>
</table>
<p>Retrofit 過程中如果不小心改了這些 attribute（例如把 <code>identifier = &quot;mydb&quot;</code> 參數化時打錯了值），plan 會顯示 <code>must be replaced</code>。stateful 資源的 replacement 代表先刪後建——對 RDS 來說就是資料遺失。所以每一步 plan 都要特別檢查有沒有 <code>forces replacement</code> 的輸出。</p>
<h3 id="state-locking-與並行操作">State locking 與並行操作</h3>
<p>Retrofit 期間如果有其他人同時 apply（CI pipeline 被觸發、同事在操作），兩邊的 state 操作會衝突。DynamoDB lock table 會擋下並行的 apply，但 init 和 plan 不一定會被擋。</p>
<p>操作建議：retrofit 開始前在團隊頻道通知「infra 暫停操作」，retrofit 完成後再解除。如果用 Atlantis，可以暫時鎖定 apply 權限。時程參考：10-20 個資源的環境，步驟一到五約需半天到一天。</p>
<h2 id="跨分類引用">跨分類引用</h2>
<ul>
<li>→ <a href="/blog/infra/04-environment-separation/directory-module-parameterization/" data-link-title="環境分離與模組化 — 目錄結構、module 參數化與 retrofit 路徑" data-link-desc="用目錄結構在第一天就隔開 dev 與 prod 的 state，用 module 讓環境共用同一套邏輯只差參數，以及已經單環境跑起來後怎麼安全拆分">環境分離與模組化</a>：retrofit 的目標結構與設計原則</li>
<li>→ <a href="/blog/infra/01-minimal-iac/iac-tool-state-backend/" data-link-title="IaC 工具選型與 state 地基" data-link-desc="Terraform / OpenTofu / CDK / Pulumi 的選型判準，state 作為 IaC 工具對現實的唯一記憶，以及 remote state backend 的自管與託管路線">IaC 工具選型與 state 地基</a>：state backend 的設定與 lock 機制</li>
<li>→ <a href="/blog/infra/05-core-services/stateful-protection-dependency/" data-link-title="Stateful 資源保護與跨服務依賴表達" data-link-desc="stateful 資源的保護策略（multi-AZ、備份、刪除保護）、stateful 與 stateless 的操作差異，以及用 output 與 data source 表達服務間依賴">模組五：Stateful 資源保護</a>：stateful 資源的 replacement 風險</li>
<li>→ <a href="/blog/infra/07-infra-as-pr/plan-review-apply-guardrails/" data-link-title="infra 走 PR 流程與自動化護欄" data-link-desc="infra 變更走 PR → plan → review diff → 合併 → apply，配 fmt / validate / tflint / checkov / tfsec 與 Atlantis 自動化，讓基礎設施可審查、可回溯、可交接">infra 走 PR 流程</a>：retrofit 的每一步走 PR 讓 plan 可被 review</li>
</ul>
]]></content:encoded></item></channel></rss>