<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/"><channel><title>Vertical-Slice on Tarragon</title><link>https://tarrragon.github.io/blog/tags/vertical-slice/</link><description>Recent content in Vertical-Slice on Tarragon</description><generator>Hugo -- gohugo.io</generator><language>zh-TW</language><copyright>Tarragon (CC BY 4.0)</copyright><lastBuildDate>Thu, 07 May 2026 00:00:00 +0000</lastBuildDate><atom:link href="https://tarrragon.github.io/blog/tags/vertical-slice/index.xml" rel="self" type="application/rss+xml"/><item><title>0.13 操作控制 vertical slice 實作入口</title><link>https://tarrragon.github.io/blog/backend/00-service-selection/operations-control-vertical-slice/</link><pubDate>Thu, 07 May 2026 00:00:00 +0000</pubDate><guid>https://tarrragon.github.io/blog/backend/00-service-selection/operations-control-vertical-slice/</guid><description>&lt;p>操作控制 vertical slice 的核心責任是把「看得見、驗得過、接得住、回寫得動」落到同一個服務流程。這一章把 &lt;a href="https://tarrragon.github.io/blog/backend/knowledge-cards/evidence-package/" data-link-title="Evidence Package" data-link-desc="說明觀測、驗證與事故流程如何把證據包成可交接、可回放的 artifact">evidence package&lt;/a>、&lt;a href="https://tarrragon.github.io/blog/backend/knowledge-cards/steady-state/" data-link-title="Steady State" data-link-desc="說明可靠性實驗與事故恢復如何定義系統應維持的可接受狀態">steady state&lt;/a>、&lt;a href="https://tarrragon.github.io/blog/backend/knowledge-cards/incident-decision-log/" data-link-title="Incident Decision Log" data-link-desc="說明事故期間如何保留決策、證據、owner 與回退條件">incident decision log&lt;/a> 與 action item closure 串成第一個可實作切片。&lt;/p>
&lt;h2 id="大綱">大綱&lt;/h2>
&lt;ul>
&lt;li>實作目標：選一個核心 user journey，建立最小操作控制閉環&lt;/li>
&lt;li>輸入：服務入口、核心依賴、SLO / SLI、告警、驗證場景、事故流程&lt;/li>
&lt;li>產出：evidence package、verification evidence handoff、incident decision log、write-back item&lt;/li>
&lt;li>邊界：先做 artifact 與路由，工具與語言實作留給 04 / 06 / 08 與語言教材&lt;/li>
&lt;li>驗收：能從一次異常走完 triage、verification、decision、write-back&lt;/li>
&lt;/ul>
&lt;h2 id="實作目標">實作目標&lt;/h2>
&lt;p>Vertical slice 的目標是先做一條可回放的操作控制路徑。選一個核心 user journey，例如 checkout、message delivery、document publish、login 或 invoice generation，讓這條路徑同時具備觀測證據、驗證門檻、事故決策與回寫機制。&lt;/p>
&lt;p>這一輪的交付是 artifact 與流程責任。工具可以是現有 log search、dashboard、ticket、runbook repository 與 chat；重點是資料欄位與流程責任先成立，後續才判斷是否需要 Prometheus、OpenTelemetry backend、PagerDuty、incident.io 或 chaos tooling。&lt;/p>
&lt;h2 id="選擇服務切片">選擇服務切片&lt;/h2>
&lt;p>服務切片的選擇責任是找到最能暴露 04 / 06 / 08 交接問題的路徑。第一條 slice 應該具備使用者影響、依賴邊界、可量測訊號與可驗證失敗模式。&lt;/p>
&lt;table>
 &lt;thead>
 &lt;tr>
 &lt;th>候選切片&lt;/th>
 &lt;th>適合原因&lt;/th>
 &lt;th>常見失敗模式&lt;/th>
 &lt;/tr>
 &lt;/thead>
 &lt;tbody>
 &lt;tr>
 &lt;td>Checkout&lt;/td>
 &lt;td>直接連到收入與客戶痛點&lt;/td>
 &lt;td>payment timeout、inventory lag&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>Message delivery&lt;/td>
 &lt;td>同時包含同步入口與非同步處理&lt;/td>
 &lt;td>queue lag、redelivery loop&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>Login&lt;/td>
 &lt;td>影響所有後續功能&lt;/td>
 &lt;td>identity provider outage&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>Document publish&lt;/td>
 &lt;td>涵蓋寫入、背景工作與通知&lt;/td>
 &lt;td>stale read、worker backlog&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>Invoice&lt;/td>
 &lt;td>牽涉正確性與客戶信任&lt;/td>
 &lt;td>duplicate charge、missing file&lt;/td>
 &lt;/tr>
 &lt;/tbody>
&lt;/table>
&lt;p>Checkout 適合第一輪，因為它同時暴露 latency、dependency failure、customer impact 與 rollback decision。若團隊沒有交易路徑，可以選 message delivery 或 login；判準是這條路徑一旦失效，on-call 需要在 15 分鐘內做出明確決策。&lt;/p>
&lt;p>Message delivery 適合用來驗證 async observability。它能暴露 request id、correlation id、queue lag、DLQ、retry policy 與 replay runbook 的交接品質。&lt;/p>
&lt;p>Login 適合用來驗證外部依賴事故。它能暴露 identity provider、fallback、status page、security split 與 customer communication 的邊界。&lt;/p>
&lt;h2 id="artifact-契約">Artifact 契約&lt;/h2>
&lt;p>Artifact 契約的責任是讓每個環節都有可交接輸出。這些 artifact 可以先用 Markdown、ticket 欄位或 incident template 表達，等流程跑通後再導入工具自動化。&lt;/p>
&lt;table>
 &lt;thead>
 &lt;tr>
 &lt;th>Artifact&lt;/th>
 &lt;th>最小欄位&lt;/th>
 &lt;th>來源章節&lt;/th>
 &lt;th>下游使用&lt;/th>
 &lt;/tr>
 &lt;/thead>
 &lt;tbody>
 &lt;tr>
 &lt;td>Observability evidence package&lt;/td>
 &lt;td>source、time range、query link、owner、data quality、confidence、known gap&lt;/td>
 &lt;td>&lt;a href="https://tarrragon.github.io/blog/backend/04-observability/observability-evidence-package/" data-link-title="4.20 Observability Evidence Package" data-link-desc="把 log、metric、trace、audit 與資料品質限制包成可交接證據">4.20&lt;/a>&lt;/td>
 &lt;td>triage、release gate、PIR&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>Verification evidence handoff&lt;/td>
 &lt;td>hypothesis、scope、steady state、workload / fault、result、decision、owner&lt;/td>
 &lt;td>&lt;a href="https://tarrragon.github.io/blog/backend/06-reliability/verification-evidence-handoff/" data-link-title="6.23 Verification Evidence Handoff" data-link-desc="把 SLO、load、chaos、DR 與 readiness 結果包成 release / incident 可用證據">6.23&lt;/a>&lt;/td>
 &lt;td>release gate、runbook、drill&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>Incident decision log&lt;/td>
 &lt;td>timestamp、decision、context、evidence、owner、expected effect、rollback condition&lt;/td>
 &lt;td>&lt;a href="https://tarrragon.github.io/blog/backend/08-incident-response/incident-decision-log/" data-link-title="8.19 Incident Decision Log" data-link-desc="把事中假設、決策、證據、回退條件與責任人留下可復盤紀錄">8.19&lt;/a>&lt;/td>
 &lt;td>handoff、stakeholder update、PIR&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>Incident evidence write-back&lt;/td>
 &lt;td>finding、evidence、target artifact、owner、closure signal、review date&lt;/td>
 &lt;td>&lt;a href="https://tarrragon.github.io/blog/backend/08-incident-response/incident-evidence-write-back/" data-link-title="8.22 Incident Evidence Write-back" data-link-desc="把事故證據、決策與復盤結論回寫到 observability、reliability 與 runbook">8.22&lt;/a>&lt;/td>
 &lt;td>dashboard、experiment、runbook&lt;/td>
 &lt;/tr>
 &lt;/tbody>
&lt;/table>
&lt;p>Observability evidence package 是第一個 artifact。它保存查詢、時間窗、資料品質與 owner，讓後面的驗證與事故流程使用同一組事實。&lt;/p></description><content:encoded><![CDATA[<p>操作控制 vertical slice 的核心責任是把「看得見、驗得過、接得住、回寫得動」落到同一個服務流程。這一章把 <a href="/blog/backend/knowledge-cards/evidence-package/" data-link-title="Evidence Package" data-link-desc="說明觀測、驗證與事故流程如何把證據包成可交接、可回放的 artifact">evidence package</a>、<a href="/blog/backend/knowledge-cards/steady-state/" data-link-title="Steady State" data-link-desc="說明可靠性實驗與事故恢復如何定義系統應維持的可接受狀態">steady state</a>、<a href="/blog/backend/knowledge-cards/incident-decision-log/" data-link-title="Incident Decision Log" data-link-desc="說明事故期間如何保留決策、證據、owner 與回退條件">incident decision log</a> 與 action item closure 串成第一個可實作切片。</p>
<h2 id="大綱">大綱</h2>
<ul>
<li>實作目標：選一個核心 user journey，建立最小操作控制閉環</li>
<li>輸入：服務入口、核心依賴、SLO / SLI、告警、驗證場景、事故流程</li>
<li>產出：evidence package、verification evidence handoff、incident decision log、write-back item</li>
<li>邊界：先做 artifact 與路由，工具與語言實作留給 04 / 06 / 08 與語言教材</li>
<li>驗收：能從一次異常走完 triage、verification、decision、write-back</li>
</ul>
<h2 id="實作目標">實作目標</h2>
<p>Vertical slice 的目標是先做一條可回放的操作控制路徑。選一個核心 user journey，例如 checkout、message delivery、document publish、login 或 invoice generation，讓這條路徑同時具備觀測證據、驗證門檻、事故決策與回寫機制。</p>
<p>這一輪的交付是 artifact 與流程責任。工具可以是現有 log search、dashboard、ticket、runbook repository 與 chat；重點是資料欄位與流程責任先成立，後續才判斷是否需要 Prometheus、OpenTelemetry backend、PagerDuty、incident.io 或 chaos tooling。</p>
<h2 id="選擇服務切片">選擇服務切片</h2>
<p>服務切片的選擇責任是找到最能暴露 04 / 06 / 08 交接問題的路徑。第一條 slice 應該具備使用者影響、依賴邊界、可量測訊號與可驗證失敗模式。</p>
<table>
  <thead>
      <tr>
          <th>候選切片</th>
          <th>適合原因</th>
          <th>常見失敗模式</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>Checkout</td>
          <td>直接連到收入與客戶痛點</td>
          <td>payment timeout、inventory lag</td>
      </tr>
      <tr>
          <td>Message delivery</td>
          <td>同時包含同步入口與非同步處理</td>
          <td>queue lag、redelivery loop</td>
      </tr>
      <tr>
          <td>Login</td>
          <td>影響所有後續功能</td>
          <td>identity provider outage</td>
      </tr>
      <tr>
          <td>Document publish</td>
          <td>涵蓋寫入、背景工作與通知</td>
          <td>stale read、worker backlog</td>
      </tr>
      <tr>
          <td>Invoice</td>
          <td>牽涉正確性與客戶信任</td>
          <td>duplicate charge、missing file</td>
      </tr>
  </tbody>
</table>
<p>Checkout 適合第一輪，因為它同時暴露 latency、dependency failure、customer impact 與 rollback decision。若團隊沒有交易路徑，可以選 message delivery 或 login；判準是這條路徑一旦失效，on-call 需要在 15 分鐘內做出明確決策。</p>
<p>Message delivery 適合用來驗證 async observability。它能暴露 request id、correlation id、queue lag、DLQ、retry policy 與 replay runbook 的交接品質。</p>
<p>Login 適合用來驗證外部依賴事故。它能暴露 identity provider、fallback、status page、security split 與 customer communication 的邊界。</p>
<h2 id="artifact-契約">Artifact 契約</h2>
<p>Artifact 契約的責任是讓每個環節都有可交接輸出。這些 artifact 可以先用 Markdown、ticket 欄位或 incident template 表達，等流程跑通後再導入工具自動化。</p>
<table>
  <thead>
      <tr>
          <th>Artifact</th>
          <th>最小欄位</th>
          <th>來源章節</th>
          <th>下游使用</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>Observability evidence package</td>
          <td>source、time range、query link、owner、data quality、confidence、known gap</td>
          <td><a href="/blog/backend/04-observability/observability-evidence-package/" data-link-title="4.20 Observability Evidence Package" data-link-desc="把 log、metric、trace、audit 與資料品質限制包成可交接證據">4.20</a></td>
          <td>triage、release gate、PIR</td>
      </tr>
      <tr>
          <td>Verification evidence handoff</td>
          <td>hypothesis、scope、steady state、workload / fault、result、decision、owner</td>
          <td><a href="/blog/backend/06-reliability/verification-evidence-handoff/" data-link-title="6.23 Verification Evidence Handoff" data-link-desc="把 SLO、load、chaos、DR 與 readiness 結果包成 release / incident 可用證據">6.23</a></td>
          <td>release gate、runbook、drill</td>
      </tr>
      <tr>
          <td>Incident decision log</td>
          <td>timestamp、decision、context、evidence、owner、expected effect、rollback condition</td>
          <td><a href="/blog/backend/08-incident-response/incident-decision-log/" data-link-title="8.19 Incident Decision Log" data-link-desc="把事中假設、決策、證據、回退條件與責任人留下可復盤紀錄">8.19</a></td>
          <td>handoff、stakeholder update、PIR</td>
      </tr>
      <tr>
          <td>Incident evidence write-back</td>
          <td>finding、evidence、target artifact、owner、closure signal、review date</td>
          <td><a href="/blog/backend/08-incident-response/incident-evidence-write-back/" data-link-title="8.22 Incident Evidence Write-back" data-link-desc="把事故證據、決策與復盤結論回寫到 observability、reliability 與 runbook">8.22</a></td>
          <td>dashboard、experiment、runbook</td>
      </tr>
  </tbody>
</table>
<p>Observability evidence package 是第一個 artifact。它保存查詢、時間窗、資料品質與 owner，讓後面的驗證與事故流程使用同一組事實。</p>
<p>Verification evidence handoff 是第二個 artifact。它把一次 load test、chaos drill、DR rehearsal 或 readiness review 的結果轉成 release gate 與 incident drill 可用的證據。</p>
<p>Incident decision log 是第三個 artifact。它把事中決策、證據、預期效果與回退條件保存下來，讓交班與復盤可以直接引用。</p>
<p>Incident evidence write-back 是第四個 artifact。它把事故學習轉成 dashboard、alert、SLO、experiment、runbook 或 automation boundary 的修改項。</p>
<h2 id="實作步驟">實作步驟</h2>
<p>實作步驟的責任是讓 slice 能被單次演練走完。每一步都產生一個可檢查輸出，避免流程只停在口頭共識。</p>
<ol>
<li>選定服務切片與核心 user journey。</li>
<li>定義 steady state：success rate、latency、queue lag、data correctness、customer impact。</li>
<li>補 observability evidence package：dashboard、query、trace、log、audit、data quality。</li>
<li>補 verification evidence handoff：load、chaos、DR 或 rollback rehearsal 的 hypothesis 與 result。</li>
<li>建 incident intake template：source、confidence、impact scope、evidence link、severity candidate。</li>
<li>建 incident decision log template：decision、owner、expected effect、rollback condition。</li>
<li>建 write-back template：finding、target artifact、closure signal、review date。</li>
<li>跑一次 tabletop 或 game day，確認 artifact 能被實際填寫。</li>
<li>把缺口回寫到 04 readiness、06 experiment 或 08 runbook。</li>
</ol>
<p>第一步要避免選太大的系統。選「checkout」比選「整個支付平台」更好，因為 slice 需要在一輪演練中跑完。</p>
<p>第二步要先定義穩態。沒有 steady state，load test、chaos 與 incident recovery 都會缺少共同終點。</p>
<p>第三步要保留 data quality 限制。若 trace sampling、log drop 或 metric ingest delay 會影響判讀，限制要跟 evidence 一起交接。</p>
<p>第四步要把驗證結果變成下游可用語言。Pass、conditional、fail 都要附上 scope、hypothesis 與下一步路由。</p>
<p>第五到第七步要先用輕量 template。template 跑通後，再把欄位搬進 incident tool、ticket system 或 runbook platform。</p>
<p>第八步要實際演練。tabletop 可以先驗證欄位與角色，game day 再驗證工具與訊號。</p>
<h2 id="最小-template">最小 template</h2>
<p>最小 template 的責任是讓第一輪不用等待工具導入。以下欄位可以直接放進 Markdown、ticket、incident doc 或 runbook。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="nt">service_slice</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 2</span><span class="cl"><span class="w">  </span><span class="nt">journey</span><span class="p">:</span><span class="w"> </span><span class="l">checkout</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 3</span><span class="cl"><span class="w">  </span><span class="nt">owner</span><span class="p">:</span><span class="w"> </span><span class="l">payments-team</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 4</span><span class="cl"><span class="w">  </span><span class="nt">steady_state</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 5</span><span class="cl"><span class="w">    </span><span class="nt">success_rate</span><span class="p">:</span><span class="w"> </span><span class="s2">&#34;&gt;= 99.9% over 30m&#34;</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 6</span><span class="cl"><span class="w">    </span><span class="nt">latency</span><span class="p">:</span><span class="w"> </span><span class="s2">&#34;p95 &lt;= 800ms&#34;</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 7</span><span class="cl"><span class="w">    </span><span class="nt">queue_lag</span><span class="p">:</span><span class="w"> </span><span class="s2">&#34;&lt;= 5m&#34;</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 8</span><span class="cl"><span class="w">    </span><span class="nt">customer_impact</span><span class="p">:</span><span class="w"> </span><span class="s2">&#34;failed checkout count &lt;= threshold&#34;</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 9</span><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="ln">10</span><span class="cl"><span class="w"></span><span class="nt">evidence_package</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln">11</span><span class="cl"><span class="w">  </span><span class="nt">source</span><span class="p">:</span><span class="w"> </span><span class="s2">&#34;dashboard / log query / trace / audit&#34;</span><span class="w">
</span></span></span><span class="line"><span class="ln">12</span><span class="cl"><span class="w">  </span><span class="nt">time_range</span><span class="p">:</span><span class="w"> </span><span class="s2">&#34;incident window plus baseline&#34;</span><span class="w">
</span></span></span><span class="line"><span class="ln">13</span><span class="cl"><span class="w">  </span><span class="nt">query_link</span><span class="p">:</span><span class="w"> </span><span class="s2">&#34;stable query URL or saved query name&#34;</span><span class="w">
</span></span></span><span class="line"><span class="ln">14</span><span class="cl"><span class="w">  </span><span class="nt">owner</span><span class="p">:</span><span class="w"> </span><span class="s2">&#34;service or platform owner&#34;</span><span class="w">
</span></span></span><span class="line"><span class="ln">15</span><span class="cl"><span class="w">  </span><span class="nt">data_quality</span><span class="p">:</span><span class="w"> </span><span class="s2">&#34;sampling, freshness, missing fields&#34;</span><span class="w">
</span></span></span><span class="line"><span class="ln">16</span><span class="cl"><span class="w">  </span><span class="nt">confidence</span><span class="p">:</span><span class="w"> </span><span class="s2">&#34;confirmed / suspected / weak&#34;</span><span class="w">
</span></span></span><span class="line"><span class="ln">17</span><span class="cl"><span class="w">  </span><span class="nt">known_gap</span><span class="p">:</span><span class="w"> </span><span class="s2">&#34;missing signal or schema drift&#34;</span><span class="w">
</span></span></span><span class="line"><span class="ln">18</span><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="ln">19</span><span class="cl"><span class="w"></span><span class="nt">verification_handoff</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln">20</span><span class="cl"><span class="w">  </span><span class="nt">hypothesis</span><span class="p">:</span><span class="w"> </span><span class="s2">&#34;payment provider timeout triggers fallback within 2m&#34;</span><span class="w">
</span></span></span><span class="line"><span class="ln">21</span><span class="cl"><span class="w">  </span><span class="nt">scope</span><span class="p">:</span><span class="w"> </span><span class="s2">&#34;staging or 10% production traffic&#34;</span><span class="w">
</span></span></span><span class="line"><span class="ln">22</span><span class="cl"><span class="w">  </span><span class="nt">workload_or_fault</span><span class="p">:</span><span class="w"> </span><span class="s2">&#34;timeout injection against provider adapter&#34;</span><span class="w">
</span></span></span><span class="line"><span class="ln">23</span><span class="cl"><span class="w">  </span><span class="nt">result</span><span class="p">:</span><span class="w"> </span><span class="s2">&#34;pass / conditional / fail&#34;</span><span class="w">
</span></span></span><span class="line"><span class="ln">24</span><span class="cl"><span class="w">  </span><span class="nt">decision</span><span class="p">:</span><span class="w"> </span><span class="s2">&#34;release / block / follow-up / runbook update&#34;</span><span class="w">
</span></span></span><span class="line"><span class="ln">25</span><span class="cl"><span class="w">  </span><span class="nt">owner</span><span class="p">:</span><span class="w"> </span><span class="s2">&#34;closure owner&#34;</span><span class="w">
</span></span></span><span class="line"><span class="ln">26</span><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="ln">27</span><span class="cl"><span class="w"></span><span class="nt">incident_decision</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln">28</span><span class="cl"><span class="w">  </span><span class="nt">timestamp</span><span class="p">:</span><span class="w"> </span><span class="s2">&#34;2026-05-07T10:15:00Z&#34;</span><span class="w">
</span></span></span><span class="line"><span class="ln">29</span><span class="cl"><span class="w">  </span><span class="nt">decision</span><span class="p">:</span><span class="w"> </span><span class="s2">&#34;enable checkout fallback&#34;</span><span class="w">
</span></span></span><span class="line"><span class="ln">30</span><span class="cl"><span class="w">  </span><span class="nt">context</span><span class="p">:</span><span class="w"> </span><span class="s2">&#34;provider timeout and rising failed checkout&#34;</span><span class="w">
</span></span></span><span class="line"><span class="ln">31</span><span class="cl"><span class="w">  </span><span class="nt">evidence</span><span class="p">:</span><span class="w"> </span><span class="s2">&#34;evidence_package link&#34;</span><span class="w">
</span></span></span><span class="line"><span class="ln">32</span><span class="cl"><span class="w">  </span><span class="nt">owner</span><span class="p">:</span><span class="w"> </span><span class="s2">&#34;incident commander or service owner&#34;</span><span class="w">
</span></span></span><span class="line"><span class="ln">33</span><span class="cl"><span class="w">  </span><span class="nt">expected_effect</span><span class="p">:</span><span class="w"> </span><span class="s2">&#34;failed checkout drops within 10m&#34;</span><span class="w">
</span></span></span><span class="line"><span class="ln">34</span><span class="cl"><span class="w">  </span><span class="nt">rollback_condition</span><span class="p">:</span><span class="w"> </span><span class="s2">&#34;fallback stale data exceeds threshold&#34;</span><span class="w">
</span></span></span><span class="line"><span class="ln">35</span><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="ln">36</span><span class="cl"><span class="w"></span><span class="nt">write_back</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln">37</span><span class="cl"><span class="w">  </span><span class="nt">finding</span><span class="p">:</span><span class="w"> </span><span class="s2">&#34;provider timeout alert lacks tenant dimension&#34;</span><span class="w">
</span></span></span><span class="line"><span class="ln">38</span><span class="cl"><span class="w">  </span><span class="nt">target_artifact</span><span class="p">:</span><span class="w"> </span><span class="s2">&#34;dashboard / alert / experiment / runbook&#34;</span><span class="w">
</span></span></span><span class="line"><span class="ln">39</span><span class="cl"><span class="w">  </span><span class="nt">closure_signal</span><span class="p">:</span><span class="w"> </span><span class="s2">&#34;game day triggers tenant-scoped alert within 5m&#34;</span><span class="w">
</span></span></span><span class="line"><span class="ln">40</span><span class="cl"><span class="w">  </span><span class="nt">review_date</span><span class="p">:</span><span class="w"> </span><span class="s2">&#34;next readiness review&#34;</span></span></span></code></pre></div><p>這份 template 的價值是把四個 artifact 放在同一份文件中。第一輪可以手動填寫，第二輪再拆到不同工具。</p>
<h2 id="驗收門檻">驗收門檻</h2>
<p>驗收門檻的責任是判斷 slice 是否已經能支援實際事故。完成狀態要由團隊能否沿著 artifact 做出同一組判斷來確認。</p>
<table>
  <thead>
      <tr>
          <th>驗收項目</th>
          <th>通過訊號</th>
          <th>回寫位置</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>Triage</td>
          <td>on-call 能用 evidence 判斷是否啟動事故</td>
          <td>8.18 intake</td>
      </tr>
      <tr>
          <td>Verification</td>
          <td>release owner 能讀 handoff 做放行判斷</td>
          <td>6.8 release gate</td>
      </tr>
      <tr>
          <td>Decision</td>
          <td>IC 能用 decision log 交班與回退</td>
          <td>8.19 decision log</td>
      </tr>
      <tr>
          <td>Communication</td>
          <td>stakeholder update 能引用同一組 impact</td>
          <td>8.10 comms</td>
      </tr>
      <tr>
          <td>Write-back</td>
          <td>PIR action item 有 target 與 closure</td>
          <td>8.22 write-back</td>
      </tr>
  </tbody>
</table>
<p>Triage 通過代表 evidence 能支援事故啟動。若 on-call 還需要臨場重新找資料，回到 4.16 readiness 與 4.20 evidence package。</p>
<p>Verification 通過代表驗證結果能支援 release 決策。若 release owner 只看到 pass / fail，回到 6.23 handoff 補 hypothesis、scope 與 data quality。</p>
<p>Decision 通過代表事故現場有共同記憶。若交班後需要重問背景，回到 8.19 decision log 補 context、evidence 與 rollback condition。</p>
<p>Write-back 通過代表事故學習有落點。若 action item 只有「補監控」或「更新文件」，回到 8.22 write-back 補 target artifact 與 closure signal。</p>
<h2 id="tripwire">Tripwire</h2>
<p>Tripwire 的責任是提醒團隊何時回到概念層補缺口。Vertical slice 的目的在於快速暴露 routing chain 哪裡斷掉，再用最小修正補上 artifact 與 owner。</p>
<table>
  <thead>
      <tr>
          <th>訊號</th>
          <th>判讀</th>
          <th>下一步</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>evidence 找不到 owner</td>
          <td>觀測 operating model 缺口</td>
          <td>回到 4.18 owner 與 review cadence</td>
      </tr>
      <tr>
          <td>pass / fail 缺少決策力</td>
          <td>verification handoff 缺口</td>
          <td>回到 6.23 補 scope、hypothesis、decision</td>
      </tr>
      <tr>
          <td>IC 交班缺少共同記憶</td>
          <td>decision log 缺口</td>
          <td>回到 8.19 補最近決策、未完成動作與 rollback 條件</td>
      </tr>
      <tr>
          <td>PIR action 缺少關閉力</td>
          <td>write-back 缺口</td>
          <td>回到 8.22 補 closure signal 與 review date</td>
      </tr>
      <tr>
          <td>template 填寫成本過高</td>
          <td>欄位過多或工具摩擦</td>
          <td>刪到最小欄位，再跑一次 tabletop</td>
      </tr>
  </tbody>
</table>
<p>這些 tripwire 出現時，先修 artifact 與流程，再考慮導入新工具。工具能降低填寫成本，但欄位責任與 owner 需要先清楚。</p>
<h2 id="交接路由">交接路由</h2>
<ul>
<li><a href="/blog/backend/00-service-selection/operations-control-service-selection/" data-link-title="0.12 觀測、可靠性與事故服務選型" data-link-desc="從訊號、驗證與響應三層能力判斷操作控制服務的選型順序">0.12 operations control service selection</a>：判斷目前缺的是訊號、驗證、響應還是閉環。</li>
<li><a href="/blog/backend/04-observability/observability-evidence-package/" data-link-title="4.20 Observability Evidence Package" data-link-desc="把 log、metric、trace、audit 與資料品質限制包成可交接證據">4.20 observability evidence package</a>：建立可交接觀測證據。</li>
<li><a href="/blog/backend/06-reliability/steady-state-definition/" data-link-title="6.22 Steady State Definition" data-link-desc="在 chaos 與 failover 前先定義系統應維持的穩定狀態與可接受退化">6.22 steady state definition</a>：定義實驗與事故共用成功條件。</li>
<li><a href="/blog/backend/06-reliability/verification-evidence-handoff/" data-link-title="6.23 Verification Evidence Handoff" data-link-desc="把 SLO、load、chaos、DR 與 readiness 結果包成 release / incident 可用證據">6.23 verification evidence handoff</a>：把驗證結果交給 release 與 incident。</li>
<li><a href="/blog/backend/08-incident-response/incident-decision-log/" data-link-title="8.19 Incident Decision Log" data-link-desc="把事中假設、決策、證據、回退條件與責任人留下可復盤紀錄">8.19 incident decision log</a>：保存事中決策與回退條件。</li>
<li><a href="/blog/backend/08-incident-response/incident-evidence-write-back/" data-link-title="8.22 Incident Evidence Write-back" data-link-desc="把事故證據、決策與復盤結論回寫到 observability、reliability 與 runbook">8.22 incident evidence write-back</a>：把事故學習回寫成可關閉改善。</li>
</ul>
]]></content:encoded></item></channel></rss>