<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/"><channel><title>Grafana-Cloud on Tarragon</title><link>https://tarrragon.github.io/blog/tags/grafana-cloud/</link><description>Recent content in Grafana-Cloud on Tarragon</description><generator>Hugo -- gohugo.io</generator><language>zh-TW</language><copyright>Tarragon (CC BY 4.0)</copyright><lastBuildDate>Tue, 19 May 2026 00:00:00 +0000</lastBuildDate><atom:link href="https://tarrragon.github.io/blog/tags/grafana-cloud/index.xml" rel="self" type="application/rss+xml"/><item><title>Self-managed Prometheus → Grafana Cloud Metrics：feature × ops × cost 對照</title><link>https://tarrragon.github.io/blog/backend/04-observability/vendors/grafana-stack/migrate-prometheus-to-cloud-metrics/</link><pubDate>Tue, 19 May 2026 00:00:00 +0000</pubDate><guid>https://tarrragon.github.io/blog/backend/04-observability/vendors/grafana-stack/migrate-prometheus-to-cloud-metrics/</guid><description>&lt;blockquote>
&lt;p>本文是跨 vendor migration playbook、cross-link &lt;a href="https://tarrragon.github.io/blog/backend/04-observability/vendors/prometheus/" data-link-title="Prometheus" data-link-desc="Pull-based metrics 主流 OSS、PromQL 與 alerting">Prometheus&lt;/a> 跟 &lt;a href="https://tarrragon.github.io/blog/backend/04-observability/vendors/grafana-stack/" data-link-title="Grafana Stack" data-link-desc="Grafana / Loki / Tempo / Mimir / Pyroscope 全棧">Grafana Stack&lt;/a>（Grafana Cloud Metrics、Mimir-backed）。跑 &lt;a href="https://tarrragon.github.io/blog/posts/migration-playbook-%E6%96%B9%E6%B3%95%E8%AB%96%E7%9A%84%E6%BC%94%E5%8C%96%E7%B4%80%E9%8C%84stage-0-variant-%E8%A6%8F%E5%8A%83%E6%8A%8A-collapse-%E7%8E%87%E5%BE%9E-60-%E9%99%8D%E5%88%B0-0/" data-link-title="Migration Playbook 方法論的演化紀錄：Stage 0 variant 規劃把 collapse 率從 60% 降到 0%" data-link-desc="跨 vendor migration playbook 需要獨立寫作方法論的依據，以及這套方法論從三輪 batch dogfood 中演化出來的驗證證據。">migration-playbook-methodology 6 維 audit&lt;/a> 後對映 &lt;em>Operational = High → Type C operational redesign hybrid&lt;/em>。&lt;/p>&lt;/blockquote>
&lt;h2 id="feature--ops--cost-三維對照">Feature / ops / cost 三維對照&lt;/h2>
&lt;table>
 &lt;thead>
 &lt;tr>
 &lt;th>維度&lt;/th>
 &lt;th>Self-managed Prometheus&lt;/th>
 &lt;th>Grafana Cloud Metrics&lt;/th>
 &lt;/tr>
 &lt;/thead>
 &lt;tbody>
 &lt;tr>
 &lt;td>Storage backend&lt;/td>
 &lt;td>Local disk + remote_write (optional)&lt;/td>
 &lt;td>Mimir + S3 (auto cold tier)&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>Retention&lt;/td>
 &lt;td>TSDB local 15 天 default&lt;/td>
 &lt;td>13 個月 default、可延長&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>HA&lt;/td>
 &lt;td>Two Prometheus + sidecar&lt;/td>
 &lt;td>Built-in multi-AZ&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>Cardinality limit&lt;/td>
 &lt;td>自管 limit + recording rule&lt;/td>
 &lt;td>1.5M active series / tier、scale-up 配額&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>Query API&lt;/td>
 &lt;td>PromQL + Prometheus HTTP API&lt;/td>
 &lt;td>完全相容&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>Alert&lt;/td>
 &lt;td>Alertmanager self-managed&lt;/td>
 &lt;td>Grafana Cloud Alerting&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>Dashboard&lt;/td>
 &lt;td>Grafana self-managed&lt;/td>
 &lt;td>Grafana Cloud (included)&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>Long-term storage&lt;/td>
 &lt;td>Thanos / Cortex / Mimir 自管&lt;/td>
 &lt;td>Mimir 內建&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>Cost (mid-tier)&lt;/td>
 &lt;td>$500-2000 / mo + ops FTE&lt;/td>
 &lt;td>$300-1500 / mo (按 series)&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>Operational FTE&lt;/td>
 &lt;td>0.3-0.8&lt;/td>
 &lt;td>0.05-0.15&lt;/td>
 &lt;/tr>
 &lt;/tbody>
&lt;/table>
&lt;p>跑 &lt;a href="https://tarrragon.github.io/blog/posts/migration-playbook-%E6%96%B9%E6%B3%95%E8%AB%96%E7%9A%84%E6%BC%94%E5%8C%96%E7%B4%80%E9%8C%84stage-0-variant-%E8%A6%8F%E5%8A%83%E6%8A%8A-collapse-%E7%8E%87%E5%BE%9E-60-%E9%99%8D%E5%88%B0-0/" data-link-title="Migration Playbook 方法論的演化紀錄：Stage 0 variant 規劃把 collapse 率從 60% 降到 0%" data-link-desc="跨 vendor migration playbook 需要獨立寫作方法論的依據，以及這套方法論從三輪 batch dogfood 中演化出來的驗證證據。">6 維 diff dimension audit&lt;/a>：&lt;/p>
&lt;table>
 &lt;thead>
 &lt;tr>
 &lt;th>維度&lt;/th>
 &lt;th>等級&lt;/th>
 &lt;/tr>
 &lt;/thead>
 &lt;tbody>
 &lt;tr>
 &lt;td>Schema / API&lt;/td>
 &lt;td>Low（PromQL + API 完全相容）&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>Operational&lt;/td>
 &lt;td>&lt;strong>High&lt;/strong>（HA / retention / scaling 全託管）&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>Paradigm&lt;/td>
 &lt;td>Low（同 Prometheus metric paradigm）&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>Components&lt;/td>
 &lt;td>Low&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>Application change&lt;/td>
 &lt;td>Low（remote_write endpoint 改）&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>Data topology&lt;/td>
 &lt;td>Low&lt;/td>
 &lt;/tr>
 &lt;/tbody>
&lt;/table>
&lt;p>Operational = High → Type C standard。&lt;/p></description><content:encoded><![CDATA[<blockquote>
<p>本文是跨 vendor migration playbook、cross-link <a href="/blog/backend/04-observability/vendors/prometheus/" data-link-title="Prometheus" data-link-desc="Pull-based metrics 主流 OSS、PromQL 與 alerting">Prometheus</a> 跟 <a href="/blog/backend/04-observability/vendors/grafana-stack/" data-link-title="Grafana Stack" data-link-desc="Grafana / Loki / Tempo / Mimir / Pyroscope 全棧">Grafana Stack</a>（Grafana Cloud Metrics、Mimir-backed）。跑 <a href="/blog/posts/migration-playbook-%E6%96%B9%E6%B3%95%E8%AB%96%E7%9A%84%E6%BC%94%E5%8C%96%E7%B4%80%E9%8C%84stage-0-variant-%E8%A6%8F%E5%8A%83%E6%8A%8A-collapse-%E7%8E%87%E5%BE%9E-60-%E9%99%8D%E5%88%B0-0/" data-link-title="Migration Playbook 方法論的演化紀錄：Stage 0 variant 規劃把 collapse 率從 60% 降到 0%" data-link-desc="跨 vendor migration playbook 需要獨立寫作方法論的依據，以及這套方法論從三輪 batch dogfood 中演化出來的驗證證據。">migration-playbook-methodology 6 維 audit</a> 後對映 <em>Operational = High → Type C operational redesign hybrid</em>。</p></blockquote>
<h2 id="feature--ops--cost-三維對照">Feature / ops / cost 三維對照</h2>
<table>
  <thead>
      <tr>
          <th>維度</th>
          <th>Self-managed Prometheus</th>
          <th>Grafana Cloud Metrics</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>Storage backend</td>
          <td>Local disk + remote_write (optional)</td>
          <td>Mimir + S3 (auto cold tier)</td>
      </tr>
      <tr>
          <td>Retention</td>
          <td>TSDB local 15 天 default</td>
          <td>13 個月 default、可延長</td>
      </tr>
      <tr>
          <td>HA</td>
          <td>Two Prometheus + sidecar</td>
          <td>Built-in multi-AZ</td>
      </tr>
      <tr>
          <td>Cardinality limit</td>
          <td>自管 limit + recording rule</td>
          <td>1.5M active series / tier、scale-up 配額</td>
      </tr>
      <tr>
          <td>Query API</td>
          <td>PromQL + Prometheus HTTP API</td>
          <td>完全相容</td>
      </tr>
      <tr>
          <td>Alert</td>
          <td>Alertmanager self-managed</td>
          <td>Grafana Cloud Alerting</td>
      </tr>
      <tr>
          <td>Dashboard</td>
          <td>Grafana self-managed</td>
          <td>Grafana Cloud (included)</td>
      </tr>
      <tr>
          <td>Long-term storage</td>
          <td>Thanos / Cortex / Mimir 自管</td>
          <td>Mimir 內建</td>
      </tr>
      <tr>
          <td>Cost (mid-tier)</td>
          <td>$500-2000 / mo + ops FTE</td>
          <td>$300-1500 / mo (按 series)</td>
      </tr>
      <tr>
          <td>Operational FTE</td>
          <td>0.3-0.8</td>
          <td>0.05-0.15</td>
      </tr>
  </tbody>
</table>
<p>跑 <a href="/blog/posts/migration-playbook-%E6%96%B9%E6%B3%95%E8%AB%96%E7%9A%84%E6%BC%94%E5%8C%96%E7%B4%80%E9%8C%84stage-0-variant-%E8%A6%8F%E5%8A%83%E6%8A%8A-collapse-%E7%8E%87%E5%BE%9E-60-%E9%99%8D%E5%88%B0-0/" data-link-title="Migration Playbook 方法論的演化紀錄：Stage 0 variant 規劃把 collapse 率從 60% 降到 0%" data-link-desc="跨 vendor migration playbook 需要獨立寫作方法論的依據，以及這套方法論從三輪 batch dogfood 中演化出來的驗證證據。">6 維 diff dimension audit</a>：</p>
<table>
  <thead>
      <tr>
          <th>維度</th>
          <th>等級</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>Schema / API</td>
          <td>Low（PromQL + API 完全相容）</td>
      </tr>
      <tr>
          <td>Operational</td>
          <td><strong>High</strong>（HA / retention / scaling 全託管）</td>
      </tr>
      <tr>
          <td>Paradigm</td>
          <td>Low（同 Prometheus metric paradigm）</td>
      </tr>
      <tr>
          <td>Components</td>
          <td>Low</td>
      </tr>
      <tr>
          <td>Application change</td>
          <td>Low（remote_write endpoint 改）</td>
      </tr>
      <tr>
          <td>Data topology</td>
          <td>Low</td>
      </tr>
  </tbody>
</table>
<p>Operational = High → Type C standard。</p>
<h2 id="為什麼遷retention--ops--vendor-consolidation-三條-driver">為什麼遷：retention / ops / vendor consolidation 三條 driver</h2>
<table>
  <thead>
      <tr>
          <th>Driver</th>
          <th>觸發</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>Retention</td>
          <td>Prometheus TSDB local 預設 15 天、長期 retention 需要 Thanos / Cortex / Mimir 自管</td>
      </tr>
      <tr>
          <td>Ops FTE</td>
          <td>Self-managed Prometheus + Alertmanager + Grafana 自管全部加起來 0.5-1 FTE</td>
      </tr>
      <tr>
          <td>Vendor consolidation</td>
          <td>已用 Grafana Cloud（logs / traces）、metric 加進 stack 統一</td>
      </tr>
  </tbody>
</table>
<h2 id="operational-redesign">Operational redesign</h2>
<table>
  <thead>
      <tr>
          <th>Concept</th>
          <th>Self-managed</th>
          <th>Grafana Cloud Metrics</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>Cluster bootstrap</td>
          <td>Helm chart + manual config</td>
          <td>UI 一鍵建</td>
      </tr>
      <tr>
          <td>HA</td>
          <td>Two Prometheus 配置</td>
          <td>內建 multi-AZ Mimir</td>
      </tr>
      <tr>
          <td>Long-term retention</td>
          <td>Thanos / Cortex / Mimir 自管</td>
          <td>Built-in (S3-backed)</td>
      </tr>
      <tr>
          <td>Cardinality control</td>
          <td>Manual recording rule + relabel</td>
          <td>Adaptive sampling + cardinality limit</td>
      </tr>
      <tr>
          <td>Alerting</td>
          <td>Alertmanager 自管</td>
          <td>Grafana Cloud Alerting (integrated)</td>
      </tr>
      <tr>
          <td>Dashboard</td>
          <td>Grafana self-host</td>
          <td>Grafana Cloud (free tier 包含)</td>
      </tr>
  </tbody>
</table>
<h2 id="migration-4-phase">Migration 4-phase</h2>
<h3 id="phase-0audit">Phase 0：Audit</h3>
<ul>
<li>列所有 Prometheus job / scrape config</li>
<li>統計 active series 數（Mimir tier 計費基準）</li>
<li>估 retention 需求</li>
</ul>
<h3 id="phase-1grafana-cloud-setup">Phase 1：Grafana Cloud setup</h3>
<ul>
<li>Account + organization 設定</li>
<li>API key for <code>remote_write</code></li>
<li>Grafana Cloud Mimir endpoint 啟用</li>
</ul>
<h3 id="phase-2dual-write">Phase 2：Dual-write</h3>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="c"># prometheus.yml</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 2</span><span class="cl"><span class="w"></span><span class="nt">remote_write</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 3</span><span class="cl"><span class="w">  </span>- <span class="nt">url</span><span class="p">:</span><span class="w"> </span><span class="l">https://prometheus-prod-XX-prod-us-central-0.grafana.net/api/prom/push</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 4</span><span class="cl"><span class="w">    </span><span class="nt">basic_auth</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 5</span><span class="cl"><span class="w">      </span><span class="nt">username</span><span class="p">:</span><span class="w"> </span><span class="l">&lt;INSTANCE_ID&gt;</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 6</span><span class="cl"><span class="w">      </span><span class="nt">password</span><span class="p">:</span><span class="w"> </span><span class="l">&lt;API_KEY&gt;</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 7</span><span class="cl"><span class="w">    </span><span class="nt">write_relabel_configs</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 8</span><span class="cl"><span class="w">      </span><span class="c"># Optional: drop high-cardinality before sending</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 9</span><span class="cl"><span class="w">      </span>- <span class="nt">source_labels</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="l">__name__]</span><span class="w">
</span></span></span><span class="line"><span class="ln">10</span><span class="cl"><span class="w">        </span><span class="nt">regex</span><span class="p">:</span><span class="w"> </span><span class="s1">&#39;high_card_metric_.*&#39;</span><span class="w">
</span></span></span><span class="line"><span class="ln">11</span><span class="cl"><span class="w">        </span><span class="nt">action</span><span class="p">:</span><span class="w"> </span><span class="l">drop</span></span></span></code></pre></div><p>跑 4-8 週、確認 query 結果一致 + cost 在預期。</p>
<h3 id="phase-3cutover">Phase 3：Cutover</h3>
<ul>
<li>Dashboard / alert 切到 Grafana Cloud endpoint</li>
<li>應用層 / Grafana 自管 instance 關閉 query 對 self-managed Prometheus</li>
</ul>
<h3 id="phase-4cleanup">Phase 4：Cleanup</h3>
<ul>
<li>Self-managed Prometheus stop scrape</li>
<li>留 1-2 月歷史查詢能力（用 archive snapshot）</li>
<li>Decommission</li>
</ul>
<h2 id="production-故障演練">Production 故障演練</h2>
<h3 id="case-1cardinality-爆cost-暴漲">Case 1：Cardinality 爆、cost 暴漲</h3>
<p><strong>徵兆</strong>：dual-write 第 2 週 Grafana Cloud series 從預估 100K 漲到 800K、cost 翻 8 倍。</p>
<p><strong>根因</strong>：application-level high-cardinality label（user_id / request_id）沒被 drop、scraped 進來。</p>
<p><strong>修法</strong>：</p>
<ol>
<li><code>write_relabel_configs</code> drop unbounded label</li>
<li>Application metric 設計改 fixed-bucket histogram、不用 unbounded label</li>
<li>Mimir cardinality limit 設保護 + alert</li>
</ol>
<h3 id="case-2recording-rule-對應失效">Case 2：Recording rule 對應失效</h3>
<p><strong>徵兆</strong>：cutover 後 Grafana dashboard 某些 panel 顯示空；發現用了 Prometheus 端 recording rule (<code>job:request_count:rate5m</code>)、Grafana Cloud 端沒對應 rule。</p>
<p><strong>根因</strong>：Prometheus 端 recording rule 是 <em>server-side</em>、不會跟著 remote_write 帶過去；Grafana Cloud 需要自己 setup recording rule。</p>
<p><strong>修法</strong>：</p>
<ol>
<li>Export 所有 recording rule、import 到 Grafana Cloud Mimir</li>
<li>或改用 <em>raw query</em> + Grafana query template、不依賴 recording rule</li>
</ol>
<h3 id="case-3promql-微差行為">Case 3：PromQL 微差行為</h3>
<p><strong>徵兆</strong>：某些 query 在 self-managed Prometheus 跑得好好的、切 Grafana Cloud Mimir 後 returns slightly different results。</p>
<p><strong>根因</strong>：Mimir 對某些 edge case（empty result handling / staleness marker timing）行為跟 Prometheus 略不同；多數 query 一致、&lt; 1% query 受影響。</p>
<p><strong>修法</strong>：</p>
<ol>
<li>Pre-cutover dual-query 驗證、用 critical dashboard 比對</li>
<li>Affected query 重寫、用更 robust PromQL pattern</li>
<li>文件 known incompatibility list</li>
</ol>
<h3 id="case-4alert-routing-改變">Case 4：Alert routing 改變</h3>
<p><strong>徵兆</strong>：Cutover 後 PagerDuty / Slack 收不到 alert；發現 Alertmanager 端 webhook 沒切。</p>
<p><strong>根因</strong>：alert 邏輯從 self-managed Alertmanager 搬到 Grafana Cloud Alerting、routing / contact 配置完全重做。</p>
<p><strong>修法</strong>：</p>
<ol>
<li>Pre-cutover 在 Grafana Cloud 端 rebuild alert + routing</li>
<li>雙 alert pipeline 跑 1-2 週、確認 Grafana Cloud 收到</li>
<li>Cutover 切 routing、SOC drill 一次</li>
</ol>
<h3 id="case-5歷史資料查不到">Case 5：歷史資料查不到</h3>
<p><strong>徵兆</strong>：Cutover 後 SOC 想 query 6 個月前事件、Grafana Cloud 只有 2 個月（dual-write 後的）資料。</p>
<p><strong>根因</strong>：Grafana Cloud 從 dual-write 開始才有資料、之前的 self-managed Prometheus historical data 沒 backfill。</p>
<p><strong>修法</strong>：</p>
<ol>
<li>Phase 2 期間用 <code>promtool tsdb dump</code> + <code>mimirtool</code> 把 self-managed historical 灌進 Mimir</li>
<li>或保留 self-managed Prometheus read-only 6 個月（給 historical query）</li>
<li>Long-term：retention 從 cutover 開始算、historical 是 <em>one-time backfill</em></li>
</ol>
<h2 id="capacity--cost">Capacity / cost</h2>
<table>
  <thead>
      <tr>
          <th>維度</th>
          <th>Self-managed</th>
          <th>Grafana Cloud Metrics</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>Compute (100 host, 100K series)</td>
          <td>$500-1000 / mo + ops</td>
          <td>$300-800 / mo</td>
      </tr>
      <tr>
          <td>Operational FTE</td>
          <td>0.3-0.8 = $3K-8K</td>
          <td>0.05-0.15 = $500-1500</td>
      </tr>
      <tr>
          <td>Long-term retention</td>
          <td>Thanos / Cortex / Mimir 自管</td>
          <td>Built-in 13 個月</td>
      </tr>
      <tr>
          <td>Total (mid-tier)</td>
          <td>$4K-9K / mo (含 FTE)</td>
          <td>$1K-2.5K / mo</td>
      </tr>
      <tr>
          <td>Migration cost</td>
          <td>-</td>
          <td>1-2 FTE × 1-2 個月</td>
      </tr>
  </tbody>
</table>
<h2 id="整合--下一步">整合 / 下一步</h2>
<h3 id="跟-datadog--grafana-stack-migration-對位">跟 <a href="/blog/backend/04-observability/vendors/datadog/migrate-to-grafana-stack/" data-link-title="Datadog → Grafana Stack：把 $50K/month bill 拆解到 self-hosted observability" data-link-desc="Datadog 五層計費（host APM / metric / log ingest / log retention / RUM）拆解、對位 Grafana Stack（Mimir / Loki / Tempo / Grafana / Alloy）的 5 層責任；OTel-based agent migration、5 個 production 踩雷（cardinality 爆 / log volume cost / dashboard 不直接轉 / alert routing 換邏輯 / SLO definition 差異）、cost reality check">Datadog → Grafana Stack migration</a> 對位</h3>
<p>兩條 Grafana Stack 路線：</p>
<ul>
<li>Self-host (Mimir + Loki + Tempo) on K8s：開源、自管</li>
<li>Grafana Cloud：SaaS、operational simplification</li>
</ul>
<p>本篇是「self-managed Prometheus → Grafana Cloud」、互補；如果跑兩階段（self-host → Cloud）跟「Datadog → Grafana Cloud」差不多。</p>
<h3 id="跟-opentelemetry-整合">跟 OpenTelemetry 整合</h3>
<p>OTel Collector 可同時 ship 到 Mimir (metric) + Loki (log) + Tempo (trace)；Migration 順便升 OTel 化避免下次 vendor 切換重複。</p>
<h2 id="相關連結">相關連結</h2>
<ul>
<li>Source vendor：<a href="/blog/backend/04-observability/vendors/prometheus/" data-link-title="Prometheus" data-link-desc="Pull-based metrics 主流 OSS、PromQL 與 alerting">Prometheus</a></li>
<li>Target vendor：<a href="/blog/backend/04-observability/vendors/grafana-stack/" data-link-title="Grafana Stack" data-link-desc="Grafana / Loki / Tempo / Mimir / Pyroscope 全棧">Grafana Stack</a></li>
<li>平行 migration playbook (Type C)：<a href="/blog/backend/01-database/vendors/postgresql/migrate-to-aurora/" data-link-title="PostgreSQL → Aurora Migration：protocol 相容、operational 重設計" data-link-desc="Aurora 號稱 PostgreSQL-compatible 但 operational model 不同（storage decouple / cluster endpoint / instance class / 自家備份）；遷移流程是混合（protocol drop-in &#43; operational phased）、5 個 production 踩雷（extension 不支援 / replication slot 不直通 / autovacuum 行為差 / IAM 認證強制 / cost model 換算）、跟 Patroni / read replica / DR 對位">PostgreSQL → Aurora</a> / <a href="/blog/backend/03-message-queue/vendors/kafka/migrate-to-msk/" data-link-title="Self-managed Kafka → AWS MSK：把 $15K/month operational cost 拆解到 managed" data-link-desc="Kafka self-managed → MSK 是 Type C operational redesign — protocol 完全相容、operational stack（ZooKeeper / brokers / monitoring / patching）全託管；本文用 cost 拆解開頭、5 個 production 踩雷（client connection pattern / version pinning / metric pipeline / IAM auth / cross-cluster mirror）">Kafka → MSK</a> / <a href="/blog/backend/04-observability/vendors/elastic-stack/migrate-to-elastic-cloud/" data-link-title="Self-managed ELK → Elastic Cloud：5 年 ELK 集群的 lifecycle 收尾" data-link-desc="Self-managed ELK Stack → Elastic Cloud 是 Type C operational redesign — protocol drop-in、operational stack（cluster sizing / shard 治理 / upgrade / backup）全託管；本文按 5 年 ELK lifecycle (build → scale → degrade → save → migrate) 組織、5 個 production 踩雷">ELK → Elastic Cloud</a></li>
<li>平行 D-type 對位：<a href="/blog/backend/04-observability/vendors/datadog/migrate-to-grafana-stack/" data-link-title="Datadog → Grafana Stack：把 $50K/month bill 拆解到 self-hosted observability" data-link-desc="Datadog 五層計費（host APM / metric / log ingest / log retention / RUM）拆解、對位 Grafana Stack（Mimir / Loki / Tempo / Grafana / Alloy）的 5 層責任；OTel-based agent migration、5 個 production 踩雷（cardinality 爆 / log volume cost / dashboard 不直接轉 / alert routing 換邏輯 / SLO definition 差異）、cost reality check">Datadog → Grafana Stack</a></li>
<li>Methodology：<a href="/blog/posts/migration-playbook-%E6%96%B9%E6%B3%95%E8%AB%96%E7%9A%84%E6%BC%94%E5%8C%96%E7%B4%80%E9%8C%84stage-0-variant-%E8%A6%8F%E5%8A%83%E6%8A%8A-collapse-%E7%8E%87%E5%BE%9E-60-%E9%99%8D%E5%88%B0-0/" data-link-title="Migration Playbook 方法論的演化紀錄：Stage 0 variant 規劃把 collapse 率從 60% 降到 0%" data-link-desc="跨 vendor migration playbook 需要獨立寫作方法論的依據，以及這套方法論從三輪 batch dogfood 中演化出來的驗證證據。">Migration playbook methodology</a></li>
</ul>
]]></content:encoded></item></channel></rss>