<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/"><channel><title>Atlas on Tarragon</title><link>https://tarrragon.github.io/blog/tags/atlas/</link><description>Recent content in Atlas on Tarragon</description><generator>Hugo -- gohugo.io</generator><language>zh-TW</language><copyright>Tarragon (CC BY 4.0)</copyright><lastBuildDate>Tue, 19 May 2026 00:00:00 +0000</lastBuildDate><atom:link href="https://tarrragon.github.io/blog/tags/atlas/index.xml" rel="self" type="application/rss+xml"/><item><title>MongoDB → Atlas：Atlas 不是 MongoDB + managed、是另一個 product</title><link>https://tarrragon.github.io/blog/backend/01-database/vendors/mongodb/migrate-to-atlas/</link><pubDate>Tue, 19 May 2026 00:00:00 +0000</pubDate><guid>https://tarrragon.github.io/blog/backend/01-database/vendors/mongodb/migrate-to-atlas/</guid><description>&lt;blockquote>
&lt;p>本文是跨 vendor &lt;a href="https://tarrragon.github.io/blog/backend/knowledge-cards/migration/" data-link-title="Migration" data-link-desc="說明系統如何把資料、流量或結構從舊狀態移到新狀態">migration&lt;/a> playbook、cross-link 到 &lt;a href="https://tarrragon.github.io/blog/backend/01-database/vendors/mongodb/" data-link-title="MongoDB" data-link-desc="Document database 代表、Atlas managed、跨雲可用、許多大規模平台從 MongoDB 起家">MongoDB&lt;/a> 跟 MongoDB Atlas。本文是 &lt;a href="https://tarrragon.github.io/blog/posts/migration-playbook-%E6%96%B9%E6%B3%95%E8%AB%96%E7%9A%84%E6%BC%94%E5%8C%96%E7%B4%80%E9%8C%84stage-0-variant-%E8%A6%8F%E5%8A%83%E6%8A%8A-collapse-%E7%8E%87%E5%BE%9E-60-%E9%99%8D%E5%88%B0-0/" data-link-title="Migration Playbook 方法論的演化紀錄：Stage 0 variant 規劃把 collapse 率從 60% 降到 0%" data-link-desc="跨 vendor migration playbook 需要獨立寫作方法論的依據，以及這套方法論從三輪 batch dogfood 中演化出來的驗證證據。">Migration playbook methodology&lt;/a> Type C operational redesign hybrid 的標準形態實證。每階段切換用 &lt;a href="https://tarrragon.github.io/blog/backend/knowledge-cards/migration-gate/" data-link-title="Migration Gate" data-link-desc="說明遷移流程何時可以進入下一階段或正式切換">migration gate&lt;/a> 把關 — 4 phase 之間的驗證條件就是 gate。&lt;/p>&lt;/blockquote>
&lt;h2 id="atlas-不是-mongodb--managed是另一個-product">Atlas 不是 MongoDB + managed、是另一個 product&lt;/h2>
&lt;p>「MongoDB Atlas 是 MongoDB 的 managed 版本」這個 framing 看似合理、實際誤導：&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Protocol 相容&lt;/strong>：MongoDB wire protocol 一致、driver 不改、&lt;code>mongosh&lt;/code> 連線跟 self-managed 一樣&lt;/li>
&lt;li>&lt;strong>Storage 一致&lt;/strong>：WiredTiger storage engine 一樣、document model 一樣&lt;/li>
&lt;li>&lt;strong>API 一致&lt;/strong>：Aggregation framework、indexing、change stream 都一樣&lt;/li>
&lt;/ul>
&lt;p>但 &lt;em>operational surface 完全不同&lt;/em>：&lt;/p>
&lt;table>
 &lt;thead>
 &lt;tr>
 &lt;th>Operational concept&lt;/th>
 &lt;th>Self-managed MongoDB&lt;/th>
 &lt;th>Atlas&lt;/th>
 &lt;/tr>
 &lt;/thead>
 &lt;tbody>
 &lt;tr>
 &lt;td>Cluster bootstrap&lt;/td>
 &lt;td>mongod + replica set config + cfgsvr + shard 手動&lt;/td>
 &lt;td>UI / API 一鍵建集群、全自動&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>HA&lt;/td>
 &lt;td>Replica set 自管 + arbiter + priority&lt;/td>
 &lt;td>自動跨 AZ replica + automatic failover&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>Backup&lt;/td>
 &lt;td>mongodump + S3 archive 自管&lt;/td>
 &lt;td>內建 cloud backup + PITR（按 region 設）&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>Network access&lt;/td>
 &lt;td>VPC + security group + IP whitelist 自管&lt;/td>
 &lt;td>Atlas private endpoint / VPC peering / IP access list&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>Authentication&lt;/td>
 &lt;td>mongod 內部 user / x.509 自管&lt;/td>
 &lt;td>Atlas Database User + 整合 LDAP / SSO / AWS IAM&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>Monitoring&lt;/td>
 &lt;td>Self-deploy Prometheus + grafana&lt;/td>
 &lt;td>Atlas Performance Advisor + APM 內建&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>Sizing&lt;/td>
 &lt;td>Manual instance class + scale&lt;/td>
 &lt;td>Auto-tier scaling + tier-based pricing&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>Patching&lt;/td>
 &lt;td>Manual + outage window&lt;/td>
 &lt;td>Automatic（可配置 maintenance window）&lt;/td>
 &lt;/tr>
 &lt;/tbody>
&lt;/table>
&lt;p>Migration 主要工作不在 &lt;em>資料層&lt;/em> — protocol drop-in 已 cover；是 &lt;em>operational stack 全換&lt;/em>：SRE runbook、monitoring dashboard、access control、IAM 整合、cost 預估全要重做。「Atlas 是 managed MongoDB」這個 framing 低估了 operational 工作量。&lt;/p></description><content:encoded><![CDATA[<blockquote>
<p>本文是跨 vendor <a href="/blog/backend/knowledge-cards/migration/" data-link-title="Migration" data-link-desc="說明系統如何把資料、流量或結構從舊狀態移到新狀態">migration</a> playbook、cross-link 到 <a href="/blog/backend/01-database/vendors/mongodb/" data-link-title="MongoDB" data-link-desc="Document database 代表、Atlas managed、跨雲可用、許多大規模平台從 MongoDB 起家">MongoDB</a> 跟 MongoDB Atlas。本文是 <a href="/blog/posts/migration-playbook-%E6%96%B9%E6%B3%95%E8%AB%96%E7%9A%84%E6%BC%94%E5%8C%96%E7%B4%80%E9%8C%84stage-0-variant-%E8%A6%8F%E5%8A%83%E6%8A%8A-collapse-%E7%8E%87%E5%BE%9E-60-%E9%99%8D%E5%88%B0-0/" data-link-title="Migration Playbook 方法論的演化紀錄：Stage 0 variant 規劃把 collapse 率從 60% 降到 0%" data-link-desc="跨 vendor migration playbook 需要獨立寫作方法論的依據，以及這套方法論從三輪 batch dogfood 中演化出來的驗證證據。">Migration playbook methodology</a> Type C operational redesign hybrid 的標準形態實證。每階段切換用 <a href="/blog/backend/knowledge-cards/migration-gate/" data-link-title="Migration Gate" data-link-desc="說明遷移流程何時可以進入下一階段或正式切換">migration gate</a> 把關 — 4 phase 之間的驗證條件就是 gate。</p></blockquote>
<h2 id="atlas-不是-mongodb--managed是另一個-product">Atlas 不是 MongoDB + managed、是另一個 product</h2>
<p>「MongoDB Atlas 是 MongoDB 的 managed 版本」這個 framing 看似合理、實際誤導：</p>
<ul>
<li><strong>Protocol 相容</strong>：MongoDB wire protocol 一致、driver 不改、<code>mongosh</code> 連線跟 self-managed 一樣</li>
<li><strong>Storage 一致</strong>：WiredTiger storage engine 一樣、document model 一樣</li>
<li><strong>API 一致</strong>：Aggregation framework、indexing、change stream 都一樣</li>
</ul>
<p>但 <em>operational surface 完全不同</em>：</p>
<table>
  <thead>
      <tr>
          <th>Operational concept</th>
          <th>Self-managed MongoDB</th>
          <th>Atlas</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>Cluster bootstrap</td>
          <td>mongod + replica set config + cfgsvr + shard 手動</td>
          <td>UI / API 一鍵建集群、全自動</td>
      </tr>
      <tr>
          <td>HA</td>
          <td>Replica set 自管 + arbiter + priority</td>
          <td>自動跨 AZ replica + automatic failover</td>
      </tr>
      <tr>
          <td>Backup</td>
          <td>mongodump + S3 archive 自管</td>
          <td>內建 cloud backup + PITR（按 region 設）</td>
      </tr>
      <tr>
          <td>Network access</td>
          <td>VPC + security group + IP whitelist 自管</td>
          <td>Atlas private endpoint / VPC peering / IP access list</td>
      </tr>
      <tr>
          <td>Authentication</td>
          <td>mongod 內部 user / x.509 自管</td>
          <td>Atlas Database User + 整合 LDAP / SSO / AWS IAM</td>
      </tr>
      <tr>
          <td>Monitoring</td>
          <td>Self-deploy Prometheus + grafana</td>
          <td>Atlas Performance Advisor + APM 內建</td>
      </tr>
      <tr>
          <td>Sizing</td>
          <td>Manual instance class + scale</td>
          <td>Auto-tier scaling + tier-based pricing</td>
      </tr>
      <tr>
          <td>Patching</td>
          <td>Manual + outage window</td>
          <td>Automatic（可配置 maintenance window）</td>
      </tr>
  </tbody>
</table>
<p>Migration 主要工作不在 <em>資料層</em> — protocol drop-in 已 cover；是 <em>operational stack 全換</em>：SRE runbook、monitoring dashboard、access control、IAM 整合、cost 預估全要重做。「Atlas 是 managed MongoDB」這個 framing 低估了 operational 工作量。</p>
<p>跑 <a href="/blog/report/content-structure-by-max-diff-dimension/" data-link-title="Process content 結構由最大差異維度決定、不是 universal phased" data-link-desc="跨 X process content（migration / upgrade / rollout / playbook）的結構由 source / target 之間 *差異維度組合* 決定、不存在 universal phased 模板；6 種 migration / process type 實證（schema 差 / drop-in / operational / multi-tool / paradigm / topology re-layout）跑出 6 種不同結構；寫作前必須做 *6 維 diff dimension audit* 才能決定結構、跳過會套錯模板">diff dimension audit</a>：</p>
<table>
  <thead>
      <tr>
          <th>維度</th>
          <th>評估</th>
          <th>等級</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>Schema / API</td>
          <td>MongoDB protocol / API 完全相容</td>
          <td>Low</td>
      </tr>
      <tr>
          <td>Operational model</td>
          <td>HA / backup / monitoring / IAM / network 全換</td>
          <td><strong>High</strong></td>
      </tr>
      <tr>
          <td>Abstraction / paradigm</td>
          <td>同 document DB</td>
          <td>Low</td>
      </tr>
      <tr>
          <td>Number of components</td>
          <td>同 1 個 cluster</td>
          <td>Low</td>
      </tr>
      <tr>
          <td>Application change</td>
          <td>Connection string / IAM 整合改、application logic 不改</td>
          <td>Low/Medium</td>
      </tr>
  </tbody>
</table>
<p>主導維度 Operational = High、Schema / Paradigm 都 Low — 對映 <a href="/blog/posts/migration-playbook-%E6%96%B9%E6%B3%95%E8%AB%96%E7%9A%84%E6%BC%94%E5%8C%96%E7%B4%80%E9%8C%84stage-0-variant-%E8%A6%8F%E5%8A%83%E6%8A%8A-collapse-%E7%8E%87%E5%BE%9E-60-%E9%99%8D%E5%88%B0-0/" data-link-title="Migration Playbook 方法論的演化紀錄：Stage 0 variant 規劃把 collapse 率從 60% 降到 0%" data-link-desc="跨 vendor migration playbook 需要獨立寫作方法論的依據，以及這套方法論從三輪 batch dogfood 中演化出來的驗證證據。">Type C operational redesign hybrid</a>。</p>
<h2 id="結構4-phase-operational--drop-in-cutover">結構：4-phase operational + drop-in cutover</h2>
<p>跟 <a href="/blog/backend/01-database/vendors/postgresql/migrate-to-aurora/" data-link-title="PostgreSQL → Aurora Migration：protocol 相容、operational 重設計" data-link-desc="Aurora 號稱 PostgreSQL-compatible 但 operational model 不同（storage decouple / cluster endpoint / instance class / 自家備份）；遷移流程是混合（protocol drop-in &#43; operational phased）、5 個 production 踩雷（extension 不支援 / replication slot 不直通 / autovacuum 行為差 / IAM 認證強制 / cost model 換算）、跟 Patroni / read replica / DR 對位">PostgreSQL → Aurora</a> 結構對齊（同 Type C）：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="ln"> 1</span><span class="cl">Phase 0：Pre-migration audit（1-2 週）
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">  - Workload sizing（IOPS / connection / storage）
</span></span><span class="line"><span class="ln"> 3</span><span class="cl">  - Application connection pattern audit
</span></span><span class="line"><span class="ln"> 4</span><span class="cl">  - Compliance requirement audit
</span></span><span class="line"><span class="ln"> 5</span><span class="cl">
</span></span><span class="line"><span class="ln"> 6</span><span class="cl">Phase 1：Operational infrastructure 準備（2-3 週）
</span></span><span class="line"><span class="ln"> 7</span><span class="cl">  - Atlas cluster 建立
</span></span><span class="line"><span class="ln"> 8</span><span class="cl">  - VPC peering / private endpoint
</span></span><span class="line"><span class="ln"> 9</span><span class="cl">  - IAM role + Atlas Database User
</span></span><span class="line"><span class="ln">10</span><span class="cl">  - Monitoring + alert
</span></span><span class="line"><span class="ln">11</span><span class="cl">  - Backup retention 設定
</span></span><span class="line"><span class="ln">12</span><span class="cl">
</span></span><span class="line"><span class="ln">13</span><span class="cl">Phase 2：Data migration（取決於 dataset 大小）
</span></span><span class="line"><span class="ln">14</span><span class="cl">  - mongomirror / Atlas Live Migration tool
</span></span><span class="line"><span class="ln">15</span><span class="cl">  - 或 mongodump → mongorestore（小 DB）
</span></span><span class="line"><span class="ln">16</span><span class="cl">
</span></span><span class="line"><span class="ln">17</span><span class="cl">Phase 3：Cutover 跟 verification
</span></span><span class="line"><span class="ln">18</span><span class="cl">
</span></span><span class="line"><span class="ln">19</span><span class="cl">Phase 4：Cleanup（self-managed decommission）</span></span></code></pre></div><p>整體 4-12 週、依 dataset 大小跟 organization 流程複雜度。</p>
<h2 id="phase-0pre-migration-audit">Phase 0：Pre-migration audit</h2>
<h3 id="workload-sizing--atlas-tier">Workload sizing → Atlas tier</h3>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="ln"> 1</span><span class="cl">Self-managed observations:
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">- Peak IOPS: 8000
</span></span><span class="line"><span class="ln"> 3</span><span class="cl">- P99 read latency: 5ms
</span></span><span class="line"><span class="ln"> 4</span><span class="cl">- Connection count peak: 1500
</span></span><span class="line"><span class="ln"> 5</span><span class="cl">- Storage: 800GB
</span></span><span class="line"><span class="ln"> 6</span><span class="cl">- Cross-region replication needed: yes
</span></span><span class="line"><span class="ln"> 7</span><span class="cl">
</span></span><span class="line"><span class="ln"> 8</span><span class="cl">Atlas tier mapping:
</span></span><span class="line"><span class="ln"> 9</span><span class="cl">- M40 (8 vCPU, 16GB RAM): IOPS 3000、不夠
</span></span><span class="line"><span class="ln">10</span><span class="cl">- M60 (16 vCPU, 64GB RAM): IOPS 6000、邊界
</span></span><span class="line"><span class="ln">11</span><span class="cl">- M80 (32 vCPU, 128GB RAM): IOPS 9000、安全（選此）
</span></span><span class="line"><span class="ln">12</span><span class="cl">- Storage: 1TB tier（足夠 800GB + 25% buffer）
</span></span><span class="line"><span class="ln">13</span><span class="cl">- Cross-region replication add-on</span></span></code></pre></div><p>Atlas 不是 <em>自由 instance class</em>、是 <em>固定 tier</em>；workload 跨 tier 邊界時要選 <em>上一級</em> 而不是 push 下一級。</p>
<h3 id="connection-pattern-audit">Connection pattern audit</h3>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-javascript" data-lang="javascript"><span class="line"><span class="ln">1</span><span class="cl"><span class="c1">// Application connection pool config
</span></span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="c1"></span><span class="kr">const</span> <span class="nx">client</span> <span class="o">=</span> <span class="k">new</span> <span class="nx">MongoClient</span><span class="p">(</span><span class="nx">uri</span><span class="p">,</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">3</span><span class="cl">  <span class="nx">maxPoolSize</span><span class="o">:</span> <span class="mi">100</span><span class="p">,</span>     <span class="c1">// ← Atlas 端 tier-specific connection limit
</span></span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="c1"></span>  <span class="nx">minPoolSize</span><span class="o">:</span> <span class="mi">10</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">5</span><span class="cl">  <span class="nx">maxIdleTimeMS</span><span class="o">:</span> <span class="mi">60000</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">6</span><span class="cl"><span class="p">});</span></span></span></code></pre></div><p>Atlas tier 對 <em>single user connection</em> 有限制（M40 ~1500、M80 ~3000）；多 application instance 跑同帳號連 Atlas 可能撞 limit。預先計算 total connection = <code>pod_count × maxPoolSize</code>、對照 tier limit。</p>
<h3 id="compliance-audit">Compliance audit</h3>
<ul>
<li><strong>Data residency</strong>：Atlas 部署 region 是否符合 GDPR / 客戶合約</li>
<li><strong>Encryption at rest</strong>：Atlas 預設 enable、但 <em>encryption key 是 Atlas-managed</em> — 合規嚴格要用 CMK / BYOK</li>
<li><strong>Audit log</strong>：Atlas 提供 audit log、export 到 S3 / Splunk</li>
</ul>
<h2 id="phase-1operational-infrastructure-準備">Phase 1：Operational infrastructure 準備</h2>
<h3 id="atlas-cluster-配置">Atlas cluster 配置</h3>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="c"># 用 Terraform mongodbatlas provider</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 2</span><span class="cl"><span class="w"></span><span class="l">resource &#34;mongodbatlas_cluster&#34; &#34;production&#34; {</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 3</span><span class="cl"><span class="w">  </span><span class="l">project_id   = var.project_id</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 4</span><span class="cl"><span class="w">  </span><span class="l">name         = &#34;production-cluster&#34;</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 5</span><span class="cl"><span class="w">  </span><span class="l">cluster_type = &#34;REPLICASET&#34;</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 6</span><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="ln"> 7</span><span class="cl"><span class="w">  </span><span class="l">provider_name         = &#34;AWS&#34;</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 8</span><span class="cl"><span class="w">  </span><span class="l">provider_region_name  = &#34;US_EAST_1&#34;</span><span class="w">
</span></span></span><span class="line"><span class="ln"> 9</span><span class="cl"><span class="w">  </span><span class="l">provider_instance_size_name = &#34;M80&#34;</span><span class="w">
</span></span></span><span class="line"><span class="ln">10</span><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="ln">11</span><span class="cl"><span class="w">  </span><span class="l">backup_enabled         = true</span><span class="w">
</span></span></span><span class="line"><span class="ln">12</span><span class="cl"><span class="w">  </span><span class="l">pit_enabled            = true  </span><span class="w"> </span><span class="c"># PITR</span><span class="w">
</span></span></span><span class="line"><span class="ln">13</span><span class="cl"><span class="w">  </span><span class="l">mongo_db_major_version = &#34;7.0&#34;</span><span class="w">
</span></span></span><span class="line"><span class="ln">14</span><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="ln">15</span><span class="cl"><span class="w">  </span><span class="l">advanced_configuration {</span><span class="w">
</span></span></span><span class="line"><span class="ln">16</span><span class="cl"><span class="w">    </span><span class="l">javascript_enabled                   = false</span><span class="w">
</span></span></span><span class="line"><span class="ln">17</span><span class="cl"><span class="w">    </span><span class="l">minimum_enabled_tls_protocol         = &#34;TLS1_2&#34;</span><span class="w">
</span></span></span><span class="line"><span class="ln">18</span><span class="cl"><span class="w">    </span><span class="l">no_table_scan                        = false</span><span class="w">
</span></span></span><span class="line"><span class="ln">19</span><span class="cl"><span class="w">    </span><span class="l">oplog_size_mb                        = 51200</span><span class="w">
</span></span></span><span class="line"><span class="ln">20</span><span class="cl"><span class="w">  </span>}<span class="w">
</span></span></span><span class="line"><span class="ln">21</span><span class="cl"><span class="w"></span>}<span class="w">
</span></span></span><span class="line"><span class="ln">22</span><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="ln">23</span><span class="cl"><span class="w"></span><span class="c"># Backup retention</span><span class="w">
</span></span></span><span class="line"><span class="ln">24</span><span class="cl"><span class="w"></span><span class="l">resource &#34;mongodbatlas_cloud_backup_schedule&#34; &#34;production&#34; {</span><span class="w">
</span></span></span><span class="line"><span class="ln">25</span><span class="cl"><span class="w">  </span><span class="l">project_id   = var.project_id</span><span class="w">
</span></span></span><span class="line"><span class="ln">26</span><span class="cl"><span class="w">  </span><span class="l">cluster_name = mongodbatlas_cluster.production.name</span><span class="w">
</span></span></span><span class="line"><span class="ln">27</span><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="ln">28</span><span class="cl"><span class="w">  </span><span class="l">reference_hour_of_day    = 3</span><span class="w">
</span></span></span><span class="line"><span class="ln">29</span><span class="cl"><span class="w">  </span><span class="l">reference_minute_of_hour = 0</span><span class="w">
</span></span></span><span class="line"><span class="ln">30</span><span class="cl"><span class="w">  </span><span class="l">restore_window_days      = 7</span><span class="w">
</span></span></span><span class="line"><span class="ln">31</span><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="ln">32</span><span class="cl"><span class="w">  </span><span class="l">policy_item_daily {</span><span class="w">
</span></span></span><span class="line"><span class="ln">33</span><span class="cl"><span class="w">    </span><span class="l">frequency_interval = 1</span><span class="w">
</span></span></span><span class="line"><span class="ln">34</span><span class="cl"><span class="w">    </span><span class="l">retention_unit     = &#34;days&#34;</span><span class="w">
</span></span></span><span class="line"><span class="ln">35</span><span class="cl"><span class="w">    </span><span class="l">retention_value    = 7</span><span class="w">
</span></span></span><span class="line"><span class="ln">36</span><span class="cl"><span class="w">  </span>}<span class="w">
</span></span></span><span class="line"><span class="ln">37</span><span class="cl"><span class="w"></span>}</span></span></code></pre></div><h3 id="vpc-peering--private-endpoint">VPC peering / private endpoint</h3>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="ln"> 1</span><span class="cl">Pattern A: VPC Peering
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">  AWS VPC &lt;──peering──&gt; Atlas project VPC
</span></span><span class="line"><span class="ln"> 3</span><span class="cl">  - 跨 region 跑、routing table 對齊
</span></span><span class="line"><span class="ln"> 4</span><span class="cl">  - 適合中型 / 大型 workload、stable network topology
</span></span><span class="line"><span class="ln"> 5</span><span class="cl">
</span></span><span class="line"><span class="ln"> 6</span><span class="cl">Pattern B: Private Endpoint (Atlas private link)
</span></span><span class="line"><span class="ln"> 7</span><span class="cl">  AWS VPC ──private link──&gt; Atlas
</span></span><span class="line"><span class="ln"> 8</span><span class="cl">  - 不需要 routing table 改
</span></span><span class="line"><span class="ln"> 9</span><span class="cl">  - 適合 multi-account / multi-region 複雜場景
</span></span><span class="line"><span class="ln">10</span><span class="cl">  - Cost 略高</span></span></code></pre></div><p>production default 走 Private Endpoint、設定簡單跟 IAM 整合好。</p>
<h3 id="atlas-database-user-跟-iam-整合">Atlas Database User 跟 IAM 整合</h3>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="ln">1</span><span class="cl">Pattern A: 傳統 username / password
</span></span><span class="line"><span class="ln">2</span><span class="cl">  - 設 Database User、application 用 SCRAM-SHA-256 連
</span></span><span class="line"><span class="ln">3</span><span class="cl">  - 適合 legacy application
</span></span><span class="line"><span class="ln">4</span><span class="cl">
</span></span><span class="line"><span class="ln">5</span><span class="cl">Pattern B: AWS IAM authentication（推薦）
</span></span><span class="line"><span class="ln">6</span><span class="cl">  - Atlas Database User type: &#34;AWS IAM&#34;
</span></span><span class="line"><span class="ln">7</span><span class="cl">  - Application 用 AWS IAM role + Atlas SDK
</span></span><span class="line"><span class="ln">8</span><span class="cl">  - Token 15 分鐘輪換、application 自管 refresh</span></span></code></pre></div><p>cutover 時間表內加 IAM authentication migration、不要事後補。</p>
<h2 id="phase-2data-migration">Phase 2：Data migration</h2>
<h3 id="atlas-live-migration-tool小到中型">Atlas Live Migration tool（小到中型）</h3>
<p>Atlas UI 內建 Live Migration tool：</p>
<ol>
<li>Source cluster URI（self-managed MongoDB）</li>
<li>Atlas target cluster</li>
<li>tool 自動 full sync + oplog tailing</li>
<li>Cutover window 內 final cutover</li>
</ol>
<p>支援 dataset &lt; 100GB 簡單；100GB-1TB 需要分批 / collection 順序設計。</p>
<h3 id="mongomirror大型">mongomirror（大型）</h3>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl"><span class="c1"># Mongomirror: source → atlas</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl">mongomirror <span class="se">\
</span></span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="se"></span>  --host source-replicaset/host1:27017,host2:27017 <span class="se">\
</span></span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="se"></span>  --destination atlas-cluster-host:27017 <span class="se">\
</span></span></span><span class="line"><span class="ln">5</span><span class="cl"><span class="se"></span>  --destinationUsername admin <span class="se">\
</span></span></span><span class="line"><span class="ln">6</span><span class="cl"><span class="se"></span>  --destinationPassword <span class="nv">$ATLAS_PASSWORD</span> <span class="se">\
</span></span></span><span class="line"><span class="ln">7</span><span class="cl"><span class="se"></span>  --ssl</span></span></code></pre></div><p>mongomirror 分兩段：</p>
<ol>
<li>Initial sync（full dump + restore）</li>
<li>Oplog tailing（continuous CDC）</li>
</ol>
<p>Cutover 期間 application 切 connection string、mongomirror 跟著 stream 收尾。</p>
<h2 id="phase-3cutover--verification">Phase 3：Cutover + verification</h2>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="ln">1</span><span class="cl">1. Application 端設 maintenance mode（block write）
</span></span><span class="line"><span class="ln">2</span><span class="cl">2. Wait mongomirror catch up（oplog gap → 0）
</span></span><span class="line"><span class="ln">3</span><span class="cl">3. 驗證 Atlas 端 collection count + sample query
</span></span><span class="line"><span class="ln">4</span><span class="cl">4. Application connection string 切到 Atlas
</span></span><span class="line"><span class="ln">5</span><span class="cl">5. 解除 maintenance、monitor 24-48 小時
</span></span><span class="line"><span class="ln">6</span><span class="cl">6. Self-managed mongo read-only standby 1-2 週</span></span></code></pre></div><h2 id="production-故障演練">Production 故障演練</h2>
<h3 id="case-1atlas-tier-connection-limit-撞牆">Case 1：Atlas tier connection limit 撞牆</h3>
<p><strong>徵兆</strong>：cutover 後 application 流量高峰時大量 <code>Connection refused</code>、Atlas 端顯示 connection limit reached；self-managed 階段沒有這問題。</p>
<p><strong>根因</strong>：M80 tier connection limit ~3000、application 100 個 pod × maxPoolSize=50 = 5000 connection；超出 limit。</p>
<p><strong>修法</strong>：</p>
<ol>
<li><strong>Pre-migration 計算</strong>：total connection 對照 Atlas tier、超出選上一級 tier</li>
<li><strong>降 maxPoolSize</strong>：100 pod × 30 = 3000、剛好 cap；但 burst 仍可能撞</li>
<li><strong>加 connection proxy</strong>：在 application 跟 Atlas 之間放 connection pooler（如 mongos sharded 或 ProxySQL-style proxy）</li>
</ol>
<h3 id="case-2ip-whitelist-漏-application-vpccutover-後完全連不上">Case 2：IP whitelist 漏 application VPC、cutover 後完全連不上</h3>
<p><strong>徵兆</strong>：cutover 後 application 直接報 <code>connection timeout</code>、Atlas dashboard 顯示 zero traffic；troubleshooting 1 小時才發現是 IP access list 漏掉某 application VPC CIDR。</p>
<p><strong>根因</strong>：Atlas IP access list 預設 deny all、必須明示加 application VPC；Phase 1 設定漏看某個 VPC（如 multi-account organization 內的 staging account）。</p>
<p><strong>修法</strong>：</p>
<ol>
<li><strong>Pre-cutover 連線測試</strong>：每個 application VPC 跑 sample MongoDB 連線、確認 ping 通</li>
<li><strong>改 Private Endpoint</strong>：不靠 IP whitelist、用 PrivateLink 自動 routing</li>
<li><strong>Backup access</strong>：保留 bastion host with whitelisted IP、incident 期間能直連</li>
</ol>
<h3 id="case-3backup-retention-設不夠compliance-audit-抓到">Case 3：Backup retention 設不夠、compliance audit 抓到</h3>
<p><strong>徵兆</strong>：cutover 3 個月後 SOX audit 發現 backup retention 設 7 天、合規要求 90 天；急忙改 Atlas config 設 90 天、但 <em>過去 3 個月 backup 已不可恢復</em>。</p>
<p><strong>根因</strong>：Atlas backup retention 是 <em>向前生效</em>、不能回追加；Phase 1 預設配置漏對合規 review。</p>
<p><strong>修法</strong>：</p>
<ol>
<li><strong>Pre-Phase 1 跑 compliance review</strong>：跟 legal / security team 確認 retention / data residency / audit log</li>
<li><strong>預設 retention 設保守值</strong>（30 / 60 天）、之後可降不能升</li>
<li><strong>PITR 跟 backup retention 分開設</strong>：PITR window 7-30 天、full backup 90-365 天</li>
</ol>
<h3 id="case-4iam-token-過期application-端-reconnect-storm">Case 4：IAM token 過期、application 端 reconnect storm</h3>
<p><strong>徵兆</strong>：production 切到 IAM authentication 後、每 15 分鐘出現一波 connection failure；Atlas log 顯示「auth token expired」。</p>
<p><strong>根因</strong>：AWS IAM token 15 分鐘輪換、application 用舊 token 重連失敗；token refresh 邏輯沒寫對。</p>
<p><strong>修法</strong>：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-javascript" data-lang="javascript"><span class="line"><span class="ln">1</span><span class="cl"><span class="c1">// 用 Atlas SDK + AWS SDK 整合、自動 token refresh
</span></span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="c1"></span><span class="kr">const</span> <span class="p">{</span> <span class="nx">MongoClient</span> <span class="p">}</span> <span class="o">=</span> <span class="nx">require</span><span class="p">(</span><span class="s1">&#39;mongodb&#39;</span><span class="p">);</span>
</span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="kr">const</span> <span class="p">{</span> <span class="nx">fromIni</span> <span class="p">}</span> <span class="o">=</span> <span class="nx">require</span><span class="p">(</span><span class="s1">&#39;@aws-sdk/credential-providers&#39;</span><span class="p">);</span>
</span></span><span class="line"><span class="ln">4</span><span class="cl">
</span></span><span class="line"><span class="ln">5</span><span class="cl"><span class="kr">const</span> <span class="nx">credentials</span> <span class="o">=</span> <span class="nx">fromIni</span><span class="p">({</span> <span class="nx">profile</span><span class="o">:</span> <span class="s1">&#39;production&#39;</span> <span class="p">});</span>
</span></span><span class="line"><span class="ln">6</span><span class="cl"><span class="kr">const</span> <span class="nx">client</span> <span class="o">=</span> <span class="k">new</span> <span class="nx">MongoClient</span><span class="p">(</span><span class="nx">uri</span><span class="p">,</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">7</span><span class="cl">  <span class="nx">authMechanism</span><span class="o">:</span> <span class="s1">&#39;MONGODB-AWS&#39;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">8</span><span class="cl">  <span class="c1">// SDK 自動 refresh token
</span></span></span><span class="line"><span class="ln">9</span><span class="cl"><span class="c1"></span><span class="p">});</span></span></span></code></pre></div><p>不要自管 token rotation、用 vendor SDK 抽象掉。</p>
<h3 id="case-5billing-暴漲iops-跟-backup-storage-超預估">Case 5：Billing 暴漲、IOPS 跟 backup storage 超預估</h3>
<p><strong>徵兆</strong>：第一個月 Atlas 帳單 $15K USD、預估 $8K；Atlas dashboard 顯示 backup storage 跟 IOPS 各超 1.5-2x 預估。</p>
<p><strong>根因</strong>：</p>
<ul>
<li>Atlas backup 預設 <em>跨 region replicated</em>、storage cost 2x</li>
<li>IOPS-heavy workload 在 M tier 內可能撞 burst credit、auto-tier-up 暫時觸發更貴 tier</li>
<li>Data transfer 跨 region / 跨 cloud 計費沒算</li>
</ul>
<p><strong>修法</strong>：</p>
<ol>
<li><strong>Pre-migration cost estimate</strong>：用 self-managed metrics 估 IOPS / bandwidth、套 Atlas pricing</li>
<li><strong>Backup region 設單一</strong>：若不要跨 region DR、設 same-region backup 省 50%</li>
<li><strong>Reserved Instance</strong>：穩定 workload 預付 1-3 年、省 30-40%</li>
<li><strong>Performance Advisor 早用</strong>：第一週就跑、找 inefficient query 降 IOPS</li>
</ol>
<h2 id="capacity--cost">Capacity / cost</h2>
<table>
  <thead>
      <tr>
          <th>維度</th>
          <th>Self-managed MongoDB</th>
          <th>Atlas</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>Cluster cost (M80)</td>
          <td>EC2 r6g.4xlarge × 3 ≈ $1.5K / mo</td>
          <td>M80 + storage + backup ≈ $3K / mo</td>
      </tr>
      <tr>
          <td>Operational FTE</td>
          <td>0.5-1.5 FTE</td>
          <td>0.1-0.3 FTE</td>
      </tr>
      <tr>
          <td>Backup cost</td>
          <td>S3 + tooling 自管</td>
          <td>內建 + tiered storage</td>
      </tr>
      <tr>
          <td>Cross-region DR cost</td>
          <td>Manual + 2x infrastructure</td>
          <td>1-click + 1.5-2x billing</td>
      </tr>
      <tr>
          <td>Time to value</td>
          <td>1-3 個月（HA + ops setup）</td>
          <td>1-2 週（cluster ready + IAM）</td>
      </tr>
      <tr>
          <td>Migration cost</td>
          <td>-</td>
          <td>1-3 FTE × 2-3 個月</td>
      </tr>
  </tbody>
</table>
<p><strong>Break-even</strong>：~200GB / 中型 workload、Atlas operational savings 平攤 1-2 年後比 self-managed cheaper；TB+ 大型 workload self-managed 仍可能便宜、但需要 ops team。</p>
<h2 id="整合--下一步">整合 / 下一步</h2>
<h3 id="跟-postgresql--aurora-migration-對照">跟 <a href="/blog/backend/01-database/vendors/postgresql/migrate-to-aurora/" data-link-title="PostgreSQL → Aurora Migration：protocol 相容、operational 重設計" data-link-desc="Aurora 號稱 PostgreSQL-compatible 但 operational model 不同（storage decouple / cluster endpoint / instance class / 自家備份）；遷移流程是混合（protocol drop-in &#43; operational phased）、5 個 production 踩雷（extension 不支援 / replication slot 不直通 / autovacuum 行為差 / IAM 認證強制 / cost model 換算）、跟 Patroni / read replica / DR 對位">PostgreSQL → Aurora migration</a> 對照</h3>
<p>兩篇都是 Type C operational redesign hybrid、模板共用、細節差：</p>
<ul>
<li>Aurora 端 RDS Proxy 是推薦做法、Atlas 端 Private Endpoint 更標準</li>
<li>Aurora 端 IAM authentication 是 <em>optional best practice</em>、Atlas IAM 是 <em>推薦預設</em></li>
<li>兩家 cost model 都複雜、I/O cost 是 surprise 主要來源</li>
</ul>
<h3 id="跟-application-端-iam-token-rotation-整合">跟 <a href="/blog/backend/07-security-data-protection/vendors/hashicorp-vault/dynamic-credential/" data-link-title="HashiCorp Vault Dynamic Credential：lease 治理跟 application 整合的實作層" data-link-desc="Vault database secrets engine 怎麼配、application 怎麼 renew lease、production 五大踩雷（lease 過期 race、DB max_connections 撞牆、Vault sealed、token expire、scope 過寬）、容量規劃跟 vault-agent injector 整合">Application 端 IAM token rotation</a> 整合</h3>
<p>Vault dynamic credential 可 issue Atlas Database User credential、lease lifecycle 對齊 application；對 high-stakes workload 是好做法、但 setup 複雜。</p>
<h3 id="下一步議題">下一步議題</h3>
<ul>
<li><strong>Atlas Data Federation</strong>：跨 Atlas 集群 query S3 / 跨 region；如果走 multi-region 評估這 feature</li>
<li><strong>Atlas Online Archive</strong>：cold data 自動 archive 到 S3、查 query 透明；對 retention 重的 workload 省 storage cost</li>
<li><strong>Atlas Serverless</strong>：burst workload 適合、steady 不划算</li>
</ul>
<h2 id="相關連結">相關連結</h2>
<ul>
<li>Source vendor：<a href="/blog/backend/01-database/vendors/mongodb/" data-link-title="MongoDB" data-link-desc="Document database 代表、Atlas managed、跨雲可用、許多大規模平台從 MongoDB 起家">MongoDB</a></li>
<li>平行 migration playbook (Type C)：<a href="/blog/backend/01-database/vendors/postgresql/migrate-to-aurora/" data-link-title="PostgreSQL → Aurora Migration：protocol 相容、operational 重設計" data-link-desc="Aurora 號稱 PostgreSQL-compatible 但 operational model 不同（storage decouple / cluster endpoint / instance class / 自家備份）；遷移流程是混合（protocol drop-in &#43; operational phased）、5 個 production 踩雷（extension 不支援 / replication slot 不直通 / autovacuum 行為差 / IAM 認證強制 / cost model 換算）、跟 Patroni / read replica / DR 對位">PostgreSQL → Aurora</a></li>
<li>平行 migration playbook：<a href="/blog/backend/07-security-data-protection/vendors/splunk/migrate-to-elastic-security/" data-link-title="Splunk → Elastic Security Detection Rule Migration：6 段 phased playbook 跟 5 大踩雷" data-link-desc="從 Splunk Enterprise Security 遷到 Elastic Security 的 detection rule translation playbook：SPL ↔ KQL/ES|QL schema 對位、AI-assisted translation pipeline、parallel run 比對、cutover routing、5 個 production 踩雷（macro 沒對應 / time zone 差異 / summary index 不對位 / alert dedup key 衝突 / 過早 decommission）、capacity / cost 對照">Splunk → Elastic</a>（Type A schema 差） / <a href="/blog/backend/03-message-queue/vendors/kafka/migrate-from-to-nats/" data-link-title="Kafka ↔ NATS：不是 migration、是 messaging paradigm 重設計" data-link-desc="Kafka 跟 NATS 不是同類產品（log-based event streaming vs subject-based messaging）、&#39;migration&#39; 字面上不成立；本文釐清兩家 paradigm 邊界、什麼情境真的能換、application 模式重設計的 5 個踩雷（consumer offset 觀念差 / retention model / exactly-once 假設 / schema registry 缺位 / fan-out 模式差）、跟 JetStream 對位 &#43; 混合架構">Kafka ↔ NATS</a>（Type E paradigm shift）</li>
<li>Methodology：<a href="/blog/posts/migration-playbook-%E6%96%B9%E6%B3%95%E8%AB%96%E7%9A%84%E6%BC%94%E5%8C%96%E7%B4%80%E9%8C%84stage-0-variant-%E8%A6%8F%E5%8A%83%E6%8A%8A-collapse-%E7%8E%87%E5%BE%9E-60-%E9%99%8D%E5%88%B0-0/" data-link-title="Migration Playbook 方法論的演化紀錄：Stage 0 variant 規劃把 collapse 率從 60% 降到 0%" data-link-desc="跨 vendor migration playbook 需要獨立寫作方法論的依據，以及這套方法論從三輪 batch dogfood 中演化出來的驗證證據。">Migration playbook methodology</a>（本文驗證 Type C 標準形態）</li>
</ul>
]]></content:encoded></item></channel></rss>