<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/"><channel><title>Schema-Design on Tarragon</title><link>https://tarrragon.github.io/blog/tags/schema-design/</link><description>Recent content in Schema-Design on Tarragon</description><generator>Hugo -- gohugo.io</generator><language>zh-TW</language><copyright>Tarragon (CC BY 4.0)</copyright><lastBuildDate>Wed, 27 May 2026 00:00:00 +0000</lastBuildDate><atom:link href="https://tarrragon.github.io/blog/tags/schema-design/index.xml" rel="self" type="application/rss+xml"/><item><title>MongoDB Schema Design Pattern：contract layer 在哪 vs embedded / reference</title><link>https://tarrragon.github.io/blog/backend/01-database/vendors/mongodb/schema-design-pattern/</link><pubDate>Wed, 27 May 2026 00:00:00 +0000</pubDate><guid>https://tarrragon.github.io/blog/backend/01-database/vendors/mongodb/schema-design-pattern/</guid><description>&lt;p>MongoDB schema design 的初學討論常停在「embedded vs reference 二選一」。真實 production 議題遠不止此：document model 給的 schema flexibility 在第一年是紅利、跑半年後同 collection 開始混三代 schema、application code 三層 if-else 處理欄位缺失與型別漂移。這時候讀者要解的不是「embed 還是 reference」、是 &lt;strong>schema contract 該由誰守、守在哪一層&lt;/strong>。本文把這個議題拆成三條 contract layer 路徑（DB-layer validator / app-layer abstraction / 混合）、配合 embedded / reference / polymorphic 機制與 time-series collection 邊界一起討論。&lt;/p>
&lt;p>本文不重複 &lt;a href="https://tarrragon.github.io/blog/backend/01-database/vendors/mongodb/" data-link-title="MongoDB" data-link-desc="Document database 代表、Atlas managed、跨雲可用、許多大規模平台從 MongoDB 起家">MongoDB vendor overview&lt;/a> 已寫過的 document model 適用條件 — 而是 production 部署 + schema governance + 失敗修復 的實作層教學。&lt;/p>
&lt;h2 id="問題情境document-自由的後座力">問題情境：document 自由的後座力&lt;/h2>
&lt;p>MongoDB 適用度的前置判讀有三件事要確認：&lt;/p>
&lt;ul>
&lt;li>&lt;strong>document shape 是否主導資料&lt;/strong>：sensor signal / CMS article / order aggregate 這類「形狀本來就多型 + 隨產品演進」適合 document model；access pattern 固定 + 欄位定型的反而該回 KV 系統或 SQL&lt;/li>
&lt;li>&lt;strong>contract layer 該放哪&lt;/strong>：DB-layer validator 適合 schema 穩定 / 跨服務共用 collection 的場景；app-layer abstraction 適合 schema 演進快 / 微服務獨立 owner；混合適合大型 production&lt;/li>
&lt;li>&lt;strong>跨雲 hedging 是否需要&lt;/strong>：若團隊未來雲商策略不確定、Atlas 跨雲是 selection 訊號；只在單雲跑就不必為 hedging 多付代價&lt;/li>
&lt;/ul>
&lt;p>確認 MongoDB 該用之後，讀者真正在 production 撞到的徵兆：&lt;/p>
&lt;ul>
&lt;li>Document model 早期 schema-less 紅利、跑半年後 collection 同時混三代 schema、application 寫 if-else 處理欄位缺失與型別漂移&lt;/li>
&lt;li>子文件越塞越深、單 document 突破 1-2MB、partial update 仍要把整顆 document load + write、IO 跟 working set 雙重壓力&lt;/li>
&lt;li>反向過度 normalize：訂單跟訂單 item 拆兩個 collection、單一查詢得 N+1 &lt;code>$lookup&lt;/code>、aggregation cost 飆&lt;/li>
&lt;li>IoT / sensor / event log workload 寫進 regular collection、寫入吞吐撞牆但沒考慮 time-series collection&lt;/li>
&lt;li>&lt;code>$lookup&lt;/code> 出現在 hot path、document size warning（16MB 上限預警）、partial update 卻產生大量 disk write、schema validation 報錯比例突然爬升&lt;/li>
&lt;/ul>
&lt;p>Case anchor：&lt;a href="https://tarrragon.github.io/blog/backend/09-performance-capacity/cases/toyota-connected-mongodb-telematics-iot/" data-link-title="9.C38 Toyota Connected：MongoDB Atlas 撐 900 萬車輛 telematics、月 180 億 transaction" data-link-desc="Toyota Connected 用 MongoDB Atlas 撐 Safety Connect 900 萬車、月 180 億 transaction、緊急訊號 3 秒內到 agent">9.C38 Toyota Connected&lt;/a> 揭露車載 sensor schema 隨車型 / 年份 / 規範演進、polymorphic document 與 schema governance 並存；&lt;a href="https://tarrragon.github.io/blog/backend/09-performance-capacity/cases/forbes-mongodb-atlas-multi-cloud-migration/" data-link-title="9.C37 Forbes：自管 MongoDB → Atlas on GCP、build 時間 25 → 9 分鐘" data-link-desc="Forbes 把自管 MongoDB 遷到 Atlas on Google Cloud、6 個月完成、build 25 → 9 分鐘、120M 不重複訪客單月承接">9.C37 Forbes&lt;/a> 揭露 CMS 50+ 微服務透過自建中介 abstraction layer 隔離 schema 變動；&lt;a href="https://tarrragon.github.io/blog/backend/09-performance-capacity/cases/microsoft-365-cosmos-db-analytics/" data-link-title="9.C30 Microsoft 365：從 MongoDB 遷移到 Cosmos DB 的分析平台" data-link-desc="Microsoft 365 把使用分析平台從 MongoDB 遷移到 Cosmos DB、planet-scale 全球分散式分析">9.C30 Microsoft 365&lt;/a> 揭露 document model 保留 + 跨 vendor 形狀治理。早期 startup MongoDB 三代 schema 並存的具體 incident 細節需未來 case 補完、本文先以「常見 failure pattern」處理。&lt;/p></description><content:encoded><![CDATA[<p>MongoDB schema design 的初學討論常停在「embedded vs reference 二選一」。真實 production 議題遠不止此：document model 給的 schema flexibility 在第一年是紅利、跑半年後同 collection 開始混三代 schema、application code 三層 if-else 處理欄位缺失與型別漂移。這時候讀者要解的不是「embed 還是 reference」、是 <strong>schema contract 該由誰守、守在哪一層</strong>。本文把這個議題拆成三條 contract layer 路徑（DB-layer validator / app-layer abstraction / 混合）、配合 embedded / reference / polymorphic 機制與 time-series collection 邊界一起討論。</p>
<p>本文不重複 <a href="/blog/backend/01-database/vendors/mongodb/" data-link-title="MongoDB" data-link-desc="Document database 代表、Atlas managed、跨雲可用、許多大規模平台從 MongoDB 起家">MongoDB vendor overview</a> 已寫過的 document model 適用條件 — 而是 production 部署 + schema governance + 失敗修復 的實作層教學。</p>
<h2 id="問題情境document-自由的後座力">問題情境：document 自由的後座力</h2>
<p>MongoDB 適用度的前置判讀有三件事要確認：</p>
<ul>
<li><strong>document shape 是否主導資料</strong>：sensor signal / CMS article / order aggregate 這類「形狀本來就多型 + 隨產品演進」適合 document model；access pattern 固定 + 欄位定型的反而該回 KV 系統或 SQL</li>
<li><strong>contract layer 該放哪</strong>：DB-layer validator 適合 schema 穩定 / 跨服務共用 collection 的場景；app-layer abstraction 適合 schema 演進快 / 微服務獨立 owner；混合適合大型 production</li>
<li><strong>跨雲 hedging 是否需要</strong>：若團隊未來雲商策略不確定、Atlas 跨雲是 selection 訊號；只在單雲跑就不必為 hedging 多付代價</li>
</ul>
<p>確認 MongoDB 該用之後，讀者真正在 production 撞到的徵兆：</p>
<ul>
<li>Document model 早期 schema-less 紅利、跑半年後 collection 同時混三代 schema、application 寫 if-else 處理欄位缺失與型別漂移</li>
<li>子文件越塞越深、單 document 突破 1-2MB、partial update 仍要把整顆 document load + write、IO 跟 working set 雙重壓力</li>
<li>反向過度 normalize：訂單跟訂單 item 拆兩個 collection、單一查詢得 N+1 <code>$lookup</code>、aggregation cost 飆</li>
<li>IoT / sensor / event log workload 寫進 regular collection、寫入吞吐撞牆但沒考慮 time-series collection</li>
<li><code>$lookup</code> 出現在 hot path、document size warning（16MB 上限預警）、partial update 卻產生大量 disk write、schema validation 報錯比例突然爬升</li>
</ul>
<p>Case anchor：<a href="/blog/backend/09-performance-capacity/cases/toyota-connected-mongodb-telematics-iot/" data-link-title="9.C38 Toyota Connected：MongoDB Atlas 撐 900 萬車輛 telematics、月 180 億 transaction" data-link-desc="Toyota Connected 用 MongoDB Atlas 撐 Safety Connect 900 萬車、月 180 億 transaction、緊急訊號 3 秒內到 agent">9.C38 Toyota Connected</a> 揭露車載 sensor schema 隨車型 / 年份 / 規範演進、polymorphic document 與 schema governance 並存；<a href="/blog/backend/09-performance-capacity/cases/forbes-mongodb-atlas-multi-cloud-migration/" data-link-title="9.C37 Forbes：自管 MongoDB → Atlas on GCP、build 時間 25 → 9 分鐘" data-link-desc="Forbes 把自管 MongoDB 遷到 Atlas on Google Cloud、6 個月完成、build 25 → 9 分鐘、120M 不重複訪客單月承接">9.C37 Forbes</a> 揭露 CMS 50+ 微服務透過自建中介 abstraction layer 隔離 schema 變動；<a href="/blog/backend/09-performance-capacity/cases/microsoft-365-cosmos-db-analytics/" data-link-title="9.C30 Microsoft 365：從 MongoDB 遷移到 Cosmos DB 的分析平台" data-link-desc="Microsoft 365 把使用分析平台從 MongoDB 遷移到 Cosmos DB、planet-scale 全球分散式分析">9.C30 Microsoft 365</a> 揭露 document model 保留 + 跨 vendor 形狀治理。早期 startup MongoDB 三代 schema 並存的具體 incident 細節需未來 case 補完、本文先以「常見 failure pattern」處理。</p>
<h2 id="核心機制aggregate-rootembeddedreferencepolymorphic">核心機制：aggregate root、embedded、reference、polymorphic</h2>
<p>MongoDB schema design 的第一層是 <em>aggregate root 決定 atomicity 邊界</em>。MongoDB 把寫入 atomicity 限制在「單 document 內」、跨 document 要 multi-document transaction（5.0+ 在 replica set / sharded cluster 都支援、但跨 shard 有性能成本）。aggregate root 是 DDD 概念落地到 MongoDB 的具體實作 — 把「一起讀、一起寫、一致性邊界一致」的資料塞同一個 document。</p>
<ul>
<li><strong>Embedded（subdocument / array）</strong>：寫入 atomic、讀取一次到位；代價是 update sub-element 仍要 rewrite 整顆 document，sub-element 寫頻很高時不適合</li>
<li><strong>Reference（手動 <code>_id</code> foreign key + <code>$lookup</code>）</strong>：document 大小可控，但 join 在 application 或 aggregation 階段做；JOIN-heavy workload 跑這條路徑會 N+1</li>
<li><strong>Polymorphic pattern</strong>：同 collection 用 <code>type</code> discriminator 存多型實體；MongoDB 沒 inheritance、靠 schema validator 與 partial index 維持邊界</li>
<li><strong>16MB document hard limit</strong>：是 MongoDB 機制邊界；working set 在 RAM 的隱性軟限制（單 doc 大小直接影響 page cache 效率）更早就會出問題</li>
</ul>
<h3 id="contract-layer-三條路徑">Contract layer 三條路徑</h3>
<p>跨 case 合成 frame（本章合成、Toyota + Forbes 共同揭露）：document model 的 schema flexibility 在 production 必須以 schema governance 對沖、否則「schema 自由」變「production data inconsistency」（Toyota case 明示）。讀者要選的不是「要不要做 schema governance」、是「contract 守在哪一層」。三條路徑：</p>
<table>
  <thead>
      <tr>
          <th>路徑</th>
          <th>實作機制</th>
          <th>適用條件</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>DB-layer contract</td>
          <td>MongoDB <code>$jsonSchema</code> validator + <code>validationLevel</code> + <code>validationAction</code></td>
          <td>Schema 穩定、多服務共用 collection、要 DB 擋髒資料</td>
      </tr>
      <tr>
          <td>App-layer contract</td>
          <td>自建 API abstraction + middleware schema 驗證</td>
          <td>Schema 演進快、微服務獨立 owner、跨雲彈性需求</td>
      </tr>
      <tr>
          <td>混合</td>
          <td>DB 層擋型別 / 必填、app 層擋業務語意 / 版本</td>
          <td>大型 production、多 owner、跨團隊</td>
      </tr>
  </tbody>
</table>
<p><strong>DB-layer 路徑</strong>：<code>$jsonSchema</code> validator 在 production 是「契約 enforcement」工具、不是 dev-time linter。設 <code>validationAction: &quot;error&quot;</code> 寫入直接擋；設 <code>&quot;warn&quot;</code> 只記 log。<code>validationLevel: &quot;moderate&quot;</code> 對既有 doc 放行、對新寫入嚴格；<code>&quot;strict&quot;</code> 對所有寫入都嚴格。適合 schema 穩定到「跨服務共用 collection」的程度。</p>
<p><strong>App-layer 路徑</strong>：9.C37 Forbes 揭露的模式 — 50+ 微服務透過自建中介 abstraction layer 看到穩定的 contract API、DB schema 變動限制在 owner microservice 內。Forbes 跨雲彈性能用起來、核心原因是 abstraction layer 把 schema 治理收斂到單點、跨雲遷移時 abstraction layer 不變、微服務不知道底層 DB 換 cluster 換雲。</p>
<p><strong>混合路徑</strong>：Atlas Application Services、enterprise schema registry 屬此類。DB 層 validator 守底線（欄位型別、必填欄位）、app 層 abstraction 守業務（版本欄位 / 相容處理 / cross-document 一致性）。代價是兩層都要維護、版本同步成本高、適合 production 規模真的撐住這個複雜度的團隊。</p>
<p>讀者選哪條路徑要看：team 規模 / collection 跨服務程度 / schema 演進速度。</p>
<h3 id="time-series-collection60">Time-series collection（6.0+）</h3>
<p>Time-series collection 是 MongoDB 為 IoT / sensor / event log / metrics 設計的 vendor-specific 機制 — 比 regular collection 寫入吞吐高 3-5x、storage 壓縮率更好。資料形狀必須是 <code>{ timestamp, metadata, measurement }</code> 三段式、timestamp 主導。</p>
<p>適用情境：sensor signal 高頻寫入、metrics 系統的 time series、application event log。<strong>不適用情境</strong>：schema 不以 timestamp 為主、需要跨 document update、需要 polymorphic discriminator。</p>
<p>9.C38 Toyota Connected 自承「20 個 Atlas database 沒明確說有沒有用 time series collection — 對 IoT 案例這是重要區分、但 case study 沒揭露」。寫進 production 時必須明示：IoT / sensor 場景該考慮 time-series collection、Toyota case 未揭露實際使用情況、不可寫成「Toyota 使用 time-series collection」。</p>
<p>對應 knowledge card：<a href="/blog/backend/knowledge-cards/document-store/" data-link-title="Document Store" data-link-desc="說明以 JSON 文件與彈性 schema 提供資料存取的模式，以及它仍需的治理邊界">document-store</a>、<a href="/blog/backend/knowledge-cards/transaction-boundary/" data-link-title="Transaction Boundary" data-link-desc="說明哪些資料變更應在同一個交易中一起成功或一起回復">transaction-boundary</a>（aggregate boundary = transaction boundary）、<a href="/blog/backend/knowledge-cards/data-inconsistency/" data-link-title="Data Inconsistency" data-link-desc="說明多份資料暫時不同步時如何判斷產品後果與修復責任">data-inconsistency</a>。</p>
<h2 id="操作流程">操作流程</h2>
<p><strong>Step 1：access pattern 盤點</strong>。列出 top 10 query / write、標 read together / write together 集合 — 這份清單決定 embedded vs reference vs polymorphic 的候選。</p>
<p><strong>Step 2：contract layer 決策</strong>。</p>
<table>
  <thead>
      <tr>
          <th>條件</th>
          <th>路徑</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>Collection 跨多服務 + schema 穩定</td>
          <td>DB-layer validator</td>
      </tr>
      <tr>
          <td>Schema 演進快 + 微服務獨立 owner</td>
          <td>App-layer abstraction</td>
      </tr>
      <tr>
          <td>大型 production + 多 owner + 跨團隊</td>
          <td>混合（兩者並用）</td>
      </tr>
      <tr>
          <td>IoT / sensor / event log + timestamp 主導</td>
          <td>Time-series collection（取代 regular collection）</td>
      </tr>
  </tbody>
</table>
<p><strong>Step 3：embed 判準</strong> — 1:few、life-cycle 同步、&lt; 1MB 預期上限；<strong>reference 判準</strong> — 1:many 寫頻不對稱、跨 aggregate 引用。</p>
<p><strong>Step 4：DB-layer 路徑 validator 配置</strong>：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-javascript" data-lang="javascript"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="nx">db</span><span class="p">.</span><span class="nx">runCommand</span><span class="p">({</span>
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">  <span class="nx">collMod</span><span class="o">:</span> <span class="s2">&#34;orders&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 3</span><span class="cl">  <span class="nx">validator</span><span class="o">:</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln"> 4</span><span class="cl">    <span class="nx">$jsonSchema</span><span class="o">:</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln"> 5</span><span class="cl">      <span class="nx">bsonType</span><span class="o">:</span> <span class="s2">&#34;object&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 6</span><span class="cl">      <span class="nx">required</span><span class="o">:</span> <span class="p">[</span><span class="s2">&#34;_id&#34;</span><span class="p">,</span> <span class="s2">&#34;tenantId&#34;</span><span class="p">,</span> <span class="s2">&#34;createdAt&#34;</span><span class="p">,</span> <span class="s2">&#34;items&#34;</span><span class="p">],</span>
</span></span><span class="line"><span class="ln"> 7</span><span class="cl">      <span class="nx">properties</span><span class="o">:</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln"> 8</span><span class="cl">        <span class="nx">tenantId</span><span class="o">:</span> <span class="p">{</span> <span class="nx">bsonType</span><span class="o">:</span> <span class="s2">&#34;string&#34;</span> <span class="p">},</span>
</span></span><span class="line"><span class="ln"> 9</span><span class="cl">        <span class="nx">createdAt</span><span class="o">:</span> <span class="p">{</span> <span class="nx">bsonType</span><span class="o">:</span> <span class="s2">&#34;date&#34;</span> <span class="p">},</span>
</span></span><span class="line"><span class="ln">10</span><span class="cl">        <span class="nx">items</span><span class="o">:</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">11</span><span class="cl">          <span class="nx">bsonType</span><span class="o">:</span> <span class="s2">&#34;array&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">12</span><span class="cl">          <span class="nx">minItems</span><span class="o">:</span> <span class="mi">1</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">13</span><span class="cl">          <span class="nx">items</span><span class="o">:</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">14</span><span class="cl">            <span class="nx">bsonType</span><span class="o">:</span> <span class="s2">&#34;object&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">15</span><span class="cl">            <span class="nx">required</span><span class="o">:</span> <span class="p">[</span><span class="s2">&#34;sku&#34;</span><span class="p">,</span> <span class="s2">&#34;qty&#34;</span><span class="p">],</span>
</span></span><span class="line"><span class="ln">16</span><span class="cl">            <span class="nx">properties</span><span class="o">:</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">17</span><span class="cl">              <span class="nx">sku</span><span class="o">:</span> <span class="p">{</span> <span class="nx">bsonType</span><span class="o">:</span> <span class="s2">&#34;string&#34;</span> <span class="p">},</span>
</span></span><span class="line"><span class="ln">18</span><span class="cl">              <span class="nx">qty</span><span class="o">:</span> <span class="p">{</span> <span class="nx">bsonType</span><span class="o">:</span> <span class="s2">&#34;int&#34;</span><span class="p">,</span> <span class="nx">minimum</span><span class="o">:</span> <span class="mi">1</span> <span class="p">}</span>
</span></span><span class="line"><span class="ln">19</span><span class="cl">            <span class="p">}</span>
</span></span><span class="line"><span class="ln">20</span><span class="cl">          <span class="p">}</span>
</span></span><span class="line"><span class="ln">21</span><span class="cl">        <span class="p">}</span>
</span></span><span class="line"><span class="ln">22</span><span class="cl">      <span class="p">}</span>
</span></span><span class="line"><span class="ln">23</span><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="ln">24</span><span class="cl">  <span class="p">},</span>
</span></span><span class="line"><span class="ln">25</span><span class="cl">  <span class="nx">validationLevel</span><span class="o">:</span> <span class="s2">&#34;moderate&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">26</span><span class="cl">  <span class="nx">validationAction</span><span class="o">:</span> <span class="s2">&#34;warn&#34;</span>
</span></span><span class="line"><span class="ln">27</span><span class="cl"><span class="p">})</span></span></span></code></pre></div><p>灰度策略：先 <code>validationLevel: &quot;moderate&quot;</code> + <code>validationAction: &quot;warn&quot;</code> 觀察兩週、確認 application 不寫違規 doc、再切 <code>&quot;strict&quot;</code> + <code>&quot;error&quot;</code> 封死。</p>
<p><strong>Step 5：App-layer 路徑 abstraction 介面</strong>。9.C37 Forbes 揭露的模式 — middleware 攔截 microservice 寫入、驗 schema、套版本欄位、把 owner microservice 的 schema 變動隔離在 abstraction 內。</p>
<p><strong>Step 6：Polymorphic + partial index</strong> — <code>partialFilterExpression</code> 避免冷分支吃 index 成本：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-javascript" data-lang="javascript"><span class="line"><span class="ln">1</span><span class="cl"><span class="nx">db</span><span class="p">.</span><span class="nx">events</span><span class="p">.</span><span class="nx">createIndex</span><span class="p">(</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl">  <span class="p">{</span> <span class="nx">type</span><span class="o">:</span> <span class="mi">1</span><span class="p">,</span> <span class="nx">timestamp</span><span class="o">:</span> <span class="o">-</span><span class="mi">1</span> <span class="p">},</span>
</span></span><span class="line"><span class="ln">3</span><span class="cl">  <span class="p">{</span> <span class="nx">partialFilterExpression</span><span class="o">:</span> <span class="p">{</span> <span class="nx">type</span><span class="o">:</span> <span class="p">{</span> <span class="nx">$in</span><span class="o">:</span> <span class="p">[</span><span class="s2">&#34;click&#34;</span><span class="p">,</span> <span class="s2">&#34;purchase&#34;</span><span class="p">]</span> <span class="p">}</span> <span class="p">}</span> <span class="p">}</span>
</span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="p">)</span></span></span></code></pre></div><p><strong>Step 7：量測 doc 形狀</strong>。用 <code>bsondump</code> + <code>$bsonSize</code> + <code>collStats</code> 量測：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-javascript" data-lang="javascript"><span class="line"><span class="ln">1</span><span class="cl"><span class="nx">db</span><span class="p">.</span><span class="nx">coll</span><span class="p">.</span><span class="nx">aggregate</span><span class="p">([</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl">  <span class="p">{</span> <span class="nx">$group</span><span class="o">:</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln">3</span><span class="cl">      <span class="nx">_id</span><span class="o">:</span> <span class="kc">null</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">4</span><span class="cl">      <span class="nx">avg</span><span class="o">:</span> <span class="p">{</span> <span class="nx">$avg</span><span class="o">:</span> <span class="p">{</span> <span class="nx">$bsonSize</span><span class="o">:</span> <span class="s2">&#34;$$ROOT&#34;</span> <span class="p">}</span> <span class="p">},</span>
</span></span><span class="line"><span class="ln">5</span><span class="cl">      <span class="nx">max</span><span class="o">:</span> <span class="p">{</span> <span class="nx">$max</span><span class="o">:</span> <span class="p">{</span> <span class="nx">$bsonSize</span><span class="o">:</span> <span class="s2">&#34;$$ROOT&#34;</span> <span class="p">}</span> <span class="p">}</span>
</span></span><span class="line"><span class="ln">6</span><span class="cl">  <span class="p">}}</span>
</span></span><span class="line"><span class="ln">7</span><span class="cl"><span class="p">])</span></span></span></code></pre></div><p>驗證點：avgObjSize 在預期範圍、validator failure rate &lt; SLO、abstraction layer schema mismatch rate 可追溯。</p>
<p><strong>Rollback boundary</strong>：validator 從 <code>strict</code> 退回 <code>moderate</code> 是 single-command、application code 不必改；abstraction layer 換版需 application code 灰度；已 embed 進去的 schema 變更要靠 backfill migration script、無法 in-place 還原。</p>
<h2 id="失敗模式">失敗模式</h2>
<p><strong>Unbounded array growth</strong>：把「使用者所有訊息」embed 進 user document、document 撞 16MB → 寫入直接 reject。修法是改 reference、訊息獨立 collection、用 <code>userId</code> 索引。</p>
<p><strong>Hot subdocument update</strong>：所有寫都打同一個 nested field、wiredTiger document-level lock 退化成熱點，concurrency 看似多核卻被序列化。修法是把熱寫欄位拆 reference document、或改 sharded collection 把寫散開（見 <a href="../shard-key-selection/">shard key selection</a>）。</p>
<p><strong><code>$lookup</code> 在 hot path</strong>：reference 沒設好變 join、p99 latency 隨 collection 大小線性退化。修法是 schema design 階段 denormalize、把 read-together 資料 embed 回 aggregate root；或 <code>$merge</code> 寫 materialized view（見 <a href="../aggregation-pipeline-optimization/">aggregation pipeline optimization</a>）。</p>
<p><strong>Schema 三代並存（缺 contract layer）</strong>：缺 validator 跟 abstraction layer、舊版欄位殘留、application code 三層 fallback、新 dev onboarding 看不懂哪個欄位是現役。9.C38 Toyota 揭露：document model 的彈性「成本是 production 必須做 schema governance」、否則「schema 自由」變「production data inconsistency」。</p>
<p><strong>Abstraction layer 變成 lock-in</strong>：app-layer contract 寫得太重、跨 vendor 遷移時 abstraction 本身要重寫。該層應該薄、只做 schema 隔離、不做業務邏輯。</p>
<p><strong>Polymorphic 全表掃描</strong>：discriminator 沒進 index、<code>type: &quot;rare&quot;</code> 查詢全表 scan。修法用 partial index 把熱類型蓋住、冷類型走全表也只是冷路徑。</p>
<p><strong>Time-series collection 用錯場景</strong>：把非 timestamp 主導資料塞進 time-series collection、失去 flexibility 又拿不到吞吐紅利。Time-series collection 是專屬優化、不是普適 collection 升級。</p>
<p>Anti-recommendation：</p>
<ul>
<li>access pattern 還沒穩定的早期 MVP 不需要鎖死 schema validator；先用 app-layer abstraction、production 穩定後再決定 DB 層該不該封死</li>
<li>JOIN-heavy / 強 normalize workload 一開始就該回 PostgreSQL JSONB 或 SQL、不是塞進 MongoDB 再 <code>$lookup</code></li>
<li>跨案合成 frame：「不是所有資料都該進 MongoDB」、document-shaped + 形狀變化頻繁的進、access pattern 固定的 KV 走 KV（9.C36 Coinbase 揭露 MongoDB + DynamoDB 按 workload 分流）</li>
</ul>
<h2 id="容量與觀測">容量與觀測</h2>
<p>關鍵 metric：</p>
<ul>
<li><strong>Document 形狀</strong>：<code>collStats.avgObjSize</code>、<code>collStats.size</code> vs <code>storageSize</code>（壓縮比）</li>
<li><strong>Contract 健康</strong>：document validation failure rate、abstraction layer schema mismatch rate</li>
<li><strong>Working set 壓力</strong>：<code>wiredTiger.cache.bytes currently in the cache</code> 對比 working set 估算</li>
<li><strong>Aggregation 副作用</strong>：profiler slow op、<code>$lookup</code> / <code>$unwind</code> 在 hot path 出現位置</li>
</ul>
<p>Mongo command：</p>
<ul>
<li><code>db.coll.stats()</code> 看 document 平均 / 最大 size、storage / index size</li>
<li><code>db.runCommand({collMod: ..., validator: ...})</code> 改 validator</li>
<li><code>db.setProfilingLevel(1, {slowms: 100})</code> 抓 slow op</li>
</ul>
<p>回到 <a href="/blog/backend/04-observability/observability-evidence-package/" data-link-title="4.20 Observability Evidence Package" data-link-desc="把 log、metric、trace、audit 與資料品質限制包成可交接證據">4.20 observability evidence</a>：把 doc size 分布、validator failure rate、abstraction layer schema mismatch、<code>$lookup</code> 出現位置列為 evidence 三件套。</p>
<p>回到 <a href="/blog/backend/09-performance-capacity/bottleneck-localization/" data-link-title="9.5 瓶頸定位流程" data-link-desc="從 app 到 DB / cache / broker / 第三方 quota 的逐層瓶頸定位">9.5 bottleneck localization</a>：working set 撐爆 RAM 時的 page fault 信號、跟 doc size 異常增長強相關。</p>
<h2 id="邊界與整合">邊界與整合</h2>
<p>Sibling deep articles：</p>
<ul>
<li><a href="../shard-key-selection/">shard key selection</a> — document 形狀決定 shard key 候選空間</li>
<li><a href="../aggregation-pipeline-optimization/">aggregation pipeline optimization</a> — <code>$lookup</code> 與 schema reference 互相牽動</li>
<li><a href="../connection-management-and-cache-layer/">connection management and cache layer</a> — abstraction layer 跟 cache 層協作</li>
</ul>
<p>Migration playbook：</p>
<ul>
<li>document 形狀走樣到無法治理時的 <a href="/blog/backend/01-database/large-scale-db-migration/" data-link-title="1.12 大規模 DB 遷移實戰" data-link-desc="跨 DB 遷移的 dual-write、[shadow read](/backend/knowledge-cards/shadow-read/)、cutover、rollback 流程 — 從實戰案例提煉的工程做法">→ MongoDB → PostgreSQL 拆 normalize</a> 路徑</li>
<li>保留 document model 換 vendor 三型對照 — 保留主 DB 補周邊（Coinbase）/ 同 DB 換託管（Forbes Atlas）/ 同 model 換 vendor（<a href="/blog/backend/01-database/vendors/cosmosdb/" data-link-title="Azure Cosmos DB" data-link-desc="全球分散式 multi-model DB、5 個 consistency levels、Microsoft 自家 dogfood 證據">Microsoft 365 Cosmos DB MongoDB API</a>）</li>
</ul>
<p>跟 1.x 互引：<a href="/blog/backend/01-database/schema-design/" data-link-title="1.2 Schema Design 與資料建模" data-link-desc="整理 table、index、key、partition、denormalization 與命名規則">1.2 schema design</a> 處理通用 schema 演進原則、本文是 MongoDB-specific 落地；<a href="/blog/backend/01-database/transaction-boundary/" data-link-title="1.3 Transaction 與一致性邊界" data-link-desc="交易邊界、isolation level、retry 策略、distributed transaction（2PC、Saga）與跨 region 強一致取捨">1.3 transaction boundary</a> 對齊 aggregate = atomic 邊界。</p>
<h2 id="相關連結">相關連結</h2>
<ul>
<li><a href="/blog/backend/01-database/vendors/mongodb/" data-link-title="MongoDB" data-link-desc="Document database 代表、Atlas managed、跨雲可用、許多大規模平台從 MongoDB 起家">MongoDB vendor overview</a> — 本文是該頁尾「schema design pattern」backlog 的深度展開</li>
<li><a href="/blog/posts/vendor-%E6%B7%B1%E5%BA%A6%E6%8A%80%E8%A1%93%E6%96%87%E7%AB%A0%E6%96%B9%E6%B3%95%E8%AB%96%E7%9A%84%E6%BC%94%E5%8C%96%E7%B4%80%E9%8C%84%E5%90%8C-vendor-%E7%B3%BB%E5%88%97%E7%9A%84%E9%96%8B%E5%A0%B4%E8%BC%AA%E6%9B%BF%E9%A9%97%E8%AD%89/" data-link-title="Vendor 深度技術文章方法論的演化紀錄：同 vendor 系列的開場輪替驗證" data-link-desc="vendor overview 飽和後要寫單一功能深度文章、需要選題與結構依據時回來。這套方法論的驗證來源與 cadence variant 在高風險場景（同 vendor sub-tool 系列）的實證。">Vendor 深度技術文章方法論</a></li>
<li><a href="/blog/backend/09-performance-capacity/cases/toyota-connected-mongodb-telematics-iot/" data-link-title="9.C38 Toyota Connected：MongoDB Atlas 撐 900 萬車輛 telematics、月 180 億 transaction" data-link-desc="Toyota Connected 用 MongoDB Atlas 撐 Safety Connect 900 萬車、月 180 億 transaction、緊急訊號 3 秒內到 agent">9.C38 Toyota Connected</a> — polymorphic + governance</li>
<li><a href="/blog/backend/09-performance-capacity/cases/forbes-mongodb-atlas-multi-cloud-migration/" data-link-title="9.C37 Forbes：自管 MongoDB → Atlas on GCP、build 時間 25 → 9 分鐘" data-link-desc="Forbes 把自管 MongoDB 遷到 Atlas on Google Cloud、6 個月完成、build 25 → 9 分鐘、120M 不重複訪客單月承接">9.C37 Forbes</a> — abstraction layer 模式</li>
<li>官方：<a href="https://www.mongodb.com/docs/manual/core/data-modeling-introduction/">MongoDB Data Modeling</a>、<a href="https://www.mongodb.com/docs/manual/core/schema-validation/">Schema Validation</a>、<a href="https://www.mongodb.com/docs/manual/core/timeseries-collections/">Time Series Collections</a></li>
</ul>
]]></content:encoded></item></channel></rss>