<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/"><channel><title>Aggregation on Tarragon</title><link>https://tarrragon.github.io/blog/tags/aggregation/</link><description>Recent content in Aggregation on Tarragon</description><generator>Hugo -- gohugo.io</generator><language>zh-TW</language><copyright>Tarragon (CC BY 4.0)</copyright><lastBuildDate>Wed, 27 May 2026 00:00:00 +0000</lastBuildDate><atom:link href="https://tarrragon.github.io/blog/tags/aggregation/index.xml" rel="self" type="application/rss+xml"/><item><title>MongoDB Aggregation Pipeline Optimization：stage 順序、index 配合與 memory 邊界</title><link>https://tarrragon.github.io/blog/backend/01-database/vendors/mongodb/aggregation-pipeline-optimization/</link><pubDate>Wed, 27 May 2026 00:00:00 +0000</pubDate><guid>https://tarrragon.github.io/blog/backend/01-database/vendors/mongodb/aggregation-pipeline-optimization/</guid><description>&lt;p>MongoDB aggregation pipeline 是 document model 做 analytical query 的主要介面、stage stream 設計直觀但 production 容易踩雷 — 上線時 200ms、半年後資料量翻倍變 8s、加 index 沒用；profiler 顯示 stage 之間在 memory 累積上百 MB temp data。Aggregation pipeline 的最佳化跟 RDBMS 的 SQL planner 完全不同邏輯 — RDBMS 靠 planner 自動重排 join / filter、MongoDB 靠寫 query 的人手動排 stage 順序。本文把 stage 機制、index 配合、memory 邊界、cross-shard 限制講清楚、並對「report dashboard 跑爆 primary」這個常見 anti-pattern 給治理路徑。&lt;/p>
&lt;p>本文不重複 &lt;a href="https://tarrragon.github.io/blog/backend/01-database/vendors/mongodb/" data-link-title="MongoDB" data-link-desc="Document database 代表、Atlas managed、跨雲可用、許多大規模平台從 MongoDB 起家">MongoDB vendor overview&lt;/a> 已寫過的 aggregation 簡介 — 而是 production tuning + 失敗修復的實作層教學。&lt;/p>
&lt;blockquote>
&lt;p>&lt;strong>前置閱讀&lt;/strong>：MongoDB workload 適配判讀（document shape 主導 / contract layer 該放哪 / 跨雲 hedging 是否需要）見 &lt;a href="../schema-design-pattern/#%e5%95%8f%e9%a1%8c%e6%83%85%e5%a2%83document-%e8%87%aa%e7%94%b1%e7%9a%84%e5%be%8c%e5%ba%a7%e5%8a%9b">schema-design-pattern 開頭 3 軸前置判讀&lt;/a>。本文聚焦 aggregation pipeline 操作層、是 &lt;em>已選 MongoDB 後&lt;/em> 的 query 層工程議題、不重複前置判讀。&lt;/p>&lt;/blockquote>
&lt;h2 id="問題情境aggregation-是-hot-path-的反模式">問題情境：aggregation 是 hot path 的反模式&lt;/h2>
&lt;p>典型觸發場景：報表 pipeline 上線時 200ms、半年後資料量翻倍變 8s、加 index 沒用；profiler 顯示 stage 之間在 memory 累積上百 MB temp data。&lt;/p>
&lt;p>進一步徵兆：&lt;/p>
&lt;ul>
&lt;li>「OLTP collection 上跑 analytical query」的混合 workload：把 &lt;code>$group + $lookup + $sort&lt;/code> 接成長 pipeline、aggregation 把整個 working set 從 cache 擠走&lt;/li>
&lt;li>Sharded cluster 上跑 cross-shard aggregation：&lt;code>$group&lt;/code> / &lt;code>$sort&lt;/code> 必須在 mongos 合併、mongos 變單點瓶頸&lt;/li>
&lt;li>&lt;code>$lookup&lt;/code> 出現在 hot path：每筆 input doc 都要去另一個 collection 查、嚴格意義上是 N+1&lt;/li>
&lt;li>&lt;code>db.serverStatus().metrics.aggStageCounters&lt;/code> 飆、&lt;code>executionStats.executionTimeMillis&lt;/code> 跟 doc 數線性增長&lt;/li>
&lt;li>Profiler 報 &lt;code>usedDisk: true&lt;/code>、aggregation OOM kill &lt;code>QueryExceededMemoryLimitNoDiskUseAllowed&lt;/code>&lt;/li>
&lt;/ul>
&lt;p>Case anchor：report dashboard 跑爆 primary 的具體 incident 細節需未來 case 補完、本文以「常見 anti-pattern」處理、不憑空編造 incident 數字。側面引用 &lt;a href="https://tarrragon.github.io/blog/backend/09-performance-capacity/cases/microsoft-365-cosmos-db-analytics/" data-link-title="9.C30 Microsoft 365：從 MongoDB 遷移到 Cosmos DB 的分析平台" data-link-desc="Microsoft 365 把使用分析平台從 MongoDB 遷移到 Cosmos DB、planet-scale 全球分散式分析">9.C30 Microsoft 365&lt;/a> — 從 MongoDB 把 analytics 分離出來的 driver。&lt;/p></description><content:encoded><![CDATA[<p>MongoDB aggregation pipeline 是 document model 做 analytical query 的主要介面、stage stream 設計直觀但 production 容易踩雷 — 上線時 200ms、半年後資料量翻倍變 8s、加 index 沒用；profiler 顯示 stage 之間在 memory 累積上百 MB temp data。Aggregation pipeline 的最佳化跟 RDBMS 的 SQL planner 完全不同邏輯 — RDBMS 靠 planner 自動重排 join / filter、MongoDB 靠寫 query 的人手動排 stage 順序。本文把 stage 機制、index 配合、memory 邊界、cross-shard 限制講清楚、並對「report dashboard 跑爆 primary」這個常見 anti-pattern 給治理路徑。</p>
<p>本文不重複 <a href="/blog/backend/01-database/vendors/mongodb/" data-link-title="MongoDB" data-link-desc="Document database 代表、Atlas managed、跨雲可用、許多大規模平台從 MongoDB 起家">MongoDB vendor overview</a> 已寫過的 aggregation 簡介 — 而是 production tuning + 失敗修復的實作層教學。</p>
<blockquote>
<p><strong>前置閱讀</strong>：MongoDB workload 適配判讀（document shape 主導 / contract layer 該放哪 / 跨雲 hedging 是否需要）見 <a href="../schema-design-pattern/#%e5%95%8f%e9%a1%8c%e6%83%85%e5%a2%83document-%e8%87%aa%e7%94%b1%e7%9a%84%e5%be%8c%e5%ba%a7%e5%8a%9b">schema-design-pattern 開頭 3 軸前置判讀</a>。本文聚焦 aggregation pipeline 操作層、是 <em>已選 MongoDB 後</em> 的 query 層工程議題、不重複前置判讀。</p></blockquote>
<h2 id="問題情境aggregation-是-hot-path-的反模式">問題情境：aggregation 是 hot path 的反模式</h2>
<p>典型觸發場景：報表 pipeline 上線時 200ms、半年後資料量翻倍變 8s、加 index 沒用；profiler 顯示 stage 之間在 memory 累積上百 MB temp data。</p>
<p>進一步徵兆：</p>
<ul>
<li>「OLTP collection 上跑 analytical query」的混合 workload：把 <code>$group + $lookup + $sort</code> 接成長 pipeline、aggregation 把整個 working set 從 cache 擠走</li>
<li>Sharded cluster 上跑 cross-shard aggregation：<code>$group</code> / <code>$sort</code> 必須在 mongos 合併、mongos 變單點瓶頸</li>
<li><code>$lookup</code> 出現在 hot path：每筆 input doc 都要去另一個 collection 查、嚴格意義上是 N+1</li>
<li><code>db.serverStatus().metrics.aggStageCounters</code> 飆、<code>executionStats.executionTimeMillis</code> 跟 doc 數線性增長</li>
<li>Profiler 報 <code>usedDisk: true</code>、aggregation OOM kill <code>QueryExceededMemoryLimitNoDiskUseAllowed</code></li>
</ul>
<p>Case anchor：report dashboard 跑爆 primary 的具體 incident 細節需未來 case 補完、本文以「常見 anti-pattern」處理、不憑空編造 incident 數字。側面引用 <a href="/blog/backend/09-performance-capacity/cases/microsoft-365-cosmos-db-analytics/" data-link-title="9.C30 Microsoft 365：從 MongoDB 遷移到 Cosmos DB 的分析平台" data-link-desc="Microsoft 365 把使用分析平台從 MongoDB 遷移到 Cosmos DB、planet-scale 全球分散式分析">9.C30 Microsoft 365</a> — 從 MongoDB 把 analytics 分離出來的 driver。</p>
<h2 id="核心機制">核心機制</h2>
<p>Aggregation pipeline 是 stage 序列：每個 stage 接 stream of document、產出 stream of document。Stage 順序直接決定後續 stage 處理量 — 第一個 stage 是 IXSCAN 還是 COLLSCAN、<code>$match</code> 推到前面還是後面、<code>$project</code> 早 drop 還是晚 drop、都會放大或縮小後續 cost。</p>
<p><strong>Optimizer rewrite</strong>：MongoDB 會自動把 <code>$match</code> / <code>$project</code> 往前推、把 <code>$sort + $limit</code> 合併成 top-K、但不保證所有 case。用 <code>explain(&quot;executionStats&quot;)</code> 看 rewrite 後的 effective pipeline、不要靠原始 pipeline 推斷實際執行順序。</p>
<p><strong>Index 配合</strong>：pipeline 的 <em>第一個 stage</em> 若是 <code>$match</code> 或 <code>$sort</code>、且能對到 index、就走 IXSCAN。中間 stage 都是 in-memory stream、沒 index 概念。所以 <code>$match</code> 永遠該排第一、配合對應 index。</p>
<p><strong>Memory 邊界</strong>：每個 aggregation stage 預設 100MB memory 上限、超過要 <code>allowDiskUse: true</code>（4.2+ 是預設）。Disk spill 啟動後 IO 嚴重拖慢、aggregation 變慢 50-100x。</p>
<p><strong><code>$lookup</code> 在 sharded cluster</strong>：foreign collection 不能 sharded（5.0 前完全不行、5.0+ 有限放寬）；<code>$lookup</code> 本質是 nested loop join、沒 hash join / merge join — 對大 collection 不可用。</p>
<p><strong><code>$facet</code> 平行多 pipeline</strong>：但所有 facet 共享同一個 100MB 限制、複雜 facet 容易撞 memory ceiling。</p>
<p><strong><code>$merge</code> / <code>$out</code></strong>：把結果寫回 collection（pre-computed view / materialized view）— 把 hot analytical query 移出 read path、是治理 anti-pattern 的主要工具。</p>
<p>對應 knowledge card：<a href="/blog/backend/knowledge-cards/hot-partition/" data-link-title="Hot Partition" data-link-desc="說明分散式 KV / OLTP 中、單一 partition 流量遠超其他的容量問題">hot-partition</a>（aggregation 集中讀單 shard 的副作用）、<a href="/blog/backend/knowledge-cards/document-store/" data-link-title="Document Store" data-link-desc="說明以 JSON 文件與彈性 schema 提供資料存取的模式，以及它仍需的治理邊界">document-store</a>、<a href="/blog/backend/knowledge-cards/stale-read/" data-link-title="Stale Read" data-link-desc="讀取到落後於最新寫入版本的舊資料">stale-read</a>（從 secondary 跑 aggregation 的 trade-off）。</p>
<h2 id="操作流程">操作流程</h2>
<p><strong>Step 0：把壞 pipeline 跟好 pipeline 並排</strong>。看一個簡化但典型的優化：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-javascript" data-lang="javascript"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="c1">// 壞：lookup 在 match 前、sort 沒 limit、project 在最後
</span></span></span><span class="line"><span class="ln"> 2</span><span class="cl"><span class="c1"></span><span class="nx">db</span><span class="p">.</span><span class="nx">orders</span><span class="p">.</span><span class="nx">aggregate</span><span class="p">([</span>
</span></span><span class="line"><span class="ln"> 3</span><span class="cl">  <span class="p">{</span> <span class="nx">$lookup</span><span class="o">:</span> <span class="p">{</span> <span class="nx">from</span><span class="o">:</span> <span class="s2">&#34;users&#34;</span><span class="p">,</span> <span class="nx">localField</span><span class="o">:</span> <span class="s2">&#34;userId&#34;</span><span class="p">,</span> <span class="nx">foreignField</span><span class="o">:</span> <span class="s2">&#34;_id&#34;</span><span class="p">,</span> <span class="nx">as</span><span class="o">:</span> <span class="s2">&#34;user&#34;</span> <span class="p">}</span> <span class="p">},</span>
</span></span><span class="line"><span class="ln"> 4</span><span class="cl">  <span class="p">{</span> <span class="nx">$match</span><span class="o">:</span> <span class="p">{</span> <span class="nx">status</span><span class="o">:</span> <span class="s2">&#34;completed&#34;</span><span class="p">,</span> <span class="s2">&#34;user.region&#34;</span><span class="o">:</span> <span class="s2">&#34;ap-tokyo&#34;</span> <span class="p">}</span> <span class="p">},</span>
</span></span><span class="line"><span class="ln"> 5</span><span class="cl">  <span class="p">{</span> <span class="nx">$sort</span><span class="o">:</span> <span class="p">{</span> <span class="nx">createdAt</span><span class="o">:</span> <span class="o">-</span><span class="mi">1</span> <span class="p">}</span> <span class="p">},</span>
</span></span><span class="line"><span class="ln"> 6</span><span class="cl">  <span class="p">{</span> <span class="nx">$project</span><span class="o">:</span> <span class="p">{</span> <span class="nx">_id</span><span class="o">:</span> <span class="mi">1</span><span class="p">,</span> <span class="nx">total</span><span class="o">:</span> <span class="mi">1</span><span class="p">,</span> <span class="nx">createdAt</span><span class="o">:</span> <span class="mi">1</span> <span class="p">}</span> <span class="p">}</span>
</span></span><span class="line"><span class="ln"> 7</span><span class="cl"><span class="p">])</span>
</span></span><span class="line"><span class="ln"> 8</span><span class="cl">
</span></span><span class="line"><span class="ln"> 9</span><span class="cl"><span class="c1">// 好：可推前的 match 寫前面、sort + limit 配對、project 早寫
</span></span></span><span class="line"><span class="ln">10</span><span class="cl"><span class="c1"></span><span class="nx">db</span><span class="p">.</span><span class="nx">orders</span><span class="p">.</span><span class="nx">aggregate</span><span class="p">([</span>
</span></span><span class="line"><span class="ln">11</span><span class="cl">  <span class="p">{</span> <span class="nx">$match</span><span class="o">:</span> <span class="p">{</span> <span class="nx">status</span><span class="o">:</span> <span class="s2">&#34;completed&#34;</span> <span class="p">}</span> <span class="p">},</span>
</span></span><span class="line"><span class="ln">12</span><span class="cl">  <span class="p">{</span> <span class="nx">$sort</span><span class="o">:</span> <span class="p">{</span> <span class="nx">createdAt</span><span class="o">:</span> <span class="o">-</span><span class="mi">1</span> <span class="p">}</span> <span class="p">},</span>
</span></span><span class="line"><span class="ln">13</span><span class="cl">  <span class="p">{</span> <span class="nx">$limit</span><span class="o">:</span> <span class="mi">100</span> <span class="p">},</span>
</span></span><span class="line"><span class="ln">14</span><span class="cl">  <span class="p">{</span> <span class="nx">$lookup</span><span class="o">:</span> <span class="p">{</span> <span class="nx">from</span><span class="o">:</span> <span class="s2">&#34;users&#34;</span><span class="p">,</span> <span class="nx">localField</span><span class="o">:</span> <span class="s2">&#34;userId&#34;</span><span class="p">,</span> <span class="nx">foreignField</span><span class="o">:</span> <span class="s2">&#34;_id&#34;</span><span class="p">,</span> <span class="nx">as</span><span class="o">:</span> <span class="s2">&#34;user&#34;</span> <span class="p">}</span> <span class="p">},</span>
</span></span><span class="line"><span class="ln">15</span><span class="cl">  <span class="p">{</span> <span class="nx">$match</span><span class="o">:</span> <span class="p">{</span> <span class="s2">&#34;user.region&#34;</span><span class="o">:</span> <span class="s2">&#34;ap-tokyo&#34;</span> <span class="p">}</span> <span class="p">},</span>
</span></span><span class="line"><span class="ln">16</span><span class="cl">  <span class="p">{</span> <span class="nx">$project</span><span class="o">:</span> <span class="p">{</span> <span class="nx">_id</span><span class="o">:</span> <span class="mi">1</span><span class="p">,</span> <span class="nx">total</span><span class="o">:</span> <span class="mi">1</span><span class="p">,</span> <span class="nx">createdAt</span><span class="o">:</span> <span class="mi">1</span><span class="p">,</span> <span class="s2">&#34;user.name&#34;</span><span class="o">:</span> <span class="mi">1</span> <span class="p">}</span> <span class="p">}</span>
</span></span><span class="line"><span class="ln">17</span><span class="cl"><span class="p">])</span></span></span></code></pre></div><p>差別：壞 pipeline 對整個 orders 做 lookup、然後才過濾；好 pipeline 先過濾 + top-100、只對 100 筆做 lookup、再過濾 lookup 結果。實際 collection 大時兩者差 50-100x。</p>
<p><strong>Step 1：拿 explain plan</strong>。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-javascript" data-lang="javascript"><span class="line"><span class="ln">1</span><span class="cl"><span class="nx">db</span><span class="p">.</span><span class="nx">coll</span><span class="p">.</span><span class="nx">explain</span><span class="p">(</span><span class="s2">&#34;executionStats&#34;</span><span class="p">).</span><span class="nx">aggregate</span><span class="p">([...])</span></span></span></code></pre></div><p>看 <code>stages[]</code> 顯示 rewrite 後的 effective pipeline、<code>executionTimeMillis</code>、<code>totalDocsExamined / totalDocsReturned</code> 比值、是否 <code>usedDisk</code>。</p>
<p><strong>Step 2：把 <code>$match</code> 推到最前</strong>。越早過濾、後續 stage 處理量越小。Optimizer 通常自己會推、但 <code>$lookup</code> 之後的 <code>$match</code> 不會自動推到 <code>$lookup</code> 之前 — 因為 lookup 出的欄位才能被那個 match 用、邏輯依賴。寫 query 時就把能推前的 <code>$match</code> 寫前面。</p>
<p><strong>Step 3：對 <code>$match</code> 欄位建 compound index</strong>。確保 <code>executionStages</code> 顯示 <code>IXSCAN</code> 而不是 <code>COLLSCAN</code>。Compound index 順序敏感 — <code>{ status: 1, createdAt: -1 }</code> 對 <code>{ status: ..., createdAt: $gte: ... }</code> 高效、對 <code>{ createdAt: $gte: ... }</code> 走不到 index。</p>
<p><strong>Step 4：<code>$sort + $limit</code> 寫在一起</strong>。Optimizer 才會推 top-K（不需要 full sort、只需要 heap）。單 <code>$sort</code> 不限 limit 會做 full sort、容易撞 memory。</p>
<p><strong>Step 5：<code>$project</code> 早寫</strong>。把不需要的欄位早期 drop、減少後續 stage 處理 doc size。對大 document 特別有效。</p>
<p><strong>Step 6：把 hot analytical pipeline 寫成 materialized view</strong>。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-javascript" data-lang="javascript"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="nx">db</span><span class="p">.</span><span class="nx">orders</span><span class="p">.</span><span class="nx">aggregate</span><span class="p">([</span>
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">  <span class="p">{</span> <span class="nx">$match</span><span class="o">:</span> <span class="p">{</span> <span class="nx">createdAt</span><span class="o">:</span> <span class="p">{</span> <span class="nx">$gte</span><span class="o">:</span> <span class="nx">ISODate</span><span class="p">(</span><span class="s2">&#34;2026-05-01&#34;</span><span class="p">)</span> <span class="p">}</span> <span class="p">}</span> <span class="p">},</span>
</span></span><span class="line"><span class="ln"> 3</span><span class="cl">  <span class="p">{</span> <span class="nx">$group</span><span class="o">:</span> <span class="p">{</span> <span class="nx">_id</span><span class="o">:</span> <span class="s2">&#34;$customerId&#34;</span><span class="p">,</span> <span class="nx">total</span><span class="o">:</span> <span class="p">{</span> <span class="nx">$sum</span><span class="o">:</span> <span class="s2">&#34;$amount&#34;</span> <span class="p">}</span> <span class="p">}</span> <span class="p">},</span>
</span></span><span class="line"><span class="ln"> 4</span><span class="cl">  <span class="p">{</span> <span class="nx">$merge</span><span class="o">:</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln"> 5</span><span class="cl">      <span class="nx">into</span><span class="o">:</span> <span class="s2">&#34;monthly_customer_summary&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 6</span><span class="cl">      <span class="nx">on</span><span class="o">:</span> <span class="s2">&#34;_id&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 7</span><span class="cl">      <span class="nx">whenMatched</span><span class="o">:</span> <span class="s2">&#34;merge&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 8</span><span class="cl">      <span class="nx">whenNotMatched</span><span class="o">:</span> <span class="s2">&#34;insert&#34;</span>
</span></span><span class="line"><span class="ln"> 9</span><span class="cl">  <span class="p">}}</span>
</span></span><span class="line"><span class="ln">10</span><span class="cl"><span class="p">])</span></span></span></code></pre></div><p>定時更新（cron / 5 分鐘一次）、application 讀 materialized view 而不是即時跑 aggregation。</p>
<p><strong>Step 7：sharded cluster 處理</strong>。避免在 hot path 用 cross-shard <code>$lookup</code> / <code>$group</code>、或把這類 query 路由到 analytical replica（用 tag set + read preference）、見 <a href="../replica-set-read-preference/">replica set read preference</a>。</p>
<p>驗證點：</p>
<ul>
<li><code>executionTimeMillis</code> 在預期 budget 內</li>
<li><code>totalDocsExamined / totalDocsReturned</code> 比值接近 1（過濾效率高）</li>
<li>無 <code>usedDisk: true</code></li>
<li>無 stage 看到 <code>inMemory &gt; 50MB</code></li>
</ul>
<p>Rollback boundary：pipeline 改寫是 application code 變更、可以灰度；materialized view（<code>$merge</code>）需備份 target collection 才能還原。</p>
<h3 id="典型-tuning-過程200ms--8s--250ms">典型 tuning 過程（200ms → 8s → 250ms）</h3>
<p>一個常見的 production pipeline 演化路徑：</p>
<ol>
<li><strong>上線時 200ms</strong>：collection 100K doc、<code>$match</code> 過濾 95%、<code>$lookup</code> 只跑 5K 次、in-memory <code>$sort</code> 處理 5K row 在 100MB 內</li>
<li><strong>半年後 8s</strong>：collection 長到 2M doc、<code>$match</code> 仍過濾 95% 但變 100K row、<code>$lookup</code> 跑 100K 次（5K → 100K 是 20x）、<code>$sort</code> 在 in-memory 撞 100MB 開始 disk spill、IO 100x 退化</li>
<li><strong>加 compound index 沒用</strong>：index 是給 <code>$match</code> 用的、但 <code>$match</code> 之後的 stage（<code>$lookup</code> / <code>$sort</code>）走的是 in-memory pipeline、index 救不了</li>
<li><strong>修法到 250ms</strong>：(a) <code>$sort + $limit</code> 配對讓 optimizer 走 top-K、避免 full sort (b) 改 schema embed 把 <code>$lookup</code> 拿掉（見 <a href="../schema-design-pattern/">schema design pattern</a>）(c) hot pipeline 寫成 <code>$merge</code> materialized view、application 讀 view 不跑 aggregation</li>
</ol>
<p>關鍵教訓：aggregation 慢的原因不在 query 本身、在 <em>資料形狀演進</em>。Index 是 hot path 的第一個槓桿、但只對 <code>$match</code> / <code>$sort</code> 第一 stage 有效；後續 stage 要靠 stage 順序、materialized view、schema denormalize 來救。</p>
<h2 id="失敗模式">失敗模式</h2>
<p><strong><code>$lookup</code> 在 hot path</strong>：list page 每行去另一 collection 查、p99 隨 page size 線性增。應在 schema design 階段 denormalize、把 read-together 資料 embed 回 aggregate root（見 <a href="../schema-design-pattern/">schema design pattern</a>）。</p>
<p><strong><code>$sort</code> 不帶 limit + 沒 index</strong>：全表 in-memory sort、撞 100MB 限制 → OOM 或 disk spill。<code>allowDiskUse: true</code> 解 OOM 但 IO 100x 退化。修法是建對應 index 走 IXSCAN sort、或限 limit 走 top-K。</p>
<p><strong>Sharded cluster cross-shard aggregation</strong>：<code>$group</code> 階段所有 partial result 跑到 mongos 合併、mongos memory + CPU 爆。修法是 group key 包含 shard key prefix（讓 group 在 shard 內完成）、或路由到 analytical replica 跑。</p>
<p><strong>Stage 順序錯</strong>：<code>$lookup</code> 放在 <code>$match</code> 前、等於對全表都做 lookup 再過濾、每個 input doc 都觸發 lookup。<code>$match</code> 永遠該排第一。</p>
<p><strong>Aggregation 把 working set 擠走</strong>：OLTP 的 hot page 被 aggregation 的 cold scan 擠出 cache、整體 query latency 一起退化。修法是 analytical workload 跟 OLTP read 隔離（read preference tag）、或搬走 analytical（見下面 anti-recommendation）。</p>
<p><strong><code>$facet</code> 滿載</strong>：四個 facet 各跑大 pipeline、共享 100MB 限制立刻爆。修法是拆成獨立 query、不要硬塞 facet。</p>
<p>Anti-recommendation：</p>
<ul>
<li><strong>報表 / BI / analytics workload 跑 MongoDB primary 是反模式</strong>：應該 (a) 設定 analytical secondary + read preference tag (b) 用 <code>$merge</code> 寫到 reporting collection (c) 進階用 BI Connector / data lake / 把 analytical workload 整批搬到 <a href="https://clickhouse.com">ClickHouse</a> / BigQuery</li>
<li><strong>「report dashboard 跑爆 primary」典型 anti-pattern</strong>：BI 工具直連 MongoDB primary 跑長 pipeline、cache eviction 把 OLTP working set 擠走、p99 latency 在報表時段集體升。沒拿到具體 incident 數字、不在本文編造、改寫成「常見 anti-pattern」並推到治理路徑</li>
<li><strong>Aggregation 不能解 read scaling</strong>：aggregation 是 OLTP 的補位、不是 read scaling 的主路。Read scaling 在大規模 OLTP 走 cache + freshness token（見 <a href="../connection-management-and-cache-layer/">connection management and cache layer</a>）、不是把 aggregation 跑爆 secondary</li>
</ul>
<h2 id="容量與觀測">容量與觀測</h2>
<p>關鍵 metric：</p>
<ul>
<li>Aggregation operation time 分布</li>
<li>Disk spill 次數</li>
<li><code>opcounters.command</code> 中 aggregate 比例</li>
<li>Cache eviction rate 在 aggregation 高峰時的變化</li>
</ul>
<p>Mongo command：</p>
<ul>
<li><code>db.currentOp({ &quot;command.aggregate&quot;: { $exists: true } })</code>：當前 aggregation 在跑</li>
<li><code>db.serverStatus().metrics.aggStageCounters</code>：stage 級別 counter</li>
<li><code>explain(&quot;executionStats&quot;)</code>：單 query 詳細分析</li>
</ul>
<p>Profiler：<code>db.setProfilingLevel(1, {slowms: 200})</code>、看 <code>usedDisk</code> flag 跟 <code>numYield</code>。</p>
<p>回到 <a href="/blog/backend/04-observability/observability-evidence-package/" data-link-title="4.20 Observability Evidence Package" data-link-desc="把 log、metric、trace、audit 與資料品質限制包成可交接證據">4.20 observability evidence</a>：aggregation slow log + cache hit ratio + disk spill rate 是「analytical 壓力」的 evidence 三件套。</p>
<p>回到 <a href="/blog/backend/09-performance-capacity/bottleneck-localization/" data-link-title="9.5 瓶頸定位流程" data-link-desc="從 app 到 DB / cache / broker / 第三方 quota 的逐層瓶頸定位">9.5 bottleneck localization</a>：用 explain executionStats 把 pipeline stage 對到瓶頸（IXSCAN 還是 COLLSCAN、in-memory 還是 disk spill、shard-local 還是 mongos merge）。</p>
<h2 id="邊界與整合">邊界與整合</h2>
<p>Sibling deep articles：</p>
<ul>
<li><a href="../schema-design-pattern/">schema design pattern</a> — embedded 設計可消除大部分 <code>$lookup</code></li>
<li><a href="../shard-key-selection/">shard key selection</a> — 決定 aggregation 是 shard-local 還是 cross-shard</li>
<li><a href="../replica-set-read-preference/">replica set read preference</a> — aggregation 跑 secondary 的 stale read trade-off</li>
<li><a href="../connection-management-and-cache-layer/">connection management and cache layer</a> — report dashboard 跑爆 primary 時的 cache + read scaling 主路</li>
</ul>
<p>Migration playbook：analytical workload 大到不能繼續混在 MongoDB → split 出 <a href="/blog/backend/01-database/vendors/cosmosdb/" data-link-title="Azure Cosmos DB" data-link-desc="全球分散式 multi-model DB、5 個 consistency levels、Microsoft 自家 dogfood 證據">→ Cosmos DB MongoDB API + Synapse</a> 或 <a href="/blog/backend/01-database/vendors/dynamodb/" data-link-title="DynamoDB" data-link-desc="AWS managed key-value、partition-based scaling、9000 萬 RPS sustained 實戰證據">→ DynamoDB + Athena/Glue</a>（access pattern 重設計）。</p>
<p>跟 1.x 互引：<a href="/blog/backend/01-database/kv-document-capacity-planning/" data-link-title="1.10 KV / Document DB 容量規劃" data-link-desc="DynamoDB / Cosmos DB / Bigtable / MongoDB 等 KV / Document DB 的容量設計、partition key 取捨、capacity mode 選擇">1.10 KV / Document DB 容量規劃</a> 把 aggregation 列為 read-shape 的成本維度；<a href="/blog/backend/01-database/high-concurrency-access/" data-link-title="1.1 高併發下的 SQL 讀寫邊界" data-link-desc="說明高併發服務如何共用資料庫 client、控制 transaction、管理 connection pool、避免資料庫成為瓶頸">1.1 高併發資料存取</a> 處理「OLTP + analytical 同 cluster」的反模式。</p>
<h2 id="相關連結">相關連結</h2>
<ul>
<li><a href="/blog/backend/01-database/vendors/mongodb/" data-link-title="MongoDB" data-link-desc="Document database 代表、Atlas managed、跨雲可用、許多大規模平台從 MongoDB 起家">MongoDB vendor overview</a> — 本文是該頁尾「aggregation pipeline optimization」backlog 的深度展開</li>
<li><a href="/blog/posts/vendor-%E6%B7%B1%E5%BA%A6%E6%8A%80%E8%A1%93%E6%96%87%E7%AB%A0%E6%96%B9%E6%B3%95%E8%AB%96%E7%9A%84%E6%BC%94%E5%8C%96%E7%B4%80%E9%8C%84%E5%90%8C-vendor-%E7%B3%BB%E5%88%97%E7%9A%84%E9%96%8B%E5%A0%B4%E8%BC%AA%E6%9B%BF%E9%A9%97%E8%AD%89/" data-link-title="Vendor 深度技術文章方法論的演化紀錄：同 vendor 系列的開場輪替驗證" data-link-desc="vendor overview 飽和後要寫單一功能深度文章、需要選題與結構依據時回來。這套方法論的驗證來源與 cadence variant 在高風險場景（同 vendor sub-tool 系列）的實證。">Vendor 深度技術文章方法論</a></li>
<li>官方：<a href="https://www.mongodb.com/docs/manual/aggregation/">Aggregation Pipeline</a>、<a href="https://www.mongodb.com/docs/manual/core/aggregation-pipeline-optimization/">Optimize Pipelines</a>、<a href="https://www.mongodb.com/docs/manual/reference/operator/aggregation/merge/">$merge</a></li>
</ul>
]]></content:encoded></item></channel></rss>