Load-Test on Tarragon

k6：Threshold CI Gate 與 Scenario 設計

Tue, 23 Jun 2026 00:00:00 +0000

問題情境

Load test 跑完會產生大量指標，但 CI pipeline 需要的是 pass/fail 訊號。若沒有 threshold 把指標轉成判讀結論，效能退化只能靠人工看 dashboard 發現，等到看見時通常已經累積數個版本。

另一面，threshold 的判讀品質取決於 workload model 的真實度。用 --vus 10 --duration 30s 跑出來的結果跟 production 流量結構差距太大時，threshold 通過也無法證明 production 安全。

這篇處理兩個問題：怎麼設 threshold 讓 CI gate 可靠，怎麼設 scenario 讓 workload 接近真實。

Threshold 設計

Threshold 的責任是把 load test 指標轉成 CI 的 pass/fail 訊號。k6 在所有 threshold 都通過時回傳 exit code 0，任一 threshold 失敗就回傳非零 — CI pipeline 直接用 exit code 判斷。

多指標 threshold

單一指標 threshold 容易漏風險。latency 正常但 error rate 偏高代表系統在丟請求；throughput 正常但 latency 偏高代表排隊開始堆積。完整的 threshold 至少涵蓋三個面向：

1export const options = {
2  thresholds: {
3    http_req_duration: ['p(95)<500', 'p(99)<1000'],
4    http_req_failed:   ['rate<0.01'],
5    http_reqs:         ['rate>100'],
6  },
7};

latency threshold 用 percentile 而不是 average — average 會被長尾稀釋，p95/p99 更接近使用者感知的最差體驗。

門檻來源

Threshold 的門檻從 production baseline 出發。先從 observability 系統（Grafana / Datadog）取最近 7-30 天的 p95/p99 latency 與 error rate，加上可接受退化幅度（通常 10-20%）作為 threshold。門檻太緊會讓 CI 環境噪音觸發 false positive；門檻太寬會讓真退化滑過去。

校準節奏：每月或每次重大架構變更後重新對齊 production baseline，避免 threshold 跟真實系統漂移。

Path-level threshold

不同 API path 的效能特徵不同。checkout 路徑的 latency 容忍度可能比 listing 路徑低很多。k6 的 group + tag 機制讓 threshold 可以按 path 設定：

 1import { group } from 'k6';
 2
 3export default function () {
 4  group('checkout', function () {
 5    // checkout 請求
 6  });
 7  group('listing', function () {
 8    // listing 請求
 9  });
10}
11
12export const options = {
13  thresholds: {
14    'http_req_duration{group:::checkout}': ['p(95)<300'],
15    'http_req_duration{group:::listing}':  ['p(95)<800'],
16  },
17};

path-level threshold 讓 gate 的判讀粒度從「整體效能」細化到「關鍵路徑效能」。

Scenario 設計

Scenario 的責任是讓壓測的流量結構接近 production。k6 提供五種 scenario executor，選擇取決於要控制什麼變量。

Executor	控制變量	適用場景
constant-vus	並發使用者數	簡單 smoke test
ramping-vus	並發使用者數	階梯式升壓找 saturation
constant-arrival-rate	固定 RPS	CI regression（穩定輸入）
ramping-arrival-rate	變化 RPS	模擬 production peak/off-peak
externally-controlled	外部 API	結合 production 流量 replay

Executor 選擇判準

constant-vus 最簡單，但 throughput 會隨 response time 波動 — 伺服器變慢時 RPS 自動下降，掩蓋了真正的壓力。constant-arrival-rate 控制 RPS 穩定，能讓 threshold 的判讀基準一致，但需要設定足夠的 preAllocatedVUs 避免 k6 因為 VU 不足而主動降速。

CI regression 測試建議用 constant-arrival-rate：輸入固定、輸出可比較、版本間的差異才有意義。

Production traffic shape 對齊

用 ramping-arrival-rate 模擬 production 的流量形狀：

 1export const options = {
 2  scenarios: {
 3    peak_simulation: {
 4      executor: 'ramping-arrival-rate',
 5      startRate: 50,
 6      stages: [
 7        { target: 200, duration: '2m' },  // ramp up
 8        { target: 200, duration: '5m' },  // sustain peak
 9        { target: 50,  duration: '1m' },  // ramp down
10      ],
11      preAllocatedVUs: 300,
12    },
13  },
14};

流量形狀的參數（startRate / target / duration）從 production access log 的 peak 時段推算。Shopify 的 BFCM 準備流程把 game day 的 load test scenario 跟實際峰值形狀對齊 — 短時間爆量加高寫入比例需要特別設計 scenario 來覆蓋。

Cohort 模擬

Production 流量不是單一類型。用多 scenario 並行模擬不同 cohort：

 1export const options = {
 2  scenarios: {
 3    read_traffic: {
 4      executor: 'constant-arrival-rate',
 5      rate: 150, exec: 'readFlow',
 6      preAllocatedVUs: 200,
 7      duration: '5m',
 8    },
 9    write_traffic: {
10      executor: 'constant-arrival-rate',
11      rate: 30, exec: 'writeFlow',
12      preAllocatedVUs: 50,
13      duration: '5m',
14    },
15  },
16};
17
18export function readFlow() { /* GET 請求 */ }
19export function writeFlow() { /* POST 請求 */ }

讀寫比例從 production 的 access log 或 APM 資料推算。比例偏差會讓瓶頸位置失真 — 讀為主的模型抓不到寫入引起的 lock contention。

資料驅動

測試資料用 SharedArray 載入，避免每個 VU 各自載入造成記憶體浪費：

1import { SharedArray } from 'k6/data';
2
3const users = new SharedArray('users', function () {
4  return JSON.parse(open('./users.json'));
5});

資料來源可以是 production sample（脫敏後）或 synthetic generation。資料分佈需要接近 production — ID 範圍、key 分佈、payload 大小都會影響 query plan 與 cache 行為。

CI 整合實務

Fast path（每次 push）

固定 scenario + 短 duration（30s-2min），用 constant-arrival-rate 做 regression 偵測。threshold 設在 production baseline + 10%。這一層的目的是快速攔住明顯退化，不需要模擬完整峰值。

Slow path（merge gate）

完整 scenario + 較長 duration（5-15min），包含多 cohort 與 ramping 模擬。threshold 涵蓋 path-level 指標。這一層的目的是深層驗證變更在接近真實壓力下的行為。

結果留存

k6 結果預設輸出到 stdout。CI 整合時用 --out flag 把結果送到時序資料庫（InfluxDB / Prometheus Remote Write / Grafana Cloud k6），讓歷史趨勢可查詢。趨勢比較能偵測 threshold 內但持續惡化的 slow drift。

LinkedIn 的自動化壓測實踐把 load test 結果跟容量預測接在一起 — saturation point 隨時間的變化趨勢直接驅動擴容決策。

邊界與陷阱

Threshold variance：CI runner 的硬體差異（shared runner 的鄰居效應、network jitter、GC pause）會讓同一份 code 在不同 run 產生不同結果。控制方式：dedicated runner 消除鄰居效應、warmup iteration 丟棄前幾輪結果、多次 run 取中位數。若 variance 超過 threshold 的退化幅度，gate 判讀就不可信。

門檻過寬或過緊：threshold 永遠通過代表 gate 形同虛設；threshold 頻繁 false positive 會讓團隊忽略 CI 結果。兩者都會讓 gate 失去判讀價值。校準的判準是：過去 30 天的 threshold 結果中，真正需要關注的退化是否都被攔住，同時 false positive 率低於 5%。

Scenario 跟 production drift：production 的流量結構會隨產品演進改變。定期（每月或每次重大功能上線）用 access log 校準 scenario 的 RPS、cohort 比例與資料分佈，避免模型越跑越偏。

整合路由

上游概念：6.2 load testing 的 workload model 設計
下游能力：6.13 performance regression gate 的 baseline 管理與退化定位
平行 vendor：Gatling、Locust、JMeter
案例回寫：Shopify BFCM 容量治理（game day load test 對齊峰值形狀）、LinkedIn Automated Load Testing（持續壓測驅動容量預測）

k6

Fri, 15 May 2026 00:00:00 +0000

k6 的核心責任是把 workload model 轉成可重跑、可版本化、可接到 CI 的壓測 scenario。它適合 API、HTTP、gRPC、WebSocket 與 browser-style flow 的負載驗證，重點在用程式化腳本描述使用者行為、負載階段、threshold 與結果輸出。

服務定位

k6 是 Grafana Labs 旗下的 scriptable load testing 工具、2021 年被 Grafana 收購。產品線分兩層：k6 OSS（Go 寫的 engine + JS API 描述 scenario、CLI 為主、output 可丟 Prometheus / InfluxDB / JSON / CSV）跟 Grafana Cloud k6（前 k6 Cloud、SaaS 多 region runner + 結果保存 + 跟 Grafana Cloud dashboard / Loki / Tempo 同 plane）。底層 engine 是 Go、不是 JS — JS 只是 scenario 描述層、runtime 由 Go 跑、所以單機 VU 容量比 Python-based 工具高出一個量級。

跟 JMeter 比、k6 走 code-first + CI-friendly、JMeter 走 XML / GUI + plugin ecosystem；JMeter 在 protocol 廣度（JDBC / LDAP / JMS / FTP）跟非工程團隊操作勝出、k6 在版控、PR review、artifact pipeline 勝出。跟 Locust 比、k6 用 JS、Locust 用 Python；Locust 對 Python team 自然、但 Python GIL 讓單機 VU 容量受限、需多 worker、k6 單機可跑數千 VU。跟 Gatling 比、Gatling 走 JVM + Scala/Java/Kotlin DSL、適合 JVM-heavy 團隊；k6 的 threshold + Grafana ecosystem 整合在 release gate 場景更直接。

定位

k6 適合把壓測納入工程流程。當團隊已經能描述 traffic shape、endpoint mix、arrival rate、think time 與 stop condition，k6 可以把這些模型寫成腳本，讓每次 release、capacity review 或 peak-event readiness 都能重跑同一組驗證。

這個定位讓 k6 接到三個主章。它從 9.2 Workload Modeling 接收流量模型，從 9.4 Saturation Discovery 接收 ramp-up 與 knee point 判讀，從 9.10 Production-Side 驗證接收 canary、dark launch 或 production-like load test 的安全邊界。

適用場景

API 壓測是 k6 最穩定的入口。Checkout、login、search、order query、payment callback mock 與 internal API 都可以用 scenario 表達，並用 threshold 把 latency、error rate 與 throughput 轉成 pass / fail 訊號。

CI performance gate 是 k6 的常見價值。團隊可以在 merge、nightly、pre-release 或 game day 前跑固定 baseline，觀察 p95 / p99、error rate、throughput 與 regression trend，再把結果交給 6.13 Performance Regression Gate。

Peak readiness rehearsal 適合用 k6 表達階段式負載。活動前可以用 ramping arrival rate 模擬 T-90、T-30、T-7、T-1 與 T-0 的負載階段，並把結果回寫到 9.11 高峰事件準備。

最短判讀路徑

判斷 k6 deployment 是否健康、最少看四件事：

Scenario design：用 executor: ramping-arrival-rate 而非 constant-vus、把 RPS / arrival rate 設成 first-class、VU 由 engine 自動算；scenario 描述跟 9.2 Workload Modeling 的 endpoint mix、think time、cohort 對得起來
Threshold gate：thresholds 區塊明確寫 p95 / p99 / error rate / throughput、CI fail 條件清楚、不靠人眼看 summary 判斷 pass / fail
Output 進 observability stack：--out experimental-prometheus-rw 把 metric remote-write 到 Prometheus、Grafana dashboard 接 k6 同 datasource、結果跟 target service 的 saturation metric 在同一張圖上看
k6 Cloud vs CLI 邊界：本地 CLI 跑 baseline + CI、Grafana Cloud k6 跑跨 region / 大規模 / 結果 retention；不要把 CI gate 放 Cloud（成本 + 時間不對）、也不要本地單機硬跑 100k VU（runner 自身瓶頸假象）

四件事任一缺失、就是 scenario 已經寫得不完整、threshold gate 失效、或 runner 觀測缺失。

選型判準

判準	k6 的價值	需要補的能力
腳本化	scenario、threshold、setup / teardown 可版本化	production traffic 抽樣與模型校正
CI 友善	CLI 與 artifact 容易接 pipeline	長期趨勢儲存與 release gate 語意
API 導向	HTTP / gRPC / WebSocket 等常見 API 場景清楚	複雜瀏覽器互動與端到端資料準備
團隊學習成本	JavaScript 腳本容易被多數 backend 團隊接手	大型分散式 runner 與測試資料治理

腳本化價值來自可重跑。一次性的壓測只能回答當天配置能撐多少；可版本化 scenario 可以回答 release 後容量曲線有沒有漂移，並讓退化調查回到同一份 workload model。

CI 友善價值來自交接成本低。壓測結果要能轉成 artifact、threshold、trend 與 gate decision，才會從「工程師手動跑工具」變成 release 流程的一部分。

API 導向價值來自後端路徑明確。k6 很適合 checkout API、search API、internal API 與 webhook receiver；如果主要問題是完整 browser UX、第三方真實支付或多裝置同步，文章要把資料準備、side effect 與環境隔離另外寫清楚。

跟其他工具的取捨

k6 和 JMeter 的主要差異是工作方式。k6 偏程式化腳本、CLI、CI artifact 與工程流程；JMeter 偏 GUI、protocol plugin、既有企業測試流程與非工程團隊協作。

k6 和 Gatling 的主要差異是生態與語言。k6 使用 JavaScript-style 腳本，Gatling 偏 JVM / Scala / Java / Kotlin 生態；團隊語言能力與既有 pipeline 會影響維護成本。

k6 和 Locust 的主要差異是團隊技能與模型表達。Locust 使用 Python，對 Python 團隊與 custom user behavior 很自然；k6 的 threshold、CLI 與雲端 / Grafana 生態讓 release gate 整合更直接。

k6 和 Vegeta 的主要差異是場景複雜度。Vegeta 適合簡單 HTTP load、CLI workflow 與快速 saturation 探測；k6 適合較完整的 multi-step scenario、threshold 與長期 baseline。

核心取捨表

取捨維度	k6	JMeter	Locust	Gatling
Scenario 語言	JavaScript（ES6+）	XML（GUI 編輯）/ Groovy	Python	Scala / Java / Kotlin DSL
Engine runtime	Go	JVM	Python（gevent）	JVM（Akka）
單機 VU 容量	高（thousands+）	中（JVM heap-bound）	中低（GIL、需 multi-worker）	高（Akka actor）
CI 友善度	強 — CLI + threshold + JSON / Prometheus	中 — 需 plugin / Jenkins integration	中 — CLI 友善但 result reporting 較弱	強 — CLI + HTML report + Maven/Gradle plugin
Protocol 廣度	HTTP / gRPC / WebSocket / Browser	最廣（JDBC / LDAP / JMS / FTP / SMTP）	HTTP 為主、其他靠 custom client	HTTP / WebSocket / JMS / MQTT
Browser test	k6 Browser（Playwright-based）	無原生（Selenium plugin）	無原生	無原生
Distributed	k6 Cloud / k6 Operator on k8s	Master / Slave（運維重）	Master / Worker	Gatling Enterprise / FrontLine
適合場景	API-first + CI gate + Grafana ecosystem	企業 + protocol 多 + 非工程團隊	Python team + custom user behavior	JVM team + DSL 表達力

選 k6 的核心訴求：API-first scenario + CI gate + Grafana / Prometheus ecosystem 已用、且團隊接受 JS DSL。Protocol 廣度需求大、走 JMeter；Python team、走 Locust；JVM-heavy、走 Gatling。

進階主題

k6 Browser：基於 Chromium + Playwright API、跑在 k6 同 scenario 內、可混 protocol-level 跟 browser-level load（前段 API call、後段真實 browser flow）。意義是「pure API load 跟 real user UX 在同一份 scenario」、不用維護兩套工具。但 browser VU 比 protocol VU 重幾十倍、runner cost 要重新算。

xk6 extensions：用 Go 寫 k6 extension、補 protocol（Kafka / Redis / SQL / AMQP）或 output（custom backend）。xk6 build 生出客製 binary、organization 可維護自家 extension。意義是 k6 不只跑 HTTP — Kafka producer load / Redis hot-key probe 都能用同一個 scenario harness。

Grafana Cloud k6（前 k6 Cloud）：SaaS 跑 multi-region runner、結果保存、跟 Grafana Cloud dashboard / Loki / Tempo / Prometheus 同 plane。適合 跨 region 真實延遲驗證、大規模 distributed run、結果 retention + team share。跟 Grafana Cloud 已用的團隊 ecosystem 一致；只用 OSS 的團隊走 k6 Operator on k8s。

Distributed execution：自管 distributed 走 k6 Operator on Kubernetes、scenario 拆 instance、結果 aggregate 到 output。意義是不需要 k6 Cloud 也能跑跨機器 load、但 runner pool 自管成本 + 結果 aggregation 自己處理。

Output integration：--out experimental-prometheus-rw 直接 remote-write 到 Prometheus、Grafana dashboard 一張圖看 k6 client metric + target service saturation；--out cloud 上 Grafana Cloud k6；--out json=... 落地檔案給 CI artifact；--out influxdb 接 InfluxDB（legacy）。Loki 用來接 k6 console log、Tempo 用來接 k6 trace（若 scenario 帶 W3C trace context）。

排錯與失敗快速判讀

VU 跑不上去 / runner CPU 滿：scenario 寫了重 JS 邏輯（big JSON parse、複雜 regex、crypto）— 把 setup-once 邏輯搬 setup()、不要每 VU iteration 重算
Resource throttling 假象：runner 機器 CPU / network bandwidth / file descriptor 自身瓶頸、target service 還沒到 saturation — 換大機 / 多 runner / 看 runner 自身 saturation metric 排除
Threshold 設過嚴 / CI 一直 red：threshold 抄 production SLO 不留 budget — staging tenant 跑 5-10 次抓 baseline distribution、threshold 設 baseline + buffer、不是 SLO 直接搬
p95 看起來好但 user 抱怨慢：scenario endpoint mix 跟 production traffic shape 不符 — 補 production endpoint distribution、按 weight 配 scenario、跟 9.2 Workload Modeling 對齊
Script logic 太重 / VU iteration 不穩：在 scenario 內做 token refresh / large payload 處理、iteration 時間漂移 — 用 executor: ramping-arrival-rate 鎖 RPS 而非 VU count、iteration 時間漂移由 engine 吸收
結果無法回放 / 找不到 baseline：output 沒落 artifact、Grafana dashboard 沒存 time range — 每次 run 強制 --out json + tag scenario version + push 到 evidence package

操作成本

k6 的主要成本是 workload model 維護。腳本本身容易寫，真正的成本在 production endpoint mix、資料分布、tenant / region / user cohort、think time 與 peak shape 的持續校正。

Runner 成本會隨負載規模上升。單機 runner 適合小型 API baseline；跨 region、數十萬 RPS 或長時間 soak test 需要分散式 runner、網路成本、目標服務隔離與觀測儲存。

測試資料治理是高風險成本。Checkout、payment、order、email、notification 與 webhook 路徑都可能產生 side effect，因此 scenario 要明確定義 test tenant、idempotency key、mock boundary、cleanup 與 stop condition。

Evidence Package

k6 結果應回寫到 evidence package。最小欄位包括 scenario version、target environment、time range、VUs / arrival rate、threshold、p95 / p99、error rate、throughput、target service saturation metric、known gap 與 owner。

欄位	k6 證據來源
Source	k6 summary、JSON output、dashboard link
Time range	test start / end
Query link	Grafana / Prometheus / APM 查詢連結
Data quality	scenario coverage、test data freshness
Confidence	production similarity、runner capacity
Known gap	未覆蓋 endpoint、未模擬第三方、資料偏差

Evidence package 的核心用途是讓 release gate 能判斷。k6 的 threshold pass 只是其中一個訊號；gate 還要看 target service 的 CPU、connection、DB latency、cache hit rate、queue lag 與 cloud cost。

案例回寫

k6 目前在 09 案例庫中主要作為工具類承接點，案例主角仍是負載形狀與驗證節奏。它可回寫到 9.C15 Tixcraft 售票壓測的 pre-event load test 判讀、9.C1 Prime Day readiness 的 staged validation、9.C28 FanDuel 雙峰 workload 的多模型壓測需求、9.C2 GR8 Tech FIFA World Cup readiness 的 54000 TPS @ 25ms p95 驗證、以及 9.C7 Lyft 8x peak 跨 100+ 微服務的獨立 threshold 設計。

這些案例提供的是負載形狀與工程節奏。k6 頁引用案例時，要把 case 轉成 workload model、ramp-up、threshold、runner 規模與 stop condition，並讓工具回到可替換的承載選項 — 例如 GR8 Tech 25ms p95 是 threshold pass / fail 的硬目標、Lyft 的「8x 是特定服務、不是全部 8x」要拆成 per-service scenario。

下一步路由

Apache JMeter

Fri, 15 May 2026 00:00:00 +0000

JMeter 的核心責任是把多 protocol 測試與既有企業測試資產轉成可重跑的負載驗證。它適合 GUI 驅動、plugin 生態成熟、HTTP 之外還需要 JDBC、JMS、FTP、mail 或 legacy protocol 的團隊，重點在把測試流程保留成可審查、可交接、可在 non-GUI mode 跑的 artifact。

服務定位

JMeter 是 Apache Software Foundation 的 OSS load testing tool、Java 寫、用 XML 描述 thread group / sampler / listener 組成的 test plan（.jmx 檔）、支援 GUI 與 CLI（non-GUI / headless）雙模式。它是業界最老牌、protocol 覆蓋最廣的壓測工具 — sampler 直接覆蓋 HTTP、JDBC、JMS、SOAP、FTP、SMTP、IMAP、TCP、JUnit、OS process 等。

跟 k6 比、JMeter 走 GUI-driven + protocol 廣、k6 走 code-first（JavaScript）+ HTTP 為主；JMeter 適合 QA 團隊維護、k6 適合 dev / SRE 寫進 CI。跟 Locust 比、JMeter 用 XML + plugin、Locust 用純 Python class、custom client 彈性 Locust 強但 protocol 內建支援 JMeter 廣。跟 Gatling 比、JMeter 偏 GUI / 多 protocol、Gatling 偏 JVM DSL（Scala / Java / Kotlin）+ async runtime、單機 throughput Gatling 較高但 protocol 廣度與既有資產承接 JMeter 勝。

關鍵張力：GUI / protocol 廣度 ↔ 單機 throughput / CI 友善度 是選 JMeter 的根本取捨。GUI 適合 QA 團隊與跨角色協作、.jmx 又有 plugin 生態與十多年累積；代價是 XML diff 難 review、GUI listener 吃記憶體、CI 整合相比 k6 / Gatling 多一層 packaging。

JMeter 適合測試資產已經存在的組織。當團隊有大量 .jmx 測試計畫、QA 團隊用 GUI 維護 scenario、或壓測需要跨 HTTP、JDBC、JMS 與其他 plugin protocol，JMeter 的價值在於承接組織流程，而不只是產生 HTTP 負載。這個定位讓 JMeter 接到 9.3 壓測工具選型與 9.10 Production-Side 驗證。它能支援 production-like test 的多系統 dependency，但 evidence package 要補上測試計畫版本、plugin 版本、runner 配置與結果保存方式。

適用場景

多 protocol 壓測是 JMeter 的主要入口。企業服務常同時需要測 HTTP API、JDBC query、JMS queue、FTP 或 mail flow，JMeter 的 sampler 與 plugin 生態能讓同一份測試計畫覆蓋多種 dependency。

GUI 協作適合非純工程團隊。QA、測試中心或受監管環境常需要可視化測試設計、審核與交接，JMeter 的 GUI 能降低跨角色溝通成本。

Legacy 測試資產適合保留 JMeter。既有 .jmx 檔案、listener、plugin 與報表流程如果已經運作多年，重寫到 k6、Gatling 或 Locust 的機會成本要用維護收益抵銷。

最短判讀路徑

判斷 JMeter deployment 是否健康、最少看四件事：

Thread group 設計：thread count / ramp-up / loop count / duration 是否反映真實流量模型、有沒有用 Stepping Thread Group（plugin）或 Concurrency Thread Group 控制 arrival rate、不是把 thread 當「user」直接綁
Listener 配置：GUI listener（View Results Tree / Aggregate Report / Graph）只在 design / debug 階段開、正式跑必須改 Simple Data Writer 輸出 JTL、結果分析交給離線 HTML report 或外部 Grafana
Distributed mode 設定：單機 thread 上限約 3000-5000（受 JVM heap 與 thread context switch 限制）、超過要走 master + slave（remote engine）；slave 機器 plugin / JMeter version / JVM 參數要跟 master 一致、否則結果不可信
GUI vs CLI 模式區分：GUI 是 design / debug only、production load 一律走 jmeter -n -t plan.jmx -l result.jtl；GUI 跑大規模測試會把 listener 拉爆記憶體、結果反而失真

四件事任一缺、就是 9.3 壓測工具選型邊界的待補項目。

選型判準

判準	JMeter 的價值	需要補的能力
多 protocol	sampler 與 plugin 覆蓋廣	plugin 版本治理與測試環境一致性
GUI 協作	非工程角色可讀可改	code review、diff 與版本控制紀律
既有資產	`.jmx`、listener、報表可延續	scenario cleanup 與 artifact 標準化
分散式執行	remote engine 可擴負載	runner sizing、網路瓶頸與結果合併

多 protocol 價值來自 dependency coverage。當 workload model 包含 database、queue、file transfer 或 legacy endpoint，JMeter 可以把不同 dependency 的壓力放在同一個測試計畫中觀察。

GUI 協作價值來自跨角色可見性。這個優點會帶來版本控制成本，因為 XML diff 不容易 review；團隊要補上 naming、folder structure、parameterization 與 review checklist。

跟其他工具的取捨

JMeter 和 k6 的主要差異是 workflow。JMeter 偏 GUI、plugin 與既有企業流程；k6 偏 code-first、CLI、threshold 與 CI artifact。

JMeter 和 Gatling 的主要差異是 scenario 表達。JMeter 用 test plan、thread group、sampler 與 listener 組裝；Gatling 用 JVM DSL 描述 simulation，較適合工程團隊維護複雜 flow。

JMeter 和 Locust 的主要差異是自訂能力。JMeter 依賴 plugin 與 sampler，Locust 可以直接用 Python library 實作 custom client；如果 protocol 特別特殊，Python 團隊可能更適合 Locust。

JMeter 和 Vegeta 的主要差異是複雜度。Vegeta 適合快速 HTTP saturation probe；JMeter 適合多步驟、多 dependency 與可交接測試計畫。

取捨維度	JMeter	k6	Locust	Gatling
描述語言	XML（`.jmx`）+ GUI	JavaScript	Python（class-based）	Scala / Java / Kotlin DSL
Protocol 覆蓋	HTTP/JDBC/JMS/SOAP/FTP/SMTP/TCP	HTTP/WebSocket/gRPC	HTTP + 任何 Python lib custom	HTTP/JMS/MQTT
單機 throughput	中（thread-per-user）	高（Go goroutine）	中（gevent / async）	高（Akka async）
Runtime model	JVM thread	Go runtime	Python gevent	JVM async actor
CI 友善度	需 packaging `.jmx` + plugin	強 — 單一 JS file + CLI	強 — pip + Python file	強 — sbt / Maven + Scala file
GUI	完整 GUI（design / debug）	無（CLI only）	Web UI（runtime monitoring）	無（HTML report only）
Distributed	Master + Slave（remote engine）	k6 Cloud / Operator	Master + Worker	Gatling Enterprise / FrontLine
適合場景	Enterprise QA + 多 protocol	Dev / SRE + HTTP-heavy + CI	Python 團隊 + custom protocol	JVM 團隊 + 複雜 scenario

操作成本

JMeter 的主要成本是測試計畫治理。.jmx 檔案可以累積大量 listener、debug sampler、hard-coded variable 與過期 assertion，長期不整理會讓壓測結果失去可追溯性。

Runner 成本來自 JVM 與 listener。GUI listener 適合開發階段觀察，不適合大規模壓測；正式測試要使用 non-GUI mode，把結果輸出成 JTL、HTML report 或外部 metrics。

Plugin 成本來自版本漂移。不同 runner、不同工程師機器或 CI image 的 plugin 版本如果不一致，同一份測試計畫可能產生不同結果，因此要把 plugin 清單、JMeter 版本與 container image 固定下來。

Evidence Package

JMeter 結果應回寫到 evidence package。最小欄位包括 test plan version、JMeter version、plugin list、runner topology、thread group 設定、ramp-up、duration、p95 / p99、error rate、throughput、target saturation metric 與 known gap。

欄位	JMeter 證據來源
Source	`.jmx`、JTL、HTML report、dashboard link
Time range	test start / end
Query link	APM / Prometheus / DB / queue 查詢連結
Data quality	test plan version、plugin version
Confidence	runner topology、production similarity
Known gap	未覆蓋 protocol、資料偏差、listener overhead

Evidence package 的核心用途是讓結果可審查。JMeter 測試計畫常由多人維護，gate decision 要能追到哪一版 .jmx、哪一組 runner、哪一批測試資料與哪一個目標環境。

進階主題

JMeter Plugins 生態：jmeter-plugins.org 社群維護的 plugin 集合補齊原版 JMeter 的不足 — Custom Thread Groups（Stepping / Ultimate / Concurrency / Arrivals）讓 thread schedule 反映真實 arrival rate、PerfMon 抓 remote server CPU / memory、Throughput Shaping Timer 直接以 RPS 為目標而非 thread count、Dummy Sampler 拿來 mock dependency。Plugin Manager 統一安裝、CI image 要把 plugin 清單固定（PluginsManagerCMD.sh install ）避免漂移。

BlazeMeter Cloud / Distributed execution：自建 distributed mode（master + slave 跨多 VM）成本高 — slave 機器要同 JMeter 版本、同 plugin、同 JVM 參數、RMI port 開通、結果回傳網路足夠。BlazeMeter（Perforce / 前 CA）是 JMeter SaaS、直接吃 .jmx 跑 cloud-scale 壓測、附 geo-distributed runner、適合短期 spike 測試不想自建 distributed cluster 的團隊。trade-off 是 vendor lock-in 跟 per-test 計費 — 長期高頻測試自建較划算。

Distributed mode 細節：master 機器發 control plane（thread group 配置、test plan 分發）、slave 跑 thread 並回傳 sample 結果。瓶頸常出在 master 收結果（RMI / 自訂 protocol），不是 slave 跑不動 — 大規模測試應該關掉 GUI listener、用 Backend Listener 把 metric 即時推到外部時序資料庫、master 只收彙整指標而非每個 sample。同步要點：所有 slave 用同一份 .jmx 與 test data CSV，CSV 不能依賴 master local path。

Backend Listener + Grafana 整合：JMeter 原生 Backend Listener 支援 InfluxDB / Graphite / Elasticsearch、把 active thread / response time / hit / error 即時推出去、Grafana 配 official JMeter dashboard 即時看 throughput / latency curve。這個組合取代 GUI listener、是 distributed mode 的標準觀測方式 — listener overhead 從 master 移到外部時序系統、master 不再被 GUI 拉爆。配合 4 observability 的時序資料庫已有時、JMeter metric 進同一個 Grafana、跟 application 端的 latency / error 並列、加速 6.13 Performance Regression Gate 的對照判讀。

排錯與失敗快速判讀

GUI 模式吃記憶體爆 / OOM：GUI listener（View Results Tree / Graph）會把所有 sample 留在 heap、跑大規模就 OutOfMemoryError — 設計階段才開 GUI、正式跑切 jmeter -n non-GUI、listener 用 Simple Data Writer 寫 JTL 而非 in-memory aggregate
Listener 拖累 throughput / 結果失真：太多 listener 同時開、每個 sample 都被多個 listener 處理、JMeter 自身成為瓶頸 — 正式測試只留 Simple Data Writer + Backend Listener、結果分析離線跑 jmeter -g result.jtl -o report/ 產 HTML
Thread group 計算錯 / 真實流量對不上：把 thread 當「user」直接設、忽略 think time + ramp-up、結果壓出來的是 thread 全速跑而非業務流量 — 改用 Concurrency Thread Group 或 Throughput Shaping Timer 直接以 RPS 為目標、配 Constant Timer 模擬 think time
Distributed mode 結果跟單機對不上：slave 機器 plugin / JMeter version / JVM heap 不一致、或 CSV 路徑只存在 master — 把 slave 環境 container 化（同 Docker image）、CSV 隨 .jmx 一起分發、--remote-start 統一啟動
.jmx XML diff 不可 review / merge conflict 多：多人同時改測試計畫、GUI 改完 XML 結構大變 — 拆 fragment（Test Fragment + Module Controller）、scenario 分檔、parameterization 走外部 CSV / properties、PR review 看截圖 + 跑結果而非 raw XML diff
Plugin 版本漂移 / CI 結果不可重現：dev 機器 plugin 跟 CI image 不同版 — 固定 plugin manifest、CI image 用 PluginsManagerCMD.sh install-for-jmx plan.jmx 從 plan 自動安裝、版本鎖到 image tag
HTTPS / TLS 連線數爆炸：JMeter 預設每 thread 一個 TLS handshake、large thread count 把 server TLS 拖垮、結果反而測到 TLS 不是 app — 開 HTTP Cache Manager 跟 KeepAlive、必要時調 httpclient4.idletimeout

案例回寫

JMeter 在 09 案例庫中適合作為 enterprise load test 承接點。它可回寫到 9.C15 Tixcraft 售票壓測的 pre-event validation、9.C17 BookMyShow ticketing 的售票流量模型、9.C1 Prime Day readiness 的 staged validation、9.C13 Hotstar IPL 1860 萬同時觀看的全球直播 pre-event rehearsal、以及 9.C14 Standard Chartered 跨 7 個受監管市場的 Aurora 4000 TPS 容量驗證。

這些案例提供的是複雜業務流程與活動前驗證節奏。JMeter 頁引用案例時，要把 case 轉成 thread group、ramp-up、data set、dependency sampler 與 result artifact，並讓負載數字回到業務流程判讀 — 例如 Hotstar 的「集中地理區 CDN 壓力」要在 JMeter 用 per-region thread group 模擬、不是把全球流量塞進單一 runner。

下一步路由

Gatling

Fri, 15 May 2026 00:00:00 +0000

Gatling 的核心責任是把複雜使用者流程寫成可維護的 JVM simulation。它適合 JVM 生態團隊、強型別 DSL、HTTP / WebSocket / JMS / MQTT 等 scenario，以及需要把 injection profile、assertion、report 與 CI pipeline 綁在一起的壓測流程。

服務定位

Gatling 是 Scala-origin / 現以 Java DSL 為主流 的 load testing 工具、跑在 JVM、async / non-blocking engine（基於 Akka / Netty）讓單一 injector node 就能驅動高 RPS。它跟 k6 / JMeter / Locust 的核心差異在 語言生態 + engine efficiency + scenario 表達力、壓出負載的能力都具備：

vs k6 — k6 走 Go runtime + JavaScript scripting、CLI / Grafana 生態友善；Gatling 走 JVM + Java/Scala/Kotlin DSL、適合既有 JVM 工具鏈與強型別 review
vs JMeter — JMeter 走 GUI / XML test plan、適合非工程角色協作；Gatling 走 code-first、適合 PR / build pipeline / refactor 工作流
vs Locust — Locust 走 Python coroutine、scripting 自由度高；Gatling 走 DSL + injection profile、scenario 結構化程度更高
engine efficiency — async / non-blocking model 讓 Gatling 在單機可推到數萬 RPS、JMeter thread-per-user 在同等資源下 throughput 較低

產品線分兩層：Gatling OSS（開源 simulation runner + HTML report）與 Gatling Enterprise（前身 FrontLine、加上 distributed injector、cluster orchestration、live monitoring、long-term result storage、role-based access）。OSS 適合單機 baseline / CI smoke、Enterprise 適合 cross-region distributed / 大型活動前壓測 / 結果長期治理。

最短判讀路徑

判斷 Gatling 在壓測流程裡是否健康、最少看四件事：

Scala DSL vs Java DSL 版本：Gatling 3.7+（2022）正式加 Java DSL、2024 後新專案多走 Java DSL；舊 Scala simulation 仍可跑、但團隊要決定 維持 Scala 還是漸進改寫 Java、避免雙語言治理
Injection profile 設計：simulation 是否明確區分 open model（rampUsersPerSec / constantUsersPerSec、模擬真實 arrival）vs closed model（atOnceUsers / rampUsers、模擬 fixed user pool），對應 9.2 Workload Modeling 的 traffic shape
Assertion gate：simulation 是否有 assertions { global.responseTime.percentile3.lt(500) } 這類 hard gate、CI 跑完直接 fail build；沒 assertion 的 simulation 只是壓測、不是 release gate
Enterprise vs OSS 邊界：是否清楚知道哪些能力只 Enterprise 有（distributed injector / multi-region / long-term result storage / live dashboard）、避免用 OSS 拼湊 Enterprise 級需求

定位

Gatling 適合 code-first 且 JVM 能力強的團隊。當 workload model 需要多步驟 flow、資料 feeder、條件分支、session state 與明確 injection profile，Gatling 能用 simulation 把這些行為寫成工程 artifact。

這個定位讓 Gatling 接到 9.2 Workload Modeling 與 9.4 Saturation Discovery。它的價值在於把 traffic shape 寫進 injection profile，讓 ramp-up、constant users、stress peak 與 soak test 都能被版本化。

適用場景

JVM 團隊適合用 Gatling 承接壓測。Java、Scala 或 Kotlin 團隊能把 simulation 當成一般程式碼 review，並用既有 build、dependency、CI 與 artifact 流程維護。

複雜 scenario 適合用 Gatling 表達。登入、搜尋、加入購物車、checkout、payment mock、order query 這類 multi-step flow 可以用 session 與 feeder 管理資料。

高品質 report 適合 release review。Gatling 的 report 能幫 reviewer 看到 response time distribution、request group、error 與 injection profile，適合在 release gate 中保留可讀證據。

選型判準

判準	Gatling 的價值	需要補的能力
JVM DSL	simulation 可 code review	Scala / Java / Kotlin 維護能力
Injection profile	負載階段可精準表達	production traffic shape 校正
Session / feeder	多步驟資料與狀態容易管理	測試資料治理與敏感資料遮罩
Report	release review 可讀性高	長期趨勢儲存與 cross-run comparison

JVM DSL 價值來自可維護性。壓測 scenario 如果需要被長期 review、重構、抽 helper 或接 build pipeline，Gatling 的 code-first workflow 會比 GUI test plan 更適合工程團隊。

Injection profile 價值來自負載形狀精準。團隊可以把 steady load、spike、ramp、open model 與 closed model 放到 simulation 中，讓 9.4 Saturation Discovery 的 knee point 判讀更可重現。

跟其他工具的取捨

Gatling 和 k6 的主要差異是語言與生態。Gatling 適合 JVM 團隊與強型別 simulation；k6 適合 JavaScript-style scripting、CLI workflow 與 Grafana 生態。

Gatling 和 JMeter 的主要差異是維護模式。Gatling 偏 code review、build pipeline 與 simulation abstraction；JMeter 偏 GUI、plugin 與跨角色測試資產。

Gatling 和 Locust 的主要差異是自訂語言。Locust 適合 Python 團隊與任意 Python client；Gatling 適合 JVM 團隊與 report / injection profile 的結構化壓測。

Gatling 和 Vegeta 的主要差異是 scenario 深度。Vegeta 適合快速 HTTP pressure test；Gatling 適合需要 session、feeder、assertion 與多 request group 的長期測試。

操作成本

Gatling 的主要成本是 JVM 團隊能力。非 JVM 團隊要承擔語言、build tool、dependency 與 simulation pattern 的學習成本；這個成本只有在 scenario 複雜度夠高時才划算。

測試資料成本來自 feeder 與 session。多步驟 flow 需要 account、cart、order、token、region 與 tenant 資料，資料過期或分布偏差會讓壓測結果失真。

Enterprise / distributed 成本要提前評估。單機 Gatling 適合中小型 baseline；跨 region、大型活動前驗證或長時間 soak test 需要 runner topology、結果集中與雲端成本治理。

Evidence Package

Gatling 結果應回寫到 evidence package。最小欄位包括 simulation version、injection profile、feeder source、target environment、assertion、response time distribution、error rate、throughput、target service saturation metric、known gap 與 owner。

欄位	Gatling 證據來源
Source	simulation code、HTML report、dashboard link
Time range	test start / end
Query link	APM / metrics / logs 查詢連結
Data quality	feeder freshness、scenario coverage
Confidence	production similarity、runner capacity
Known gap	未覆蓋 flow、資料偏差、下游 mock 限制

Evidence package 的核心用途是讓 simulation 可回放。Reviewer 要能從 report 回到 injection profile、scenario code、feeder 與目標環境，才有辦法判斷一次壓測是容量訊號還是測試設計偏差。

核心取捨表

取捨維度	Gatling	k6	JMeter	Locust
語言 / DSL	Java / Kotlin / Scala DSL（JVM）	JavaScript（Go runtime）	GUI / XML test plan（JVM）	Python（coroutine / gevent）
Engine model	Async / non-blocking（Akka + Netty）	Async（Go goroutine）	Thread-per-user（同步）	Async coroutine
單機 RPS 上限	高（數萬 RPS）	高（數萬 RPS）	中（thread overhead）	中（GIL + coroutine）
Scenario 表達力	強（session / feeder / 條件分支內建）	中（JS function 自寫）	中（GUI 拖拉 + listener）	中（Python class + task）
Report quality	高（HTML report 內建、distribution / group 詳細）	中（CLI 摘要 + Grafana 串接）	中（GUI listener、不適合 headless）	中（web UI 即時、無 historical）
CI integration	強（Maven / Gradle / sbt + assertion gate）	強（CLI + JSON output）	中（CLI mode 可、但 GUI-first）	強（CLI + Python ecosystem）
Distributed	OSS 自建 / Enterprise 內建	k6 Cloud / OSS 自建	自建（master-slave）	自建（master-worker）
商業版本	Gatling Enterprise（前 FrontLine）	Grafana Cloud k6	無（純 OSS）	無（純 OSS）
適合場景	JVM 團隊、複雜 scenario、release gate、高 RPS efficiency	全棧團隊、CLI workflow、Grafana 生態	跨角色團隊、legacy test plan、protocol 多樣	Python 團隊、自訂 client、輕量 setup

選 Gatling 的核心訴求：JVM 團隊 + 複雜 scenario（session / feeder / 多 group）+ 高 RPS 單機效率 + HTML report 作為 release gate 證據。Java DSL 在 2024 後降低了 Scala 學習門檻、讓 Java/Kotlin 後端團隊不必再為了壓測導入 Scala。

進階主題

Gatling Enterprise（前 FrontLine）：商業版加 distributed injector cluster（跨 region / 跨 cloud 推大型負載）、live monitoring dashboard（real-time RPS / response time 趨勢、不用等 simulation 結束看 HTML）、long-term result storage（cross-run comparison、retention policy）、role-based access（QA / dev / SRE 不同權限）。對只跑單機 baseline 的團隊 OSS 已夠；要跑黑五 / 春晚級活動前壓測或多 region 同時施壓、需要 Enterprise 或自建 distributed topology。

Java DSL 取代 Scala 成主流（2022-2024）：Gatling 3.7（2022）正式釋出 Java DSL、3.9+ 文件 Java / Kotlin / Scala 三語並列、2024 後新教學多以 Java 為主。對 Java 後端團隊降低 onboarding 成本、但要注意 Gatling 2.x → 3.x 的 Scala syntax 不向後相容（scenario builder、http config、feed 用法都改寫）— 舊 simulation 升級時等於改寫一遍。

Distributed execution（OSS）：OSS 沒有內建 cluster orchestration、要靠 multiple injector + result aggregation：每台 injector 跑同一份 simulation（按 user count 切割）、結束後把 simulation.log 蒐集到一處用 gatling.sh 重跑 report stage。常見補位是用 Kubernetes Job + 共享 PVC、或直接走 Gatling Enterprise。

HTML report 與 release gate：simulation 跑完自動產 HTML report、含 response time percentile distribution（mean / p50 / p95 / p99 / max）、per-request-group breakdown、active users over time、error log。release gate 的標準做法是：CI job 跑 simulation → assertion gate fail 直接 break build → HTML report 存成 build artifact 供 reviewer 翻查、配合 Evidence Package 治理。

CI integration 模式：Jenkins / GitLab CI / GitHub Actions 都靠 mvn gatling:test / gradle gatlingRun / sbt gatling:test 入口、CI 設定 baseline simulation（每 PR 跑、catch regression）+ release simulation（release branch / nightly 跑、長時間 soak）。staging environment 跑壓測時要隔離噪音來源（其他 QA 流量 / cron job）、否則 RPS 數字會被污染。

排錯與失敗快速判讀

Scala learning curve 拖累進度：團隊沒人會 Scala、被 implicit / case class / pattern match 卡住 — 改用 Java DSL（3.7+）或 Kotlin DSL、保留 Gatling 表達力但去除 Scala 學習成本
Gatling 2.x → 3.x 升級 simulation 全紅：bootstrap import path / scenario builder API / feed 語法都變了 — 走 新專案直接 3.x、舊專案維持 2.x 雙軌、或安排專門 sprint 改寫、避免邊跑邊踩雷
JVM heap OOM / GC pause 拖慢 RPS：高 RPS 下 default heap 不夠、Young Gen GC 頻繁 — 調 -Xmx4G -Xms4G、用 G1GC / ZGC、監控 injector 的 GC log 跟 CPU、不是只看 target service
Injection profile 設計錯導致誤判 saturation：用 atOnceUsers(1000) 壓 closed model 但實際 traffic 是 open arrival、結果 knee point 找錯 — 看 production traffic shape、open model 用 constantUsersPerSec / rampUsersPerSec、closed model 才用 atOnceUsers
Single injector node 撞 client-side bottleneck：injector CPU / network / file descriptor / source port 用滿、看起來 target saturate 其實是 injector saturate — 監控 injector resource、scale out 成 distributed 或走 Enterprise
Feeder data 過期 / 分布偏差：用同一份 users.csv 反覆壓、cache hit rate 失真、production 看不到的 cache miss 路徑沒被測 — feeder 走 random / shuffle、定期 regenerate、覆蓋 long-tail key
HTML report 看起來綠但 production 出事：assertion gate 只設 average response time、p99 / error rate 沒設、release 後尖峰時段才爆 — assertion 要明確設 p95 / p99 + error rate threshold、不只看 mean

案例回寫

Gatling 適合回寫多步驟與多負載模型案例。它可接 9.C28 FanDuel 雙峰 workload 的直播與投注雙模型、9.C16 SeatGeek waiting room 的 token / admission flow、9.C17 BookMyShow ticketing 的售票流程壓力、9.C4 DraftKings Aurora 金融帳本的「比賽期讀爆量 + payout 時寫爆量」雙峰錯位，以及 9.C2 GR8 Tech 的「投注 / 結算 / 賠率更新」三類請求 group 的 injection profile。

這些案例的重點是 scenario 與 injection profile。Gatling 頁引用案例時，要把業務流程拆成 request group、session state、feeder、assertion 與 stop condition — 例如 DraftKings 雙峰錯位要寫成兩個 scenario 平行注入、各自有獨立 assertion budget。

下一步路由

Locust

Fri, 15 May 2026 00:00:00 +0000

Locust 的核心責任是用 Python 表達高度自訂的使用者行為與 protocol client。它適合 Python 團隊、需要自訂 client、需要 distributed worker、或 scenario 邏輯比工具內建 sampler 更複雜的壓測流程。

服務定位

Locust 適合把壓測寫成一般 Python 程式。當 workload model 需要呼叫 internal SDK、特殊 protocol、複雜資料準備、狀態機、隨機行為或自訂 client、Locust 可以直接使用 Python 生態來表達。底層架構是 master + worker 分散式 swarm、worker 之間用 Gevent green-thread（非 OS thread）模擬大量並發 user、master 負責 spawn rate、aggregation 跟 Web UI。

這個定位讓 Locust 接到 9.2 Workload Modeling 與 9.5 瓶頸定位流程。它能把特殊 client 與下游 dependency 放進同一個 user behavior、但也要求團隊處理 runner、資料與可重現性。

跟 k6（JS / Go runtime）比、Locust 用 Python 換到 自訂能力與生態相容、但代價是單 worker capacity 低、CPU bound 容易先打到自己。跟 JMeter（GUI / XML）比、Locust 偏 code-first 工程團隊、scenario 直接走 Git review、不靠 GUI plugin 拼裝。跟 Gatling（Scala DSL）比、Locust 換到 Python team 友善 + 既有 domain library 重用、但失去 JVM injection profile 的精細度與報表內建。

關鍵張力：Python 表達力 ↔ runner 效能上限。Python team 想 reuse domain library、staging fixture、API client 寫壓測腳本時 Locust 是首選；但要心裡有數 單 worker RPS 上限不高、超過幾千 RPS 就要靠 worker scale-out、不是調 Locust 本身。

適用場景

Python 團隊適合用 Locust 長期維護壓測。既有 domain library、API client、fixture、資料產生器與驗證 helper 都可以被壓測腳本重用。

自訂 protocol 適合用 Locust。HTTP 之外、如果服務需要 gRPC、WebSocket、binary protocol、message broker client 或自家 SDK、Locust 可以直接接 Python library。

Distributed load 適合用 Locust worker 擴展。當單機 Python runner 遇到 CPU 或 connection bottleneck、可以用 master / worker 拆開負載產生能力。

本章目標

讀完本頁、讀者能判斷：

Locust 在壓測 stack 中承擔哪一段（user behavior modeling / load generation / distributed swarm）、哪些要外接（Prometheus / Grafana 觀測 worker 自身、APM 看目標 saturation）
User class / task weight / on_start lifecycle 的 ownership 設計（誰寫 locustfile、誰 review、誰調 spawn rate）
Distributed master-worker 部署的容量規劃（單 worker user 上限、worker 數量計算、target RPS 對應 worker count）
何時用 Locust、何時走 k6 / JMeter / Gatling 的取捨

最短判讀路徑

判斷 Locust 壓測是否健康、最少看四件事：

User class 設計：每個 HttpUser / User subclass 是不是一個明確的 persona（mobile user / API client / admin user）、wait_time 是否反映真實使用者間隔（不是 0 拼最大 RPS、是 between(1, 5) 模擬 think time）、user state 是否在 instance 內封閉
Task 比例：@task(weight) 數字是否對應 production traffic mix（80% read / 15% write / 5% admin、不是每個 endpoint 等比例）、weight 是否走版控 review
on_start lifecycle：login / token fetch / session bootstrap 是否寫在 on_start（每個 user 一次）、不是寫在 @task 裡（每個 request 都重做）— 寫錯位置會讓 auth endpoint 變成主要 traffic
Distributed master-worker：worker 數量是否夠（單 worker 跑幾千 user 後 CPU 會先打死、不是目標服務先死）、master 是否獨立機器（master 也跑 user 時 aggregation 跟 Web UI 會卡）、--expect-workers 是否設、worker sync drift 是否觀察

四件事任一缺失、就是壓測證據可信度的待補項目。

日常操作與決策形狀

locustfile 結構：locustfile.py 是 Python module、定義 User / HttpUser subclass、每個 user 有 wait_time、若干 @task(weight) method、on_start / on_stop lifecycle hook。執行用 locust -f locustfile.py --host=https://target 起 Web UI、或 locust --headless -u 1000 -r 100 -t 10m 在 CI 跑無 UI 模式。locustfile 應該走 Git review、不是 GUI 改完就跑。

Task weight / wait_time 設計：weight 是 相對權重、不是百分比 —@task(8) + @task(2) 等於 80% / 20%。wait_time = between(1, 5) 在每個 task 之間等 1-5 秒、模擬 think time；若要拚最大 RPS 用 constant(0)、但同時要意識到這就不是 user behavior 模型、是 throughput probe。

on_start vs @task 的邊界：on_start(self) 每個 user instance 啟動時跑一次、適合做 login、token fetch、cache warm、fixture lookup；@task 是 user 行為主迴圈、每次選一個 task 跑。把 login 寫在 @task 是常見錯誤、會讓 IdP 變成主壓力來源、不是目標 API。

Gevent-based concurrency：Locust 用 gevent 的 green-thread 模擬大量 concurrent user、不是 OS thread。意義是單 worker 可以跑幾千個 user、但 CPU bound 工作（JSON serialization、加密、本地計算）會 blocking 整個 worker 的 event loop。gevent.monkey.patch_all() 要在 import 第一行、否則 socket / time / ssl 不會被 patch、blocking call 會卡死 swarm。

Distributed master-worker：單機到極限時開 distributed — locust --master 起 master、locust --worker --master-host=master.example.com 起 worker。Master 負責 Web UI、spawn rate 控制、result aggregation、stat 收集；worker 負責跑 user。Master 不該跑 user（會跟 aggregation 搶 CPU、stat 失真）。worker 數量計算：先單 worker 拉到 CPU 80% 看能撐多少 user、目標 user 數除這個值 + 20% buffer。

Custom load shape：除了固定 -u 1000、Locust 支援 LoadTestShape subclass 寫 時間軸負載曲線 — spike test（瞬間 0 → 5000 user）、ramp test（線性爬升）、wave test（週期性高低交替）、step test（階梯式增加）。tick() method 每秒回傳 (user_count, spawn_rate)。用 custom shape 才能模擬 9.C16 SeatGeek waiting room 那種 ticket drop 瞬間衝擊。

Prometheus exporter / 觀測：Locust 內建 stat 只是 in-memory 的 p50 / p95 / p99 / RPS、結束就消失。長期觀測接 locust-prometheus-exporter（或 --csv result.csv 自己抓）、把 metric 推到 Prometheus + Grafana。worker 自身的 CPU / memory / network 一定要同時觀測、不然分不出是目標 saturation 還是 worker 已死。

Locust Cloud（managed SaaS）：2024 後 Locust 推官方 Locust Cloud、託管 master + worker + result storage、付費換 ops 成本。自管 master-worker 對 CI / staging 是合理的；production 等級的 scale test（10k+ concurrent user）跑一次要拉幾十台 worker、用 Cloud 省 infra ops 是合理 trade-off。

核心取捨表

取捨維度	Locust	k6	JMeter	Gatling
腳本語言	Python（generic）	JavaScript (k6 runtime)	XML / GUI / Groovy	Scala DSL（也支援 Java / Kotlin）
Runtime	Python + Gevent green-thread	Go-based、單 binary、低 overhead	JVM、heavy	JVM、async actor model
單 worker capacity	中低（Python overhead、千級 user）	高（Go runtime、萬級 VU 單機）	中（JVM tuning 後可用）	高（Akka actor、效能好）
Distributed mode	內建 master-worker	內建 k6 Cloud / k6 Operator	內建 master-slave	Gatling Enterprise（前 FrontLine）
User behavior 彈性	高 — 一般 Python、任意 library	中 — JS 但 k6 runtime 受限	中 — GUI 拼裝 + plugin	中高 — Scala DSL 表達 simulation
Custom protocol	強 — 接任何 Python library	強 — 有 gRPC / WS / Kafka extension	強但繁瑣 — plugin 生態廣	中 — 主要 HTTP / WS
CI / headless	`--headless` 支援	CI-first design	non-GUI mode 支援	內建支援
Report / UI	Web UI 即時 + CSV 匯出	k6 Cloud / Grafana / 簡 stdout	GUI listener / HTML report	HTML report 內建、視覺豐富
學習曲線	緩（Python team）/ 陡（非 Python）	中 — JS-style scripting	緩（GUI）/ 陡（深度 tuning）	陡 — Scala 語法
適合場景	Python team + 自訂 behavior / client	DevOps + CI / 標準 HTTP / 高 RPS 單機	非工程角色協作 / legacy enterprise	JVM team + 精細 injection profile
退場成本	低 — Python 腳本可移植	中 — k6 runtime 綁定	中 — XML jmx 不易他移	中 — Scala DSL 綁定

選 Locust 的核心訴求：Python team + custom user behavior + 既有 domain library 重用、且能投入 worker scale-out 預算（單 worker capacity 低、要靠分散式補）+ scenario 走 Git review 不靠 GUI。標準 HTTP 高 RPS 單機壓測直接走 k6 更快、非工程角色協作壓測走 JMeter、JVM team 精細模擬走 Gatling。

進階主題

Distributed Locust 的 master-worker swarm：production scale test 通常需要 10-100 個 worker。實作要點：worker 之間不要共享 state、shared resource 由 master 統一發（用 zeromq message bus）；worker 加入 / 離開時 user 會 redistribute、避免 user index 當 unique key；worker 跨 region 跑時 latency 來自 worker → target 不只是 target 內部、要在 worker 本身的 region 對齊。

Custom load shape（spike / wave / step）：LoadTestShape.tick(self) return (user_count, spawn_rate) tuple 每秒被叫一次。Spike test：前 60 秒 0 user、第 61 秒瞬間衝 5000、模擬 9.C16 SeatGeek waiting room 的 admission storm。Wave test：sine wave 在 1000-3000 user 之間振盪、測 autoscaling 反應速度。Step test：每 5 分鐘加 1000 user、觀察哪一階開始降級。custom shape 是 Locust 比 k6 強的點之一。

跟 Prometheus exporter 整合：locust-prometheus-exporter 把 Locust stat 推到 Prometheus / Grafana、做長期 baseline、跨 test 比較、p99 退化偵測。實務上要在 dashboard 同時放 Locust 內部 stat + worker host metric + 目標服務 APM、三層 stack 起來才能判讀是 runner 還是目標 saturation。

Locust Cloud（managed SaaS）：2024+ 官方 SaaS、託管 master + worker + result + dashboard。trade-off：自管適合 CI / staging / 內網壓測（target 跑在內網時 Cloud 連不到）；Cloud 適合大規模一次性 scale test（拉 50 worker 跑 2 小時、跑完即停、不想自己 infra ops）。

操作成本

Locust 的主要成本是 runner overhead 與分散式治理。Python runner 的效能上限要用 worker scale-out 解決；壓測結論要同時檢查目標服務 saturation 與 worker 本身 CPU、connection、network 是否已成瓶頸。

腳本工程成本來自自由度。Python 可以很快寫出複雜行為、也容易把測試資料、randomness、side effect、sleep 與 exception handling 寫散；團隊要維持 scenario structure、fixture、logging 與 artifact 標準。

自訂 client 成本來自校正。使用 SDK 或 custom protocol client 時、要確認 client retry、timeout、connection pool 與 serialization 行為是否接近 production、避免 runner 模擬出不存在的壓力形狀。

排錯與失敗快速判讀

Worker CPU 100% 但目標服務閒：Python runner 先死、不是 target saturation — 加 worker 數量、或檢查 task 裡有沒有 CPU bound 的本地計算（大 JSON parse、加密、本地 fixture 生成）擠掉 event loop
Gevent monkey-patch gotcha：requests / psycopg2 / 自家 SDK 在第三方 library 內部 blocking call、整個 worker 卡住 — gevent.monkey.patch_all() 一定要寫在 import 第一行；無法 patch 的 C extension（如 native MySQL driver）改用 gevent-friendly client
RPS 達不到目標 / 看起來像 target 慢：實際是 worker connection pool 耗盡、或 worker 本身網卡飽和 — 觀測 worker 本身的 TCP socket 數、netstat ESTABLISHED、network throughput；不要直接 blame target
Distributed sync drift：worker 之間 user count 不平均、aggregation 顯示 RPS 抖動 — --expect-workers=N 確認 master 等所有 worker join 才開測；worker 跨 region 時 message bus latency 也會影響 sync
on_start 在 @task 裡跑：壓測啟動瞬間打爆 auth endpoint、看到 IdP latency 飆高以為是 target — 把 login / token fetch 移到 on_start、每個 user 只做一次
wait_time = 0 拼最大 RPS 結果結論奇怪：這已經不是 user behavior 是 throughput probe、p99 跟 production 對不上 — 改成 between(1, 5) 模擬 think time 或寫 custom shape
Web UI 卡 / master CPU 100%：master 同時在跑 user + aggregation — locust --master 跟 worker 拆機器、master 不跑 user

何時改走其他服務

需求形狀	改走
標準 HTTP / 高 RPS 單機 / CI-first	k6
非工程角色協作 / GUI 拼裝	JMeter
JVM team / 精細 injection profile	Gatling
極簡 HTTP probe / 命令列 one-shot	Vegeta
Production traffic replay / shadow	GoReplay / Service Mesh Mirroring
壓測結果回寫到效能工程 lifecycle	9.5 瓶頸定位流程、9.3 壓測工具選型

不在本頁內的主題

locustfile 完整語法 reference、User 跟 HttpUser 的 attribute 細節
Locust Cloud 計費跟 quota 細節（看官方 docs）
gevent 跟 asyncio 的取捨（Locust 選了 gevent、不在本頁討論替代）
壓測證據怎麼歸檔（看 9.7 evidence package 通則）

Evidence Package

Locust 結果應回寫到 evidence package。最小欄位包括 locustfile version、user class、task weight、spawn rate、worker count、client library version、target environment、p95 / p99、error rate、throughput、target saturation metric、known gap 與 owner。

欄位	Locust 證據來源
Source	locustfile、CSV / JSON result、dashboard link
Time range	test start / end
Query link	APM / metrics / logs 查詢連結
Data quality	user behavior coverage、fixture freshness
Confidence	worker capacity、client realism
Known gap	worker bottleneck、custom client 偏差、資料偏差

Evidence package 的核心用途是區分目標瓶頸與 runner 瓶頸。Locust 分散式測試要同時保存 worker 數量、worker 資源、spawn rate 與 client behavior、讓 reviewer 知道壓力是否真的打到目標服務。

案例回寫

Locust 適合回寫需要高度自訂 user behavior 的案例。它可接 9.C28 FanDuel 雙峰 workload 的投注行為模型、9.C16 SeatGeek waiting room 的 admission / token flow、9.C26 PayPay mobile payment messaging 的外部推送與下游 quota 模擬、9.C8 Niantic Pokémon GO 50x surge 的玩家移動 + 互動混合行為、以及 9.C18 Zoom COVID 30x surge 的會議建立 / 加入 / 離開行為混合。

這些案例的重點是 domain behavior。Locust 頁引用案例時、要把 case 轉成 user class、task weight、custom client、downstream mock 與 worker capacity、再把總 RPS 放回這些行為條件下判讀 — 例如 Pokémon GO 玩家行為跟一般 web user 完全不同（持續 GPS 上報 + 偶發互動）、不能直接用 HTTP RPS 衡量；SeatGeek waiting room 要寫 LoadTestShape 模擬 ticket drop 瞬間衝擊、不是穩態 RPS。

下一步路由

上游：9.2 Workload Modeling
上游：9.3 壓測工具選型
上游：9.5 瓶頸定位流程
平行：k6、JMeter、Gatling、Vegeta
跨類：GoReplay（production traffic replay 替代 synthetic load）
跨模組：4 Observability（worker 自身 + 目標 APM 雙觀測）
官方：Locust documentation

Vegeta

Fri, 15 May 2026 00:00:00 +0000

Vegeta 的核心責任是用簡潔 CLI 對 HTTP endpoint 產生固定 rate 負載，快速探測 latency、throughput、error rate 與 saturation。它適合單一 endpoint、少量 header / body 變化、快速 baseline、incident 後驗證與工程師本機或 CI 中的輕量壓測。

服務定位

Vegeta 是 Go 寫的 HTTP load testing CLI，核心模型是 constant rate attack：指定「每秒 N 個 request」就持續打 N rps、不會因 server 變慢就降速，跟「fire-and-wait」型工具（hey / wrk 預設 closed-loop）行為差異很大。constant rate 是 open-loop 模型 — 模擬真實流量「不會因服務慢而減少」的行為、所以 saturation 點才會明確浮現。

Vegeta 是 Unix philosophy CLI：targets 從 stdin 讀（可以 pipe 進複雜 generator）、binary report 從 stdout 出（可以 pipe 進 vegeta report / vegeta plot / vegeta encode）。這個設計讓 Vegeta 容易跟 shell pipeline / CI script 接合、但同時也決定它不適合表達多步驟 session。

跟 k6 比、Vegeta 走 CLI-first + open-loop constant rate、k6 走 JS scenario + threshold + CI artifact。Vegeta 適合「我要對這個 URL 打 200 rps 60 秒」的一次性壓測、k6 適合「我有 3 種 user journey、各占 40/30/30%、跑 ramp-up profile」的可維護 scenario。跟 hey 比、Vegeta 的 constant rate 是真的 open-loop、hey 的 -q 是 per-worker rate（worker 變慢整體就降速）— 探測 saturation 時 Vegeta 比較誠實。跟 wrk / wrk2 比、Vegeta 沒有 LuaJIT 那麼極致的單機壓測效能、但 binary report + vegeta plot + targets pipe 對日常工程師工作流更友善。

本章目標

讀完本頁、讀者能判斷：

何時用 Vegeta、何時走 k6 / hey / wrk / Gatling / Locust 的取捨
constant rate attack 的設計意涵（open-loop vs closed-loop、為什麼這對 saturation discovery 重要）
target file / rate / duration / report 四件套的 baseline workflow 跟 evidence package 對應
排錯時的常見陷阱：runner 端 TCP socket exhaust、open file limit、constant rate 跟 target server 限速 disconnect

定位

Vegeta 適合快速回答「這個 endpoint 在某個 rate 下表現如何」。當團隊需要先找出大概 knee point、驗證一個修補是否降低 latency、或在 CI 裡跑小型 performance smoke test，Vegeta 的 CLI workflow 很直接。

這個定位讓 Vegeta 接到 9.4 Saturation Discovery 與 9.5 瓶頸定位流程。它提供的是快速壓力探針，後續若要表達複雜 workload model，通常要轉向 k6、Gatling、Locust 或 JMeter。

最短判讀路徑

判斷一次 Vegeta 壓測是否有效、最少看四件事：

Target 描述完整性：targets file 是否包含 method / URL / headers / body、是否反映真實 request shape（含 auth header、content-type、representative payload size），缺一就會讓壓測結果偏離正式環境
Rate model 設計：選的是 constant rate（-rate=200/s）還是 ramp（用多段 attack pipe），constant rate 適合 saturation probe、ramp-up 要 wrap script 自己 stage、Vegeta 沒有原生 ramp profile
Report 解讀：vegeta report 給 mean / p50 / p95 / p99 / max latency + success rate + throughput，重點看 p99 跟 max 的距離 與 requested rate vs actual throughput 是否 disconnect — disconnect 表示 server / runner 端有人在限速
Duration vs warm-up：短 duration（< 30s）容易吃到 JIT / cache / connection pool warm-up 噪音，baseline 壓測 duration 至少 60s、且第一段 result 要 discard，否則 p99 會被前 5s 拉高

適用場景

單 endpoint saturation probe 是 Vegeta 的主要入口。工程師可以對 login、search、read API、feature flag endpoint 或 internal health-like endpoint 施加固定 rate，觀察 p95 / p99 與 error rate 何時開始上升。

Regression smoke test 適合用 Vegeta。CI 或 pre-release 可以用短時間固定 rate 測試，確認 hot path 沒有明顯退化，再把更完整的 scenario 交給 k6、Gatling 或 Locust。

Incident 後修補驗證適合用 Vegeta。當事故根因是某個 endpoint 的 query、cache miss、lock contention 或 timeout，修補後可以用相同 request set 重跑，快速比較 latency distribution。

選型判準

判準	Vegeta 的價值	需要補的能力
CLI 簡潔	本機、CI、shell workflow 容易接	長期報表與 artifact 標準化
固定 rate	探測 rate / latency 關係清楚	複雜使用者行為與 arrival pattern
HTTP 導向	API hot path 快速驗證	非 HTTP protocol 與 multi-step flow
快速 probe	適合 smoke test 與修補驗證	完整 workload model 與資料治理

CLI 簡潔價值來自低摩擦。當問題還在定位階段，工程師可以很快產生可重跑 command 與 target file，先取得 baseline，再決定是否需要完整壓測平台。

固定 rate 價值來自可比較。用相同 request set、rate、duration 與 target environment 重跑，可以讓修補前後的 latency distribution 有清楚對照。

跟其他工具的取捨

Vegeta 和 k6 的主要差異是 scenario 深度。Vegeta 適合固定 rate HTTP probe；k6 適合多步驟 scenario、threshold、CI artifact 與 browser-style flow。

Vegeta 和 JMeter 的主要差異是工具重量。Vegeta 適合快速 CLI；JMeter 適合 GUI、多 protocol、plugin 與企業測試資產。

Vegeta 和 Gatling 的主要差異是長期維護模式。Vegeta 用 command / target file 保持簡單；Gatling 用 simulation 維護複雜 flow 與 injection profile。

Vegeta 和 Locust 的主要差異是自訂能力。Locust 適合 Python user behavior 與 custom client；Vegeta 適合 HTTP endpoint 的直接壓力測量。

操作成本

Vegeta 的主要成本是 workload coverage 有限。它能快速測 endpoint，但多步驟 session、資料依賴、payment mock、queue side effect 與 realistic user journey 需要額外工具或腳本補上。

Artifact 成本來自命令可追溯性。每次測試要保存 rate、duration、targets、headers、body、環境、版本與結果檔；否則快速 probe 很容易變成不可比較的一次性觀察。

Runner 成本通常較低，但仍要檢查本機瓶頸。高 rate 測試時，產生負載的機器也可能先被 CPU、network、file descriptor 或 connection limit 卡住。

Evidence Package

Vegeta 結果應回寫到 evidence package。最小欄位包括 command、target file hash、rate、duration、workers、target environment、p95 / p99、max latency、error rate、throughput、target saturation metric、known gap 與 owner。

欄位	Vegeta 證據來源
Source	command、targets file、binary result、report
Time range	test start / end
Query link	APM / metrics / logs 查詢連結
Data quality	target set freshness、header / body correctness
Confidence	runner capacity、endpoint representativeness
Known gap	未覆蓋多步驟 flow、資料偏差、runner limit

Evidence package 的核心用途是讓快速測試可以比較。Vegeta 的結果通常很短，反而更需要保存 command 與 target set，讓下一次修補驗證能跑同一組條件。

核心取捨表

取捨維度	Vegeta	k6	hey	wrk / wrk2
負載模型	Open-loop constant rate（rps 不隨 latency 降）	Open-loop（k6 default）/ closed-loop（VU mode）	Per-worker rate（closed-loop 傾向）	wrk closed-loop / wrk2 open-loop
Scenario 深度	單 endpoint pipe target、多 endpoint 需 script	JS script、多步驟、staging / threshold / SLO 內建	單一 URL CLI flag	Lua script 可寫複雜邏輯但 idiom 較陡
輸出形式	Binary stream + `vegeta report/plot/encode`	stdout summary + JSON + 內建 dashboard	stdout 文字 summary	stdout 文字 summary、HdrHistogram
CI 整合	用 shell 包、自寫 threshold gate	內建 threshold / exit code、CI artifact 標準化	簡單 smoke、無 threshold	需自寫 wrapper
學習成本	低 — 幾個 flag 就上手	中 — 要寫 JS scenario	極低 — 一行 CLI	中 — Lua 加 HdrHistogram 概念
適合場景	修補驗證、CI smoke、saturation probe	完整壓測平台、SLO gate、多 scenario	一次性 ad-hoc 探測	極致單機壓測效能、低 overhead 量測

選 Vegeta 的核心訴求：工程師本機 / CI smoke / 修補驗證 / saturation probe 都要快速可重跑、且結果要可以保存比較；不需要完整 scenario 模型也不需要 GUI 報表。若團隊需要完整 user journey、threshold / SLO gate、長期 trend dashboard，直接走 k6 或 Gatling。

進階主題

Reporting 多輸出 format：vegeta report 預設 text summary、加 -type=hist[0,10ms,50ms,100ms,500ms] 給 latency bucket histogram、-type=json 給機器可讀 result、vegeta plot 出 HTML latency chart、vegeta encode -to=csv 轉成可進 spreadsheet / dashboard 的 CSV。binary result 檔可重複 decode 成不同 format，不用重跑壓測。修補驗證的標準作法是保留 results.bin、之後可隨時 re-render report。

Pipe attack workflow：Vegeta 的 stdin/stdout 都是 stream — 可以用 shell pipe 串接 jq 動態產 targets（jq -r '.urls[] | "GET " + .'）、用 vegeta attack | tee results.bin | vegeta report 同時寫檔跟即時看 summary、用 cat results-old.bin results-new.bin | vegeta report 比較兩次結果。這個設計讓 Vegeta 跟 incident drill / chaos test script 容易接合 — 修補 deploy 完跑一次 attack、result 直接 commit 進 git 當 evidence。

CI integration pattern：CI 裡 Vegeta 沒有 k6 那種內建 threshold，要自寫 gate — vegeta report -type=json results.bin | jq '.latencies.p99' 出 p99、bash 比較 budget、超標 exit 非零。把 targets.txt + attack.sh + expected-budget.json commit 進 repo、CI artifact 上傳 results.bin + plot.html，下次 regression 時可以 diff。

排錯與失敗快速判讀

Requested rate 跟 actual throughput disconnect（要 200rps 實際只跑 80rps）：runner 端先飽和、不是 server 飽和 — 看 vegeta attack stderr 是否報 socket: too many open files、檢查 ulimit -n（生產壓測 runner 至少設 65535）；或 server 端有限速 / rate limit / connection cap 把 request reject 在 TCP 層、Vegeta 看不到完整 response 就被卡
TCP socket exhaust（runner 端）：constant rate 模型下、若 server 回應慢、connection 會堆積、TIME_WAIT socket 爆 ephemeral port range — 用 -keepalive=true（預設）並調 net.ipv4.tcp_tw_reuse=1、或加 -connections=N 限制 connection pool 上限避免無限堆 socket
p99 / max latency 異常高、但 server-side metrics 看不到：runner 端 GC pause / CPU steal / network jitter 把 latency 量測污染 — 把 runner 移到跟 target 同 placement group / same AZ、確認 runner CPU 沒被其他 process 搶、duration 拉長到 5min 讓 outlier 變稀釋
Success rate 100% 但 server 已經爆：targets 沒帶 auth header / 打到 LB 而非 backend、所有 request 在前面就 200 / cache hit、server 根本沒收到壓力 — 檢查 target server access log 的 request count 跟 Vegeta requested rate 是否對得上
短時間壓測結果不穩定（同 command 跑兩次差很多）：duration 太短（< 30s）、warm-up 噪音占比太高 — 至少 60s、第一段 5-10s discard、若 endpoint 有 lazy initialization（cache / connection pool / JIT compile）先跑一段 warm-up attack 再正式量

案例回寫

Vegeta 適合回寫單 endpoint hot path 與修補驗證案例。它可接 9.C3 Coinbase ultra-low latency 的 sub-millisecond latency distribution 判讀、9.C25 Tubi feature store 的 p99 < 10ms lookup 驗證、9.C29 Lemino connection limit 的 RDB bottleneck 探測、9.C6 Tinder ElastiCache 的次毫秒 cache lookup 驗證，以及 9.C5 Amazon Ads DynamoDB 的 hot partition 探測。

這些案例的重點是快速定位與比較。Vegeta 頁引用案例時，要把 case 轉成 endpoint、rate、duration、latency budget、target saturation metric 與 runner limit — 例如 Coinbase 的 sub-ms 目標要求 Vegeta runner 必須跟 target 同 placement group、否則 runner 自身的網路 jitter 會吃掉觀測精度。