Siem on Tarragon

Splunk

Mon, 18 May 2026 00:00:00 +0000

Splunk 是 SIEM（Security Information and Event Management）的事實標準、大企業 / 金融 / 政府的 SOC 主流選擇。2024 年被 Cisco 收購、產品線維持獨立發展。它跟 Elastic Security / Datadog Security / Google Security Operations 的差異在 計費模型 + ecosystem maturity + detection content 深度、偵測能力本身相近 — Splunk 的 ingestion-based pricing 是業界最貴的 SIEM 計費模式、但 detection content 跟 SOC tooling ecosystem 也是最成熟的。

服務定位

Splunk 的核心定位是 任意 log source 的統一查詢平台、SIEM 是其上的 application layer（Splunk Enterprise Security app）。底層是 Splunk Enterprise（自管）或 Splunk Cloud Platform（SaaS）、頂層產品包含：Enterprise Security (ES) — premium SIEM app、含 correlation rule、Risk-Based Alerting、ITSI 整合；SOAR（前 Phantom）— security orchestration / automated response；UBA（User Behavior Analytics）— ML-based anomaly detection。

跟 Elastic Security 比、Splunk 走 deeper but more expensive — SPL 比 KQL / EQL 表達力更強、detection content（Splunk Security Content 公開 YAML rules）覆蓋廣、ES app 的 Risk-Based Alerting 是業界先驅；但 ingestion-based pricing 在 TB/day 級別會痛。跟 Datadog Security 比、Splunk 走 security-first、Datadog Cloud SIEM 是 observability platform 加上 security view；Datadog 適合 cloud-native + 中等規模、Splunk 適合 enterprise + 跨 on-prem。跟 Google Security Operations（前 Chronicle）比、Google Security Ops 走 fixed-price by data、massive scale、Splunk 是 per-GB 累進、超大規模反而 Google 划算。

關鍵張力：ingestion-based 計費 ↔ 偵測覆蓋率 是 Splunk 客戶最大的 trade-off。為了省錢選擇性 ingest log（只進 Windows Event Log 不進 Linux auth log、只進 prod 不進 dev）、結果 Storm-0558 / Uber MFA 那種跨來源 correlation 抓不到。要看清楚自己 容忍多少偵測盲點換多少預算。

本章目標

讀完本頁、讀者能判斷：

Splunk 在 SOC stack 中承擔哪一段（log aggregation / SIEM / SOAR / UBA）、哪些要外接（Vault 管 service token、IdP log 來源治理）
SPL / correlation rule / detection content 的 ownership 設計（誰寫、誰 review、誰調 false positive）
Ingestion pricing trap 的應對（log priority tiering、Cribl / Cribl Stream 做 pre-filter、Splunk SmartStore 把冷資料丟 S3）
何時用 Splunk、何時走 Elastic / Datadog / Google Security Ops 的取捨

最短判讀路徑

判斷 Splunk deployment 是否健康、最少看四件事：

誰能改 correlation rule：Splunk admin / ES admin / KV store admin 的人數、SPL search 跟 saved search 是否走版控（Git → git-fusion / Splunk Cloud Versioned Configs）、rule change 是否經 PR review
Ingestion 治理：哪些 source 進 Splunk（IdP audit log / cloud control plane log / endpoint log / network log / app log）、是否有 log priority tier（critical / standard / archive）、Cribl Stream 是否在前面做 pre-filter / routing
Detection content coverage：Splunk Security Content（公開 YAML rule library）有多少 enabled、是否跟 MITRE ATT&CK 對照、自家 custom rule 是否補 organization-specific anti-pattern
Alert quality / SOC handoff：alert volume per day、SOC analyst triage time、false positive rate、alert 是否進 SOAR playbook 自動處理低風險、跟 8 incident response 的 routing 是否定義

四件事任一缺失、就是 Detection Coverage and Signal Governance 邊界的待補項目。

日常操作與決策形狀

Ingestion architecture：log 進 Splunk 三種路徑 — Universal Forwarder / Heavy Forwarder（agent-based，自管 host）、HTTP Event Collector (HEC)（push log via HTTP endpoint、SaaS / serverless workload 預設）、Splunk Add-on for 各 cloud / SaaS（cloud-native log pull）。production 通常混用：endpoint 用 Universal Forwarder、cloud control plane 用 Add-on（AWS / GCP / Azure / Okta）、自家 app 用 HEC。在前面接 Cribl Stream 做 routing / filtering / sampling 是大型 deployment 的標準補位。

SPL（Search Processing Language）：類 Unix pipe 的 | 串接（index=ids sourcetype=auth | stats count by user | where count > 100）、表達力強但學習曲線陡。SPL 是 first-class concept、不只是查詢工具 — saved search 變 correlation rule、scheduled search 變 alert、accelerated search 變 data model 加速。SPL 寫得好不好直接決定 偵測規則品質 + 查詢成本。

Correlation rule / Notable Event：ES app 把 high-confidence finding 轉成 Notable Event、進 Incident Review queue。Correlation rule 的反例是 single-event alert（看到一個 SSH brute force attempt 就 alert、SOC analyst 一天看 10000 個沒意義）— production rule 應該是 time-bounded aggregation（過去 5min 內 100 個 brute force from same IP）+ cross-source correlation（brute force IP 同時出現在 cloud control plane access）。

Detection content lifecycle：Splunk Security Content 是 Splunk 維護的 OSS detection rule library、YAML format、跟 MITRE ATT&CK 對應。組織通常 先 import 全部 baseline、再選擇性 disable noisy 規則 + 新增 organization-specific 規則。Rule change 走 PR review、staging tenant 跑 24-48hr 觀察 false positive curve 才 promote 到 production。對應 Detection Engineering Lifecycle 的章節原則。

Risk-Based Alerting (RBA)：ES app 7.0+ 引入、給每個 user / asset 累積 risk score（取代逐 finding alert）、累積到 threshold 才 alert。處理 alert fatigue 的工程化做法：5 個 low-confidence signal 加總超過 threshold 比單一 high-confidence alert 更接近真實 attack pattern。對應 Alert Fatigue and Signal Quality。

SOAR integration：Splunk SOAR（前 Phantom）接 alert + playbook 自動執行 — 例如 leaked credential 自動 rotate（拉 Vault API）、suspect IP 自動加 firewall block（拉 Cloudflare WAF custom rule）、suspect user 自動 force MFA re-enroll（拉 Okta API）。playbook 進版控、定期 dry-run、不能黑箱 production fire-and-forget。

Ingestion pricing 治理：Splunk 按 ingestion volume（GB/day）計費、TB-scale deployment 年費千萬美元級別。實務治理：tier 1 log（IdP / cloud control plane / payment processor / DB audit）進 Splunk hot index、tier 2 log（app log / web access log）按 sampling / filtering 進 Splunk、tier 3 log（debug / verbose）走 SmartStore 到 S3 / GCS 冷儲存、或繞過 Splunk 直接打到 Elastic / data lake。Cribl Stream 在 forwarder 前 pre-filter 是業界標準作法、可省 30-50% ingestion cost。

SmartStore 跟冷熱分離：SmartStore 把 indexer 的 warm + cold bucket 放到 S3 / Azure Blob / GCS、indexer 只保留 hot data + cache。意義是 retention 從幾個月延長到幾年但 cost 不線性漲。production deployment 幾乎都該開、不開等於每年砸錢買 EBS。

核心取捨表

取捨維度	Splunk	Elastic Security	Datadog Security	Google Security Operations
計費模型	Ingestion-based（GB/day、累進）	Resource-based（node / cluster size）	Per-host + per-event（events/month）	Fixed price by data tier（PB-scale 划算）
學習曲線	陡 — SPL 表達力強但 idiom 多	中 — KQL / EQL 較直觀	緩 — 沿用 Datadog observability 語法	中 — YARA-L 是新語法但結構清楚
部署模型	Self-hosted (Splunk Enterprise) / SaaS (Cloud)	Self-hosted / Elastic Cloud / Serverless	SaaS only	SaaS only（Google Cloud）
Detection content	Splunk Security Content（最豐富、社群活躍）	Elastic Prebuilt rules + Sigma 支援	Datadog Security Rules（中等）	Google YARA-L 內建 + Google threat intel
SOAR / Response	Splunk SOAR（前 Phantom、業界先驅）	內建 Cases + Endpoint response（Elastic Defend）	Workflow Automation（基本）	SOAR 內建（前 Siemplify）
跨來源 correlation	強 — data model + SPL 支撐	強 — EQL sequence + Lucene	中 — log + metrics + trace 同 plane	強 — UDM normalization + cross-tenant
Multi-cloud	強 — Add-on 覆蓋三大雲	強 — Beats / Agent 跨雲	強 — Datadog Agent 跨雲	GCP-first、跨雲靠 Forwarder
適合場景	Enterprise + 跨 on-prem / 多雲、預算允許	OSS-friendly、中大型、Elastic stack 已用	Cloud-native、observability 已用 Datadog	超大規模 ingestion、Google 雲 + 多雲 SOC
退場成本	高 — SPL / detection content / dashboard 量多	中 — Sigma / Lucene 較可移植	中	中

選 Splunk 的核心訴求：Enterprise scale + 跨 on-prem + detection content 跟 SOC tooling ecosystem 成熟、且能投入預算（千萬美元級別 license + Cribl pre-filter + SmartStore 冷儲存治理）+ 有 SOC team 維護 correlation rule 跟 SOAR playbook。中等規模 cloud-native 直接走 Datadog / Google Security Ops 更划算。

進階主題

Enterprise Security app 的 Risk-Based Alerting：RBA 把「事件 → alert」改成「事件 → risk score → 累積 → alert」、是 alert fatigue 的工程化解法。實作要決定 risk decay window（多久後 risk score 衰減）、risk attribution（同一台 EC2 上多 user 的 risk 怎麼分）、per-asset vs per-user threshold。配對 Uber 2022 MFA Fatigue 的 lesson：單一 MFA fail 不該 alert、5min 內 50 個 fail + 新裝置 + 異常地理就是 high risk。

Common Information Model (CIM) + Data Model：Splunk CIM 把不同 source 的欄位 normalize 到統一 schema（authentication / network_traffic / web 等 data model）。意義是 SPL 跨 source 寫一次、不用為 Okta log / Azure AD log / CrowdStrike log 各寫一份。CIM 配合 Add-on 自動 mapping、organization 寫 custom source 需要自己定 CIM mapping。

Multi-tenant deployment：MSSP / 大型集團多 BU 共用一個 Splunk 部署、用 index（隔離 data）+ role / capability（隔離 access）+ App（隔離 dashboard / search）三層。注意 Splunk admin 在跨 tenant 場景是高權限角色、應該走 break-glass 流程 + audit。

Cisco 整合（2024+）：Cisco 收購後 Splunk 跟 Cisco XDR / Talos threat intel / Cisco Secure Endpoint 整合加速。對 Cisco-heavy 環境是 ecosystem 一致性增加；對非 Cisco 環境暫時影響有限、但長期 roadmap 會有 Cisco-specific 加值。

排錯與失敗快速判讀

Alert volume 爆炸 / SOC 看不完：correlation rule 寫成 single-event alert、或 false positive baseline 沒調 — 用 RBA 改 risk-based、staging tenant 跑 48hr 觀察再 promote
Detection coverage 出事故時才發現缺：critical log source 沒進 Splunk（為了省錢）— 補回 tier 1 log priority、用 Cribl Stream 對 tier 2 / 3 做 sampling 而非整批不 ingest
Ingestion cost 暴衝：新 source 加入沒 review、debug log 直接打進 Splunk — Cribl Stream 前置 + license usage dashboard alert + indexer ingestion quota
SPL search 慢 / 卡 search head：full-fidelity search on 1TB raw event、沒用 data model acceleration — 改用 accelerated data model、限定 time range、用 tstats 而非 stats
Correlation rule false positive 多：rule 寫得太寬、env-specific noise 沒 tune — staging tenant 跑 1 週統計 FP、tune threshold、加 lookup table 排除已知合法 source
SOAR playbook 黑箱 fire-and-forget：自動 disable account 結果誤殺 CEO — playbook 走 approval gate for high-impact action、defaults to containment not deletion
Splunk admin 太多 / 沒 break-glass：日常運維用 admin token、admin compromise blast radius 太大 — 收 admin 角色、改 power user + 特定 capability、break-glass 走 Vault

何時改走其他服務

需求形狀	改走
OSS-friendly / 預算敏感	Elastic Security
Cloud-native + observability 已用	Datadog Security
超大規模 ingestion + Google 雲	Google Security Operations
DLP / sensitive data discovery	Google DLP / Microsoft Purview
Endpoint detection 為主	CrowdStrike Falcon / Microsoft Defender for Endpoint
Pre-filter / log routing	Cribl Stream（前置 forwarder、不是替代 SIEM）
Incident routing	8 事故處理 vendor 清單

不在本頁內的主題

SPL 完整語法 reference、saved search 跟 macro 進階用法
Splunk Cloud Platform vs Splunk Enterprise 的功能對照細節
Splunk Observability Cloud（前 SignalFx 收購、跟 Datadog 直接競爭、屬 observability 不屬 security）
ITSI（IT Service Intelligence）— 屬 ITSM / observability、不在資安範圍
SOAR playbook 的具體實作（Phantom Python SDK）

案例回寫

Splunk 在 07 案例庫沒有直接 vendor-level 事件、但所有 detection-related case 都是 SIEM 偵測覆蓋率的對照：

案例	跟 Splunk 的關係（對照啟示）
Uber 2022 MFA Fatigue	MFA 請求密度應是 Splunk correlation rule first-class signal、5min window count > N 直接 alert + RBA 升級高風險 user score
Microsoft Storm-0558 Signing Key Chain	跨租戶 token 異常驗證需 Splunk Add-on for Azure AD + cloud control plane log 同時 ingest、跨來源 correlation 才能秒級偵測
Snowflake 2024 Credential Abuse	資料平台 query volume + 跨 schema scan + 來源 IP 異常的複合 correlation rule、不只看 audit log 也要 query metrics correlation
SolarWinds 2020 Sunburst	簽章驗證通過但 runtime 行為異常需 endpoint log + network log correlation、不靠 IoC-only 規則
Detection Engineering Lifecycle (section)	Splunk Security Content + 自家 custom rule 走 propose → staging tune → promote → review 的工程 lifecycle、不是 console 直改
Alert Fatigue and Signal Quality (section)	RBA 是工程化解 alert fatigue、不是「忽略低風險」、要設 risk decay + threshold tuning lifecycle

下一步路由

上游：7.13 偵測覆蓋率與訊號治理、Detection Engineering Lifecycle
平行：Elastic Security、Datadog Security、Google Security Operations
下游：Google DLP / Microsoft Purview（DLP signal 進 Splunk）
跨類：Okta（IdP log source）、HashiCorp Vault（SOAR playbook 拉 API）、Cloudflare WAF（WAF log + auto-block）
跨模組：8 事故處理 vendor 清單（Notable Event → IR routing）、4 observability（log pipeline 共用）
官方：Splunk Documentation

Elastic Security

Mon, 18 May 2026 00:00:00 +0000

Elastic Security 是 Elastic Stack（Elasticsearch + Kibana + Beats / Agent）上的 SIEM + EDR + Cloud Security 套件、OSS 起源、現屬 Elastic 商業版的 Solution。它跟 Splunk / Datadog Security / Google Security Operations 的差異在 計費模型 + 查詢語言模型 + ecosystem 開放度、偵測能力本身相近 — Elastic 走 resource-based pricing（按 cluster size 而非 ingestion volume）、且提供 KQL / EQL / Lucene / ES|QL 四種互補的查詢語言。

服務定位

Elastic Security 的核心定位是 Elastic Stack 上的 security solution、底層是 Elasticsearch（資料層）+ Kibana（查詢與 UI 層）+ Fleet / Elastic Agent（採集層）、頂層產品分三條：Elastic SIEM（log aggregation + detection rule + Case + Timeline）、Elastic Defend（前 Endgame 收購而來、EDR + endpoint protection、跟 CrowdStrike / SentinelOne 同層）、Elastic Cloud Security（CSPM + CWP、雲端資源 misconfig 與 workload 防護）。

跟 Splunk 比、Elastic 走 OSS-friendly + resource-based pricing — TB-scale ingestion 不直接漲費用（要 scale node 但邊際成本遠低於 Splunk per-GB 累進）、Sigma rule 社群可直接 import 5000+ 規則；但 Splunk Security Content 跟 SOAR / RBA 等 detection content + SOC tooling 成熟度仍高一個量級。跟 Datadog Security 比、Elastic 跨 on-prem + 多雲、可自管也可 Elastic Cloud SaaS；Datadog 是 SaaS-only、適合純 cloud-native。跟 Google Security Operations 比、Elastic 多查詢語言（KQL / EQL / Lucene / ES|QL）、Google 走 YARA-L 單一統一語言、超大規模 ingestion Google 反而划算。

關鍵張力：多查詢語言模型 同時是 Elastic 的優勢跟負擔。EQL 寫 attack chain sequence 比 SPL correlation 更直接、KQL 過濾快、ES|QL 寫 aggregation 像 SQL 直覺、Lucene 處理 full-text；但 SOC team 要決定哪個 rule 用哪個語言、不能讓每個 analyst 各寫各的。

本章目標

讀完本頁、讀者能判斷：

Elastic Security 在 SOC stack 中承擔哪一段（log aggregation / SIEM / EDR / CSPM）、哪些要外接（Okta IdP log、Vault secret rotation）
KQL / EQL / Lucene / ES|QL 四種查詢語言的職責分工（誰用在哪種 rule、誰負責教育 SOC）
Resource-based pricing 的治理（cluster sizing、hot-warm-cold tier、Searchable Snapshots、Elastic Cloud Serverless）
何時用 Elastic、何時走 Splunk / Datadog / Google Security Ops 的取捨

最短判讀路徑

判斷 Elastic Security deployment 是否健康、最少看四件事：

誰能改 detection rule：Elastic Security app 的 rule editor 權限、detection-rules repo（Elastic 官方 OSS rule 庫）有沒有 fork 進組織版控、rule change 是否走 PR review + staging space 驗證
採集治理：Fleet 統一管 Elastic Agent policy / 還是散落 Beats（filebeat / metricbeat / auditbeat / winlogbeat）各自設定、log source 是否分 hot / warm / cold tier、Searchable Snapshots 是否開
Detection content coverage：Elastic Prebuilt rules + Sigma 社群規則 import 多少 enabled、是否跟 MITRE ATT&CK 對照、EQL sequence 規則覆蓋多少 attack chain pattern
Alert quality / SOC handoff：alert volume per day、Case 跟 Timeline 是否進入日常 SOC workflow、ML anomaly job 是否在線 + threshold 是否 tuned、跟 8 incident response 的 routing 是否定義

四件事任一缺失、就是 Detection Coverage and Signal Governance 邊界的待補項目。

日常操作與決策形狀

Ingestion architecture：log 進 Elastic 三種主路徑 — Elastic Agent + Fleet（現代部署的預設、單一 agent 收 system / endpoint / cloud / app log、中央 Fleet server 統一管 policy）、Beats（filebeat / metricbeat / auditbeat / winlogbeat 等專用 agent、Fleet 推出前的傳統做法、現在持續支援但建議遷移到 Elastic Agent）、Logstash（pipeline-style ETL、用在 enrich / filter / route 複雜場景）。production 通常 Elastic Agent + Fleet 為主、Logstash 補 ETL 缺口。

KQL / EQL / Lucene / ES|QL 的職責分工：四種查詢語言各有 first-class 場景。KQL（Kibana Query Language）是 Kibana 預設過濾語法、user.name : "alice" and event.action : "logon-failed"、簡單直觀、適合 dashboard / Discover 過濾。EQL（Event Query Language）做 sequence pattern matching、sequence by user.name [authentication where event.outcome=="failure"] [authentication where event.outcome=="success" and source.geo.country != "TW"]、表達 attack chain 比 SPL correlation 更直接。Lucene 是底層 full-text query、特殊需要時直接寫。ES|QL（Elasticsearch Query Language、2024+）是新版 SQL-like、FROM logs-* | WHERE event.category == "authentication" | STATS count = COUNT(*) BY user.name、寫 aggregation 直覺；屬新語言、production 採用 cadence 還在跟進中。

Detection rule 種類：Elastic Security 的 rule type 是六種 first-class 概念、不是只有「query rule」一種 — Query rule（KQL / Lucene 觸發）、EQL rule（sequence pattern）、Threshold rule（聚合超過閾值、例如同一 IP 5min 內 login fail > 100）、ML rule（綁 Elastic ML anomaly job、anomaly score 超過閾值觸發）、New term rule（首次出現的 entity、例如某 user 第一次從某國登入）、Indicator match rule（事件 enrich 比對 threat intel feed、IoC hit 觸發）。production rule 經常組合多種 — query rule 做粗篩、EQL rule 抓 sequence、threshold + ML 補 baseline anomaly。

Sigma rule import：Sigma 是 OSS 通用 detection rule 格式（YAML、跨 SIEM 可移植）、社群維護 5000+ 規則。Elastic 支援直接 import Sigma rule 轉成 Elastic detection rule、是 Elastic 拉開跟商業 SIEM 距離的 OSS 槓桿。實務做法：先 import Sigma baseline + 全部走 staging space 跑 false positive 觀察、再 enable 到 production；不要直接全 enable、Sigma rule 跨 SIEM 通用所以 environment-specific tuning 必須自己做。

Case + Timeline：Case 是 incident 容器、聚合 alert + comment + assignment + status；Timeline 是 SOC analyst 的 investigation workspace、可以 pin event / annotate / link related alert、產出 investigation narrative。兩者組合是 Elastic 的 SOC workflow first-class、不是外掛 — 對應 Splunk ES 的 Notable Event + Incident Review、但 Elastic 走 OSS 化、Case 可 export markdown 進 ticketing。

Elastic Defend（EDR）：前 Endgame 收購整合、提供 endpoint detection + prevention（malware block / ransomware protection / behavior detection）、跟 CrowdStrike Falcon / SentinelOne 同層。Elastic Defend 跑在 Elastic Agent 內、policy 從 Fleet 推。實務上多數 SIEM 客戶不會用內建 EDR、而是外接專業 EDR feed 進 Elastic SIEM；但 OSS-friendly + 預算敏感的中型客戶可以直接整合到一個 stack。

Cross-cluster search：跨多個 Elastic cluster 統一查詢（remote_cluster:index-name）、適合 multi-region / multi-tenant SOC、不需要把所有 log 搬到單一 cluster。對應 Splunk Cloud federated search。實務場景：歐洲 GDPR 資料留在 EU cluster、美國 cluster query 過去做 incident investigation 而不複製資料。

ML jobs（anomaly detection）：Elastic ML 內建 unsupervised anomaly detection、pre-built ML job library 覆蓋 SOC 常見場景（user behavior baseline、host login pattern、port scan detection、rare process）。ML rule 綁 ML job、anomaly score 超過閾值觸發 detection rule。對應 Splunk UBA、但 Elastic ML 是 stack 內建、不是 add-on app。

Resource-based pricing 治理：Elastic Cloud 按 cluster size（node count × node size）計費、不按 ingestion volume — 意義是 ingest 多 log 不直接漲費用、但要 scale node 維持查詢效能。實務治理：hot tier（最近 7-30 天、SSD 高效能 node）、warm tier（30-90 天、低 IO node）、cold tier / frozen tier（90 天以上、Searchable Snapshots on S3 / GCS、查詢慢但成本極低）。對應 Splunk SmartStore、但 Elastic frozen tier 把 retention 從幾個月延長到幾年、cost 不線性漲。

核心取捨表

取捨維度	Elastic Security	Splunk	Datadog Security	Google Security Operations
計費模型	Resource-based（node / cluster size）	Ingestion-based（GB/day、累進）	Per-host + per-event（events/month）	Fixed price by data tier（PB-scale 划算）
查詢語言	KQL / EQL / Lucene / ES\|QL 四種互補	SPL（單一強表達力）	Datadog Query（沿用 observability 語法）	YARA-L（統一、結構清楚）
Sequence 表達	EQL `sequence by` 直接表達 attack chain	SPL transaction / streamstats	log + metrics + trace 同 plane	UDM + YARA-L 多事件 rule
部署模型	Self-hosted / Elastic Cloud / Serverless	Self-hosted (Enterprise) / SaaS (Cloud)	SaaS only	SaaS only（Google Cloud）
Detection content	Elastic Prebuilt rules + Sigma 社群 5000+	Splunk Security Content（最豐富、社群活躍）	Datadog Security Rules（中等）	Google YARA-L + Google threat intel
EDR 整合	Elastic Defend 內建（前 Endgame）	外接 CrowdStrike / Defender	Workload Security（容器 focus）	外接（透過 forwarder）
SOAR / Response	Cases + Endpoint response（Elastic Defend）	Splunk SOAR（前 Phantom、業界先驅）	Workflow Automation（基本）	SOAR 內建（前 Siemplify）
適合場景	OSS-friendly、中大型、Elastic stack 已用	Enterprise + 跨 on-prem、預算允許	Cloud-native + observability 已用 Datadog	超大規模 ingestion、Google 雲 + 多雲 SOC
退場成本	中 — Sigma / Lucene / EQL 部分可移植	高 — SPL / detection content / dashboard 量多	中	中

選 Elastic 的核心訴求：OSS-friendly 文化 + resource-based pricing 友善 + Elastic Stack 已作為 observability 在用、團隊有能力跨四種查詢語言（或至少把 EQL 跟 KQL 雙語分工清楚）、能接受 detection content 跟 SOAR 成熟度 trade-off。TB-scale ingestion 時 Elastic 比 Splunk 省 60-80% license cost 是最大誘因、但要算進 cluster sizing 跟 SRE 維運的隱形成本。

進階主題

EQL sequence pattern（時序攻擊鏈）：EQL 的 sequence by 是 Elastic 表達 attack chain 的 first-class 武器、比 SPL correlation 直接。例如 MFA fatigue 寫成 sequence by user.name with maxspan=5m [authentication where event.outcome=="failure"] [authentication where event.outcome=="failure"] [authentication where event.outcome=="success" and source.ip != known_ip]、序列邏輯直接表達。配對 Uber 2022 MFA Fatigue lesson：MFA fail 序列 + 新裝置 success 直接觸發。

Elastic Defend endpoint response：除偵測外、Defend 支援 host isolation（隔離受感染 endpoint 但保留 SOC 連線）、process kill、file quarantine 等 response action、直接從 Kibana Security app 觸發。對應 CrowdStrike Real Time Response。production 採用前要設 approval gate、避免 SOC analyst 誤觸動 production server。

CSPM / CWP（Elastic Cloud Security）：CSPM（Cloud Security Posture Management）對 AWS / GCP / Azure 帳號做 misconfig 掃描（S3 bucket public、IAM over-permission、security group 0.0.0.0/0）、對照 CIS Benchmark；CWP（Cloud Workload Protection）對 Kubernetes workload 跑 runtime detection。屬較新的功能、跟 Wiz / Lacework 等專業 CNAPP 比覆蓋還在追趕。

Cross-cluster search 跨環境 federated query：multi-region SOC 的 first-class 工具 — query 寫 FROM logs-auth-*, eu-cluster:logs-auth-*、Elastic 自動路由跨 cluster。實務注意：跨 cluster query 延遲較高、要設 timeout；資料合規（GDPR）必須留意 query 結果是否包含跨境資料、不是搬資料但 query 結果回傳算不算傳輸要法務確認。

Sigma 規則社群：Sigma 是 OSS detection rule 通用格式、Elastic 是 Sigma 主力使用者（內建 importer + Elastic 工程師參與 Sigma upstream）。實務做法：fork SigmaHQ repo 進組織版控、CI pipeline 自動轉 Sigma → Elastic detection rule、staging space 跑 false positive curve、promote 到 production；不要每次 manually import。

Elastic Cloud Serverless（2024+）：新模型、按 workload type（search / observability / security）計費、不再按 cluster size — 減少 sizing 決策、autoscaling 由 Elastic 託管。屬新模型、production 採用 cadence 還在跟進中、適合 greenfield 部署或 PoC、existing cluster 遷移 roadmap 還在演進。

排錯與失敗快速判讀

Alert volume 爆炸 / SOC 看不完：Sigma rule 全 enable 沒 tune、或 threshold rule 閾值太低 — staging space 跑 1 週統計 FP、tune threshold、加 exception list 排除已知合法 source、ML rule 補 user-specific baseline
EQL sequence rule 跑不動 / timeout：sequence span 太長（24h）或 by field cardinality 太高、查詢成本爆炸 — 縮短 maxspan、限定 index pattern、加 pre-filter 條件
Cluster 查詢慢 / Kibana 卡：hot tier 塞太多舊資料、沒做 hot-warm-cold tier 分層 — 開 ILM（Index Lifecycle Management）policy 自動 rollover、warm tier 用便宜 node、cold / frozen 走 Searchable Snapshots
Fleet agent enrollment 失敗：Fleet server 跟 Elasticsearch 之間網路 / 憑證 / token 問題 — 檢查 Fleet server health、確認 enrollment token 未過期、agent log 看 specific 錯誤
Sigma rule import 後大量 FP：Sigma rule 是 cross-SIEM 通用、沒有 environment-specific exclusion — 不要全 enable、staging tune 後再 promote、加 exception list（known scanner IP / 內部測試帳號）
Resource-based pricing 超預算：node 過度 scale 或 hot tier 留太多 — 開 hot-warm-cold ILM、把 retention 超過 30 天的 index 推到 frozen tier on S3、Searchable Snapshots 是預設應該開
ML job anomaly score 不準：training data 包含已 compromise 期間、baseline 被汙染 — 確認 training window 在乾淨期、定期重訓、配 detection rule 用 anomaly_score > 75 而非 > 50

何時改走其他服務

需求形狀	改走
Enterprise + detection content 最豐富	Splunk
Cloud-native + observability 已用 Datadog	Datadog Security
超大規模 ingestion + Google 雲	Google Security Operations
DLP / sensitive data discovery	Google DLP / Microsoft Purview
Endpoint detection 為主、不要全 stack	CrowdStrike Falcon / Microsoft Defender for Endpoint / SentinelOne
CNAPP 為主（雲端 posture + workload）	Wiz / Lacework / Prisma Cloud（Elastic Cloud Security 較新）
Incident routing	8 事故處理 vendor 清單

不在本頁內的主題

KQL / EQL / ES|QL 完整語法 reference、Lucene query DSL 進階用法
Elasticsearch index sharding / replica / ILM tuning 細節（屬 observability / 資料工程範圍）
Elastic Observability（APM / logs / metrics）— 屬 observability 不屬 security
Elastic Cloud Serverless 詳細 sizing 與 pricing 模型（2024+ 新模型、變動中）
Elastic Stack 自管的維運（cluster upgrade、Kibana plugin 開發）

案例回寫

Elastic Security 在 07 案例庫沒有直接 vendor-level 事件、但所有 detection-related case 都是 SIEM 偵測覆蓋率的對照：

案例	跟 Elastic Security 的關係（對照啟示）
Uber 2022 MFA Fatigue	Elastic EQL `sequence by user.name [auth fail count > 50 in 5min] [auth success from new device]` 直接表達 MFA fatigue pattern、Sigma 社群有現成規則可 import 起步
Microsoft Storm-0558 Signing Key Chain	跨租戶 token 異常驗證需 Elastic Cross-cluster search 跨 Azure AD log + GCP audit log + 自家 app log 同時 query、不需先搬資料
3CX 2023 Desktop App Supply Chain	Elastic Defend 直接看到 desktop app process spawn + 異常網路 callback、不需外接 EDR feed；EQL `sequence` 抓 process → DNS → C2 行為鏈
Detection Engineering Lifecycle (section)	Elastic rule 走 `detection-rules` repo（OSS、Elastic 官方維護）+ Sigma fork + staging space + promote 工程化 lifecycle、不是 Kibana UI 直改
Alert Fatigue and Signal Quality (section)	Elastic 沒有 Splunk RBA 對應、用 ML anomaly rule + threshold rule severity + Case grouping 三層降噪、要設 ML job 重訓 lifecycle

下一步路由

上游：7.13 偵測覆蓋率與訊號治理、Detection Engineering Lifecycle
平行：Splunk、Datadog Security、Google Security Operations
下游：Google DLP / Microsoft Purview（DLP signal 進 Elastic SIEM）
跨類：Okta（IdP log source）、HashiCorp Vault（secret rotation API）、Cloudflare WAF（WAF log + Sigma rule 對接）
跨模組：8 事故處理 vendor 清單（Case → IR routing）、4 observability（Elastic Stack 共用 log pipeline）
官方：Elastic Security Documentation、detection-rules repo

Datadog Security

Mon, 18 May 2026 00:00:00 +0000

Datadog Security 是 Datadog observability platform 上的 security 套件、跟 Datadog logs / metrics / APM / infrastructure 共用同一個 control plane 與 data plane。它的設計起點不是 SIEM、是 把資安訊號當成 observability 的一個維度：alert 不只看 log、可以同時 pivot 到 APM trace、infra metrics 與 host context。這個定位決定了它的優勢（cloud-native + 混合 incident 偵測）與限制（SaaS-only + 計費隨 host 量線性漲、不適合 on-prem-heavy 或預算敏感場景）。

服務定位

Datadog Security 由四個 product 構成、共用 Datadog Agent 與 backend：Cloud SIEM（log-based detection、跟 Splunk Enterprise Security 同類）、Cloud Security Management (CSM) — 涵蓋 CSPM（cloud config posture）與 Cloud Workload Security (CWS)（container / Linux runtime via eBPF）、App and API Protection (AAP、前 ASM) — RASP-style 在 app runtime 收 attack signal、Sensitive Data Scanner — scan log 中的 PII / credential 並 redact。

跟 Splunk 比、Datadog 走 observability-first + security 是 view、Splunk 是 security-first。Splunk 在 enterprise SOC tooling 深度（SOAR playbook、RBA、CIM data model）與跨 on-prem 部署上更成熟、Datadog SaaS-only 但跟 APM / Infra 同 plane、混合 incident（latency 異常是攻擊還是容量？）的判讀路徑更短。跟 Elastic Security 比、Elastic 可跨 on-prem + OSS、Datadog 只給 SaaS；Elastic 要自己整合 observability 訊號、Datadog 出廠就有。跟 Google Security Operations 比、Google 走 fixed-price by data、PB-scale 划算、Datadog 隨 host 線性漲、中等規模友善但破千 host 後 cost 曲線變陡。

關鍵張力：observability 與 security 同 plane 是 Datadog 最大賣點、也是 cost 風險來源。host count 跟 events/month 同時是 observability 跟 security 的計費基準、security 加上去後 bill 不會獨立 — 預算要從 整個 Datadog 帳單 看、不是 security 單列。

本章目標

讀完本頁、讀者能判斷：

Datadog Security 在 SOC stack 中承擔哪一段（log SIEM / CSPM / 容器 runtime / WAF-runtime / log DLP）、哪些要外接（Vault、Okta IdP log、edge WAF）
observability + security 同 plane 的優勢何時成立、何時是 vendor lock-in 風險
Cloud SIEM 計費（events/month + indexed）跟 Standard / Flex Logs retention tier 的成本治理
何時用 Datadog、何時走 Splunk / Elastic / Google Security Ops 的取捨

最短判讀路徑

判斷 Datadog Security 部署是否健康、最少看四件事：

Datadog Agent coverage：agent 是否裝在所有 host / container / serverless wrapper、log forwarder 是否覆蓋 cloud control plane（AWS CloudTrail / GCP Audit Log / Azure Activity Log）、IdP（Okta）audit log 是否進來 — 缺一個就是 detection 盲點
Detection rule ownership：Cloud SIEM rule 是用內建還是 custom、custom rule 是否走 Git 版控（Terraform datadog_security_monitoring_rule）、staging 環境是否 dry-run 24-48hr 才 promote production
CSPM compliance check 治理：CIS / NIST / PCI baseline 開哪些、findings 是否進 ticket workflow、misconfig 修復 SLA 有沒有定義（critical 24hr、high 7d、medium 30d）
Events/month + Indexed Log 預算：Cloud SIEM 按 events/month + indexed event 計費、新加 source 前是否估算 ingestion impact、Standard / Flex Logs retention tier 是否依 log priority 分流

四件事任一缺失、就是 Detection Coverage and Signal Governance 邊界的待補項目。

日常操作與決策形狀

Datadog Agent 採集：log / metrics / trace / security event 走同一個 Agent、用 integration（150+）抓 cloud / SaaS / database / queue。security event 跟 observability event 在後端用 attribute tag（env、service、host、trace_id）關聯、查 incident 時可以從 log alert pivot 到同 trace_id 的 APM trace 看 attack 發生的 application context。

Cloud SIEM detection rule：rule 形式類似 SPL 的 query — source:okta @evt.name:user.authentication.auth_via_mfa @outcome:failure 加 signal aggregation（rolling window count、new value、anomaly detection、impossible travel）。內建 rule 跟 MITRE ATT&CK 對應、跟 Splunk Security Content 同類但 rule 數量較少；custom rule 走 Terraform provider 進版控、不在 UI 直改 production。

CSPM compliance check：scan AWS / GCP / Azure 配置 vs CIS / NIST 800-53 / PCI / SOC 2 baseline、發現 misconfig（public S3 bucket、overly permissive IAM、不安全 SG rule）。跟 Wiz / Prisma Cloud 同類但跟 Datadog Infra 同 dashboard、findings 可以直接看到 affected resource 的 metrics / log。優勢是 資安發現可以直接看業務影響、限制是 graph-based attack path（Wiz 強項）不及專業 CNAPP。

Cloud Workload Security（CWS）：用 Linux eBPF probe 在 kernel 層觀察 container / process behavior、偵測 cryptominer / privilege escalation / 異常 syscall / file integrity 變動。跟 Falco 同類但跟 Datadog Infra 同 plane、CWS alert 可以直接 pivot 到該 container 的 CPU / memory / trace。Linux eBPF 對 kernel 版本敏感、舊 kernel 部份功能不可用、production 前要確認 fleet kernel matrix。

App and API Protection（AAP）：RASP-style protection、Datadog APM library 在 application runtime 收 attack signal（SQLi / XSS / SSRF / 異常 traffic pattern）。跟 Cloudflare WAF / AWS WAF 不同層 — WAF 在 edge / CDN、AAP 在 app runtime 看到的是真實 request handler / DB query。兩者互補不互斥：edge WAF 擋 volumetric attack 跟已知 pattern、AAP 補 app-specific business logic abuse。

Sensitive Data Scanner：scan ingest 進來的 log、用內建或 custom pattern 偵測 PII / credential / payment card / API key、發現後可以 redact、quarantine 或 alert。是 DLP-lite — 比不上 Google DLP / Microsoft Purview 的 sensitive data discovery / classification / lineage 全套、但對 log 中誤洩 secret 的場景夠用、是 detection signal source 也是 DLP 補位。

Notebooks + Workflow Automation：Notebooks 是 incident investigation 用的 query workbook、混 log query + metric chart + APM trace + 註記、跟 Splunk Search 比較像 Jupyter notebook 的 SOC 版。Workflow Automation 是輕量 SOAR、接 PagerDuty / Slack / Jira / Webhook / Vault API、playbook 走 visual builder + Python。SOAR 深度不到 Splunk SOAR、但對中等規模 SOC（10-50 人）的常見 response 動作（rotate credential / block IP / open ticket）夠用。

Standard Logs / Flex Logs + retention tier：log 進 Datadog 後分 Indexed（hot、可全文搜尋、貴）、Flex Logs（warm、retention 長、查詢延遲較高、cost 1/3-1/5）、Archive（cold、丟 S3 / GCS、純儲存）三層。Cloud SIEM detection 跑在 indexed log 上、所以 哪些 log 走 indexed 直接決定 detection coverage 跟 bill。tier 1 source（IdP / cloud control plane / payment）必 indexed、tier 2 source（app log）按 sampling、tier 3（debug）走 Flex 或 Archive。

核心取捨表

取捨維度	Datadog Security	Splunk	Elastic Security	Google Security Operations
設計起點	Observability + security 同 plane	Security-first、log 統一查詢平台	Search-first、ELK stack 延伸	Massive scale ingestion、Google threat intel
計費模型	Per-host + per-event（events/month）	Ingestion-based（GB/day、累進）	Resource-based（node / cluster）	Fixed price by data tier（PB-scale 划算）
部署模型	SaaS only	Self-hosted / SaaS	Self-hosted / Cloud / Serverless	SaaS only（Google Cloud）
觀測整合	Native — log + APM + metrics + infra 同 query	需自接（Splunk Observability 另收）	需自接（Elastic Observability 另開）	弱 — 跨產品 federation
雲端 posture (CSPM)	內建（CSM）	第三方 add-on / Cisco 整合	第三方 / Wazuh	第三方 / Mandiant 整合
容器 runtime	內建 CWS（eBPF）	需 Falco / 第三方	Elastic Defend	需 Falco / 第三方
App runtime（RASP）	內建 AAP	需第三方	第三方	第三方
SOAR / Response	Workflow Automation（輕量）	Splunk SOAR（業界先驅）	Cases + Endpoint response	SOAR 內建（前 Siemplify）
適合場景	Cloud-native + 已用 Datadog + 中等規模 SOC	Enterprise + 跨 on-prem、預算允許	OSS-friendly、Elastic stack 已用	超大規模 ingestion、Google 雲

選 Datadog 的核心訴求：已經用 Datadog observability、cloud-native 為主、SOC 規模中等（10-50 人）、需要 observability + security 同 plane 的 incident 判讀路徑。on-prem 為主、預算敏感（host 量 1000+）、需要 enterprise SOAR / RBA 深度、走 Splunk；OSS-friendly、跨 on-prem、走 Elastic。

進階主題

Cross-product correlation（log + APM + metrics 同 trace_id）：Datadog 最特別的偵測形狀 — security alert 不只 log line、而是綁 trace_id 的 integrated incident view。例如 API endpoint 出現 SQLi 嘗試、Cloud SIEM 開 signal、同時 APM 看到該 request 的 DB query 跟 latency、infra 看到該 host 的 CPU。對「query latency 異常是不是被攻擊」這種混合 incident 偵測有結構性優勢、跟 Snowflake 2024 Credential Abuse 的調查路徑直接對應。

CWS Linux eBPF 行為偵測：eBPF probe 在 kernel 層、不需要 kernel module、不影響 process performance（< 1% overhead）。可以偵測的行為包括 file integrity（/etc/passwd 被改）、process tree（bash → curl → /tmp/payload 異常 chain）、network connection（容器對外連 cryptominer pool）、syscall pattern（ptrace 用於 process injection）。跟 Falco 同樣用 eBPF、差別是 Datadog CWS 不需要單獨部署 + 跟 Datadog 其他 signal 同 plane。

Datadog Threat Intelligence：內建 threat feed（malicious IP / domain / file hash）、自動標記 log / network event 命中 IoC。可以加自家 STIX/TAXII feed、不過深度比不上 Mandiant / Recorded Future / 專業 TI platform；中等規模 SOC 夠用、嚴重 APT 對抗場景要外接專業 TI。

跟 Datadog Incident Management 整合：security signal 可以直接開 Datadog Incident（內建 incident channel + timeline + post-mortem template）、跟 PagerDuty 同類但跟 observability 同 plane。對 資安事件升級成全公司 incident 的場景（Change Healthcare 2024 Operations Impact 那種規模）可以共用 incident commander 視角、不用兩套 timeline 拼起來。

排錯與失敗快速判讀

Cloud SIEM 偵測 lag / 沒 alert：events 沒進 indexed log（走了 Flex）、retention tier 設錯 — 檢查 log pipeline rule 是否把 security-critical source 標 indexed
Events/month 暴衝：debug log / verbose log 進 Cloud SIEM index、CWS event 量爆 — log pipeline 前置 filter（Datadog Observability Pipeline 或 Cribl）、CWS rule 收斂 noisy 行為
CSPM findings 100+ 沒人修：findings 沒進 ticket workflow、沒分 priority — 整合 Jira / ServiceNow、severity 對應 SLA、findings 老化超 30 天升級
CWS 在舊 kernel host 沒資料：eBPF feature 對 kernel 版本敏感（< 4.18 部份功能不支援）— 升級 kernel 或標記該 host 為 CWS-incompatible、補位用 host-based agent
AAP false positive 卡 user：RASP 在 app runtime 直接 block、誤殺正常 request — AAP 先走 monitor mode 1-2 週收 baseline、tune 後再轉 protect mode
Sensitive Data Scanner miss PII：custom pattern 沒寫對、log format 嵌套（JSON 內又是 JSON）— 用 sample log 跑 dry-run、scanner 跑在 ingest 階段不是 retroactive
Workflow Automation playbook 黑箱：自動 rotate credential 結果誤殺 prod service account — playbook high-impact action 走 approval gate、default 走 containment 不走 deletion

何時改走其他服務

需求形狀	改走
Enterprise + 跨 on-prem、預算允許	Splunk
OSS-friendly / Elastic stack 已用	Elastic Security
超大規模 ingestion + Google 雲	Google Security Operations
嚴格 DLP / 資料分類	Google DLP / Microsoft Purview
Cloud posture graph / attack path	Wiz / Prisma Cloud / Lacework
Edge WAF / volumetric attack	Cloudflare WAF / AWS WAF
Endpoint EDR	CrowdStrike Falcon / Microsoft Defender for Endpoint
Incident routing	8 事故處理 vendor 清單

不在本頁內的主題

Datadog Agent 完整 configuration reference、custom check 撰寫
Datadog observability（APM / RUM / Synthetics / DBM）細節 — 屬 4 observability 模組
Cloud SIEM rule 完整語法 reference
CWS eBPF probe 撰寫（custom rule via Agent Expression Language）細節
Datadog Incident Management workflow（屬 8 IR 模組）

案例回寫

Datadog Security 在 07 案例庫沒有直接 vendor-level 事件、但 observability + security 同 plane 的偵測形狀讓部份案例的調查路徑變短、值得對照：

案例	跟 Datadog Security 的關係（對照啟示）
Snowflake 2024 Credential Abuse	Query volume + 連接數 + CPU 負載異常是 Datadog 同 plane 的強項、Cloud SIEM rule + DBM metrics 同 query 不用 SIEM + 監控工具拼接
Change Healthcare 2024 Operations Impact	業務中樞事件的影響評估、APM + Infra 可秒級判斷 latency 異常源自資安 vs 容量、Datadog Incident 共用 IC 視角
Mailchimp 2023 Support Tool Abuse	APM span correlation 可看到單一 operator 短時間跨多 tenant access 的 trace pattern、log-only SIEM 看不到 application-level tenant 切換
Uber 2022 MFA Fatigue	Cloud SIEM detection rule 配 Okta MFA log + APM error rate correlation、不靠單一 log source
Detection Coverage and Signal Governance (section)	Standard / Flex Logs + retention tier 是 detection coverage 治理的工具、tier 1 source 必 indexed、tier 2 / 3 走 Flex / Archive

下一步路由

上游：7.13 偵測覆蓋率與訊號治理、Detection Engineering Lifecycle
平行：Splunk、Elastic Security、Google Security Operations
下游：Google DLP / Microsoft Purview（DLP signal 進 Datadog）
跨類：Okta（IdP log source）、HashiCorp Vault（Workflow Automation 拉 API）、Cloudflare WAF / AWS WAF（edge WAF log 進 Cloud SIEM、AAP 在 app 層補位）
跨模組：4 observability（同 Agent / 同 plane）、8 事故處理 vendor 清單（Datadog Incident → IR routing）
官方：Datadog Security Documentation

Google Security Operations

Mon, 18 May 2026 00:00:00 +0000

Google Security Operations 是 Google 雲端的 SOC 整合平台、2023 年起把前 Chronicle SIEM + 2022 收購的 Siemplify SOAR + 2022 收購的 Mandiant threat intel 三條產品線整合成單一品牌。它跟 Splunk / Elastic Security / Datadog Security 的差異在 資料規模假設 + 計費哲學 + threat intel 內建程度、偵測能力本身相近 — Google 的設計假設是 PB/day ingestion + Google 級基礎設施 + 固定費率 by data tier、跟 Splunk per-GB 累進的計費哲學完全相反。

服務定位

Google Security Operations 的核心定位是 為超大規模 SOC 設計的雲原生 SIEM + SOAR + threat intel 一體機、底層走 Google 自家 search infrastructure、上層由四個 first-class concept 撐起來：UDM（Unified Data Model、Google 自定 schema、所有 source 強制 normalize）、YARA-L（Google 自家 detection rule 語言）、Curated Detection（Google 維護的 detection rule 訂閱、客戶不需自己拉）、Mandiant Applied Threat Intel（事件期間自動 enrich + IoC push）。

跟 Splunk 比、Google 走 fixed-price by data tier + 強制 schema normalization — Splunk per-GB ingestion 計費在 PB-scale 會痛、Google 在 multi-PB 通常便宜 3-5 倍、但客戶要接受 UDM 強制 schema 跟 YARA-L 新語法。跟 Elastic Security 比、Google 是 SaaS-only + 大規模優化、Elastic 可自管 + OSS-friendly。跟 Datadog Security 比、Google 是 純 SOC 專用工具、Datadog 是 observability 平面上的 security view；Datadog 適合中等規模 + observability 已用 Datadog、Google 適合大規模 SOC + 不需要 observability 同 plane。

關鍵張力：fixed-price tier 在小規模反而不划算、PB-scale 才回本。組織要看清楚自己的 ingestion 量級 — TB/day 以下走 Datadog / Elastic 通常更便宜、TB-PB/day 之間是模糊地帶、PB/day 以上 Google 是少數能撐又便宜的選擇。Mandiant threat intel 跟 Gemini for Security 是 Google-only 的加值、但這兩個是 enhancement、不是選 Google 的主理由。

本章目標

讀完本頁、讀者能判斷：

Google Security Ops 在 SOC stack 承擔哪一段（log aggregation + SIEM + SOAR + threat intel 一體）、跟 Google Cloud IAM / Google Secret Manager 怎麼整合
UDM forced normalization 跟 YARA-L 對 detection 設計的影響（schema-first 而非 query-first）
Curated Detection + Mandiant Applied Threat Intel 在偵測 lifecycle 的位置（不是自己拉、是訂閱）
何時選 Google Security Ops、何時走 Splunk / Elastic / Datadog 的取捨

最短判讀路徑

判斷 Google Security Ops deployment 是否健康、最少看四件事：

Ingestion 邊界：哪些 source 進來（Forwarder / GCS bucket / Pub/Sub feed / Cloud-native API feed）、UDM normalization 是否覆蓋全部 source、自家 app log 的 parser 是否寫好
Detection 治理：誰能改 YARA-L rule、Curated Detection 開了哪些、自家 rule 是否走版控（Git → API push）、staging tenant 是否在 production 之前 sanity-check
Threat intel 流向：Mandiant Applied Threat Intel 是否啟用、Curated Detection 是否跟新 IoC 自動同步、IoC enrichment 是否回 alert 上下文
Response 流向：Siemplify SOAR 是否接 alert、playbook 是否進版控、跟 8 incident response 的 routing 是否定義

四件事任一缺失、就是 Detection Coverage and Signal Governance 的待補項目。

日常操作與決策形狀

Ingestion 路徑：log 進 Google Security Ops 有三種主路徑 — Chronicle Forwarder（agent-based、on-prem / VM、syslog / file tail）、Cloud Storage feed（log 先進 GCS bucket、Google 拉）、Pub/Sub feed（serverless / GCP 原生 push）、再加 Direct API feed（cloud SaaS 像 Okta / Azure AD / AWS CloudTrail 透過原廠 connector）。SaaS-heavy 環境通常以 Direct API feed 為主、on-prem 才需要 Forwarder。

UDM (Unified Data Model)：UDM 是 Google 自定的統一 event schema、所有 source（CloudTrail / Azure AD / Okta / endpoint / DNS）在 ingestion 時 強制 normalize 到 UDM 欄位（principal.user、target.resource、security_result.action 等）。跟 Splunk CIM 同概念、但 Splunk CIM 是 選擇性 mapping、Google UDM 是 forced normalization — 不寫 parser 就不能 ingest custom source。設計取捨：schema-first 讓跨 source query 一致、但客製 source 的 onboarding 變重。

YARA-L detection rule：Google 自家 detection rule 語言、跟 SPL / EQL 同類但結構更明示 — events { } 段定義 source pattern、match { } 段定義 join / time window、condition { } 段定義 threshold、outcome { } 段定義 risk score。比 SPL 的 pipe 風格更接近 關聯式宣告、特別適合表達 time-bounded sequence + cross-source join。Uber MFA 那種「5min 內 50 個 MFA fail + 新裝置 + 異常地理」用 YARA-L 直接寫成 sequence pattern 比 SPL 清楚。

Curated Detection：Google 維護的 detection rule 訂閱集合、跟 Splunk Security Content 同類但 Google 是 built-in subscription、客戶不需要自己拉 / merge — Google 自動跟 Mandiant threat intel 同步、新 IoC 發布後對應 rule 自動 enable。組織通常 先全部啟用 baseline、再選擇性 disable noisy 規則 + 補自家 custom YARA-L。

Applied Threat Intel (Mandiant)：事件發生時 Google 自動把 alert 裡的 IoC（IP / domain / hash）跟 Mandiant feed 對照、若命中已知 APT 活動就升級 risk score + 附上 Mandiant 報告。跟其他 SIEM 走第三方 threat intel feed 需要自己 maintain enrichment pipeline 不同、Google 走 vertical integration — 收購 Mandiant 後直接內建。

Siemplify SOAR：2022 收購 Siemplify 後整合進 Google Security Ops、playbook 處理 alert triage + 自動 response — 例如 leaked credential 自動 rotate（拉 Google Secret Manager API）、suspect user 自動 disable（拉 Okta / Google Workspace API）、suspect IP 自動加 firewall block（拉 Cloudflare WAF custom rule）。playbook 進版控、走 approval gate for high-impact action、不能黑箱 fire-and-forget。

Entity Graph：Google Security Ops 把 user / asset / IP / domain / hash 等實體做 graph、做 correlation + lateral movement detection。Snowflake 2024 那種「同一 credential / IP 跨多個 Snowflake account」的橫向擴散用 Entity Graph 直接視覺化關聯。

Google Cloud 整合：跟 Google Cloud IAM / Workload Identity Federation 整合度高 — GCP audit log 直接內建 connector、IAM policy change 直接 surface 成 alert 候選、跨 GCP project 的 federation 走 Google Cloud IAM 認證。非 GCP 環境（AWS / Azure / on-prem）一樣支援、但設定路徑比 Splunk add-on 略陡。

核心取捨表

取捨維度	Google Security Operations	Splunk	Elastic Security	Datadog Security
計費模型	Fixed price by data tier（PB-scale 划算）	Ingestion-based（GB/day、累進）	Resource-based（node / cluster size）	Per-host + per-event（events/month）
Schema 處理	UDM forced normalization	CIM optional mapping	ECS optional mapping	Tag-based、彈性高
Detection 語言	YARA-L（結構化 events / match / condition）	SPL（pipe-based、表達力強）	KQL / EQL	Datadog query
Detection content	Curated Detection 內建訂閱	Splunk Security Content（OSS、自拉）	Elastic Prebuilt + Sigma	Datadog Security Rules
Threat intel	Mandiant Applied Threat Intel 內建	需第三方 feed + 自家 pipeline	需第三方 feed	Datadog 內建 + 第三方
SOAR / Response	Siemplify SOAR 內建	Splunk SOAR（前 Phantom、業界先驅）	Cases + Elastic Defend	Workflow Automation（基本）
LLM-assisted	Gemini for Security 內建（2024+）	Splunk AI Assistant	Elastic AI Assistant	Bits AI
部署模型	SaaS only（Google Cloud）	Self-hosted / SaaS	Self-hosted / SaaS / Serverless	SaaS only
適合場景	PB-scale SOC、Google Cloud heavy、要 Mandiant	Enterprise + 跨 on-prem、預算允許	OSS-friendly、Elastic stack 已用	Cloud-native + observability 已用 Datadog
退場成本	中 — YARA-L 跟 UDM 是 Google-specific	高 — SPL / detection / dashboard 量多	中 — Sigma / Lucene 較可移植	中

選 Google Security Ops 的核心訴求：PB-scale ingestion + fixed-price 計費可預期 + Mandiant threat intel 內建 + Google Cloud 整合度。中等規模 / on-prem 為主 / 預算敏感 / 需要 observability 同 plane 的場景都更適合走 Splunk / Elastic / Datadog。

進階主題

Risk Score multi-signal aggregation：Google Security Ops 給每個 entity（user / asset）累積 risk score、跨多 rule 加總、超 threshold 才升級 alert。設計上跟 Splunk RBA 同類、但 Google 把 risk decay 跟 attribution 走 Entity Graph、跨 entity 關係的 risk 傳遞比較細。配對 Uber 2022 MFA Fatigue 的 lesson：MFA fail 累積 + 新裝置 login + 異常地理三個 signal 加總、單獨任一個都不該 alert。

Cross-tenant federated search：MSSP / 大型集團多 BU 可在 Google Security Ops 跨多個 tenant 做 federated search、單一 console 看跨組織 detection。權限走 Google Cloud IAM role assignment、跨 tenant admin 是高權限角色、走 break-glass + audit。

Applied Threat Intel + Curated Detection 同步：Mandiant 揭露新 APT 活動後、Curated Detection 對應 rule 自動 enable + Applied Threat Intel IoC 自動 push、客戶 SOC 不需要手動 onboard。SolarWinds 2020 揭露當下、Mandiant client 是少數能即時 enable 對應 detection 的 SOC。

Siemplify playbook 工程化：playbook 走 graph-based workflow（不是 linear pipeline）、可以 branching / approval gate / human-in-the-loop。Production rule 走 containment-first（disable session、不 delete account）+ approval gate for irreversible action。

Gemini for Security (2024+)：LLM-assisted investigation — natural language 問「過去 24hr 哪些 user 有異常 GCP API 行為」直接生成 UDM query、alert 自動 summarize + 提供 next step 建議。不取代 SOC analyst、但縮短 triage time。

排錯與失敗快速判讀

Custom source ingest 失敗：UDM parser 沒寫 / 寫錯、source 進不來或欄位 NULL — 補 parser、staging tenant 跑 sanity check、看 UDM event count by source 確認 normalization 通過
Detection 沒觸發 / 漏報：YARA-L 的 match { } 段 time window 寫太短、或 condition { } threshold 寫太高 — staging tenant 用歷史資料 backtest、tune window / threshold 後 promote
Alert volume 過多：Curated Detection 全開沒 tune、env-specific noise 沒 disable — 跟 Splunk 一樣走 staging 觀察 false positive curve、tune 或 disable 個別規則
Mandiant threat intel 沒命中：licensing tier 沒包 Mandiant Advantage、或 enrichment pipeline 沒啟用 — 檢查 tier、確認 Applied Threat Intel 開
Siemplify playbook 黑箱 fire-and-forget：自動 disable 結果誤殺合法 user — playbook 走 approval gate、預設 containment 不 deletion、定期 dry-run
Cross-tenant admin 太多：日常運維用 cross-tenant admin、blast radius 太大 — 收 admin、改 tenant-scoped role + 特定 capability、跨 tenant 走 break-glass
Cost 比預期高：data tier 選錯（買了 Enterprise Plus 卻只用 Enterprise feature）、retention 設太長 — 看實際 ingestion + retention 用量、tier 跟 retention 一起 review

何時改走其他服務

需求形狀	改走
Enterprise + 跨 on-prem + detection 成熟	Splunk
OSS-friendly / 自管 / 預算敏感	Elastic Security
Cloud-native + observability 已用 Datadog	Datadog Security
DLP / sensitive data discovery	Google DLP / Microsoft Purview
Endpoint detection 為主	CrowdStrike Falcon / Microsoft Defender for Endpoint
Incident routing	8 事故處理 vendor 清單

不在本頁內的主題

YARA-L 完整語法 reference、UDM 全欄位 schema
Chronicle / Siemplify / Mandiant 三條產品線整合前的歷史細節
Mandiant Advantage 平台（threat intel 訂閱、跟 SIEM 整合但獨立產品）
VirusTotal（Google 旗下、跟 Mandiant 互補但獨立服務）
Gemini for Security 的 prompt engineering 細節
Google Workspace security center（屬 Google Workspace、不在 Security Ops 範圍）

案例回寫

Google Security Ops 在 07 案例庫沒有直接 vendor-level 事件、但所有 detection-related case 都是 SIEM 偵測覆蓋率的對照：

案例	跟 Google Security Ops 的關係（對照啟示）
Microsoft Storm-0558 Signing Key Chain	UDM 強制 normalize 跨 Azure AD / GCP / Okta token validation 欄位、YARA-L 跨 source join 直接表達跨租戶 token forging pattern、Entity Graph 視覺化
Uber 2022 MFA Fatigue	YARA-L sequence pattern 直接表達「MFA fail count + 新裝置 login」、Risk Score 累積到 threshold 觸發 Siemplify playbook 自動 disable session
SolarWinds 2020 Sunburst	Mandiant 揭露 IoC 後 Applied Threat Intel 自動 push、Curated Detection 對應規則自動 enable、客戶不需要手動 onboard rule
Snowflake 2024 Credential Abuse	YARA-L 表達「query 體積 / 跨 schema scan / 來源 IP baseline」三軸 correlation rule；Entity Graph 聚合 credential / IP / data warehouse account 視覺化異常擴散（公開 UNC5537 跨客戶模式屬案例外延伸）
Detection Engineering Lifecycle (section)	Curated Detection + 自家 YARA-L rule 走 propose → staging → promote lifecycle、Google Security Ops 內建 rule versioning + Git → API push
Alert Fatigue and Signal Quality (section)	Risk Score multi-signal aggregation 是 alert fatigue 的工程化解法、跟 Splunk RBA 同類但 risk 傳遞走 Entity Graph、跨 entity 關係更細

下一步路由

上游：7.13 偵測覆蓋率與訊號治理、Detection Engineering Lifecycle
平行：Splunk、Elastic Security、Datadog Security
下游：Google DLP / Microsoft Purview（DLP signal 進 Google Security Ops）
跨類：Google Cloud IAM（GCP IAM log + Workload Identity Federation）、Google Secret Manager（SOAR playbook 拉 API）、Okta（IdP log source）、Cloudflare WAF（WAF log + auto-block）
跨模組：8 事故處理 vendor 清單（alert → IR routing）、4 observability（log pipeline 共用判斷）
官方：Google Security Operations Documentation

MySQL Audit Log + SIEM

Fri, 22 May 2026 00:00:00 +0000

MySQL audit log + SIEM 的核心責任是把資料庫操作事件轉成可查詢、可保留、可告警的安全證據。Audit log 是可調查的行為紀錄；它要回答誰在何時、從哪裡、對哪個資料物件做了什麼，以及是否符合授權流程。

本文的判讀錨點是：audit logging 要服務於 investigation 與 compliance。Slow query log、general log、binary log、error log、managed service audit log、plugin audit log 各自承擔不同證據，不應混成同一種 log。

Event Taxonomy

Event taxonomy 的核心責任是定義要蒐集哪些資料庫事件。

Event 類型	目的
Login / logout	身份與來源追蹤
Failed access	brute force、credential misuse
DDL	schema 變更與 migration evidence
DCL	grant / revoke / role 變更
Sensitive read	PII / payment / high-risk table
Data modification	bulk update / delete、admin action
Replication / backup	binlog、backup、restore access

事件分類要對應 alert。DDL 可以進 release audit；failed login 可以進 security alert；sensitive read 要連到 support ticket 或 break-glass 流程。

Log Sources

Log sources 的核心責任是選出合適來源。

Source	適合用途	風險
Error log	startup、crash、replication error	缺少完整 query context
Slow log	performance investigation	安全事件覆蓋不足
General log	debug / short-term tracing	volume 大、PII 風險高
Binary log	data change recovery / CDC	需要解析、並非 user audit 完整替代
Audit plugin / managed audit	security evidence	provider / edition / config 限制

General log 在 production 要謹慎使用。它能提供完整 SQL，但 volume、PII 與成本都高；通常只用短時間 incident window 或測試環境。

SIEM Pipeline

SIEM pipeline 的核心責任是把 database event 轉成集中查詢與告警。

Pipeline step	內容
Collect	log file、managed log export、agent
Normalize	actor、source IP、database、object、action
Mask	移除 SQL literal / PII
Retain	retention、legal hold、storage class
Alert	rule、severity、owner、runbook
Review	periodic access review

Normalization 要避免把完整 SQL 直接送進 SIEM。對敏感系統，可保留 query fingerprint、table、operation、row count、actor 與 ticket id，而非 literal value。

Alert Rules

Alert rules 的核心責任是把高風險事件變成可行動訊號。

Rule	代表風險	第一反應
Admin login outside window	credential misuse / emergency access	確認 ticket、限制 session
Grant / revoke event	權限邊界變更	access review
Drop / truncate table	destructive DDL	freeze release、restore decision
Bulk update / delete	application bug / misuse	查 transaction、binlog、backup
Sensitive table read	PII exposure	ticket match、scope review

Alert 要有 owner 與 runbook。只把 log 送進 SIEM，缺少 triage rule，incident 時仍然難以快速定位。

Retention and Privacy

Retention and privacy 的核心責任是讓 audit log 同時可用與合規。Audit log 可能包含帳號、IP、SQL、table name、literal value 與 PII；保存時間越長，保護責任越重。

Retention policy 要定義：

保存天數與 storage class。
哪些欄位可被 masked。
誰能查 audit log。
Legal hold 如何覆蓋一般 retention。
Export 到外部 SIEM 的資料邊界。

Audit log 本身也要納入 access control。能查敏感 audit 的人，通常也能推斷敏感資料活動。

下一步路由

Audit log + SIEM 完成後，加密與憑證讀 Encryption / TLS / Key Management；備份事故讀 PITR / Backup；安全治理讀 Data Protection。