Production

"Production"

2026-05-14 Frozen baseline Eval 系統中固定特定 prompt + model 當長期對照、讓行為漂移可見的標準作法
2026-05-12 LLM Tracing 把 LLM 應用的每次 LLM call / tool call / memory op 編成結構化 span、用 OpenTelemetry GenAI semantic conventions 標準化
2026-05-12 LLM-as-Judge 用 LLM 評估另一個 LLM 的輸出品質、production eval 的主流方法、500-5000× 成本降但有 bias 要處理
2026-05-12 Prefix Cache 把多個請求共用的前綴 prompt 的 KV cache 重用、省下重複 prefill 算力的優化、production 多用戶服務的常見設計
2026-04-22 6.1 graceful shutdown 與 signal handling 用 signal 與 context 傳遞停止訊號
2026-04-22 6.2 健康檢查與診斷 endpoint 區分服務可用性與工程診斷入口
2026-04-22 6.3 結構化日誌欄位設計讓 log 可 grep、可聚合、可追蹤
2026-04-22 6.4 版本偵測與 feature gate 依版本與環境能力啟用功能
2026-04-22 7.4 Observability pipeline、metrics 與 tracing 把 structured log、metric、trace 與 profile 組成可操作的診斷系統
2026-04-22 7.5 Kubernetes、systemd 與 load balancer 合約理解部署平台如何影響 Go 服務的 shutdown、health 與資源限制
2026-05-12 6.5 跨進 production 的 routing 中樞個人 dev → 團隊 → production LLM 服務的三層演化、跟 backend/07 對應卡片的 routing 清單
2026-04-22 1.6 rate limiting 與 backpressure 用本地速率限制與 backpressure 策略保護服務入口與下游依賴
2026-05-12 4.9 Production 部署的資源評估原理從本地單 user 到 production multi-tenant：concurrent users、cost model、observability、SLA、capacity planning 的設計取捨
2026-05-12 4.20 LLM tracing 與 observability OpenTelemetry GenAI semantic conventions、結構化 span 設計、cost / latency 監控、failure debug 流程、跟 LLM-as-judge eval 的串接
2026-05-12 4.21 LLM-as-Judge 評估方法 LLM 評估 LLM 的 production eval 方法：rubric design、pairwise / direct scoring、三大 bias 緩解、跟 trace 串接的閉環、calibration