Tarragon

"Llm-as-Judge"

2026-05-12 Hands-on：用本地 LLM 跑 judge harness（最小可行版）在 Ollama / LM Studio 上跑 local reasoning model 當 judge、對自己工作流案例做 eval、JSONL in / JSONL out 最小 harness
2026-05-12 4.21 LLM-as-Judge 評估方法 LLM 評估 LLM 的 production eval 方法：rubric design、pairwise / direct scoring、三大 bias 緩解、跟 trace 串接的閉環、calibration