Tarragon

"Quantization"

2026-05-12 QLoRA 把 base model 量化到 4-bit + LoRA fine-tune 的組合、消費級 GPU 也能 fine-tune 大模型
2026-05-12 5.2 KV cache 量化策略 PC 場景用 K=Q8 / V=Q4 等量化把 KV cache 壓縮、騰出 VRAM 開大 context window 或加併發數的判讀