<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/"><channel><title>Qlora on Tarragon</title><link>https://tarrragon.github.io/blog/tags/qlora/</link><description>Recent content in Qlora on Tarragon</description><generator>Hugo -- gohugo.io</generator><language>zh-TW</language><copyright>Tarragon (CC BY 4.0)</copyright><lastBuildDate>Tue, 12 May 2026 00:00:00 +0000</lastBuildDate><atom:link href="https://tarrragon.github.io/blog/tags/qlora/index.xml" rel="self" type="application/rss+xml"/><item><title>Hands-on：用 QLoRA 在本機 fine-tune coding 模型</title><link>https://tarrragon.github.io/blog/llm/01-local-llm-services/hands-on/local-fine-tuning/</link><pubDate>Tue, 12 May 2026 00:00:00 +0000</pubDate><guid>https://tarrragon.github.io/blog/llm/01-local-llm-services/hands-on/local-fine-tuning/</guid><description>&lt;p>&lt;a href="https://tarrragon.github.io/blog/llm/knowledge-cards/qlora/" data-link-title="QLoRA" data-link-desc="把 base model 量化到 4-bit &amp;#43; LoRA fine-tune 的組合、消費級 GPU 也能 fine-tune 大模型">QLoRA&lt;/a>（4-bit 量化 base model + &lt;a href="https://tarrragon.github.io/blog/llm/knowledge-cards/lora/" data-link-title="LoRA" data-link-desc="Low-Rank Adaptation：凍住原模型權重、只訓兩個小矩陣的 parameter-efficient fine-tuning">LoRA&lt;/a> adapter）讓消費級硬體也能 fine-tune 7B-32B 模型、是 2026/5 本地 fine-tuning 的主流方法。「在本機 fine-tune 一個小 coding 模型懂我 codebase 的慣例」是個人 dev 的合理目標、特別是在「本地 RAG 不夠精準、prompt engineering 已到天花板」的場景。本篇用 QLoRA 把 fine-tuning 的最短路徑走完：環境準備、資料蒐集、訓練、evaluation、合併權重、部署到 Ollama / llama.cpp 配 VS Code Continue.dev。&lt;/p>
&lt;p>本篇 framing 是「&lt;strong>真實會跑、不只跑 demo&lt;/strong>」、所以包含：硬體預算估算、catastrophic forgetting 防護、evaluation 確認真的有提升、回退方案（fine-tune 失敗時怎麼辦）。&lt;/p>
&lt;blockquote>
&lt;p>&lt;strong>驗證日期&lt;/strong>：2026-05-12
&lt;strong>環境&lt;/strong>：M4 Max 64GB + Hugging Face PEFT 0.13、或 5090 24GB + bitsandbytes
&lt;strong>目標模型&lt;/strong>：Qwen3-Coder-7B-Instruct（fine-tune 後輸出符合自己 codebase 慣例的 code）&lt;/p>&lt;/blockquote>
&lt;h2 id="為什麼這個議題重要">為什麼這個議題重要&lt;/h2>
&lt;p>寫 code 場景的常見 fine-tune 動機：&lt;/p>
&lt;ol>
&lt;li>&lt;strong>私有 codebase 慣例&lt;/strong>：自家專案有特殊 naming、特殊 design pattern、prompt engineering 拉不到、希望模型「自然知道」&lt;/li>
&lt;li>&lt;strong>特殊框架 / library&lt;/strong>：用 obscure 的內部 framework、通用模型沒看過、補完品質差&lt;/li>
&lt;li>&lt;strong>特定文檔風格&lt;/strong>：commit message、PR description、code comment 有 team-specific 格式&lt;/li>
&lt;li>&lt;strong>Reduce RAG dependence&lt;/strong>：把高頻 knowledge 編進模型權重、減少每次 query 都要 retrieve&lt;/li>
&lt;/ol>
&lt;p>但&lt;strong>不該 fine-tune&lt;/strong>的情境（先排除）：&lt;/p>
&lt;ol>
&lt;li>&lt;strong>新增世界知識&lt;/strong>：fine-tune 不擅長加新事實、用 &lt;a href="https://tarrragon.github.io/blog/llm/04-applications/rag-principles/" data-link-title="4.1 RAG 原理：retrieval &amp;#43; augmentation 模式" data-link-desc="為什麼模型需要外掛知識、語意相似 vs 字面相似、chunking 的本質取捨、retrieval 失敗的根本原因">RAG&lt;/a> 即可&lt;/li>
&lt;li>&lt;strong>複雜 reasoning 能力&lt;/strong>：fine-tune 一般不會讓模型變更會 reason、reasoning 來自 pre-training + RL&lt;/li>
&lt;li>&lt;strong>改善通用對話品質&lt;/strong>：通用對話品質取決於 RLHF、fine-tune 多半會 &lt;a href="https://tarrragon.github.io/blog/llm/knowledge-cards/catastrophic-forgetting/" data-link-title="Catastrophic Forgetting" data-link-desc="Fine-tune 模型時、新訓練資料覆蓋掉原本學到的能力的現象、LoRA / 資料 mixing 是主要緩解">catastrophic forgetting&lt;/a>&lt;/li>
&lt;li>&lt;strong>資料太少（&amp;lt; 500 對）&lt;/strong>：fine-tune 收益低、不如優化 prompt + RAG&lt;/li>
&lt;/ol>
&lt;h2 id="整體流程">整體流程&lt;/h2>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-text" data-lang="text">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">1. 硬體預算估算 → 知道能跑哪個 size 的 base model
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">2&lt;/span>&lt;span class="cl">2. 蒐集 fine-tune 資料 → 50-5000 對 (prompt, response)
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">3&lt;/span>&lt;span class="cl">3. 環境準備 → Python + bitsandbytes / PEFT / transformers
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">4&lt;/span>&lt;span class="cl">4. 跑 QLoRA 訓練 → 1-3 epochs、看 loss 趨勢
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">5&lt;/span>&lt;span class="cl">5. Evaluation → 在 held-out set + 通用 benchmark 都跑
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">6&lt;/span>&lt;span class="cl">6. Merge LoRA → base → 得到合併權重 .safetensors
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">7&lt;/span>&lt;span class="cl">7. Convert → GGUF → 用 llama.cpp convert 工具
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">8&lt;/span>&lt;span class="cl">8. Deploy 到 Ollama → ollama create my-coder -f Modelfile
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">9&lt;/span>&lt;span class="cl">9. 配 Continue.dev → config.json 加新 provider&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h2 id="step-1硬體預算估算">Step 1：硬體預算估算&lt;/h2>
&lt;p>QLoRA 訓練的記憶體需求（粗略估算）：&lt;/p></description><content:encoded><![CDATA[<p><a href="/blog/llm/knowledge-cards/qlora/" data-link-title="QLoRA" data-link-desc="把 base model 量化到 4-bit &#43; LoRA fine-tune 的組合、消費級 GPU 也能 fine-tune 大模型">QLoRA</a>（4-bit 量化 base model + <a href="/blog/llm/knowledge-cards/lora/" data-link-title="LoRA" data-link-desc="Low-Rank Adaptation：凍住原模型權重、只訓兩個小矩陣的 parameter-efficient fine-tuning">LoRA</a> adapter）讓消費級硬體也能 fine-tune 7B-32B 模型、是 2026/5 本地 fine-tuning 的主流方法。「在本機 fine-tune 一個小 coding 模型懂我 codebase 的慣例」是個人 dev 的合理目標、特別是在「本地 RAG 不夠精準、prompt engineering 已到天花板」的場景。本篇用 QLoRA 把 fine-tuning 的最短路徑走完：環境準備、資料蒐集、訓練、evaluation、合併權重、部署到 Ollama / llama.cpp 配 VS Code Continue.dev。</p>
<p>本篇 framing 是「<strong>真實會跑、不只跑 demo</strong>」、所以包含：硬體預算估算、catastrophic forgetting 防護、evaluation 確認真的有提升、回退方案（fine-tune 失敗時怎麼辦）。</p>
<blockquote>
<p><strong>驗證日期</strong>：2026-05-12
<strong>環境</strong>：M4 Max 64GB + Hugging Face PEFT 0.13、或 5090 24GB + bitsandbytes
<strong>目標模型</strong>：Qwen3-Coder-7B-Instruct（fine-tune 後輸出符合自己 codebase 慣例的 code）</p></blockquote>
<h2 id="為什麼這個議題重要">為什麼這個議題重要</h2>
<p>寫 code 場景的常見 fine-tune 動機：</p>
<ol>
<li><strong>私有 codebase 慣例</strong>：自家專案有特殊 naming、特殊 design pattern、prompt engineering 拉不到、希望模型「自然知道」</li>
<li><strong>特殊框架 / library</strong>：用 obscure 的內部 framework、通用模型沒看過、補完品質差</li>
<li><strong>特定文檔風格</strong>：commit message、PR description、code comment 有 team-specific 格式</li>
<li><strong>Reduce RAG dependence</strong>：把高頻 knowledge 編進模型權重、減少每次 query 都要 retrieve</li>
</ol>
<p>但<strong>不該 fine-tune</strong>的情境（先排除）：</p>
<ol>
<li><strong>新增世界知識</strong>：fine-tune 不擅長加新事實、用 <a href="/blog/llm/04-applications/rag-principles/" data-link-title="4.1 RAG 原理：retrieval &#43; augmentation 模式" data-link-desc="為什麼模型需要外掛知識、語意相似 vs 字面相似、chunking 的本質取捨、retrieval 失敗的根本原因">RAG</a> 即可</li>
<li><strong>複雜 reasoning 能力</strong>：fine-tune 一般不會讓模型變更會 reason、reasoning 來自 pre-training + RL</li>
<li><strong>改善通用對話品質</strong>：通用對話品質取決於 RLHF、fine-tune 多半會 <a href="/blog/llm/knowledge-cards/catastrophic-forgetting/" data-link-title="Catastrophic Forgetting" data-link-desc="Fine-tune 模型時、新訓練資料覆蓋掉原本學到的能力的現象、LoRA / 資料 mixing 是主要緩解">catastrophic forgetting</a></li>
<li><strong>資料太少（&lt; 500 對）</strong>：fine-tune 收益低、不如優化 prompt + RAG</li>
</ol>
<h2 id="整體流程">整體流程</h2>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="ln">1</span><span class="cl">1. 硬體預算估算       → 知道能跑哪個 size 的 base model
</span></span><span class="line"><span class="ln">2</span><span class="cl">2. 蒐集 fine-tune 資料 → 50-5000 對 (prompt, response)
</span></span><span class="line"><span class="ln">3</span><span class="cl">3. 環境準備           → Python + bitsandbytes / PEFT / transformers
</span></span><span class="line"><span class="ln">4</span><span class="cl">4. 跑 QLoRA 訓練      → 1-3 epochs、看 loss 趨勢
</span></span><span class="line"><span class="ln">5</span><span class="cl">5. Evaluation         → 在 held-out set + 通用 benchmark 都跑
</span></span><span class="line"><span class="ln">6</span><span class="cl">6. Merge LoRA → base  → 得到合併權重 .safetensors
</span></span><span class="line"><span class="ln">7</span><span class="cl">7. Convert → GGUF     → 用 llama.cpp convert 工具
</span></span><span class="line"><span class="ln">8</span><span class="cl">8. Deploy 到 Ollama   → ollama create my-coder -f Modelfile
</span></span><span class="line"><span class="ln">9</span><span class="cl">9. 配 Continue.dev    → config.json 加新 provider</span></span></code></pre></div><h2 id="step-1硬體預算估算">Step 1：硬體預算估算</h2>
<p>QLoRA 訓練的記憶體需求（粗略估算）：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="ln"> 1</span><span class="cl">記憶體 ≈ N (B 參數) × 0.6 GB     ← 訓練時
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">        ≈ N (B 參數) × 0.3 GB     ← 推論（4-bit）
</span></span><span class="line"><span class="ln"> 3</span><span class="cl">
</span></span><span class="line"><span class="ln"> 4</span><span class="cl">Apple Silicon Mac：
</span></span><span class="line"><span class="ln"> 5</span><span class="cl">  M4 Pro 24GB → 訓 7B 可、訓 14B 緊
</span></span><span class="line"><span class="ln"> 6</span><span class="cl">  M4 Pro 36GB → 訓 7B 寬鬆、訓 14B 可
</span></span><span class="line"><span class="ln"> 7</span><span class="cl">  M4 Max 64GB+ → 訓 30B 可、推論 70B 可
</span></span><span class="line"><span class="ln"> 8</span><span class="cl">
</span></span><span class="line"><span class="ln"> 9</span><span class="cl">PC 獨立 GPU：
</span></span><span class="line"><span class="ln">10</span><span class="cl">  RTX 4090 / 5090 24GB → 訓 7B 寬鬆、訓 14B / 30B with `--n-cpu-moe` 可
</span></span><span class="line"><span class="ln">11</span><span class="cl">  RTX A6000 48GB → 訓 30-32B 寬鬆</span></span></code></pre></div><blockquote>
<p><strong>事實查核註</strong>：Apple Silicon 上的 QLoRA 支援度跟 bitsandbytes / MLX 工具鏈版本相關、2026/5 主流是用 MLX 自己的 LoRA 實作（<code>mlx-lm</code>）、CUDA 路線用 transformers + bitsandbytes + PEFT。具體支援度以對應 release 為準。</p></blockquote>
<p>本篇假設 fine-tune Qwen3-Coder-7B、所以 24GB+ Mac 或 16GB+ GPU 都能跑。</p>
<h2 id="step-2蒐集-fine-tune-資料">Step 2：蒐集 fine-tune 資料</h2>
<p>最關鍵的 step。資料品質決定 fine-tune 成敗。</p>
<h3 id="資料格式典型-sft-format">資料格式（典型 SFT format）</h3>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-json" data-lang="json"><span class="line"><span class="ln">1</span><span class="cl"><span class="p">[</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl">  <span class="p">{</span>
</span></span><span class="line"><span class="ln">3</span><span class="cl">    <span class="nt">&#34;instruction&#34;</span><span class="p">:</span> <span class="s2">&#34;用我們 codebase 的慣例寫一個 REST endpoint 處理 user signup&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">4</span><span class="cl">    <span class="nt">&#34;input&#34;</span><span class="p">:</span> <span class="s2">&#34;需求：accept email + password、回 JWT&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">5</span><span class="cl">    <span class="nt">&#34;output&#34;</span><span class="p">:</span> <span class="s2">&#34;// 完整符合我們慣例的 code...&#34;</span>
</span></span><span class="line"><span class="ln">6</span><span class="cl">  <span class="p">},</span>
</span></span><span class="line"><span class="ln">7</span><span class="cl">  <span class="err">...</span>
</span></span><span class="line"><span class="ln">8</span><span class="cl"><span class="p">]</span></span></span></code></pre></div><p>或對話格式（ChatML）：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-json" data-lang="json"><span class="line"><span class="ln">1</span><span class="cl"><span class="p">[</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl">  <span class="p">{</span>
</span></span><span class="line"><span class="ln">3</span><span class="cl">    <span class="nt">&#34;messages&#34;</span><span class="p">:</span> <span class="p">[</span>
</span></span><span class="line"><span class="ln">4</span><span class="cl">      <span class="p">{</span><span class="nt">&#34;role&#34;</span><span class="p">:</span> <span class="s2">&#34;system&#34;</span><span class="p">,</span> <span class="nt">&#34;content&#34;</span><span class="p">:</span> <span class="s2">&#34;你是我們 codebase 的 coding assistant&#34;</span><span class="p">},</span>
</span></span><span class="line"><span class="ln">5</span><span class="cl">      <span class="p">{</span><span class="nt">&#34;role&#34;</span><span class="p">:</span> <span class="s2">&#34;user&#34;</span><span class="p">,</span> <span class="nt">&#34;content&#34;</span><span class="p">:</span> <span class="s2">&#34;...&#34;</span><span class="p">},</span>
</span></span><span class="line"><span class="ln">6</span><span class="cl">      <span class="p">{</span><span class="nt">&#34;role&#34;</span><span class="p">:</span> <span class="s2">&#34;assistant&#34;</span><span class="p">,</span> <span class="nt">&#34;content&#34;</span><span class="p">:</span> <span class="s2">&#34;...&#34;</span><span class="p">}</span>
</span></span><span class="line"><span class="ln">7</span><span class="cl">    <span class="p">]</span>
</span></span><span class="line"><span class="ln">8</span><span class="cl">  <span class="p">}</span>
</span></span><span class="line"><span class="ln">9</span><span class="cl"><span class="p">]</span></span></span></code></pre></div><h3 id="資料來源">資料來源</h3>
<table>
  <thead>
      <tr>
          <th>來源</th>
          <th>取得方式</th>
          <th>品質</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>過往 commit 的「good code」</td>
          <td>從 main branch 抽函式 + git log message</td>
          <td>中（人工挑）</td>
      </tr>
      <tr>
          <td>Code review 通過的 PR diff</td>
          <td>從 GitHub API 抽 merged PR</td>
          <td>高</td>
      </tr>
      <tr>
          <td>內部 wiki 跟 design docs</td>
          <td>轉成 Q&amp;A 對</td>
          <td>中</td>
      </tr>
      <tr>
          <td>Synthetic data：用大模型生</td>
          <td>給雲端旗艦 prompt「以這個 codebase 風格寫 X」</td>
          <td>中（要 review）</td>
      </tr>
      <tr>
          <td>Pair programming 紀錄</td>
          <td>自己跟 IDE 互動的 log</td>
          <td>高（最貼近真實使用）</td>
      </tr>
  </tbody>
</table>
<h3 id="資料量門檻">資料量門檻</h3>
<table>
  <thead>
      <tr>
          <th>資料量</th>
          <th>預期效果</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>&lt; 50 對</td>
          <td>通常無感、不如優化 prompt + RAG</td>
      </tr>
      <tr>
          <td>50-500 對</td>
          <td>開始有 in-domain 效果、但易 forgetting</td>
      </tr>
      <tr>
          <td>500-5000 對</td>
          <td>顯著效果、QLoRA fine-tune 甜蜜點</td>
      </tr>
      <tr>
          <td>5000+ 對</td>
          <td>邊際收益遞減、開始接近 full fine-tune 效果</td>
      </tr>
  </tbody>
</table>
<h3 id="資料-mixing防-catastrophic-forgetting">資料 mixing（防 <a href="/blog/llm/knowledge-cards/catastrophic-forgetting/" data-link-title="Catastrophic Forgetting" data-link-desc="Fine-tune 模型時、新訓練資料覆蓋掉原本學到的能力的現象、LoRA / 資料 mixing 是主要緩解">catastrophic forgetting</a>）</h3>
<p>訓練 batch 內 mix 通用資料、避免 fine-tune 把通用能力洗掉：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="ln">1</span><span class="cl">80% in-domain data（你的 codebase 範例）
</span></span><span class="line"><span class="ln">2</span><span class="cl">20% 通用 instruction data（如 Alpaca、ShareGPT subset）</span></span></code></pre></div><p>通用 data 可從 Hugging Face datasets 抓（如 <code>tatsu-lab/alpaca</code>、<code>teknium/OpenHermes-2.5</code>）。</p>
<h2 id="step-3環境準備">Step 3：環境準備</h2>
<h3 id="apple-silicon-mac用-mlx">Apple Silicon Mac（用 MLX）</h3>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl"><span class="c1"># MLX 是 Apple 的 ML framework、原生支援 Apple Silicon</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl">pip install mlx mlx-lm
</span></span><span class="line"><span class="ln">3</span><span class="cl">
</span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="c1"># 或用 conda（推薦）</span>
</span></span><span class="line"><span class="ln">5</span><span class="cl">conda create -n llm-ft <span class="nv">python</span><span class="o">=</span>3.11
</span></span><span class="line"><span class="ln">6</span><span class="cl">conda activate llm-ft
</span></span><span class="line"><span class="ln">7</span><span class="cl">pip install mlx-lm</span></span></code></pre></div><h3 id="pccuda--transformers--bitsandbytes">PC（CUDA + transformers + bitsandbytes）</h3>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl"><span class="c1"># 安裝 CUDA 12.x（依 GPU 驅動）</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl">
</span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="c1"># Python 套件</span>
</span></span><span class="line"><span class="ln">4</span><span class="cl">pip install torch transformers peft bitsandbytes accelerate datasets trl</span></span></code></pre></div><h2 id="step-4跑-qlora-訓練">Step 4：跑 QLoRA 訓練</h2>
<h3 id="apple-siliconmlx方式">Apple Silicon（MLX）方式</h3>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="c1"># 把 base model 下載到本機</span>
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">huggingface-cli download Qwen/Qwen3-Coder-7B-Instruct <span class="se">\
</span></span></span><span class="line"><span class="ln"> 3</span><span class="cl"><span class="se"></span>  --local-dir ~/models/qwen3-coder-7b
</span></span><span class="line"><span class="ln"> 4</span><span class="cl">
</span></span><span class="line"><span class="ln"> 5</span><span class="cl"><span class="c1"># 把資料整理成 JSONL（一行一筆）</span>
</span></span><span class="line"><span class="ln"> 6</span><span class="cl"><span class="c1"># data/train.jsonl、data/valid.jsonl</span>
</span></span><span class="line"><span class="ln"> 7</span><span class="cl">
</span></span><span class="line"><span class="ln"> 8</span><span class="cl"><span class="c1"># 跑 LoRA fine-tune（MLX 內建 4-bit）</span>
</span></span><span class="line"><span class="ln"> 9</span><span class="cl">mlx_lm.lora <span class="se">\
</span></span></span><span class="line"><span class="ln">10</span><span class="cl"><span class="se"></span>  --train <span class="se">\
</span></span></span><span class="line"><span class="ln">11</span><span class="cl"><span class="se"></span>  --model ~/models/qwen3-coder-7b <span class="se">\
</span></span></span><span class="line"><span class="ln">12</span><span class="cl"><span class="se"></span>  --data data/ <span class="se">\
</span></span></span><span class="line"><span class="ln">13</span><span class="cl"><span class="se"></span>  --batch-size <span class="m">4</span> <span class="se">\
</span></span></span><span class="line"><span class="ln">14</span><span class="cl"><span class="se"></span>  --lora-layers <span class="m">16</span> <span class="se">\
</span></span></span><span class="line"><span class="ln">15</span><span class="cl"><span class="se"></span>  --iters <span class="m">1000</span> <span class="se">\
</span></span></span><span class="line"><span class="ln">16</span><span class="cl"><span class="se"></span>  --learning-rate 1e-4 <span class="se">\
</span></span></span><span class="line"><span class="ln">17</span><span class="cl"><span class="se"></span>  --steps-per-eval <span class="m">100</span> <span class="se">\
</span></span></span><span class="line"><span class="ln">18</span><span class="cl"><span class="se"></span>  --adapter-path ./adapters</span></span></code></pre></div><h3 id="pccuda方式">PC（CUDA）方式</h3>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="c1"># train.py（簡化版）</span>
</span></span><span class="line"><span class="ln"> 2</span><span class="cl"><span class="kn">from</span> <span class="nn">transformers</span> <span class="kn">import</span> <span class="n">AutoTokenizer</span><span class="p">,</span> <span class="n">AutoModelForCausalLM</span><span class="p">,</span> <span class="n">TrainingArguments</span><span class="p">,</span> <span class="n">BitsAndBytesConfig</span>
</span></span><span class="line"><span class="ln"> 3</span><span class="cl"><span class="kn">from</span> <span class="nn">peft</span> <span class="kn">import</span> <span class="n">LoraConfig</span><span class="p">,</span> <span class="n">get_peft_model</span>
</span></span><span class="line"><span class="ln"> 4</span><span class="cl"><span class="kn">from</span> <span class="nn">trl</span> <span class="kn">import</span> <span class="n">SFTTrainer</span>
</span></span><span class="line"><span class="ln"> 5</span><span class="cl"><span class="kn">from</span> <span class="nn">datasets</span> <span class="kn">import</span> <span class="n">load_dataset</span>
</span></span><span class="line"><span class="ln"> 6</span><span class="cl">
</span></span><span class="line"><span class="ln"> 7</span><span class="cl"><span class="c1"># 4-bit 量化載入 base</span>
</span></span><span class="line"><span class="ln"> 8</span><span class="cl"><span class="n">bnb_config</span> <span class="o">=</span> <span class="n">BitsAndBytesConfig</span><span class="p">(</span>
</span></span><span class="line"><span class="ln"> 9</span><span class="cl">    <span class="n">load_in_4bit</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">10</span><span class="cl">    <span class="n">bnb_4bit_quant_type</span><span class="o">=</span><span class="s2">&#34;nf4&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">11</span><span class="cl">    <span class="n">bnb_4bit_compute_dtype</span><span class="o">=</span><span class="s2">&#34;bfloat16&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">12</span><span class="cl"><span class="p">)</span>
</span></span><span class="line"><span class="ln">13</span><span class="cl"><span class="n">model</span> <span class="o">=</span> <span class="n">AutoModelForCausalLM</span><span class="o">.</span><span class="n">from_pretrained</span><span class="p">(</span>
</span></span><span class="line"><span class="ln">14</span><span class="cl">    <span class="s2">&#34;Qwen/Qwen3-Coder-7B-Instruct&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">15</span><span class="cl">    <span class="n">quantization_config</span><span class="o">=</span><span class="n">bnb_config</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">16</span><span class="cl"><span class="p">)</span>
</span></span><span class="line"><span class="ln">17</span><span class="cl">
</span></span><span class="line"><span class="ln">18</span><span class="cl"><span class="c1"># LoRA 配置</span>
</span></span><span class="line"><span class="ln">19</span><span class="cl"><span class="n">lora_config</span> <span class="o">=</span> <span class="n">LoraConfig</span><span class="p">(</span>
</span></span><span class="line"><span class="ln">20</span><span class="cl">    <span class="n">r</span><span class="o">=</span><span class="mi">16</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">21</span><span class="cl">    <span class="n">lora_alpha</span><span class="o">=</span><span class="mi">32</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">22</span><span class="cl">    <span class="n">target_modules</span><span class="o">=</span><span class="p">[</span><span class="s2">&#34;q_proj&#34;</span><span class="p">,</span> <span class="s2">&#34;v_proj&#34;</span><span class="p">],</span>
</span></span><span class="line"><span class="ln">23</span><span class="cl">    <span class="n">lora_dropout</span><span class="o">=</span><span class="mf">0.05</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">24</span><span class="cl">    <span class="n">task_type</span><span class="o">=</span><span class="s2">&#34;CAUSAL_LM&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">25</span><span class="cl"><span class="p">)</span>
</span></span><span class="line"><span class="ln">26</span><span class="cl"><span class="n">model</span> <span class="o">=</span> <span class="n">get_peft_model</span><span class="p">(</span><span class="n">model</span><span class="p">,</span> <span class="n">lora_config</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">27</span><span class="cl">
</span></span><span class="line"><span class="ln">28</span><span class="cl"><span class="c1"># 資料</span>
</span></span><span class="line"><span class="ln">29</span><span class="cl"><span class="n">dataset</span> <span class="o">=</span> <span class="n">load_dataset</span><span class="p">(</span><span class="s2">&#34;json&#34;</span><span class="p">,</span> <span class="n">data_files</span><span class="o">=</span><span class="s2">&#34;data/train.jsonl&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">30</span><span class="cl">
</span></span><span class="line"><span class="ln">31</span><span class="cl"><span class="c1"># 訓練</span>
</span></span><span class="line"><span class="ln">32</span><span class="cl"><span class="n">training_args</span> <span class="o">=</span> <span class="n">TrainingArguments</span><span class="p">(</span>
</span></span><span class="line"><span class="ln">33</span><span class="cl">    <span class="n">output_dir</span><span class="o">=</span><span class="s2">&#34;./checkpoints&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">34</span><span class="cl">    <span class="n">learning_rate</span><span class="o">=</span><span class="mf">1e-4</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">35</span><span class="cl">    <span class="n">num_train_epochs</span><span class="o">=</span><span class="mi">2</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">36</span><span class="cl">    <span class="n">per_device_train_batch_size</span><span class="o">=</span><span class="mi">4</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">37</span><span class="cl">    <span class="n">gradient_accumulation_steps</span><span class="o">=</span><span class="mi">4</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">38</span><span class="cl">    <span class="n">save_steps</span><span class="o">=</span><span class="mi">200</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">39</span><span class="cl">    <span class="n">logging_steps</span><span class="o">=</span><span class="mi">20</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">40</span><span class="cl">    <span class="n">optim</span><span class="o">=</span><span class="s2">&#34;paged_adamw_8bit&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">41</span><span class="cl">    <span class="n">bf16</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">42</span><span class="cl"><span class="p">)</span>
</span></span><span class="line"><span class="ln">43</span><span class="cl"><span class="n">trainer</span> <span class="o">=</span> <span class="n">SFTTrainer</span><span class="p">(</span>
</span></span><span class="line"><span class="ln">44</span><span class="cl">    <span class="n">model</span><span class="o">=</span><span class="n">model</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">45</span><span class="cl">    <span class="n">args</span><span class="o">=</span><span class="n">training_args</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">46</span><span class="cl">    <span class="n">train_dataset</span><span class="o">=</span><span class="n">dataset</span><span class="p">[</span><span class="s2">&#34;train&#34;</span><span class="p">],</span>
</span></span><span class="line"><span class="ln">47</span><span class="cl">    <span class="n">max_seq_length</span><span class="o">=</span><span class="mi">2048</span><span class="p">,</span>
</span></span><span class="line"><span class="ln">48</span><span class="cl"><span class="p">)</span>
</span></span><span class="line"><span class="ln">49</span><span class="cl"><span class="n">trainer</span><span class="o">.</span><span class="n">train</span><span class="p">()</span>
</span></span><span class="line"><span class="ln">50</span><span class="cl"><span class="n">trainer</span><span class="o">.</span><span class="n">save_model</span><span class="p">(</span><span class="s2">&#34;./adapters&#34;</span><span class="p">)</span></span></span></code></pre></div><p>關鍵超參數的判讀邏輯：</p>
<table>
  <thead>
      <tr>
          <th>參數</th>
          <th>預設</th>
          <th>怎麼調</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td><code>r</code>（LoRA rank）</td>
          <td>16</td>
          <td>小 dataset（&lt; 1000 對）可降到 8、大 dataset 升到 32 / 64</td>
      </tr>
      <tr>
          <td><code>lora_alpha</code></td>
          <td>32（通常 = 2 × r）</td>
          <td>增大會放大 LoRA 影響、太大易 catastrophic forgetting</td>
      </tr>
      <tr>
          <td><code>target_modules</code></td>
          <td>q_proj, v_proj</td>
          <td>8B+ 模型可加 k_proj + o_proj 提品質、加 ffn 是進階</td>
      </tr>
      <tr>
          <td><code>lora_dropout</code></td>
          <td>0.05</td>
          <td>dataset 小時加大（0.1）防 overfit</td>
      </tr>
      <tr>
          <td><code>num_train_epochs</code></td>
          <td>2</td>
          <td>1-3 是常見範圍、看 validation loss 何時開始升</td>
      </tr>
      <tr>
          <td><code>per_device_train_batch_size</code></td>
          <td>4</td>
          <td>視 GPU 記憶體；不夠用 <code>gradient_accumulation_steps</code> 補</td>
      </tr>
      <tr>
          <td><code>learning_rate</code></td>
          <td>1e-4</td>
          <td>LoRA 適合較大 lr（vs full fine-tune 的 1e-5）、初值可 1e-4 ~ 5e-4</td>
      </tr>
  </tbody>
</table>
<h3 id="看-training-loss-趨勢">看 training loss 趨勢</h3>
<p>訓練過程中、loss 應該：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="ln"> 1</span><span class="cl">Initial：~2.5（cross-entropy on next-token）
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">1/4 訓練：降到 ~1.5
</span></span><span class="line"><span class="ln"> 3</span><span class="cl">1/2 訓練：降到 ~1.0
</span></span><span class="line"><span class="ln"> 4</span><span class="cl">3/4 訓練：降到 ~0.7
</span></span><span class="line"><span class="ln"> 5</span><span class="cl">末段：穩定在 ~0.5
</span></span><span class="line"><span class="ln"> 6</span><span class="cl">
</span></span><span class="line"><span class="ln"> 7</span><span class="cl">警示訊號：
</span></span><span class="line"><span class="ln"> 8</span><span class="cl">- Loss 不降（≈ 2.0+ 持平） → lr 太小、或資料品質差、或 base 跟資料分佈完全不合
</span></span><span class="line"><span class="ln"> 9</span><span class="cl">- Loss 降到 &lt; 0.1 → over-fit、validation loss 應該已升、stop training
</span></span><span class="line"><span class="ln">10</span><span class="cl">- Loss 出 NaN → lr 太大、降 lr 重來</span></span></code></pre></div><h2 id="step-5evaluation">Step 5：Evaluation</h2>
<p>訓練完不能只看 training loss、要實測：</p>
<h3 id="1-held-out-test-set你自己的-in-domain-資料">1. Held-out test set（你自己的 in-domain 資料）</h3>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl"><span class="c1"># 拿 valid.jsonl 跑、看模型輸出 vs expected</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="c1"># 用 BLEU / ROUGE / 或 LLM-as-judge 評分</span>
</span></span><span class="line"><span class="ln">3</span><span class="cl">mlx_lm.generate <span class="se">\
</span></span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="se"></span>  --model ~/models/qwen3-coder-7b <span class="se">\
</span></span></span><span class="line"><span class="ln">5</span><span class="cl"><span class="se"></span>  --adapter ./adapters <span class="se">\
</span></span></span><span class="line"><span class="ln">6</span><span class="cl"><span class="se"></span>  --prompt <span class="s2">&#34;&lt;test prompt from valid.jsonl&gt;&#34;</span></span></span></code></pre></div><h3 id="2-通用-benchmark防-catastrophic-forgetting">2. 通用 benchmark（防 catastrophic forgetting）</h3>
<p>跑通用 HumanEval、看分數有沒有崩：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl"><span class="c1"># 用 lm-evaluation-harness</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl">git clone https://github.com/EleutherAI/lm-evaluation-harness
</span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="nb">cd</span> lm-evaluation-harness
</span></span><span class="line"><span class="ln">4</span><span class="cl">pip install -e .
</span></span><span class="line"><span class="ln">5</span><span class="cl">
</span></span><span class="line"><span class="ln">6</span><span class="cl">lm_eval --model hf <span class="se">\
</span></span></span><span class="line"><span class="ln">7</span><span class="cl"><span class="se"></span>  --model_args <span class="nv">pretrained</span><span class="o">=</span>~/models/qwen3-coder-7b,peft<span class="o">=</span>./adapters <span class="se">\
</span></span></span><span class="line"><span class="ln">8</span><span class="cl"><span class="se"></span>  --tasks humaneval <span class="se">\
</span></span></span><span class="line"><span class="ln">9</span><span class="cl"><span class="se"></span>  --batch_size <span class="m">8</span></span></span></code></pre></div><p>判讀：</p>
<ul>
<li>HumanEval 從 75% → 75%：通用能力保留、in-domain 提升、成功</li>
<li>HumanEval 從 75% → 55%：catastrophic forgetting、要重新 fine-tune（用 LoRA + 資料 mixing 加強）</li>
</ul>
<h3 id="3-自己工作流測試最重要">3. 自己工作流測試（最重要）</h3>
<p>實際在 Continue.dev 用幾天、看：</p>
<ul>
<li>In-domain 任務輸出是否確實貼近 codebase 慣例</li>
<li>通用 coding 任務（如「寫一個 helper function」）是否仍 OK</li>
<li>對話流暢度有沒有變差</li>
<li>出現怪行為的頻率</li>
</ul>
<h2 id="step-6合併-lora-跟-base-model">Step 6：合併 LoRA 跟 base model</h2>
<p>訓練完得到 adapter（小檔、&lt; 100MB）。要用於日常推論、通常 merge 進 base：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="c1"># MLX 方式</span>
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">mlx_lm.fuse <span class="se">\
</span></span></span><span class="line"><span class="ln"> 3</span><span class="cl"><span class="se"></span>  --model ~/models/qwen3-coder-7b <span class="se">\
</span></span></span><span class="line"><span class="ln"> 4</span><span class="cl"><span class="se"></span>  --adapter-path ./adapters <span class="se">\
</span></span></span><span class="line"><span class="ln"> 5</span><span class="cl"><span class="se"></span>  --save-path ~/models/qwen3-coder-7b-mycodebase
</span></span><span class="line"><span class="ln"> 6</span><span class="cl">
</span></span><span class="line"><span class="ln"> 7</span><span class="cl"><span class="c1"># PEFT 方式</span>
</span></span><span class="line"><span class="ln"> 8</span><span class="cl">python -c <span class="s2">&#34;
</span></span></span><span class="line"><span class="ln"> 9</span><span class="cl"><span class="s2">from peft import AutoPeftModelForCausalLM
</span></span></span><span class="line"><span class="ln">10</span><span class="cl"><span class="s2">import torch
</span></span></span><span class="line"><span class="ln">11</span><span class="cl"><span class="s2">
</span></span></span><span class="line"><span class="ln">12</span><span class="cl"><span class="s2">model = AutoPeftModelForCausalLM.from_pretrained(&#39;./adapters&#39;, torch_dtype=torch.bfloat16)
</span></span></span><span class="line"><span class="ln">13</span><span class="cl"><span class="s2">merged = model.merge_and_unload()
</span></span></span><span class="line"><span class="ln">14</span><span class="cl"><span class="s2">merged.save_pretrained(&#39;./merged-model&#39;)
</span></span></span><span class="line"><span class="ln">15</span><span class="cl"><span class="s2">&#34;</span></span></span></code></pre></div><h2 id="step-7convert-成-gguf給-ollama--llamacpp-用">Step 7：Convert 成 GGUF（給 Ollama / llama.cpp 用）</h2>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="c1"># 安裝 llama.cpp</span>
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">git clone https://github.com/ggml-org/llama.cpp
</span></span><span class="line"><span class="ln"> 3</span><span class="cl"><span class="nb">cd</span> llama.cpp
</span></span><span class="line"><span class="ln"> 4</span><span class="cl">pip install -r requirements.txt
</span></span><span class="line"><span class="ln"> 5</span><span class="cl">
</span></span><span class="line"><span class="ln"> 6</span><span class="cl"><span class="c1"># Convert HF → GGUF</span>
</span></span><span class="line"><span class="ln"> 7</span><span class="cl">python convert_hf_to_gguf.py ~/models/qwen3-coder-7b-mycodebase <span class="se">\
</span></span></span><span class="line"><span class="ln"> 8</span><span class="cl"><span class="se"></span>  --outfile ~/models/qwen3-coder-7b-mycodebase.gguf
</span></span><span class="line"><span class="ln"> 9</span><span class="cl">
</span></span><span class="line"><span class="ln">10</span><span class="cl"><span class="c1"># 量化（可選、Q4_K_M 是甜蜜點）</span>
</span></span><span class="line"><span class="ln">11</span><span class="cl">./llama-quantize <span class="se">\
</span></span></span><span class="line"><span class="ln">12</span><span class="cl"><span class="se"></span>  ~/models/qwen3-coder-7b-mycodebase.gguf <span class="se">\
</span></span></span><span class="line"><span class="ln">13</span><span class="cl"><span class="se"></span>  ~/models/qwen3-coder-7b-mycodebase-Q4_K_M.gguf <span class="se">\
</span></span></span><span class="line"><span class="ln">14</span><span class="cl"><span class="se"></span>  Q4_K_M</span></span></code></pre></div><h2 id="step-8deploy-到-ollama">Step 8：Deploy 到 Ollama</h2>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="c1"># 寫 Modelfile</span>
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">cat &gt; ~/models/Modelfile-mycodebase <span class="s">&lt;&lt;EOF
</span></span></span><span class="line"><span class="ln"> 3</span><span class="cl"><span class="s">FROM ~/models/qwen3-coder-7b-mycodebase-Q4_K_M.gguf
</span></span></span><span class="line"><span class="ln"> 4</span><span class="cl"><span class="s">
</span></span></span><span class="line"><span class="ln"> 5</span><span class="cl"><span class="s">TEMPLATE &#34;&#34;&#34;&lt;|im_start|&gt;system
</span></span></span><span class="line"><span class="ln"> 6</span><span class="cl"><span class="s">{{ .System }}&lt;|im_end|&gt;
</span></span></span><span class="line"><span class="ln"> 7</span><span class="cl"><span class="s">&lt;|im_start|&gt;user
</span></span></span><span class="line"><span class="ln"> 8</span><span class="cl"><span class="s">{{ .Prompt }}&lt;|im_end|&gt;
</span></span></span><span class="line"><span class="ln"> 9</span><span class="cl"><span class="s">&lt;|im_start|&gt;assistant
</span></span></span><span class="line"><span class="ln">10</span><span class="cl"><span class="s">&#34;&#34;&#34;
</span></span></span><span class="line"><span class="ln">11</span><span class="cl"><span class="s">
</span></span></span><span class="line"><span class="ln">12</span><span class="cl"><span class="s">PARAMETER temperature 0.3
</span></span></span><span class="line"><span class="ln">13</span><span class="cl"><span class="s">PARAMETER top_p 0.9
</span></span></span><span class="line"><span class="ln">14</span><span class="cl"><span class="s">PARAMETER num_ctx 32768
</span></span></span><span class="line"><span class="ln">15</span><span class="cl"><span class="s">EOF</span>
</span></span><span class="line"><span class="ln">16</span><span class="cl">
</span></span><span class="line"><span class="ln">17</span><span class="cl"><span class="c1"># 註冊到 Ollama</span>
</span></span><span class="line"><span class="ln">18</span><span class="cl">ollama create mycodebase-coder -f ~/models/Modelfile-mycodebase
</span></span><span class="line"><span class="ln">19</span><span class="cl">
</span></span><span class="line"><span class="ln">20</span><span class="cl"><span class="c1"># 測試</span>
</span></span><span class="line"><span class="ln">21</span><span class="cl">ollama run mycodebase-coder <span class="s2">&#34;寫一個 user signup endpoint&#34;</span></span></span></code></pre></div><h2 id="step-9配-continuedev">Step 9：配 Continue.dev</h2>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-json" data-lang="json"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="c1">// ~/.continue/config.json 加：
</span></span></span><span class="line"><span class="ln"> 2</span><span class="cl"><span class="c1"></span><span class="p">{</span>
</span></span><span class="line"><span class="ln"> 3</span><span class="cl">  <span class="nt">&#34;models&#34;</span><span class="p">:</span> <span class="p">[</span>
</span></span><span class="line"><span class="ln"> 4</span><span class="cl">    <span class="p">{</span>
</span></span><span class="line"><span class="ln"> 5</span><span class="cl">      <span class="nt">&#34;title&#34;</span><span class="p">:</span> <span class="s2">&#34;My Codebase Coder&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 6</span><span class="cl">      <span class="nt">&#34;provider&#34;</span><span class="p">:</span> <span class="s2">&#34;ollama&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 7</span><span class="cl">      <span class="nt">&#34;model&#34;</span><span class="p">:</span> <span class="s2">&#34;mycodebase-coder&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 8</span><span class="cl">      <span class="nt">&#34;apiBase&#34;</span><span class="p">:</span> <span class="s2">&#34;http://localhost:11434&#34;</span>
</span></span><span class="line"><span class="ln"> 9</span><span class="cl">    <span class="p">},</span>
</span></span><span class="line"><span class="ln">10</span><span class="cl">    <span class="c1">// ... 既有 models
</span></span></span><span class="line"><span class="ln">11</span><span class="cl"><span class="c1"></span>  <span class="p">]</span>
</span></span><span class="line"><span class="ln">12</span><span class="cl"><span class="p">}</span></span></span></code></pre></div><p>VS Code restart 後、Continue panel 下拉就能切換。</p>
<h2 id="失敗模式跟回退">失敗模式跟回退</h2>
<h3 id="失敗-1訓練-loss-不降">失敗 1：訓練 loss 不降</h3>
<p>可能原因：</p>
<ul>
<li>資料品質差 → 人工 review 50 對、看 instruction-response 是否真有對應</li>
<li>資料 token 太短 → 多數 &lt; 100 token、模型學不到複雜 pattern</li>
<li>lr 太小 → 試 lr 5e-4</li>
</ul>
<p>回退：把資料品質提升、或放棄 fine-tune 用 RAG。</p>
<h3 id="失敗-2humaneval-大幅下降catastrophic-forgetting">失敗 2：HumanEval 大幅下降（catastrophic forgetting）</h3>
<p>緩解：</p>
<ul>
<li>加入 20% 通用 data mixing、重訓</li>
<li>降低 epochs（從 3 → 1）</li>
<li>降低 LoRA rank（從 16 → 8）</li>
</ul>
<h3 id="失敗-3in-domain-test-進步但日常用感覺沒變">失敗 3：In-domain test 進步、但日常用感覺沒變</h3>
<p>可能原因：</p>
<ul>
<li>Test set 跟真實工作流分佈不符</li>
<li>Prompt template 在訓練跟推論不一致</li>
</ul>
<p>緩解：實際在 Continue.dev 跑 1-2 週、看真實效果再判斷。</p>
<h3 id="失敗-4訓練爆-oom">失敗 4：訓練爆 OOM</h3>
<p>緩解：</p>
<ul>
<li>降 batch size（4 → 2 → 1）</li>
<li>加 gradient_accumulation_steps（保持 effective batch size）</li>
<li>用更小的 LoRA rank</li>
<li>換更小的 base model（7B → 3B）</li>
</ul>
<h2 id="何時不該繼續-fine-tune-路線">何時不該繼續 fine-tune 路線</h2>
<p>跑完一次 fine-tune 評估後、若：</p>
<ol>
<li><strong>In-domain 提升 &lt; 10%</strong>：相對成本（時間 + 維護）不划算、用 RAG</li>
<li><strong>Catastrophic forgetting &gt; 10%</strong>：跟其他能力 trade-off 不值得</li>
<li><strong>資料量不夠（&lt; 500 對）</strong>：RAG 比 fine-tune 更有效</li>
<li><strong>工作流變化快（codebase 慣例每月變）</strong>：fine-tune 過時得快、RAG 更靈活</li>
</ol>
<h2 id="跟其他模組的關係">跟其他模組的關係</h2>
<ul>
<li>原理層的 LoRA 設計見 <a href="/blog/llm/knowledge-cards/lora/" data-link-title="LoRA" data-link-desc="Low-Rank Adaptation：凍住原模型權重、只訓兩個小矩陣的 parameter-efficient fine-tuning">LoRA 卡片</a> 跟 <a href="/blog/llm/knowledge-cards/qlora/" data-link-title="QLoRA" data-link-desc="把 base model 量化到 4-bit &#43; LoRA fine-tune 的組合、消費級 GPU 也能 fine-tune 大模型">QLoRA 卡片</a></li>
<li>Catastrophic forgetting 跟整體 alignment 議題見 <a href="/blog/llm/03-theoretical-foundations/training-pipeline/" data-link-title="3.4 訓練流程：pre-train → SFT → RLHF" data-link-desc="LLM 的三階段訓練：預訓練、指令微調、人類反饋強化學習；各階段目標與最新替代方案">3.4 訓練流程</a></li>
<li>Fine-tune 後的模型評估見 <a href="/blog/llm/04-applications/benchmarking-and-evaluation/" data-link-title="4.14 Benchmarking 與評估方法論" data-link-desc="判讀 model card benchmark 數字、做自己工作流的 in-house benchmark、量測本地推論速度的完整方法論">4.14 Benchmarking</a></li>
<li>隱私 / 供應鏈面：fine-tune 後 model 怎麼分享（給 team / 上 HuggingFace）見 <a href="/blog/llm/06-security/model-supply-chain-trust/" data-link-title="6.0 模型供應鏈與信任邊界" data-link-desc="個人 dev 用本地 LLM 時的模型權重來源信任：GGUF 完整性、Hugging Face / Ollama registry 信任、量化版本污染、檔案完整性檢查">6.0 模型供應鏈</a></li>
<li>跟 RAG 的取捨見 <a href="/blog/llm/04-applications/rag-principles/" data-link-title="4.1 RAG 原理：retrieval &#43; augmentation 模式" data-link-desc="為什麼模型需要外掛知識、語意相似 vs 字面相似、chunking 的本質取捨、retrieval 失敗的根本原因">4.1 RAG 原理</a> 的「RAG vs Fine-tuning vs Long Context」段</li>
</ul>
]]></content:encoded></item></channel></rss>