<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/"><channel><title>Quickstart on Tarragon</title><link>https://tarrragon.github.io/blog/tags/quickstart/</link><description>Recent content in Quickstart on Tarragon</description><generator>Hugo -- gohugo.io</generator><language>zh-TW</language><copyright>Tarragon (CC BY 4.0)</copyright><lastBuildDate>Tue, 12 May 2026 00:00:00 +0000</lastBuildDate><atom:link href="https://tarrragon.github.io/blog/tags/quickstart/index.xml" rel="self" type="application/rss+xml"/><item><title>Hands-on Quickstart：clone repo 後跑通所有 demo</title><link>https://tarrragon.github.io/blog/llm/01-local-llm-services/hands-on/quickstart/</link><pubDate>Tue, 12 May 2026 00:00:00 +0000</pubDate><guid>https://tarrragon.github.io/blog/llm/01-local-llm-services/hands-on/quickstart/</guid><description>&lt;p>本篇是 hands-on 系列的&lt;strong>導讀&lt;/strong>——把分散在 &lt;code>ollama-setup&lt;/code> / &lt;code>rag-demo&lt;/code> / &lt;code>mcp-demo&lt;/code> / &lt;code>permission-boundary&lt;/code> 各章節的 setup 步驟整合成一條最短路徑、讓 clone repo 的人能在 15 分鐘內跑通所有 demo（&lt;a href="https://tarrragon.github.io/blog/llm/knowledge-cards/rag/" data-link-title="RAG" data-link-desc="Retrieval-Augmented Generation：動態外掛知識給 LLM、繞開模型參數記憶的靜態限制">RAG&lt;/a>、&lt;a href="https://tarrragon.github.io/blog/llm/knowledge-cards/mcp/" data-link-title="MCP（Model Context Protocol）" data-link-desc="LLM application ↔ 外部 tool server 之間的標準化協議、複用 OpenAI 相容 API 的成功模式">MCP&lt;/a>、權限邊界三個 demo、RAG 是「retrieval 找相關內容 + LLM 回答」、MCP 是「LLM application ↔ tool server 的標準協議」）。&lt;/p>
&lt;p>每篇 hands-on 文章 focus 在「為什麼這樣設計」、本篇 focus 在「按順序跑通」。讀完想懂原理再進對應章節讀。&lt;/p>
&lt;blockquote>
&lt;p>&lt;strong>驗證日期&lt;/strong>：2026-05-12
&lt;strong>環境&lt;/strong>：macOS 14+、Apple Silicon、Ollama 0.23.2、Python 3.11+
&lt;strong>總時間&lt;/strong>：~15 分鐘（含 model 下載）
&lt;strong>磁碟需求&lt;/strong>：Step 1 ~ 4 約 ~5 GB（Ollama 200 MB + nomic-embed-text 274 MB + gemma3:1b 815 MB + room for index）；Step 5 ComfyUI 可選加 ~10 GB（SDXL base 模型）。
&lt;strong>適用平台&lt;/strong>：本快速路徑只在 Apple Silicon Mac 驗證過；Intel Mac / Linux 上 Ollama 仍可裝、但 GPU 加速跟 model tag 行為可能不同、實際以官方 release notes 為準。&lt;/p>&lt;/blockquote>
&lt;h2 id="適合誰讀">適合誰讀&lt;/h2>
&lt;table>
 &lt;thead>
 &lt;tr>
 &lt;th>你是&lt;/th>
 &lt;th>本篇對你&lt;/th>
 &lt;/tr>
 &lt;/thead>
 &lt;tbody>
 &lt;tr>
 &lt;td>剛 clone 我的 blog repo、想跑 demo 試試看&lt;/td>
 &lt;td>&lt;strong>從本篇開始&lt;/strong>、按步驟做&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>想懂某個 demo 的設計取捨&lt;/td>
 &lt;td>跑通後再進 &lt;a href="https://tarrragon.github.io/blog/llm/01-local-llm-services/hands-on/rag-demo/" data-link-title="Hands-on：用 blog content 當 corpus 跑 RAG" data-link-desc="200 行 Python：embedding &amp;#43; cosine retrieval &amp;#43; Ollama chat、validating 4.0 RAG 原理">RAG demo&lt;/a> / &lt;a href="https://tarrragon.github.io/blog/llm/01-local-llm-services/hands-on/mcp-demo/" data-link-title="Hands-on：用 blog content 寫一個最小 MCP server" data-link-desc="stdio JSON-RPC、stdlib-only Python、暴露 blog content 給 LLM 用、validating 4.3 應用層協議">MCP demo&lt;/a> / &lt;a href="https://tarrragon.github.io/blog/llm/01-local-llm-services/hands-on/permission-boundary/" data-link-title="Hands-on：Ollama 改檔案 / 寫程式碼的權限邊界在哪" data-link-desc="四組對照實驗：Ollama 自己沒 FS / shell 權限、wrapper 才有；--dry-run / --confirm / --auto 三檔審查粒度的取捨">permission-boundary&lt;/a>&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>想懂 Ollama / ComfyUI 安裝細節&lt;/td>
 &lt;td>&lt;a href="https://tarrragon.github.io/blog/llm/01-local-llm-services/hands-on/ollama-setup/" data-link-title="Hands-on：安裝 Ollama &amp;#43; 拉第一個 Gemma 模型" data-link-desc="brew install ollama、launchd service、ollama pull、curl 驗證 OpenAI 相容 API">Ollama setup&lt;/a> / &lt;a href="https://tarrragon.github.io/blog/llm/01-local-llm-services/hands-on/comfyui-setup/" data-link-title="Hands-on：安裝 ComfyUI &amp;#43; SDXL base" data-link-desc="git clone、venv、pip install requirements、SDXL safetensors 放哪、--listen 啟動 server、瀏覽器 workflow 驗證">ComfyUI setup&lt;/a>&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>想看 production 怎麼想資源評估&lt;/td>
 &lt;td>&lt;a href="https://tarrragon.github.io/blog/llm/04-applications/production-resource-planning/" data-link-title="4.9 Production 部署的資源評估原理" data-link-desc="從本地單 user 到 production multi-tenant：concurrent users、cost model、observability、SLA、capacity planning 的設計取捨">4.9 Production resource planning&lt;/a>&lt;/td>
 &lt;/tr>
 &lt;/tbody>
&lt;/table>
&lt;h2 id="為什麼不是pre-builtclone-就能跑">為什麼不是「pre-built、clone 就能跑」&lt;/h2>
&lt;p>衍生產物（&lt;code>index.pkl&lt;/code>、&lt;code>__pycache__/&lt;/code>、Ollama model weights、即「跑出來的 cache / index / weight」、跟 source code 區別）刻意&lt;strong>不進 git&lt;/strong>、原因見 &lt;a href="https://tarrragon.github.io/blog/llm/04-applications/artifact-management/" data-link-title="4.10 衍生產物管理原理：什麼進 git、什麼不該" data-link-desc="LLM 應用的 source / derived / external 三類產物對應 git / build cache / registry、與 production 部署的 reproducibility / cost / share 取捨">4.10 衍生產物管理原理&lt;/a>。所以 clone repo 後需要：&lt;/p></description><content:encoded><![CDATA[<p>本篇是 hands-on 系列的<strong>導讀</strong>——把分散在 <code>ollama-setup</code> / <code>rag-demo</code> / <code>mcp-demo</code> / <code>permission-boundary</code> 各章節的 setup 步驟整合成一條最短路徑、讓 clone repo 的人能在 15 分鐘內跑通所有 demo（<a href="/blog/llm/knowledge-cards/rag/" data-link-title="RAG" data-link-desc="Retrieval-Augmented Generation：動態外掛知識給 LLM、繞開模型參數記憶的靜態限制">RAG</a>、<a href="/blog/llm/knowledge-cards/mcp/" data-link-title="MCP（Model Context Protocol）" data-link-desc="LLM application ↔ 外部 tool server 之間的標準化協議、複用 OpenAI 相容 API 的成功模式">MCP</a>、權限邊界三個 demo、RAG 是「retrieval 找相關內容 + LLM 回答」、MCP 是「LLM application ↔ tool server 的標準協議」）。</p>
<p>每篇 hands-on 文章 focus 在「為什麼這樣設計」、本篇 focus 在「按順序跑通」。讀完想懂原理再進對應章節讀。</p>
<blockquote>
<p><strong>驗證日期</strong>：2026-05-12
<strong>環境</strong>：macOS 14+、Apple Silicon、Ollama 0.23.2、Python 3.11+
<strong>總時間</strong>：~15 分鐘（含 model 下載）
<strong>磁碟需求</strong>：Step 1 ~ 4 約 ~5 GB（Ollama 200 MB + nomic-embed-text 274 MB + gemma3:1b 815 MB + room for index）；Step 5 ComfyUI 可選加 ~10 GB（SDXL base 模型）。
<strong>適用平台</strong>：本快速路徑只在 Apple Silicon Mac 驗證過；Intel Mac / Linux 上 Ollama 仍可裝、但 GPU 加速跟 model tag 行為可能不同、實際以官方 release notes 為準。</p></blockquote>
<h2 id="適合誰讀">適合誰讀</h2>
<table>
  <thead>
      <tr>
          <th>你是</th>
          <th>本篇對你</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>剛 clone 我的 blog repo、想跑 demo 試試看</td>
          <td><strong>從本篇開始</strong>、按步驟做</td>
      </tr>
      <tr>
          <td>想懂某個 demo 的設計取捨</td>
          <td>跑通後再進 <a href="/blog/llm/01-local-llm-services/hands-on/rag-demo/" data-link-title="Hands-on：用 blog content 當 corpus 跑 RAG" data-link-desc="200 行 Python：embedding &#43; cosine retrieval &#43; Ollama chat、validating 4.0 RAG 原理">RAG demo</a> / <a href="/blog/llm/01-local-llm-services/hands-on/mcp-demo/" data-link-title="Hands-on：用 blog content 寫一個最小 MCP server" data-link-desc="stdio JSON-RPC、stdlib-only Python、暴露 blog content 給 LLM 用、validating 4.3 應用層協議">MCP demo</a> / <a href="/blog/llm/01-local-llm-services/hands-on/permission-boundary/" data-link-title="Hands-on：Ollama 改檔案 / 寫程式碼的權限邊界在哪" data-link-desc="四組對照實驗：Ollama 自己沒 FS / shell 權限、wrapper 才有；--dry-run / --confirm / --auto 三檔審查粒度的取捨">permission-boundary</a></td>
      </tr>
      <tr>
          <td>想懂 Ollama / ComfyUI 安裝細節</td>
          <td><a href="/blog/llm/01-local-llm-services/hands-on/ollama-setup/" data-link-title="Hands-on：安裝 Ollama &#43; 拉第一個 Gemma 模型" data-link-desc="brew install ollama、launchd service、ollama pull、curl 驗證 OpenAI 相容 API">Ollama setup</a> / <a href="/blog/llm/01-local-llm-services/hands-on/comfyui-setup/" data-link-title="Hands-on：安裝 ComfyUI &#43; SDXL base" data-link-desc="git clone、venv、pip install requirements、SDXL safetensors 放哪、--listen 啟動 server、瀏覽器 workflow 驗證">ComfyUI setup</a></td>
      </tr>
      <tr>
          <td>想看 production 怎麼想資源評估</td>
          <td><a href="/blog/llm/04-applications/production-resource-planning/" data-link-title="4.9 Production 部署的資源評估原理" data-link-desc="從本地單 user 到 production multi-tenant：concurrent users、cost model、observability、SLA、capacity planning 的設計取捨">4.9 Production resource planning</a></td>
      </tr>
  </tbody>
</table>
<h2 id="為什麼不是pre-builtclone-就能跑">為什麼不是「pre-built、clone 就能跑」</h2>
<p>衍生產物（<code>index.pkl</code>、<code>__pycache__/</code>、Ollama model weights、即「跑出來的 cache / index / weight」、跟 source code 區別）刻意<strong>不進 git</strong>、原因見 <a href="/blog/llm/04-applications/artifact-management/" data-link-title="4.10 衍生產物管理原理：什麼進 git、什麼不該" data-link-desc="LLM 應用的 source / derived / external 三類產物對應 git / build cache / registry、與 production 部署的 reproducibility / cost / share 取捨">4.10 衍生產物管理原理</a>。所以 clone repo 後需要：</p>
<ol>
<li>裝 Ollama daemon + 拉 model（一次性）</li>
<li>跑 <code>ingest.py</code> 建 RAG index（corpus 變動時重跑）</li>
<li>之後 demo 就能用</li>
</ol>
<p>本篇是這個流程的 step-by-step。</p>
<h2 id="step-1裝-ollama-daemonbrew-install-ollama--brew-services-start">Step 1：裝 Ollama daemon（<code>brew install ollama</code> + <code>brew services start</code>）</h2>
<blockquote>
<p>daemon = 常駐 background process、開機自動啟動、見 <a href="/blog/llm/knowledge-cards/launchd-service/" data-link-title="launchd Service" data-link-desc="macOS 原生的服務管理機制、把 process 註冊成自動啟動的 daemon 或 agent">launchd service 卡</a>。</p></blockquote>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">brew install ollama
</span></span><span class="line"><span class="ln">2</span><span class="cl">brew services start ollama</span></span></code></pre></div><p>驗證：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">curl -s http://localhost:11434/api/version
</span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="c1"># {&#34;version&#34;:&#34;0.x.x&#34;}</span></span></span></code></pre></div><p>詳細安裝跟 troubleshooting 見 <a href="/blog/llm/01-local-llm-services/hands-on/ollama-setup/" data-link-title="Hands-on：安裝 Ollama &#43; 拉第一個 Gemma 模型" data-link-desc="brew install ollama、launchd service、ollama pull、curl 驗證 OpenAI 相容 API">Ollama setup 章節</a>。</p>
<h2 id="step-2拉-modelembed--chat-兩種角色">Step 2：拉 model（embed + chat 兩種角色）</h2>
<blockquote>
<p>為什麼要拉兩個 model：RAG 需要 embedding model 把文字壓成向量做語意比對、chat model 負責根據 retrieval 結果生成回答、兩者訓練目標不同、不能互通（見 <a href="/blog/llm/03-theoretical-foundations/embedding-spaces/" data-link-title="3.1 Embedding 空間" data-link-desc="token 怎麼變成向量、為什麼相似 token 在向量空間中靠近、embedding 是怎麼學出來的">3.1 embedding 空間</a>）。</p></blockquote>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl"><span class="c1"># Embedding model（RAG / MCP 都要、274 MB）</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl">ollama pull nomic-embed-text
</span></span><span class="line"><span class="ln">3</span><span class="cl">
</span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="c1"># Chat model（推薦從 1B 開始驗證、之後可換大）</span>
</span></span><span class="line"><span class="ln">5</span><span class="cl">ollama pull gemma3:1b</span></span></code></pre></div><p>驗證：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">ollama list
</span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="c1"># NAME                       SIZE      MODIFIED</span>
</span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="c1"># gemma3:1b                  815 MB    ...</span>
</span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="c1"># nomic-embed-text:latest    274 MB    ...</span></span></span></code></pre></div><p>選 chat model 大小的取捨見 <a href="/blog/llm/01-local-llm-services/model-selection-priority/" data-link-title="1.4 寫 code 場景的模型選型優先順序" data-link-desc="Gemma 4 31B MTP → Qwen3-Coder 30B → Qwen3 14B → gpt-oss 20B 的取捨與適用情境">1.4 模型選型優先順序</a>。本 quickstart 用 1B 主要驗證流程跑通；長段 daily use（需要 follow 多段格式指令、複雜推理）建議 4B / 8B 起跳（見 <a href="/blog/llm/01-local-llm-services/hands-on/instruction-following-test/" data-link-title="Hands-on：跨資料夾風格 follow 任務的模型對比" data-link-desc="1B / 4B / 8B / 跨代 4B 在「讀風格參考、follow 既有格式、寫新章節」任務上的 structural metrics 對比、揭示 model size 不是唯一因素">instruction-following-test</a>）、極短句驗證 / 簡單問答 1B 也可。本系列預設用 <a href="/blog/llm/knowledge-cards/instruction-tuned/" data-link-title="Instruction-Tuned Model" data-link-desc="經過指令微調的模型：會跟著 prompt 走、回答使用者問題">instruction-tuned model</a> 變體（tag 含 <code>:Xb</code> 不含 <code>-base</code>）、適合對話 / 寫 code。</p>
<h2 id="step-3建-rag-index跑-ingestpy">Step 3：建 RAG index（跑 <code>ingest.py</code>）</h2>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl"><span class="nb">cd</span> /path/to/blog
</span></span><span class="line"><span class="ln">2</span><span class="cl">python3 scripts/rag-demo/ingest.py</span></span></code></pre></div><p>預期輸出：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="ln">1</span><span class="cl">Found 71 markdown files under content/llm
</span></span><span class="line"><span class="ln">2</span><span class="cl">  [10/71] 86 chunks in 4.5s
</span></span><span class="line"><span class="ln">3</span><span class="cl">  ...
</span></span><span class="line"><span class="ln">4</span><span class="cl">Wrote 463 records to scripts/rag-demo/index.pkl (22.3s)</span></span></code></pre></div><p>實際數字看你的 blog content 量。Index file 在 <code>scripts/rag-demo/index.pkl</code>、3-50 MB 不等。</p>
<p>詳細的 chunking 策略、embedding 設計、為什麼 pickle、見 <a href="/blog/llm/01-local-llm-services/hands-on/rag-demo/" data-link-title="Hands-on：用 blog content 當 corpus 跑 RAG" data-link-desc="200 行 Python：embedding &#43; cosine retrieval &#43; Ollama chat、validating 4.0 RAG 原理">RAG demo 章節</a>。</p>
<h2 id="step-4跑-rag--mcp--permission-demo">Step 4：跑 RAG / MCP / permission demo</h2>
<p>完成 step 1-3 後、四個 demo 都能跑了：</p>
<h3 id="rag-demo語意搜尋--llm-回答">RAG demo（語意搜尋 + LLM 回答）</h3>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">python3 scripts/rag-demo/query.py --show-retrieved <span class="s2">&#34;你的問題&#34;</span></span></span></code></pre></div><p>例：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">python3 scripts/rag-demo/query.py --show-retrieved <span class="s2">&#34;什麼是 MCP？&#34;</span></span></span></code></pre></div><p>預期看到 retrieved chunks（含相似度跟來源 path）+ LLM 用這些 context 生的答案。</p>
<h3 id="mcp-demostdio-json-rpc-server">MCP demo（stdio JSON-RPC server）</h3>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">python3 scripts/mcp-demo/test_client.py</span></span></code></pre></div><p>預期看到 5 個階段的 JSON-RPC 對話：initialize / tools/list / tools/call (search_blog) / tools/call (read_chunk) / error。</p>
<h3 id="permission-boundary-demollm-mediated-file-edit">Permission boundary demo（LLM-mediated file edit）</h3>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="c1"># 備份要試的檔案</span>
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">cp content/llm/knowledge-cards/token.md /tmp/token-orig.md
</span></span><span class="line"><span class="ln"> 3</span><span class="cl">
</span></span><span class="line"><span class="ln"> 4</span><span class="cl"><span class="c1"># Dry-run（預設、不寫檔、印 diff）</span>
</span></span><span class="line"><span class="ln"> 5</span><span class="cl">python3 scripts/permission-demo/edit_with_llm.py <span class="se">\
</span></span></span><span class="line"><span class="ln"> 6</span><span class="cl"><span class="se"></span>  content/llm/knowledge-cards/token.md <span class="se">\
</span></span></span><span class="line"><span class="ln"> 7</span><span class="cl"><span class="se"></span>  <span class="s2">&#34;加一句說明&#34;</span>
</span></span><span class="line"><span class="ln"> 8</span><span class="cl">
</span></span><span class="line"><span class="ln"> 9</span><span class="cl"><span class="c1"># 還原（如果剛剛沒用 dry-run）</span>
</span></span><span class="line"><span class="ln">10</span><span class="cl">cp /tmp/token-orig.md content/llm/knowledge-cards/token.md</span></span></code></pre></div><p>詳細的 <code>--dry-run</code> / <code>--confirm</code> / <code>--auto</code> 三種 mode 取捨見 <a href="/blog/llm/01-local-llm-services/hands-on/permission-boundary/" data-link-title="Hands-on：Ollama 改檔案 / 寫程式碼的權限邊界在哪" data-link-desc="四組對照實驗：Ollama 自己沒 FS / shell 權限、wrapper 才有；--dry-run / --confirm / --auto 三檔審查粒度的取捨">Permission boundary 章節</a>。</p>
<h2 id="step-5可選comfyui-text-to-image-demo">Step 5（可選）：ComfyUI text-to-image demo</h2>
<p>需要額外裝 ComfyUI + 拉 SDXL model（~10 GB 磁碟）、流程獨立：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl"><span class="c1"># 跟 step 1 平行的軌道、見 ComfyUI setup 章節</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="nb">cd</span> ~/Projects
</span></span><span class="line"><span class="ln">3</span><span class="cl">git clone --depth <span class="m">1</span> https://github.com/comfyanonymous/ComfyUI.git
</span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="nb">cd</span> ComfyUI
</span></span><span class="line"><span class="ln">5</span><span class="cl">python3 -m venv venv
</span></span><span class="line"><span class="ln">6</span><span class="cl"><span class="nb">source</span> venv/bin/activate
</span></span><span class="line"><span class="ln">7</span><span class="cl">pip install -r requirements.txt
</span></span><span class="line"><span class="ln">8</span><span class="cl"><span class="c1"># 下載 SDXL base：~/Projects/ComfyUI/models/checkpoints/</span>
</span></span><span class="line"><span class="ln">9</span><span class="cl"><span class="c1"># 見 ComfyUI setup 章節指令</span></span></span></code></pre></div><p>啟動 + 跑 generation：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl"><span class="nb">cd</span> ~/Projects/ComfyUI <span class="o">&amp;&amp;</span> <span class="nb">source</span> venv/bin/activate <span class="o">&amp;&amp;</span> nohup python main.py &gt; /tmp/comfyui.log 2&gt;<span class="p">&amp;</span><span class="m">1</span> <span class="p">&amp;</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="c1"># 等 server ready</span>
</span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="k">until</span> curl -s -o /dev/null -w <span class="s2">&#34;%{http_code}&#34;</span> http://127.0.0.1:8188/ <span class="p">|</span> grep -q 200<span class="p">;</span> <span class="k">do</span> sleep 2<span class="p">;</span> <span class="k">done</span>
</span></span><span class="line"><span class="ln">4</span><span class="cl">
</span></span><span class="line"><span class="ln">5</span><span class="cl"><span class="c1"># 跑 generation（用 repo 內的 script）</span>
</span></span><span class="line"><span class="ln">6</span><span class="cl"><span class="nb">cd</span> /path/to/blog
</span></span><span class="line"><span class="ln">7</span><span class="cl">python3 scripts/comfyui-test/generate.py --steps <span class="m">15</span></span></span></code></pre></div><p>詳細裝法 + workflow JSON 解讀見 <a href="/blog/llm/01-local-llm-services/hands-on/comfyui-setup/" data-link-title="Hands-on：安裝 ComfyUI &#43; SDXL base" data-link-desc="git clone、venv、pip install requirements、SDXL safetensors 放哪、--listen 啟動 server、瀏覽器 workflow 驗證">ComfyUI setup 章節</a>。</p>
<h2 id="cleanup完事釋放資源">Cleanup（完事釋放資源）</h2>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl"><span class="c1"># 停 Ollama daemon</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl">brew services stop ollama
</span></span><span class="line"><span class="ln">3</span><span class="cl">
</span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="c1"># kill ComfyUI（如果有跑）</span>
</span></span><span class="line"><span class="ln">5</span><span class="cl">pkill -9 -f <span class="s2">&#34;ComfyUI/main.py&#34;</span>
</span></span><span class="line"><span class="ln">6</span><span class="cl">
</span></span><span class="line"><span class="ln">7</span><span class="cl"><span class="c1"># 清 build artifact（可選、可重建）</span>
</span></span><span class="line"><span class="ln">8</span><span class="cl">rm -f scripts/rag-demo/index.pkl
</span></span><span class="line"><span class="ln">9</span><span class="cl">find scripts -name __pycache__ -type d -exec rm -rf <span class="o">{}</span> +</span></span></code></pre></div><p>詳細的 resource lifecycle 跟 cleanup idiom 見 <a href="/blog/llm/01-local-llm-services/hands-on/resource-management/" data-link-title="Hands-on：LLM 運行中 &#43; 結束的資源管理" data-link-desc="RAM / 磁碟 / port 三個 dimension 的觀察跟釋放、Ollama keep_alive 跟 ComfyUI 兩種 lifecycle 對比、實測釋放數字">Resource management 章節</a>。</p>
<h2 id="跑通後該往哪讀">跑通後該往哪讀</h2>
<table>
  <thead>
      <tr>
          <th>想懂什麼</th>
          <th>讀哪</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>「RAG 為什麼 retrieval 對 / generation 弱」</td>
          <td><a href="/blog/llm/01-local-llm-services/hands-on/rag-demo/" data-link-title="Hands-on：用 blog content 當 corpus 跑 RAG" data-link-desc="200 行 Python：embedding &#43; cosine retrieval &#43; Ollama chat、validating 4.0 RAG 原理">RAG demo</a></td>
      </tr>
      <tr>
          <td>「MCP wire protocol 細節」</td>
          <td><a href="/blog/llm/01-local-llm-services/hands-on/mcp-demo/" data-link-title="Hands-on：用 blog content 寫一個最小 MCP server" data-link-desc="stdio JSON-RPC、stdlib-only Python、暴露 blog content 給 LLM 用、validating 4.3 應用層協議">MCP demo</a></td>
      </tr>
      <tr>
          <td>「為什麼 LLM 寫 <code>rm -rf</code> 不會真的執行」</td>
          <td><a href="/blog/llm/01-local-llm-services/hands-on/permission-boundary/" data-link-title="Hands-on：Ollama 改檔案 / 寫程式碼的權限邊界在哪" data-link-desc="四組對照實驗：Ollama 自己沒 FS / shell 權限、wrapper 才有；--dry-run / --confirm / --auto 三檔審查粒度的取捨">Permission boundary</a></td>
      </tr>
      <tr>
          <td>「不同 model 在 instruction following 上的差距」</td>
          <td><a href="/blog/llm/01-local-llm-services/hands-on/instruction-following-test/" data-link-title="Hands-on：跨資料夾風格 follow 任務的模型對比" data-link-desc="1B / 4B / 8B / 跨代 4B 在「讀風格參考、follow 既有格式、寫新章節」任務上的 structural metrics 對比、揭示 model size 不是唯一因素">Instruction following test</a></td>
      </tr>
      <tr>
          <td>「跑 demo 占多少 RAM、怎麼釋放」</td>
          <td><a href="/blog/llm/01-local-llm-services/hands-on/resource-management/" data-link-title="Hands-on：LLM 運行中 &#43; 結束的資源管理" data-link-desc="RAM / 磁碟 / port 三個 dimension 的觀察跟釋放、Ollama keep_alive 跟 ComfyUI 兩種 lifecycle 對比、實測釋放數字">Resource management</a> + <a href="/blog/llm/01-local-llm-services/hands-on/rag-mcp-resources/" data-link-title="Hands-on：RAG / MCP 的資源 footprint" data-link-desc="RAG ingest / query / MCP server 三階段的 RAM / 磁碟 / process 實測、多模型並存的 RAM 衝突、本地 LLM 跑 RAG 跟單純 chat 的差異">RAG/MCP 資源 footprint</a></td>
      </tr>
      <tr>
          <td>「production 部署該怎麼想」</td>
          <td><a href="/blog/llm/04-applications/production-resource-planning/" data-link-title="4.9 Production 部署的資源評估原理" data-link-desc="從本地單 user 到 production multi-tenant：concurrent users、cost model、observability、SLA、capacity planning 的設計取捨">4.9 Production resource planning</a></td>
      </tr>
      <tr>
          <td>「什麼該進 git、什麼不該」</td>
          <td><a href="/blog/llm/04-applications/artifact-management/" data-link-title="4.10 衍生產物管理原理：什麼進 git、什麼不該" data-link-desc="LLM 應用的 source / derived / external 三類產物對應 git / build cache / registry、與 production 部署的 reproducibility / cost / share 取捨">4.10 衍生產物管理原理</a></td>
      </tr>
  </tbody>
</table>
<h2 id="跑不過時">跑不過時</h2>
<table>
  <thead>
      <tr>
          <th>症狀</th>
          <th>對應章節</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td><code>ollama: command not found</code></td>
          <td><a href="/blog/llm/01-local-llm-services/hands-on/ollama-setup/" data-link-title="Hands-on：安裝 Ollama &#43; 拉第一個 Gemma 模型" data-link-desc="brew install ollama、launchd service、ollama pull、curl 驗證 OpenAI 相容 API">Ollama setup § 常見前置設定問題</a></td>
      </tr>
      <tr>
          <td><code>curl http://localhost:11434/api/version</code> 沒回應</td>
          <td>同上</td>
      </tr>
      <tr>
          <td><code>python3 ingest.py</code> 報 HTTP error</td>
          <td>確認 Ollama daemon 跑著、nomic-embed-text 已 pull</td>
      </tr>
      <tr>
          <td>RAG retrieval 結果都不相關</td>
          <td><a href="/blog/llm/04-applications/rag-principles/" data-link-title="4.1 RAG 原理：retrieval &#43; augmentation 模式" data-link-desc="為什麼模型需要外掛知識、語意相似 vs 字面相似、chunking 的本質取捨、retrieval 失敗的根本原因">4.1 RAG § Retrieval 失敗的根本原因</a></td>
      </tr>
      <tr>
          <td>MCP test_client 卡住</td>
          <td><a href="/blog/llm/01-local-llm-services/hands-on/mcp-demo/" data-link-title="Hands-on：用 blog content 寫一個最小 MCP server" data-link-desc="stdio JSON-RPC、stdlib-only Python、暴露 blog content 給 LLM 用、validating 4.3 應用層協議">MCP demo § subprocess 跟 bufsize</a></td>
      </tr>
      <tr>
          <td>一切都不對</td>
          <td><a href="/blog/llm/01-local-llm-services/troubleshooting/" data-link-title="1.7 排錯方法論：用三層架構做故障定位" data-link-desc="故障定位的分層思考、症狀到層級的對應反射、log 在三層的角色差異、最小可重現的縮減策略">1.7 排錯方法論</a></td>
      </tr>
  </tbody>
</table>
<h2 id="何時這篇會過時">何時這篇會過時</h2>
<p><strong>會變的部分</strong>：</p>
<ul>
<li><code>brew install ollama</code> 流程（macOS 跟 brew 演化）</li>
<li><code>ollama pull</code> 的具體 model tag（model 會新陳代謝）</li>
<li>Python 版本相容性（3.11 → 3.14 各有 quirk）</li>
</ul>
<p><strong>不會過時的部分</strong>：</p>
<ul>
<li>4 步驟的順序（裝 daemon → 拉 model → 建 index → 跑 demo）是 RAG / MCP / 任何 LLM 應用的通用 setup pattern</li>
<li>衍生產物（index、cache）不進 git 的設計取捨</li>
<li>Cleanup 步驟跟釋放邏輯</li>
</ul>
<p>跑指令時報錯先看 step 對應章節的 troubleshooting section、再 Google 或開 issue。</p>
]]></content:encoded></item></channel></rss>