<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/"><channel><title>Comfyui on Tarragon</title><link>https://tarrragon.github.io/blog/tags/comfyui/</link><description>Recent content in Comfyui on Tarragon</description><generator>Hugo -- gohugo.io</generator><language>zh-TW</language><copyright>Tarragon (CC BY 4.0)</copyright><lastBuildDate>Tue, 12 May 2026 00:00:00 +0000</lastBuildDate><atom:link href="https://tarrragon.github.io/blog/tags/comfyui/index.xml" rel="self" type="application/rss+xml"/><item><title>Hands-on：安裝 ComfyUI + SDXL base</title><link>https://tarrragon.github.io/blog/llm/01-local-llm-services/hands-on/comfyui-setup/</link><pubDate>Tue, 12 May 2026 00:00:00 +0000</pubDate><guid>https://tarrragon.github.io/blog/llm/01-local-llm-services/hands-on/comfyui-setup/</guid><description>&lt;p>本篇紀錄裝 ComfyUI 跟 Stable Diffusion XL base 模型、在 Apple Silicon Mac 上跑通最小 text-to-image 流程。ComfyUI 是 2026 年 Apple Silicon 跑 &lt;a href="https://tarrragon.github.io/blog/llm/knowledge-cards/diffusion/" data-link-title="Diffusion" data-link-desc="產圖用的生成式 AI 架構：跟寫 code 用的 Transformer 是不同路線">Diffusion&lt;/a> 最主流的選擇——節點式工作流（拖拉節點連線、像 visual programming、每個節點負責一段運算）、跨平台、Python 環境、容易客製化。Draw Things（Mac 原生 GUI）更簡單、但 ComfyUI 接 workflow 跟 custom node 的能力強很多。&lt;/p>
&lt;blockquote>
&lt;p>&lt;strong>驗證日期&lt;/strong>：2026-05-12
&lt;strong>ComfyUI&lt;/strong>：main branch、shallow clone
&lt;strong>示範模型&lt;/strong>：Stable Diffusion XL base 1.0（6.5 GB、&lt;code>stabilityai/stable-diffusion-xl-base-1.0&lt;/code>）
&lt;strong>Python&lt;/strong>：3.14（venv 隔離、不污染系統）&lt;/p>&lt;/blockquote>
&lt;h2 id="前置設定">前置設定&lt;/h2>
&lt;table>
 &lt;thead>
 &lt;tr>
 &lt;th>項目&lt;/th>
 &lt;th>檢查指令&lt;/th>
 &lt;th>預期&lt;/th>
 &lt;/tr>
 &lt;/thead>
 &lt;tbody>
 &lt;tr>
 &lt;td>Git&lt;/td>
 &lt;td>&lt;code>which git&lt;/code>&lt;/td>
 &lt;td>&lt;code>/usr/bin/git&lt;/code> 或 brew 版&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>Python 3.10+&lt;/td>
 &lt;td>&lt;code>python3 --version&lt;/code>&lt;/td>
 &lt;td>3.10 ~ 3.14 都可、本 demo 用 3.14&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>磁碟空間&lt;/td>
 &lt;td>&lt;code>df -h ~&lt;/code>&lt;/td>
 &lt;td>至少 15 GB（runtime 3 GB + SDXL 6.5 GB + cache）&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>&lt;a href="https://tarrragon.github.io/blog/llm/knowledge-cards/unified-memory/" data-link-title="Unified Memory Architecture" data-link-desc="Apple Silicon 讓 CPU / GPU / NE 共用同一塊記憶體：跑大模型的優勢來源">統一記憶體&lt;/a>&lt;/td>
 &lt;td>&lt;code>system_profiler SPHardwareDataType | grep Memory&lt;/code>&lt;/td>
 &lt;td>至少 16 GB、推薦 32 GB+&lt;/td>
 &lt;/tr>
 &lt;/tbody>
&lt;/table>
&lt;p>ComfyUI 在 Apple Silicon 跑 Diffusion 用 &lt;a href="https://tarrragon.github.io/blog/llm/knowledge-cards/gpu-compute-backend/" data-link-title="GPU Compute Backend" data-link-desc="GPU 加速計算的底層 API 介面（CUDA / ROCm / Vulkan / Metal / SYCL）、決定推論軟體能否用 GPU 跑得快">MPS（Metal Performance Shaders）backend&lt;/a>、不需要 NVIDIA CUDA。但跑 SDXL 至少要 12 GB 統一記憶體留給 model + activation、16 GB Mac 跟其他 app 一起會吃緊。&lt;/p>
&lt;h2 id="clone-comfyui">Clone ComfyUI&lt;/h2>
&lt;p>放在 &lt;code>~/Projects/&lt;/code> 下、跟其他 dev project 同層：&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-bash" data-lang="bash">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">&lt;span class="nb">cd&lt;/span> ~/Projects
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">2&lt;/span>&lt;span class="cl">git clone --depth &lt;span class="m">1&lt;/span> https://github.com/comfyanonymous/ComfyUI.git
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">3&lt;/span>&lt;span class="cl">&lt;span class="nb">cd&lt;/span> ComfyUI&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>&lt;code>--depth 1&lt;/code> 只拉最新 commit、不拉全部歷史、省幾百 MB。要追歷史 / submit PR 才需要 full clone。&lt;/p>
&lt;p>ComfyUI 目錄結構（核心部分）：&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-text" data-lang="text">&lt;span class="line">&lt;span class="ln"> 1&lt;/span>&lt;span class="cl">ComfyUI/
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 2&lt;/span>&lt;span class="cl">├── main.py # 啟動 entry point
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 3&lt;/span>&lt;span class="cl">├── server.py # HTTP server
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 4&lt;/span>&lt;span class="cl">├── nodes.py # 內建節點實作
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 5&lt;/span>&lt;span class="cl">├── custom_nodes/ # 第三方 / 客製節點放這
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 6&lt;/span>&lt;span class="cl">├── models/
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 7&lt;/span>&lt;span class="cl">│ ├── checkpoints/ # SD / SDXL 主 model 檔放這
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 8&lt;/span>&lt;span class="cl">│ ├── loras/ # LoRA 微調權重
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln"> 9&lt;/span>&lt;span class="cl">│ ├── vae/ # VAE 模型
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">10&lt;/span>&lt;span class="cl">│ ├── controlnet/ # ControlNet 模型
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">11&lt;/span>&lt;span class="cl">│ └── ...
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">12&lt;/span>&lt;span class="cl">├── output/ # 生成的圖
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">13&lt;/span>&lt;span class="cl">├── input/ # 拖進 ComfyUI 的圖片
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">14&lt;/span>&lt;span class="cl">└── requirements.txt&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h2 id="建-venv--裝-dependencies">建 venv + 裝 dependencies&lt;/h2>
&lt;p>ComfyUI requirements 含 PyTorch、numpy、PIL、safetensors、einops 等、套件多、版本敏感。用 venv 隔離：&lt;/p></description><content:encoded><![CDATA[<p>本篇紀錄裝 ComfyUI 跟 Stable Diffusion XL base 模型、在 Apple Silicon Mac 上跑通最小 text-to-image 流程。ComfyUI 是 2026 年 Apple Silicon 跑 <a href="/blog/llm/knowledge-cards/diffusion/" data-link-title="Diffusion" data-link-desc="產圖用的生成式 AI 架構：跟寫 code 用的 Transformer 是不同路線">Diffusion</a> 最主流的選擇——節點式工作流（拖拉節點連線、像 visual programming、每個節點負責一段運算）、跨平台、Python 環境、容易客製化。Draw Things（Mac 原生 GUI）更簡單、但 ComfyUI 接 workflow 跟 custom node 的能力強很多。</p>
<blockquote>
<p><strong>驗證日期</strong>：2026-05-12
<strong>ComfyUI</strong>：main branch、shallow clone
<strong>示範模型</strong>：Stable Diffusion XL base 1.0（6.5 GB、<code>stabilityai/stable-diffusion-xl-base-1.0</code>）
<strong>Python</strong>：3.14（venv 隔離、不污染系統）</p></blockquote>
<h2 id="前置設定">前置設定</h2>
<table>
  <thead>
      <tr>
          <th>項目</th>
          <th>檢查指令</th>
          <th>預期</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>Git</td>
          <td><code>which git</code></td>
          <td><code>/usr/bin/git</code> 或 brew 版</td>
      </tr>
      <tr>
          <td>Python 3.10+</td>
          <td><code>python3 --version</code></td>
          <td>3.10 ~ 3.14 都可、本 demo 用 3.14</td>
      </tr>
      <tr>
          <td>磁碟空間</td>
          <td><code>df -h ~</code></td>
          <td>至少 15 GB（runtime 3 GB + SDXL 6.5 GB + cache）</td>
      </tr>
      <tr>
          <td><a href="/blog/llm/knowledge-cards/unified-memory/" data-link-title="Unified Memory Architecture" data-link-desc="Apple Silicon 讓 CPU / GPU / NE 共用同一塊記憶體：跑大模型的優勢來源">統一記憶體</a></td>
          <td><code>system_profiler SPHardwareDataType | grep Memory</code></td>
          <td>至少 16 GB、推薦 32 GB+</td>
      </tr>
  </tbody>
</table>
<p>ComfyUI 在 Apple Silicon 跑 Diffusion 用 <a href="/blog/llm/knowledge-cards/gpu-compute-backend/" data-link-title="GPU Compute Backend" data-link-desc="GPU 加速計算的底層 API 介面（CUDA / ROCm / Vulkan / Metal / SYCL）、決定推論軟體能否用 GPU 跑得快">MPS（Metal Performance Shaders）backend</a>、不需要 NVIDIA CUDA。但跑 SDXL 至少要 12 GB 統一記憶體留給 model + activation、16 GB Mac 跟其他 app 一起會吃緊。</p>
<h2 id="clone-comfyui">Clone ComfyUI</h2>
<p>放在 <code>~/Projects/</code> 下、跟其他 dev project 同層：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl"><span class="nb">cd</span> ~/Projects
</span></span><span class="line"><span class="ln">2</span><span class="cl">git clone --depth <span class="m">1</span> https://github.com/comfyanonymous/ComfyUI.git
</span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="nb">cd</span> ComfyUI</span></span></code></pre></div><p><code>--depth 1</code> 只拉最新 commit、不拉全部歷史、省幾百 MB。要追歷史 / submit PR 才需要 full clone。</p>
<p>ComfyUI 目錄結構（核心部分）：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="ln"> 1</span><span class="cl">ComfyUI/
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">├── main.py              # 啟動 entry point
</span></span><span class="line"><span class="ln"> 3</span><span class="cl">├── server.py            # HTTP server
</span></span><span class="line"><span class="ln"> 4</span><span class="cl">├── nodes.py             # 內建節點實作
</span></span><span class="line"><span class="ln"> 5</span><span class="cl">├── custom_nodes/        # 第三方 / 客製節點放這
</span></span><span class="line"><span class="ln"> 6</span><span class="cl">├── models/
</span></span><span class="line"><span class="ln"> 7</span><span class="cl">│   ├── checkpoints/     # SD / SDXL 主 model 檔放這
</span></span><span class="line"><span class="ln"> 8</span><span class="cl">│   ├── loras/           # LoRA 微調權重
</span></span><span class="line"><span class="ln"> 9</span><span class="cl">│   ├── vae/             # VAE 模型
</span></span><span class="line"><span class="ln">10</span><span class="cl">│   ├── controlnet/      # ControlNet 模型
</span></span><span class="line"><span class="ln">11</span><span class="cl">│   └── ...
</span></span><span class="line"><span class="ln">12</span><span class="cl">├── output/              # 生成的圖
</span></span><span class="line"><span class="ln">13</span><span class="cl">├── input/               # 拖進 ComfyUI 的圖片
</span></span><span class="line"><span class="ln">14</span><span class="cl">└── requirements.txt</span></span></code></pre></div><h2 id="建-venv--裝-dependencies">建 venv + 裝 dependencies</h2>
<p>ComfyUI requirements 含 PyTorch、numpy、PIL、safetensors、einops 等、套件多、版本敏感。用 venv 隔離：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl"><span class="nb">cd</span> ~/Projects/ComfyUI
</span></span><span class="line"><span class="ln">2</span><span class="cl">python3 -m venv venv
</span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="nb">source</span> venv/bin/activate
</span></span><span class="line"><span class="ln">4</span><span class="cl">python --version  <span class="c1"># 確認在 venv 內</span>
</span></span><span class="line"><span class="ln">5</span><span class="cl">pip install --upgrade pip</span></span></code></pre></div><p>裝 dependencies：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">pip install -r requirements.txt</span></span></code></pre></div><p>實測時間：10-15 分鐘（torch + 各種 dep）、首次跑會編譯部分 C extension。完成後預期看到：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="ln">1</span><span class="cl">Successfully installed Mako-... MarkupSafe-... Pillow-... PyOpenGL-... ...
</span></span><span class="line"><span class="ln">2</span><span class="cl">  torch-... torchvision-... torchaudio-... ...
</span></span><span class="line"><span class="ln">3</span><span class="cl">  safetensors-... transformers-... ...</span></span></code></pre></div><p>驗證 PyTorch + MPS：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">python -c <span class="s2">&#34;import torch; print(&#39;torch:&#39;, torch.__version__, &#39;mps:&#39;, torch.backends.mps.is_available())&#34;</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="c1"># torch: 2.x.x mps: True</span></span></span></code></pre></div><p><code>mps: True</code> 表示 Apple Silicon GPU 加速可用。</p>
<h2 id="下載-sdxl-base-模型">下載 SDXL base 模型</h2>
<p>SDXL base 約 6.5 GB、是 Stable Diffusion XL 的基礎 model。從 Hugging Face 拉到 ComfyUI 的 <code>models/checkpoints/</code>：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">mkdir -p ~/Projects/ComfyUI/models/checkpoints
</span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="nb">cd</span> ~/Projects/ComfyUI/models/checkpoints
</span></span><span class="line"><span class="ln">3</span><span class="cl">
</span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="c1"># -L 跟 redirect、--continue-at - 支援中斷後重續、避免 6.5 GB 重下</span>
</span></span><span class="line"><span class="ln">5</span><span class="cl">curl -L --continue-at - -o sd_xl_base_1.0.safetensors <span class="se">\
</span></span></span><span class="line"><span class="ln">6</span><span class="cl"><span class="se"></span>  <span class="s2">&#34;https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/resolve/main/sd_xl_base_1.0.safetensors?download=true&#34;</span></span></span></code></pre></div><p>下載時間視網速、10-30 分鐘 broadband 都正常。網路中斷時重跑同一個指令、<code>--continue-at -</code> 會從中斷處續傳、不用重下 6.5 GB。完成後：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">ls -lh sd_xl_base_1.0.safetensors
</span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="c1"># 6.5 GB</span></span></span></code></pre></div><p>可選的進階模型：</p>
<table>
  <thead>
      <tr>
          <th>Model</th>
          <th>大小</th>
          <th>用途</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>SDXL base 1.0</td>
          <td>6.5 GB</td>
          <td>基礎、本 demo 用</td>
      </tr>
      <tr>
          <td>SDXL refiner 1.0</td>
          <td>6.1 GB</td>
          <td>跟 base 配對、提升細節</td>
      </tr>
      <tr>
          <td>SD 1.5</td>
          <td>4.0 GB</td>
          <td>較小、生態最成熟（很多 LoRA）</td>
      </tr>
      <tr>
          <td>Flux.1 schnell</td>
          <td>12 GB</td>
          <td>2024+ 最強開源 SD 級</td>
      </tr>
      <tr>
          <td>Flux.1 dev</td>
          <td>24 GB</td>
          <td>Flux 完整版、品質最佳</td>
      </tr>
  </tbody>
</table>
<p>SDXL 6.5 GB 是「能驗證 + 不過大」的甜蜜點。再小可以選 SD 1.5（4 GB）、跑 Flux 要 24 GB 磁碟 + 16 GB+ 統一記憶體。</p>
<h2 id="啟動-comfyui-server">啟動 ComfyUI Server</h2>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl"><span class="nb">cd</span> ~/Projects/ComfyUI
</span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="nb">source</span> venv/bin/activate
</span></span><span class="line"><span class="ln">3</span><span class="cl">python main.py</span></span></code></pre></div><p>預期輸出：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="ln">1</span><span class="cl">[Prompt Server] Starting ComfyUI...
</span></span><span class="line"><span class="ln">2</span><span class="cl">Total VRAM 32768 MB, total RAM 32768 MB
</span></span><span class="line"><span class="ln">3</span><span class="cl">pytorch version: 2.x.x
</span></span><span class="line"><span class="ln">4</span><span class="cl">Set vram state to: SHARED
</span></span><span class="line"><span class="ln">5</span><span class="cl">Device: mps
</span></span><span class="line"><span class="ln">6</span><span class="cl">Using sub quadratic attention for cross-attention
</span></span><span class="line"><span class="ln">7</span><span class="cl">...
</span></span><span class="line"><span class="ln">8</span><span class="cl">Starting server
</span></span><span class="line"><span class="ln">9</span><span class="cl">To see the GUI go to: http://127.0.0.1:8188</span></span></code></pre></div><p>Apple Silicon 統一記憶體被 PyTorch 報成 VRAM 是預期、不是 bug：mps backend 把整個統一記憶體當成「GPU 可見記憶體」、所以 32GB Mac 顯示 <code>Total VRAM 32768 MB</code>。實際使用上 ComfyUI、其他 app 跟系統共用同一塊。</p>
<p>關鍵驗證：</p>
<ul>
<li><code>Device: mps</code> → Apple Silicon GPU 啟用</li>
<li><code>Starting server</code> + <code>http://127.0.0.1:8188</code> → server 跑了</li>
</ul>
<p>開瀏覽器到 <code>http://127.0.0.1:8188</code>、看到節點式 UI 就成功。第一次開啟會載入預設 workflow（一個簡單 text-to-image）。</p>
<p>要對外暴露（讓 LAN 內其他機器連）：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">python main.py --listen 0.0.0.0 --port <span class="m">8188</span></span></span></code></pre></div><p>跟 <a href="/blog/llm/00-foundations/privacy-data-flow/" data-link-title="0.7 隱私 / 資安的資料流原理" data-link-desc="從「位置」到「資料流」的思考升級：信任邊界、合約模型、零信任原則套用到 LLM 工作流">0.7 隱私資料流</a> 提的一樣、<code>0.0.0.0</code> 等於暴露給整個區網、家用 OK 公共網路要小心。</p>
<h2 id="跑第一張圖">跑第一張圖</h2>
<p>ComfyUI 預設 workflow 是 text-to-image：</p>
<ol>
<li><strong>CheckpointLoader 節點</strong>：選 <code>sd_xl_base_1.0.safetensors</code>。</li>
<li><strong>CLIPTextEncode（Prompt）節點</strong>：輸入 prompt、例如 <code>a photograph of a cat sitting on a wooden chair, natural lighting</code>。</li>
<li><strong>CLIPTextEncode（Negative）節點</strong>：輸入 negative prompt、例如 <code>blurry, low quality, artifacts</code>。</li>
<li><strong>EmptyLatentImage 節點</strong>：設定 1024×1024（SDXL 最佳尺寸）。</li>
<li><strong>KSampler 節點</strong>：steps=20、cfg=7、sampler=<code>euler</code> 或 <code>dpmpp_2m</code>。</li>
<li><strong>VAEDecode 節點</strong>：把 latent 轉成 RGB image。</li>
<li><strong>SaveImage 節點</strong>：存到 <code>output/</code>。</li>
</ol>
<p>點右側 panel 的 <code>Queue Prompt</code>、開始生成。</p>
<p>實測時間（M4 Pro 32GB、SDXL base、1024×1024、MPS backend）：</p>
<table>
  <thead>
      <tr>
          <th>Steps</th>
          <th>第一張（含 model 載入）</th>
          <th>後續同 model</th>
          <th>備註</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>15</td>
          <td>約 100-110 秒</td>
          <td>約 30-40 秒</td>
          <td>本驗證實測 106s（含載入）</td>
      </tr>
      <tr>
          <td>20</td>
          <td>約 130-150 秒</td>
          <td>約 40-60 秒</td>
          <td>ComfyUI 預設值</td>
      </tr>
      <tr>
          <td>30</td>
          <td>約 200 秒</td>
          <td>約 80 秒</td>
          <td>品質更高、邊際效益小</td>
      </tr>
  </tbody>
</table>
<p>16GB Mac 跑 SDXL：每張 60-180 秒、可能會降頻。</p>
<p>生成完成後在 <code>output/</code> 看到 PNG 檔（如 <code>comfyui-test_00001_.png</code>）。</p>
<h2 id="用-rest-api-直接生成不開瀏覽器">用 REST API 直接生成（不開瀏覽器）</h2>
<p>GUI 適合互動探索、自動化要走 REST API。完整 script 在 <code>scripts/comfyui-test/generate.py</code>、實際驗證指令：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl"><span class="nb">cd</span> ~/Projects/blog
</span></span><span class="line"><span class="ln">2</span><span class="cl">python3 scripts/comfyui-test/generate.py --steps <span class="m">15</span></span></span></code></pre></div><p>腳本流程：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="k">def</span> <span class="nf">build_workflow</span><span class="p">(</span><span class="n">prompt_text</span><span class="p">,</span> <span class="n">neg_text</span><span class="p">,</span> <span class="n">steps</span><span class="p">):</span>
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">    <span class="k">return</span> <span class="p">{</span>
</span></span><span class="line"><span class="ln"> 3</span><span class="cl">        <span class="s2">&#34;3&#34;</span><span class="p">:</span> <span class="p">{</span><span class="s2">&#34;inputs&#34;</span><span class="p">:</span> <span class="p">{</span><span class="s2">&#34;seed&#34;</span><span class="p">:</span> <span class="mi">42</span><span class="p">,</span> <span class="s2">&#34;steps&#34;</span><span class="p">:</span> <span class="n">steps</span><span class="p">,</span> <span class="s2">&#34;cfg&#34;</span><span class="p">:</span> <span class="mf">7.0</span><span class="p">,</span> <span class="s2">&#34;sampler_name&#34;</span><span class="p">:</span> <span class="s2">&#34;euler&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 4</span><span class="cl">                         <span class="s2">&#34;scheduler&#34;</span><span class="p">:</span> <span class="s2">&#34;normal&#34;</span><span class="p">,</span> <span class="s2">&#34;denoise&#34;</span><span class="p">:</span> <span class="mf">1.0</span><span class="p">,</span>
</span></span><span class="line"><span class="ln"> 5</span><span class="cl">                         <span class="s2">&#34;model&#34;</span><span class="p">:</span> <span class="p">[</span><span class="s2">&#34;4&#34;</span><span class="p">,</span> <span class="mi">0</span><span class="p">],</span> <span class="s2">&#34;positive&#34;</span><span class="p">:</span> <span class="p">[</span><span class="s2">&#34;6&#34;</span><span class="p">,</span> <span class="mi">0</span><span class="p">],</span>
</span></span><span class="line"><span class="ln"> 6</span><span class="cl">                         <span class="s2">&#34;negative&#34;</span><span class="p">:</span> <span class="p">[</span><span class="s2">&#34;7&#34;</span><span class="p">,</span> <span class="mi">0</span><span class="p">],</span> <span class="s2">&#34;latent_image&#34;</span><span class="p">:</span> <span class="p">[</span><span class="s2">&#34;5&#34;</span><span class="p">,</span> <span class="mi">0</span><span class="p">]},</span>
</span></span><span class="line"><span class="ln"> 7</span><span class="cl">              <span class="s2">&#34;class_type&#34;</span><span class="p">:</span> <span class="s2">&#34;KSampler&#34;</span><span class="p">},</span>
</span></span><span class="line"><span class="ln"> 8</span><span class="cl">        <span class="s2">&#34;4&#34;</span><span class="p">:</span> <span class="p">{</span><span class="s2">&#34;inputs&#34;</span><span class="p">:</span> <span class="p">{</span><span class="s2">&#34;ckpt_name&#34;</span><span class="p">:</span> <span class="s2">&#34;sd_xl_base_1.0.safetensors&#34;</span><span class="p">},</span>
</span></span><span class="line"><span class="ln"> 9</span><span class="cl">              <span class="s2">&#34;class_type&#34;</span><span class="p">:</span> <span class="s2">&#34;CheckpointLoaderSimple&#34;</span><span class="p">},</span>
</span></span><span class="line"><span class="ln">10</span><span class="cl">        <span class="s2">&#34;5&#34;</span><span class="p">:</span> <span class="p">{</span><span class="s2">&#34;inputs&#34;</span><span class="p">:</span> <span class="p">{</span><span class="s2">&#34;width&#34;</span><span class="p">:</span> <span class="mi">1024</span><span class="p">,</span> <span class="s2">&#34;height&#34;</span><span class="p">:</span> <span class="mi">1024</span><span class="p">,</span> <span class="s2">&#34;batch_size&#34;</span><span class="p">:</span> <span class="mi">1</span><span class="p">},</span>
</span></span><span class="line"><span class="ln">11</span><span class="cl">              <span class="s2">&#34;class_type&#34;</span><span class="p">:</span> <span class="s2">&#34;EmptyLatentImage&#34;</span><span class="p">},</span>
</span></span><span class="line"><span class="ln">12</span><span class="cl">        <span class="s2">&#34;6&#34;</span><span class="p">:</span> <span class="p">{</span><span class="s2">&#34;inputs&#34;</span><span class="p">:</span> <span class="p">{</span><span class="s2">&#34;text&#34;</span><span class="p">:</span> <span class="n">prompt_text</span><span class="p">,</span> <span class="s2">&#34;clip&#34;</span><span class="p">:</span> <span class="p">[</span><span class="s2">&#34;4&#34;</span><span class="p">,</span> <span class="mi">1</span><span class="p">]},</span>
</span></span><span class="line"><span class="ln">13</span><span class="cl">              <span class="s2">&#34;class_type&#34;</span><span class="p">:</span> <span class="s2">&#34;CLIPTextEncode&#34;</span><span class="p">},</span>
</span></span><span class="line"><span class="ln">14</span><span class="cl">        <span class="s2">&#34;7&#34;</span><span class="p">:</span> <span class="p">{</span><span class="s2">&#34;inputs&#34;</span><span class="p">:</span> <span class="p">{</span><span class="s2">&#34;text&#34;</span><span class="p">:</span> <span class="n">neg_text</span><span class="p">,</span> <span class="s2">&#34;clip&#34;</span><span class="p">:</span> <span class="p">[</span><span class="s2">&#34;4&#34;</span><span class="p">,</span> <span class="mi">1</span><span class="p">]},</span>
</span></span><span class="line"><span class="ln">15</span><span class="cl">              <span class="s2">&#34;class_type&#34;</span><span class="p">:</span> <span class="s2">&#34;CLIPTextEncode&#34;</span><span class="p">},</span>
</span></span><span class="line"><span class="ln">16</span><span class="cl">        <span class="s2">&#34;8&#34;</span><span class="p">:</span> <span class="p">{</span><span class="s2">&#34;inputs&#34;</span><span class="p">:</span> <span class="p">{</span><span class="s2">&#34;samples&#34;</span><span class="p">:</span> <span class="p">[</span><span class="s2">&#34;3&#34;</span><span class="p">,</span> <span class="mi">0</span><span class="p">],</span> <span class="s2">&#34;vae&#34;</span><span class="p">:</span> <span class="p">[</span><span class="s2">&#34;4&#34;</span><span class="p">,</span> <span class="mi">2</span><span class="p">]},</span>
</span></span><span class="line"><span class="ln">17</span><span class="cl">              <span class="s2">&#34;class_type&#34;</span><span class="p">:</span> <span class="s2">&#34;VAEDecode&#34;</span><span class="p">},</span>
</span></span><span class="line"><span class="ln">18</span><span class="cl">        <span class="s2">&#34;9&#34;</span><span class="p">:</span> <span class="p">{</span><span class="s2">&#34;inputs&#34;</span><span class="p">:</span> <span class="p">{</span><span class="s2">&#34;filename_prefix&#34;</span><span class="p">:</span> <span class="s2">&#34;comfyui-test&#34;</span><span class="p">,</span> <span class="s2">&#34;images&#34;</span><span class="p">:</span> <span class="p">[</span><span class="s2">&#34;8&#34;</span><span class="p">,</span> <span class="mi">0</span><span class="p">]},</span>
</span></span><span class="line"><span class="ln">19</span><span class="cl">              <span class="s2">&#34;class_type&#34;</span><span class="p">:</span> <span class="s2">&#34;SaveImage&#34;</span><span class="p">},</span>
</span></span><span class="line"><span class="ln">20</span><span class="cl">    <span class="p">}</span></span></span></code></pre></div><p><strong>workflow JSON 結構解釋</strong>：</p>
<ul>
<li><strong>每個 key（&ldquo;3&rdquo;、&ldquo;4&rdquo;、…）是節點 ID</strong>。任意整數字串、只要在 workflow 內唯一即可。</li>
<li><strong><code>class_type</code></strong>：節點類型（KSampler、CheckpointLoaderSimple、CLIPTextEncode 等）、ComfyUI 內建。</li>
<li><strong><code>inputs</code></strong>：節點參數。標量值（如 <code>1024</code>、<code>&quot;euler&quot;</code>）直接寫；連到別的節點輸出用 <code>[node_id, output_index]</code> 形式。</li>
<li><strong><code>[&quot;4&quot;, 0]</code></strong> 表示「節點 4 的第 0 個 output」。CheckpointLoaderSimple 有三個 output：<code>model</code>（0）、<code>clip</code>（1）、<code>vae</code>（2）、所以 <code>[&quot;4&quot;, 0]</code> 是 model、<code>[&quot;4&quot;, 1]</code> 是 clip、<code>[&quot;4&quot;, 2]</code> 是 vae。</li>
</ul>
<p><strong>每個節點做什麼</strong>：</p>
<ul>
<li><strong>4 CheckpointLoaderSimple</strong>：載 SDXL safetensors、輸出 model / clip / vae 三個東西。是整條 graph 的根。</li>
<li><strong>5 EmptyLatentImage</strong>：建一張 1024×1024 的空白 latent tensor（不是 RGB 圖、是 4-channel latent space tensor）。SDXL 的 「畫布」。</li>
<li><strong>6 CLIPTextEncode (positive)</strong>：把 prompt 文字用 CLIP text encoder 轉成 conditioning vector。</li>
<li><strong>7 CLIPTextEncode (negative)</strong>：同上、但是 negative prompt（要 avoid 的特徵）。</li>
<li><strong>3 KSampler</strong>：核心 denoising loop。15-30 個 step、把 latent 從噪聲變成跟 conditioning 對齊的 latent。</li>
<li><strong>8 VAEDecode</strong>：把 latent 用 VAE 解碼成 RGB 圖（1024×1024×3）。</li>
<li><strong>9 SaveImage</strong>：寫 PNG 到 <code>output/</code> 目錄、檔名 prefix <code>comfyui-test</code>。</li>
</ul>
<p><strong>為什麼 graph 結構這樣</strong>：</p>
<ul>
<li><strong>為什麼 model / clip / vae 從同一個 checkpoint 拿</strong>：SDXL 設計上三個元件互相 train、必須同源。從不同 checkpoint 拿會造成生成品質崩。</li>
<li><strong>為什麼 EmptyLatentImage 不直接接 KSampler、要設 batch_size</strong>：保留 batch 維度、未來要 batch generation（一次生 4 張）改 <code>batch_size: 4</code> 就好、其他節點不用改。</li>
<li><strong>為什麼 sampler 用 <code>euler</code>、scheduler 用 <code>normal</code></strong>：最簡單的組合、SDXL base 上品質可預測。其他選項（<code>dpmpp_2m</code>、<code>karras</code> scheduler 等）品質可能更好但效果各模型不同。</li>
<li><strong>為什麼 cfg=7.0</strong>：classifier-free guidance scale。SDXL 的標準預設、太低（&lt; 3）模型忽略 prompt、太高（&gt; 12）過 saturated。</li>
<li><strong>為什麼 seed=42</strong>：固定 seed 讓結果可重現。每次跑同 prompt 同 seed 同 model 結果完全一樣——是調 prompt / 比較 model 的必要條件。</li>
</ul>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="k">def</span> <span class="nf">main</span><span class="p">():</span>
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">    <span class="n">workflow</span> <span class="o">=</span> <span class="n">build_workflow</span><span class="p">(</span><span class="n">args</span><span class="o">.</span><span class="n">prompt</span><span class="p">,</span> <span class="n">args</span><span class="o">.</span><span class="n">neg</span><span class="p">,</span> <span class="n">args</span><span class="o">.</span><span class="n">steps</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 3</span><span class="cl">    <span class="n">client_id</span> <span class="o">=</span> <span class="nb">str</span><span class="p">(</span><span class="n">uuid</span><span class="o">.</span><span class="n">uuid4</span><span class="p">())</span>
</span></span><span class="line"><span class="ln"> 4</span><span class="cl">    <span class="n">resp</span> <span class="o">=</span> <span class="n">http_post_json</span><span class="p">(</span><span class="s2">&#34;/prompt&#34;</span><span class="p">,</span> <span class="p">{</span><span class="s2">&#34;prompt&#34;</span><span class="p">:</span> <span class="n">workflow</span><span class="p">,</span> <span class="s2">&#34;client_id&#34;</span><span class="p">:</span> <span class="n">client_id</span><span class="p">})</span>
</span></span><span class="line"><span class="ln"> 5</span><span class="cl">    <span class="n">prompt_id</span> <span class="o">=</span> <span class="n">resp</span><span class="p">[</span><span class="s2">&#34;prompt_id&#34;</span><span class="p">]</span>
</span></span><span class="line"><span class="ln"> 6</span><span class="cl">
</span></span><span class="line"><span class="ln"> 7</span><span class="cl">    <span class="k">while</span> <span class="kc">True</span><span class="p">:</span>
</span></span><span class="line"><span class="ln"> 8</span><span class="cl">        <span class="n">time</span><span class="o">.</span><span class="n">sleep</span><span class="p">(</span><span class="mi">2</span><span class="p">)</span>
</span></span><span class="line"><span class="ln"> 9</span><span class="cl">        <span class="n">history</span> <span class="o">=</span> <span class="n">http_get_json</span><span class="p">(</span><span class="sa">f</span><span class="s2">&#34;/history/</span><span class="si">{</span><span class="n">prompt_id</span><span class="si">}</span><span class="s2">&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">10</span><span class="cl">        <span class="k">if</span> <span class="n">prompt_id</span> <span class="ow">in</span> <span class="n">history</span><span class="p">:</span>
</span></span><span class="line"><span class="ln">11</span><span class="cl">            <span class="n">outputs</span> <span class="o">=</span> <span class="n">history</span><span class="p">[</span><span class="n">prompt_id</span><span class="p">]</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s2">&#34;outputs&#34;</span><span class="p">,</span> <span class="p">{})</span>
</span></span><span class="line"><span class="ln">12</span><span class="cl">            <span class="k">break</span>
</span></span><span class="line"><span class="ln">13</span><span class="cl">
</span></span><span class="line"><span class="ln">14</span><span class="cl">    <span class="n">img</span> <span class="o">=</span> <span class="n">outputs</span><span class="p">[</span><span class="s2">&#34;9&#34;</span><span class="p">][</span><span class="s2">&#34;images&#34;</span><span class="p">][</span><span class="mi">0</span><span class="p">]</span>
</span></span><span class="line"><span class="ln">15</span><span class="cl">    <span class="n">qs</span> <span class="o">=</span> <span class="n">urllib</span><span class="o">.</span><span class="n">parse</span><span class="o">.</span><span class="n">urlencode</span><span class="p">({</span><span class="s2">&#34;filename&#34;</span><span class="p">:</span> <span class="n">img</span><span class="p">[</span><span class="s2">&#34;filename&#34;</span><span class="p">],</span> <span class="s2">&#34;type&#34;</span><span class="p">:</span> <span class="s2">&#34;output&#34;</span><span class="p">})</span>
</span></span><span class="line"><span class="ln">16</span><span class="cl">    <span class="n">blob</span> <span class="o">=</span> <span class="n">http_get_bytes</span><span class="p">(</span><span class="sa">f</span><span class="s2">&#34;/view?</span><span class="si">{</span><span class="n">qs</span><span class="si">}</span><span class="s2">&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="ln">17</span><span class="cl">    <span class="n">Path</span><span class="p">(</span><span class="n">args</span><span class="o">.</span><span class="n">out</span><span class="p">)</span><span class="o">.</span><span class="n">write_bytes</span><span class="p">(</span><span class="n">blob</span><span class="p">)</span></span></span></code></pre></div><p><strong>每段做什麼</strong>：</p>
<ol>
<li><strong><code>client_id = str(uuid.uuid4())</code></strong>：每個 client 識別碼。ComfyUI 用 client_id 把 progress events 路由給正確 WebSocket subscriber。本 demo 用 polling、client_id 隨意產生即可。</li>
<li><strong><code>POST /prompt</code></strong>：送 workflow + client_id、server 回 <code>prompt_id</code>（這次 job 的 UUID）。Server 把 workflow 丟進 internal queue、立刻 return、不會等 generation。</li>
<li><strong><code>while True: time.sleep(2); GET /history/{prompt_id}</code></strong>：polling 等 job 完成。完成的 job 才會出現在 <code>/history</code> 裡（執行中 / queued 都不算）。</li>
<li><strong><code>if prompt_id in history</code></strong>：完成判讀——history 內出現該 prompt_id 表示 generation 結束。</li>
<li><strong><code>outputs[&quot;9&quot;][&quot;images&quot;][0]</code></strong>：節點 9 (SaveImage) 的輸出、含 <code>filename</code>、<code>subfolder</code>、<code>type</code> 等資訊。</li>
<li><strong><code>/view?filename=...&amp;type=output</code></strong>：拿生成的 PNG bytes。<code>type=output</code> 是 ComfyUI 的內部 dir 標記（區分 output / input / temp）。</li>
</ol>
<p><strong>為什麼這樣設計</strong>：</p>
<ul>
<li><strong>為什麼 polling 而不是 WebSocket</strong>：WebSocket 要 subscribe events、處理 connection lifecycle、邏輯複雜。Polling 兩行解決、對教學 demo 夠用。Production 自動化系統建議用 WebSocket、知道每個 progress event。</li>
<li><strong>為什麼 <code>time.sleep(2)</code></strong>：太短（&lt; 1s）對 server 造成不必要 polling；太長（&gt; 5s）感知延遲明顯。2 秒是 demo 友善平衡。</li>
<li><strong>為什麼用 prompt_id 而不是 client_id 查 history</strong>：一個 client 可能送多個 job、prompt_id 唯一識別 job。client_id 主要用 WebSocket 訂閱、不是 history query 主鍵。</li>
<li><strong>為什麼 <code>Path(args.out).write_bytes(blob)</code></strong>：PNG 是 binary、用 <code>write_bytes</code> 直接寫；改用 <code>open(...).write()</code> 的 text mode 會在編碼轉換時破壞檔案內容。</li>
</ul>
<p><strong>實測</strong>：M4 Pro 32GB、prompt 「a photograph of an orange cat sitting on a wooden chair, soft natural lighting, detailed fur」、15 steps、cfg=7、euler+normal sampler、seed=42 → 106 秒生成 1024×1024 PNG、1.65 MB。</p>
<h2 id="comfyui-的-rest-api-形狀無-openai-相容層">ComfyUI 的 REST API 形狀（無 OpenAI 相容層）</h2>
<p>ComfyUI 沒提供 OpenAI 相容 API、它的 API 是自己的 REST + WebSocket：</p>
<ul>
<li><code>POST /prompt</code>：丟一個 workflow JSON、回傳 job id。</li>
<li><code>GET /history/{prompt_id}</code>：查看生成結果。</li>
<li><code>GET /view?filename=X</code>：拿生成的圖。</li>
<li>WebSocket：訂閱 job progress events。</li>
</ul>
<p>API 形狀跟 Diffusion 任務匹配、跟 LLM 的 <code>/chat/completions</code> 完全不同——這正是 <a href="/blog/llm/04-applications/rag-principles/" data-link-title="4.1 RAG 原理：retrieval &#43; augmentation 模式" data-link-desc="為什麼模型需要外掛知識、語意相似 vs 字面相似、chunking 的本質取捨、retrieval 失敗的根本原因">4.1 RAG 章節</a> 提到「Diffusion 跟 Transformer 工具鏈互不通用」的具體展現。Ollama / LM Studio 對接 Continue.dev 的 OpenAI 相容路徑、跟 ComfyUI 接 SDXL 是完全平行的兩條路。</p>
<h2 id="常用-custom-nodes">常用 Custom Nodes</h2>
<p>ComfyUI 的核心功能來自 custom nodes、社群維護。最常用：</p>
<table>
  <thead>
      <tr>
          <th>Custom Node</th>
          <th>功能</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>ComfyUI-Manager</td>
          <td>管理其他 custom node、安裝 / 更新</td>
      </tr>
      <tr>
          <td>ComfyUI-Impact-Pack</td>
          <td>物件偵測、masking、inpainting</td>
      </tr>
      <tr>
          <td>ComfyUI-AnimateDiff</td>
          <td>影片動畫生成</td>
      </tr>
      <tr>
          <td>ComfyUI-ControlNet-Aux</td>
          <td>ControlNet preprocessor</td>
      </tr>
      <tr>
          <td>ComfyUI-IPAdapter-plus</td>
          <td>圖像 reference embedding</td>
      </tr>
  </tbody>
</table>
<p>安裝方式（透過 ComfyUI-Manager）：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl"><span class="nb">cd</span> ~/Projects/ComfyUI/custom_nodes
</span></span><span class="line"><span class="ln">2</span><span class="cl">git clone https://github.com/ltdrdata/ComfyUI-Manager.git
</span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="c1"># 重啟 ComfyUI、UI 多一個 Manager 按鈕、之後用 Manager 裝其他 node</span></span></span></code></pre></div><h2 id="常見坑">常見坑</h2>
<h3 id="python-版本太新torch-沒-wheel">Python 版本太新、torch 沒 wheel</h3>
<p>PyTorch 對最新 Python（3.13、3.14）的 wheel 發布有 lag、可能 <code>pip install -r requirements.txt</code> 跑 build from source 慢 + 失敗。退到 Python 3.11 / 3.12：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">brew install python@3.11
</span></span><span class="line"><span class="ln">2</span><span class="cl">python3.11 -m venv venv
</span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="nb">source</span> venv/bin/activate
</span></span><span class="line"><span class="ln">4</span><span class="cl">pip install -r requirements.txt</span></span></code></pre></div><h3 id="mps-false跑在-cpu-上"><code>mps: False</code>、跑在 CPU 上</h3>
<p>確認 PyTorch 是 Apple Silicon 版本（不是 x86_64 emulation）：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">python -c <span class="s2">&#34;import platform; print(platform.machine())&#34;</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="c1"># arm64 ← 正確；x86_64 ← 走 Rosetta、要重裝</span></span></span></code></pre></div><p>如果是 x86_64、表示 venv 用了 Intel Python。重建 venv：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">deactivate
</span></span><span class="line"><span class="ln">2</span><span class="cl">rm -rf venv
</span></span><span class="line"><span class="ln">3</span><span class="cl">arch -arm64 python3 -m venv venv</span></span></code></pre></div><h3 id="記憶體不夠推論時-crash">記憶體不夠、推論時 crash</h3>
<p>SDXL 在 16 GB Mac 上吃緊、可能 swap 或 crash。緩解：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl"><span class="c1"># 降解析度</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl">python main.py --normalvram   <span class="c1"># 預設、~12 GB</span>
</span></span><span class="line"><span class="ln">3</span><span class="cl">python main.py --lowvram      <span class="c1"># 較省、~8 GB、慢</span>
</span></span><span class="line"><span class="ln">4</span><span class="cl">python main.py --novram       <span class="c1"># 極省、~4 GB、極慢、實用上界</span></span></span></code></pre></div><p>或換 SD 1.5（4 GB checkpoint）、記憶體需求 &lt; SDXL 的一半。</p>
<h3 id="workflow-json-載入失敗">Workflow JSON 載入失敗</h3>
<p>ComfyUI workflow 是 JSON 描述節點 + 連線。如果是別人分享的 workflow、可能用了你沒裝的 custom node。錯誤訊息會列出缺哪些 node、用 ComfyUI-Manager 補裝。</p>
<h3 id="port-8188-被佔">Port 8188 被佔</h3>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl">lsof -i :8188
</span></span><span class="line"><span class="ln">2</span><span class="cl">python main.py --port <span class="m">8189</span>  <span class="c1"># 改 port</span></span></span></code></pre></div><h2 id="跟-llm-stack-並存">跟 LLM stack 並存</h2>
<p>ComfyUI 用 port 8188、跟 Ollama (11434) / LM Studio (1234) 完全不撞、可同時跑。實務配置：</p>
<table>
  <thead>
      <tr>
          <th>服務</th>
          <th>Port</th>
          <th>用途</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>Ollama</td>
          <td>11434</td>
          <td>寫 code、對話</td>
      </tr>
      <tr>
          <td>ComfyUI</td>
          <td>8188</td>
          <td>產圖</td>
      </tr>
      <tr>
          <td>LM Studio</td>
          <td>1234</td>
          <td>探索新 LLM</td>
      </tr>
      <tr>
          <td>Open WebUI</td>
          <td>3000</td>
          <td>ChatGPT 風格瀏覽器介面</td>
      </tr>
  </tbody>
</table>
<p>各服務獨立、不干擾、可以一台 Mac 跑全部（看記憶體預算）。</p>
<h2 id="何時這篇會過時">何時這篇會過時</h2>
<ul>
<li>ComfyUI 主分支 API 短期內穩定（大量社群依賴）。</li>
<li>SDXL base 1.0 不會消失、但會被新版本（SDXL 1.1、Flux 等）取代——「下載 .safetensors 放 models/checkpoints/」流程不變。</li>
<li>MPS backend 持續優化、效能會提升、但介面不變。</li>
<li>Python 版本相容性會持續演化、<code>pip install -r requirements.txt</code> 偶爾要降版 Python。</li>
</ul>
<p>讀的時候若 pip install 失敗、看 ComfyUI GitHub issues 跟 PyTorch release notes 對應的 Python 版本。</p>
<p>跟其他 hands-on 章節的關係：完整 hands-on 系列見 <a href="/blog/llm/01-local-llm-services/hands-on/" data-link-title="Hands-on：本地 AI 工具實作筆記" data-link-desc="Ollama / ComfyUI / Whisper / Piper TTS：實際安裝、驗證、跑通的紀錄。隨工具版本演化、跟 1.x 原理章節互補。">Hands-on 章節索引</a>、跨服務的 lifecycle / 記憶體管理見 <a href="/blog/llm/01-local-llm-services/hands-on/resource-management/" data-link-title="Hands-on：LLM 運行中 &#43; 結束的資源管理" data-link-desc="RAM / 磁碟 / port 三個 dimension 的觀察跟釋放、Ollama keep_alive 跟 ComfyUI 兩種 lifecycle 對比、實測釋放數字">Resource management</a>、ComfyUI 跟 Ollama 同台跑的記憶體預算規劃見 <a href="/blog/llm/00-foundations/hardware-memory-budget/" data-link-title="0.5 Apple Silicon 記憶體預算" data-link-desc="記憶體決定能跑什麼，Q4 量化下的可運作模型對照與系統保留">0.5 Apple Silicon 記憶體預算</a>。</p>
]]></content:encoded></item><item><title>Hands-on：LLM 運行中 + 結束的資源管理</title><link>https://tarrragon.github.io/blog/llm/01-local-llm-services/hands-on/resource-management/</link><pubDate>Tue, 12 May 2026 00:00:00 +0000</pubDate><guid>https://tarrragon.github.io/blog/llm/01-local-llm-services/hands-on/resource-management/</guid><description>&lt;p>跑本地 LLM 的核心 invariant 跟雲端不一樣：&lt;strong>Mac 是 shared resource、不是 dedicated GPU&lt;/strong>。雲端 inference server 跑進 dedicated container、結束 instance 自然回收所有資源；本地&lt;a href="https://tarrragon.github.io/blog/llm/knowledge-cards/inference-server/" data-link-title="Inference Server" data-link-desc="載入模型權重、處理 prompt、產生 token 的常駐 process">推論伺服器&lt;/a>跑在你日常用的 Mac、跟 &lt;a href="https://tarrragon.github.io/blog/llm/knowledge-cards/unified-memory/" data-link-title="Unified Memory Architecture" data-link-desc="Apple Silicon 讓 CPU / GPU / NE 共用同一塊記憶體：跑大模型的優勢來源">統一記憶體&lt;/a> 共享同一塊容量，忘記管理會 silently 吃光 RAM、磁碟、port、最後讓系統變慢甚至 swap。&lt;/p>
&lt;p>本篇紀錄三個 dimension（RAM / 磁碟 / port）的觀察工具跟釋放姿勢、對比 Ollama 跟 ComfyUI 兩種典型 lifecycle、加上實測釋放數字。對應 &lt;a href="https://tarrragon.github.io/blog/llm/00-foundations/privacy-data-flow/" data-link-title="0.7 隱私 / 資安的資料流原理" data-link-desc="從「位置」到「資料流」的思考升級：信任邊界、合約模型、零信任原則套用到 LLM 工作流">0.7 隱私資料流原理&lt;/a>「每個 hop 都要 audit」這條思維——資源管理也是 hop 級的 audit、不是「裝完就忘」。&lt;/p>
&lt;blockquote>
&lt;p>&lt;strong>驗證日期&lt;/strong>：2026-05-12
&lt;strong>環境&lt;/strong>：macOS 14、Apple Silicon、Ollama 0.23.2、ComfyUI 0.21.0、SDXL base 1.0&lt;/p>&lt;/blockquote>
&lt;h2 id="為什麼這事重要">為什麼這事重要&lt;/h2>
&lt;p>雲端 inference：&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-text" data-lang="text">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">Container start → load model → serve requests → container stop → 所有 RAM / 磁碟 / port 自動回收&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>本地 inference：&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-text" data-lang="text">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">brew services start → load model on demand → serve → ??? → 你忘記 stop
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">2&lt;/span>&lt;span class="cl"> → RAM / 磁碟一直被佔
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">3&lt;/span>&lt;span class="cl"> → 下次重開機才釋放&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>具體會踩到的問題：&lt;/p>
&lt;ul>
&lt;li>&lt;strong>RAM&lt;/strong>：18 GB SDXL 模型載入後不會自動卸、即使 ComfyUI idle、Python process 仍占 RAM&lt;/li>
&lt;li>&lt;strong>磁碟&lt;/strong>：&lt;code>ollama pull&lt;/code> 累積、&lt;code>~/.ollama/models/blobs&lt;/code> 半年可長到 50 GB+、不主動清不會減&lt;/li>
&lt;li>&lt;strong>Port&lt;/strong>：上次 crash 的 &lt;code>ollama serve&lt;/code> 進程沒乾淨清、port 11434 還占著、下次啟動報「address already in use」&lt;/li>
&lt;li>&lt;strong>GPU / Metal&lt;/strong>：模型載入後 Metal context 佔住、跟其他 GPU-using app（影片剪輯、遊戲）競爭&lt;/li>
&lt;/ul>
&lt;h2 id="三個-dimension--觀察工具">三個 dimension + 觀察工具&lt;/h2>
&lt;table>
 &lt;thead>
 &lt;tr>
 &lt;th>Dimension&lt;/th>
 &lt;th>觀察指令&lt;/th>
 &lt;th>看什麼&lt;/th>
 &lt;/tr>
 &lt;/thead>
 &lt;tbody>
 &lt;tr>
 &lt;td>RAM&lt;/td>
 &lt;td>&lt;code>vm_stat | head -5&lt;/code>&lt;/td>
 &lt;td>Pages free（每 page 16 KB）、空閒越多越好&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>RAM（per process）&lt;/td>
 &lt;td>Activity Monitor 或 &lt;code>ps aux | sort -k6 -rn | head&lt;/code>&lt;/td>
 &lt;td>哪個 process 佔最多記憶體&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>磁碟&lt;/td>
 &lt;td>&lt;code>df -h ~ | tail -1&lt;/code>&lt;/td>
 &lt;td>系統 volume 剩餘&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>磁碟（per dir）&lt;/td>
 &lt;td>&lt;code>du -sh ~/.ollama/models/blobs&lt;/code>&lt;/td>
 &lt;td>LLM models 累積量&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>Port&lt;/td>
 &lt;td>&lt;code>lsof -i :11434&lt;/code>&lt;/td>
 &lt;td>誰在 listen 該 port&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>Process&lt;/td>
 &lt;td>&lt;code>ps aux | grep -i ollama | grep -v grep&lt;/code>&lt;/td>
 &lt;td>Ollama / ComfyUI / Python 跑哪幾個&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>Ollama loaded models&lt;/td>
 &lt;td>&lt;code>ollama ps&lt;/code>&lt;/td>
 &lt;td>哪些 model 在 RAM、size、idle timer&lt;/td>
 &lt;/tr>
 &lt;/tbody>
&lt;/table>
&lt;p>實測：剛 kill 完 ComfyUI（SDXL + Python venv）後、&lt;code>vm_stat&lt;/code> 看到 free pages 從 619K 變 1090K（每 page 16 KB）、約 &lt;strong>+7.5 GB RAM 釋放&lt;/strong>——這就是 SDXL + ComfyUI process 一直占的記憶體量。&lt;/p></description><content:encoded><![CDATA[<p>跑本地 LLM 的核心 invariant 跟雲端不一樣：<strong>Mac 是 shared resource、不是 dedicated GPU</strong>。雲端 inference server 跑進 dedicated container、結束 instance 自然回收所有資源；本地<a href="/blog/llm/knowledge-cards/inference-server/" data-link-title="Inference Server" data-link-desc="載入模型權重、處理 prompt、產生 token 的常駐 process">推論伺服器</a>跑在你日常用的 Mac、跟 <a href="/blog/llm/knowledge-cards/unified-memory/" data-link-title="Unified Memory Architecture" data-link-desc="Apple Silicon 讓 CPU / GPU / NE 共用同一塊記憶體：跑大模型的優勢來源">統一記憶體</a> 共享同一塊容量，忘記管理會 silently 吃光 RAM、磁碟、port、最後讓系統變慢甚至 swap。</p>
<p>本篇紀錄三個 dimension（RAM / 磁碟 / port）的觀察工具跟釋放姿勢、對比 Ollama 跟 ComfyUI 兩種典型 lifecycle、加上實測釋放數字。對應 <a href="/blog/llm/00-foundations/privacy-data-flow/" data-link-title="0.7 隱私 / 資安的資料流原理" data-link-desc="從「位置」到「資料流」的思考升級：信任邊界、合約模型、零信任原則套用到 LLM 工作流">0.7 隱私資料流原理</a>「每個 hop 都要 audit」這條思維——資源管理也是 hop 級的 audit、不是「裝完就忘」。</p>
<blockquote>
<p><strong>驗證日期</strong>：2026-05-12
<strong>環境</strong>：macOS 14、Apple Silicon、Ollama 0.23.2、ComfyUI 0.21.0、SDXL base 1.0</p></blockquote>
<h2 id="為什麼這事重要">為什麼這事重要</h2>
<p>雲端 inference：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="ln">1</span><span class="cl">Container start → load model → serve requests → container stop → 所有 RAM / 磁碟 / port 自動回收</span></span></code></pre></div><p>本地 inference：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="ln">1</span><span class="cl">brew services start → load model on demand → serve → ??? → 你忘記 stop
</span></span><span class="line"><span class="ln">2</span><span class="cl">                                              → RAM / 磁碟一直被佔
</span></span><span class="line"><span class="ln">3</span><span class="cl">                                              → 下次重開機才釋放</span></span></code></pre></div><p>具體會踩到的問題：</p>
<ul>
<li><strong>RAM</strong>：18 GB SDXL 模型載入後不會自動卸、即使 ComfyUI idle、Python process 仍占 RAM</li>
<li><strong>磁碟</strong>：<code>ollama pull</code> 累積、<code>~/.ollama/models/blobs</code> 半年可長到 50 GB+、不主動清不會減</li>
<li><strong>Port</strong>：上次 crash 的 <code>ollama serve</code> 進程沒乾淨清、port 11434 還占著、下次啟動報「address already in use」</li>
<li><strong>GPU / Metal</strong>：模型載入後 Metal context 佔住、跟其他 GPU-using app（影片剪輯、遊戲）競爭</li>
</ul>
<h2 id="三個-dimension--觀察工具">三個 dimension + 觀察工具</h2>
<table>
  <thead>
      <tr>
          <th>Dimension</th>
          <th>觀察指令</th>
          <th>看什麼</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>RAM</td>
          <td><code>vm_stat | head -5</code></td>
          <td>Pages free（每 page 16 KB）、空閒越多越好</td>
      </tr>
      <tr>
          <td>RAM（per process）</td>
          <td>Activity Monitor 或 <code>ps aux | sort -k6 -rn | head</code></td>
          <td>哪個 process 佔最多記憶體</td>
      </tr>
      <tr>
          <td>磁碟</td>
          <td><code>df -h ~ | tail -1</code></td>
          <td>系統 volume 剩餘</td>
      </tr>
      <tr>
          <td>磁碟（per dir）</td>
          <td><code>du -sh ~/.ollama/models/blobs</code></td>
          <td>LLM models 累積量</td>
      </tr>
      <tr>
          <td>Port</td>
          <td><code>lsof -i :11434</code></td>
          <td>誰在 listen 該 port</td>
      </tr>
      <tr>
          <td>Process</td>
          <td><code>ps aux | grep -i ollama | grep -v grep</code></td>
          <td>Ollama / ComfyUI / Python 跑哪幾個</td>
      </tr>
      <tr>
          <td>Ollama loaded models</td>
          <td><code>ollama ps</code></td>
          <td>哪些 model 在 RAM、size、idle timer</td>
      </tr>
  </tbody>
</table>
<p>實測：剛 kill 完 ComfyUI（SDXL + Python venv）後、<code>vm_stat</code> 看到 free pages 從 619K 變 1090K（每 page 16 KB）、約 <strong>+7.5 GB RAM 釋放</strong>——這就是 SDXL + ComfyUI process 一直占的記憶體量。</p>
<h2 id="ollama-的-lifecycleauto-unload-模式">Ollama 的 lifecycle（auto-unload 模式）</h2>
<p>Ollama 走「按需 load / idle unload」設計：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="ln">1</span><span class="cl">brew services start ollama          → daemon 啟動、沒 model 載入、RAM 占用 ~200 MB
</span></span><span class="line"><span class="ln">2</span><span class="cl">                                     port 11434 listening
</span></span><span class="line"><span class="ln">3</span><span class="cl">ollama run gemma3:4b &#34;hello&#34;        → 把 model 載入 RAM (~4-5 GB)
</span></span><span class="line"><span class="ln">4</span><span class="cl">                                     立刻 generate response
</span></span><span class="line"><span class="ln">5</span><span class="cl">                                     model 留在 RAM
</span></span><span class="line"><span class="ln">6</span><span class="cl">(idle 5 分鐘、無新 request)         → Ollama 自動 unload model
</span></span><span class="line"><span class="ln">7</span><span class="cl">                                     RAM 釋放、daemon 仍跑著
</span></span><span class="line"><span class="ln">8</span><span class="cl">ollama run gemma3:4b &#34;next&#34;         → 重新 load model（~5-10 秒）、generate
</span></span><span class="line"><span class="ln">9</span><span class="cl">brew services stop ollama           → daemon 結束、port 釋放</span></span></code></pre></div><p><strong>關鍵參數 <code>OLLAMA_KEEP_ALIVE</code></strong>（環境變數、預設 <code>5m</code>）：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="c1"># 看當前 loaded models</span>
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">ollama ps
</span></span><span class="line"><span class="ln"> 3</span><span class="cl"><span class="c1"># NAME         ID              SIZE      PROCESSOR    UNTIL</span>
</span></span><span class="line"><span class="ln"> 4</span><span class="cl"><span class="c1"># gemma3:4b    a2af6cc3eb7f    5.5 GB    100% Metal   4 minutes from now</span>
</span></span><span class="line"><span class="ln"> 5</span><span class="cl">
</span></span><span class="line"><span class="ln"> 6</span><span class="cl"><span class="c1"># 啟動時調 keep_alive（持續佔 RAM 直到 ollama 重啟）</span>
</span></span><span class="line"><span class="ln"> 7</span><span class="cl"><span class="nv">OLLAMA_KEEP_ALIVE</span><span class="o">=</span>-1 brew services restart ollama
</span></span><span class="line"><span class="ln"> 8</span><span class="cl">
</span></span><span class="line"><span class="ln"> 9</span><span class="cl"><span class="c1"># 啟動時讓 model 用完立即 unload</span>
</span></span><span class="line"><span class="ln">10</span><span class="cl"><span class="nv">OLLAMA_KEEP_ALIVE</span><span class="o">=</span><span class="m">0</span> brew services restart ollama</span></span></code></pre></div><p>選 keep_alive 的 trade-off：</p>
<table>
  <thead>
      <tr>
          <th>設定</th>
          <th>RAM 占用</th>
          <th>首字延遲</th>
          <th>適合場景</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td><code>0</code></td>
          <td>最低（generate 完立即釋放）</td>
          <td>高（每次都重 load）</td>
          <td>偶爾用、RAM 緊張</td>
      </tr>
      <tr>
          <td><code>5m</code>（預設）</td>
          <td>中（活躍用占住、閒 5 分鐘後釋放）</td>
          <td>低（活躍期不重 load）</td>
          <td>大多場景</td>
      </tr>
      <tr>
          <td><code>-1</code></td>
          <td>高（永久占住）</td>
          <td>最低</td>
          <td>整天頻繁用、RAM 充裕</td>
      </tr>
  </tbody>
</table>
<p><strong>主動 unload 指令</strong>：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl"><span class="c1"># 把 idle 的 model 立刻從 RAM 卸掉、但 daemon 仍跑</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl">curl -s http://localhost:11434/api/generate <span class="se">\
</span></span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="se"></span>  -d <span class="s1">&#39;{&#34;model&#34;: &#34;gemma3:4b&#34;, &#34;keep_alive&#34;: 0}&#39;</span>
</span></span><span class="line"><span class="ln">4</span><span class="cl">
</span></span><span class="line"><span class="ln">5</span><span class="cl"><span class="c1"># 或關掉整個 daemon</span>
</span></span><span class="line"><span class="ln">6</span><span class="cl">brew services stop ollama</span></span></code></pre></div><h2 id="comfyui-的-lifecycle持續占用模式">ComfyUI 的 lifecycle（持續占用模式）</h2>
<p>ComfyUI 走完全不同模式：<strong>model 載入後一直在 RAM、直到 server process 結束</strong>。沒有 auto-unload 機制。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-text" data-lang="text"><span class="line"><span class="ln"> 1</span><span class="cl">python main.py                      → ComfyUI server start、port 8188 listening
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">                                     RAM ~3 GB（Python venv + 框架）
</span></span><span class="line"><span class="ln"> 3</span><span class="cl">第一次 Queue Prompt (用 SDXL)        → 載入 sd_xl_base_1.0.safetensors (~6 GB)
</span></span><span class="line"><span class="ln"> 4</span><span class="cl">                                     RAM 跳到 ~9-10 GB
</span></span><span class="line"><span class="ln"> 5</span><span class="cl">                                     generate 完成、model 留在 RAM
</span></span><span class="line"><span class="ln"> 6</span><span class="cl">連續多張生成                          → 維持 ~9-10 GB、沒 unload
</span></span><span class="line"><span class="ln"> 7</span><span class="cl">idle 1 小時                          → 仍 ~9-10 GB（沒 timer）
</span></span><span class="line"><span class="ln"> 8</span><span class="cl">切到 ControlNet workflow             → 多載 ControlNet model (~2 GB)、ComfyUI 自動 swap
</span></span><span class="line"><span class="ln"> 9</span><span class="cl">                                     RAM 暫升、SD 部分可能被 evict 到 disk
</span></span><span class="line"><span class="ln">10</span><span class="cl">Ctrl+C / pkill                       → process 結束、RAM 完全釋放</span></span></code></pre></div><p>要釋放 ComfyUI 占的 RAM、<strong>唯一方法是結束 server</strong>：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="c1"># 找 PID</span>
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">ps aux <span class="p">|</span> grep <span class="s2">&#34;ComfyUI/main.py&#34;</span> <span class="p">|</span> grep -v grep
</span></span><span class="line"><span class="ln"> 3</span><span class="cl">
</span></span><span class="line"><span class="ln"> 4</span><span class="cl"><span class="c1"># 優雅關（讓它 cleanup）</span>
</span></span><span class="line"><span class="ln"> 5</span><span class="cl">pkill -INT -f <span class="s2">&#34;ComfyUI/main.py&#34;</span>
</span></span><span class="line"><span class="ln"> 6</span><span class="cl">
</span></span><span class="line"><span class="ln"> 7</span><span class="cl"><span class="c1"># 強制 kill（如果上面沒反應、最多等 5 秒再強制）</span>
</span></span><span class="line"><span class="ln"> 8</span><span class="cl">pkill -KILL -f <span class="s2">&#34;ComfyUI/main.py&#34;</span>
</span></span><span class="line"><span class="ln"> 9</span><span class="cl">
</span></span><span class="line"><span class="ln">10</span><span class="cl"><span class="c1"># 確認 port 釋放</span>
</span></span><span class="line"><span class="ln">11</span><span class="cl">lsof -i :8188 <span class="p">|</span> head -3</span></span></code></pre></div><p>實測：M4 Pro 32GB、SDXL base 載入後 ComfyUI process 占 ~8 GB RAM；<code>pkill -9</code> 後 <code>vm_stat</code> 顯示 free pages 增加 ~470K page（<strong>7.5 GB 釋放</strong>）。</p>
<h3 id="為什麼-ollama-跟-comfyui-設計不同">為什麼 Ollama 跟 ComfyUI 設計不同</h3>
<table>
  <thead>
      <tr>
          <th>因素</th>
          <th>Ollama 設計</th>
          <th>ComfyUI 設計</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>主要使用模式</td>
          <td>API 服務、IDE plugin 透過 HTTP 用</td>
          <td>互動 GUI、user 連續調 prompt</td>
      </tr>
      <tr>
          <td>Model 切換頻率</td>
          <td>高（不同任務換不同 model）</td>
          <td>低（一次 session 通常一個 model）</td>
      </tr>
      <tr>
          <td>User 期待的 latency</td>
          <td>低首字延遲（IDE 補完場景）</td>
          <td>高 throughput（連續生圖）</td>
      </tr>
      <tr>
          <td>結論</td>
          <td>Auto-unload 釋 RAM 給其他 model</td>
          <td>持續載入避免重複 load 浪費</td>
      </tr>
  </tbody>
</table>
<p>兩種設計都 valid、適合不同使用模式。理解差異後就知道 ComfyUI 一直占 RAM「不是 bug」、是設計選擇。</p>
<h2 id="跟其他本地-server-對比">跟其他本地 server 對比</h2>
<table>
  <thead>
      <tr>
          <th>Server</th>
          <th>Auto-unload</th>
          <th>主動 unload 指令</th>
          <th>占 RAM 觀察</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>Ollama</td>
          <td>有（5 分鐘 idle）</td>
          <td><code>keep_alive: 0</code> 或 stop daemon</td>
          <td><code>ollama ps</code></td>
      </tr>
      <tr>
          <td>LM Studio</td>
          <td>無（GUI 主動關閉 model 才釋）</td>
          <td>GUI Eject Model</td>
          <td>Activity Monitor</td>
      </tr>
      <tr>
          <td>llama.cpp <code>llama-server</code></td>
          <td>無</td>
          <td>kill process</td>
          <td><code>lsof -i :8080</code></td>
      </tr>
      <tr>
          <td>ComfyUI</td>
          <td>無</td>
          <td>kill process</td>
          <td><code>ps aux | grep ComfyUI</code></td>
      </tr>
      <tr>
          <td>oMLX</td>
          <td>有（per model 可配）</td>
          <td>API endpoint</td>
          <td>server log</td>
      </tr>
  </tbody>
</table>
<p><strong>結論</strong>：只有 Ollama 跟 oMLX 內建 auto-unload、其他都要手動釋放。GUI server（LM Studio）通常給 user 一個「Eject」按鈕、CLI server 通常要 kill process。</p>
<h2 id="標準釋放程序">標準釋放程序</h2>
<p>寫 code 完一天結束、要釋放所有資源、按下表順序操作：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="c1"># 1. 確認當前狀態（記下要還回去多少 RAM）</span>
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">vm_stat <span class="p">|</span> head -3
</span></span><span class="line"><span class="ln"> 3</span><span class="cl">df -h ~ <span class="p">|</span> tail -1
</span></span><span class="line"><span class="ln"> 4</span><span class="cl">ollama ps
</span></span><span class="line"><span class="ln"> 5</span><span class="cl">ps aux <span class="p">|</span> grep -E <span class="s2">&#34;ollama|ComfyUI|llama-server&#34;</span> <span class="p">|</span> grep -v grep
</span></span><span class="line"><span class="ln"> 6</span><span class="cl">
</span></span><span class="line"><span class="ln"> 7</span><span class="cl"><span class="c1"># 2. 釋放當前載入的 LLM models（Ollama）</span>
</span></span><span class="line"><span class="ln"> 8</span><span class="cl">brew services stop ollama
</span></span><span class="line"><span class="ln"> 9</span><span class="cl"><span class="c1"># 或保留 daemon、只 unload model：</span>
</span></span><span class="line"><span class="ln">10</span><span class="cl"><span class="c1"># curl -s http://localhost:11434/api/generate -d &#39;{&#34;model&#34;: &#34;&lt;your model&gt;&#34;, &#34;keep_alive&#34;: 0}&#39;</span>
</span></span><span class="line"><span class="ln">11</span><span class="cl">
</span></span><span class="line"><span class="ln">12</span><span class="cl"><span class="c1"># 3. 結束 ComfyUI / 其他 GUI server</span>
</span></span><span class="line"><span class="ln">13</span><span class="cl">pkill -INT -f <span class="s2">&#34;ComfyUI/main.py&#34;</span> 2&gt;/dev/null
</span></span><span class="line"><span class="ln">14</span><span class="cl">pkill -INT -f <span class="s2">&#34;llama-server&#34;</span> 2&gt;/dev/null
</span></span><span class="line"><span class="ln">15</span><span class="cl">sleep <span class="m">5</span>
</span></span><span class="line"><span class="ln">16</span><span class="cl"><span class="c1"># 強制（如果上面沒清乾淨）</span>
</span></span><span class="line"><span class="ln">17</span><span class="cl">pkill -KILL -f <span class="s2">&#34;ComfyUI/main.py&#34;</span> 2&gt;/dev/null
</span></span><span class="line"><span class="ln">18</span><span class="cl">pkill -KILL -f <span class="s2">&#34;llama-server&#34;</span> 2&gt;/dev/null
</span></span><span class="line"><span class="ln">19</span><span class="cl">
</span></span><span class="line"><span class="ln">20</span><span class="cl"><span class="c1"># 4. 驗證所有 port 釋放</span>
</span></span><span class="line"><span class="ln">21</span><span class="cl">lsof -i :11434 -i :1234 -i :8080 -i :8188 -i :8000 2&gt;<span class="p">&amp;</span><span class="m">1</span> <span class="p">|</span> head
</span></span><span class="line"><span class="ln">22</span><span class="cl">
</span></span><span class="line"><span class="ln">23</span><span class="cl"><span class="c1"># 5. 確認釋放量</span>
</span></span><span class="line"><span class="ln">24</span><span class="cl">vm_stat <span class="p">|</span> head -3
</span></span><span class="line"><span class="ln">25</span><span class="cl"><span class="c1"># free pages 該明顯增加</span></span></span></code></pre></div><h3 id="容易出錯的釋放方式">容易出錯的「釋放方式」</h3>
<ul>
<li><strong><code>killall Python</code></strong>：會 kill 所有 Python process、包括其他 dev tool（如 jupyter、Django）。用 <code>pkill -f &quot;ComfyUI/main.py&quot;</code> 等明確 pattern。</li>
<li><strong><code>rm -rf ~/.ollama</code></strong>：會清掉所有 model registry、下次要重 pull 全部 model。Cleanup 用 <code>ollama rm &lt;model&gt;</code> 才精準。</li>
<li><strong><code>brew uninstall ollama</code></strong>：直接卸載 Ollama 本身、過 reinstall 麻煩。Stop service 就夠。</li>
<li><strong>重開機釋放</strong>：work 但太重、會中斷其他工作。用 process-level 操作即可。</li>
</ul>
<h2 id="磁碟長期累積管理">磁碟長期累積管理</h2>
<p>Models 一旦 <code>pull</code> 進 <code>~/.ollama/models/blobs</code>、不主動 <code>rm</code> 不會減少。半年累積可長到 50 GB+。</p>
<p>Ollama models 只是磁碟大戶之一。整台 Mac 突然被吃光、要從哪裡查起的全機診斷順序（先排除快照浮動、再用實際佔用值逐層找大戶），見 <a href="/blog/other/macos-%E7%A3%81%E7%A2%9F%E7%A9%BA%E9%96%93%E8%A2%AB%E5%90%83%E5%85%89%E7%9A%84%E8%A8%BA%E6%96%B7%E6%B5%81%E7%A8%8B/" data-link-title="macOS 磁碟空間被吃光的診斷流程" data-link-desc="Mac 空間莫名歸零、清 cache 沒救、或空間掉了又回來時的排查順序。避開 sparse 假大小和本地快照浮動的誤判。含 disk-report 腳本。">macOS 磁碟空間診斷流程</a>——那篇的佔用大戶表也會把 ollama 列為其中一項、再連回本篇的專屬清理 idiom。</p>
<h3 id="觀察累積">觀察累積</h3>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="c1"># Ollama models 總占用</span>
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">du -sh ~/.ollama/models/blobs
</span></span><span class="line"><span class="ln"> 3</span><span class="cl"><span class="c1"># 4.1G    /Users/tarragon/.ollama/models/blobs</span>
</span></span><span class="line"><span class="ln"> 4</span><span class="cl">
</span></span><span class="line"><span class="ln"> 5</span><span class="cl"><span class="c1"># 逐 model 看大小</span>
</span></span><span class="line"><span class="ln"> 6</span><span class="cl">ollama list
</span></span><span class="line"><span class="ln"> 7</span><span class="cl"><span class="c1"># NAME                       ID              SIZE      MODIFIED</span>
</span></span><span class="line"><span class="ln"> 8</span><span class="cl"><span class="c1"># gemma4:e4b                 c6eb396dbd59    9.6 GB    Less than a second ago</span>
</span></span><span class="line"><span class="ln"> 9</span><span class="cl"><span class="c1"># nomic-embed-text:latest    0a109f422b47    274 MB    3 hours ago</span>
</span></span><span class="line"><span class="ln">10</span><span class="cl">
</span></span><span class="line"><span class="ln">11</span><span class="cl"><span class="c1"># ComfyUI checkpoints 累積</span>
</span></span><span class="line"><span class="ln">12</span><span class="cl">du -sh ~/.ollama ~/Projects/ComfyUI/models 2&gt;/dev/null
</span></span><span class="line"><span class="ln">13</span><span class="cl"><span class="c1"># 4.2G    /Users/tarragon/.ollama</span>
</span></span><span class="line"><span class="ln">14</span><span class="cl"><span class="c1"># 7.0G    /Users/tarragon/Projects/ComfyUI/models</span></span></span></code></pre></div><h3 id="清理策略">清理策略</h3>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="c1"># 刪掉很久沒用的 model</span>
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">ollama rm &lt;model-tag&gt;
</span></span><span class="line"><span class="ln"> 3</span><span class="cl">
</span></span><span class="line"><span class="ln"> 4</span><span class="cl"><span class="c1"># 一次清掉所有 Ollama models（保留 daemon）</span>
</span></span><span class="line"><span class="ln"> 5</span><span class="cl">ollama list <span class="p">|</span> tail -n +2 <span class="p">|</span> awk <span class="s1">&#39;{print $1}&#39;</span> <span class="p">|</span> xargs -I <span class="o">{}</span> ollama rm <span class="o">{}</span>
</span></span><span class="line"><span class="ln"> 6</span><span class="cl">
</span></span><span class="line"><span class="ln"> 7</span><span class="cl"><span class="c1"># 看 ComfyUI checkpoints 哪些可清</span>
</span></span><span class="line"><span class="ln"> 8</span><span class="cl">ls -lh ~/Projects/ComfyUI/models/checkpoints/
</span></span><span class="line"><span class="ln"> 9</span><span class="cl">
</span></span><span class="line"><span class="ln">10</span><span class="cl"><span class="c1"># 手動刪不要的 .safetensors（小心、不能 undo）</span>
</span></span><span class="line"><span class="ln">11</span><span class="cl">rm ~/Projects/ComfyUI/models/checkpoints/&lt;old-model&gt;.safetensors</span></span></code></pre></div><h3 id="磁碟管理-idiom">磁碟管理 idiom</h3>
<p>定期（每月或磁碟剩 &lt; 20% 時）做：</p>
<ol>
<li><code>du -sh ~/.ollama ~/Projects/ComfyUI/models</code> 看當前累積</li>
<li><code>ollama list</code> 看哪些 model 沒在用（看 <code>MODIFIED</code> 欄、太舊的考慮刪）</li>
<li>刪實驗用的 model、保留 daily-driver</li>
<li>ComfyUI checkpoints 同樣 review</li>
</ol>
<h2 id="port--process-排錯">Port / Process 排錯</h2>
<h3 id="啟動報address-already-in-use">啟動報「address already in use」</h3>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="c1"># 找誰占</span>
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">lsof -i :11434
</span></span><span class="line"><span class="ln"> 3</span><span class="cl"><span class="c1"># COMMAND  PID  USER   ...   NAME</span>
</span></span><span class="line"><span class="ln"> 4</span><span class="cl"><span class="c1"># ollama   xxx  ...    ...   TCP localhost:11434 (LISTEN)</span>
</span></span><span class="line"><span class="ln"> 5</span><span class="cl">
</span></span><span class="line"><span class="ln"> 6</span><span class="cl"><span class="c1"># 看是不是 zombie process</span>
</span></span><span class="line"><span class="ln"> 7</span><span class="cl">ps aux <span class="p">|</span> grep <span class="k">$(</span>lsof -ti :11434 <span class="p">|</span> head -1<span class="k">)</span>
</span></span><span class="line"><span class="ln"> 8</span><span class="cl">
</span></span><span class="line"><span class="ln"> 9</span><span class="cl"><span class="c1"># 清掉</span>
</span></span><span class="line"><span class="ln">10</span><span class="cl"><span class="nb">kill</span> -9 <span class="k">$(</span>lsof -ti :11434<span class="k">)</span>
</span></span><span class="line"><span class="ln">11</span><span class="cl">
</span></span><span class="line"><span class="ln">12</span><span class="cl"><span class="c1"># 或重啟 service（會自動清舊 instance）</span>
</span></span><span class="line"><span class="ln">13</span><span class="cl">brew services restart ollama</span></span></code></pre></div><h3 id="ollama-daemon-掛了不知道">Ollama daemon 掛了不知道</h3>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="c1"># 健康檢查</span>
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">curl -s http://localhost:11434/api/version
</span></span><span class="line"><span class="ln"> 3</span><span class="cl">
</span></span><span class="line"><span class="ln"> 4</span><span class="cl"><span class="c1"># 沒回應、看 service 狀態</span>
</span></span><span class="line"><span class="ln"> 5</span><span class="cl">brew services list <span class="p">|</span> grep ollama
</span></span><span class="line"><span class="ln"> 6</span><span class="cl">
</span></span><span class="line"><span class="ln"> 7</span><span class="cl"><span class="c1"># 沒在跑、重啟</span>
</span></span><span class="line"><span class="ln"> 8</span><span class="cl">brew services start ollama
</span></span><span class="line"><span class="ln"> 9</span><span class="cl">
</span></span><span class="line"><span class="ln">10</span><span class="cl"><span class="c1"># 看 log</span>
</span></span><span class="line"><span class="ln">11</span><span class="cl">tail -50 /opt/homebrew/var/log/ollama.log</span></span></code></pre></div><h3 id="comfyui-看似跑著但-queue-不動">ComfyUI 看似跑著但 Queue 不動</h3>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln">1</span><span class="cl"><span class="c1"># 看 stdout / stderr log</span>
</span></span><span class="line"><span class="ln">2</span><span class="cl">tail -30 /tmp/comfyui.log  <span class="c1"># 如果啟動時 redirect 到 log</span>
</span></span><span class="line"><span class="ln">3</span><span class="cl">
</span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="c1"># 看是不是 GPU / Metal stuck（極少見、但 SDXL 大量並發可能踩到）</span>
</span></span><span class="line"><span class="ln">5</span><span class="cl"><span class="c1"># 解法：kill + 重啟</span>
</span></span><span class="line"><span class="ln">6</span><span class="cl">pkill -9 -f <span class="s2">&#34;ComfyUI/main.py&#34;</span></span></span></code></pre></div><p>完整排錯流程跟「先確認哪一層壞」見 <a href="/blog/llm/01-local-llm-services/troubleshooting/" data-link-title="1.7 排錯方法論：用三層架構做故障定位" data-link-desc="故障定位的分層思考、症狀到層級的對應反射、log 在三層的角色差異、最小可重現的縮減策略">1.7 排錯方法論</a>。</p>
<h2 id="觀察記憶體佔用實測對照">觀察記憶體佔用：實測對照</h2>
<p>跑這幾步紀錄 baseline → load model → kill 的 RAM 變化：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="c1"># Baseline</span>
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">vm_stat <span class="p">|</span> grep <span class="s2">&#34;Pages free&#34;</span>
</span></span><span class="line"><span class="ln"> 3</span><span class="cl"><span class="c1"># Pages free:                              1090076.   ← ~17 GB free</span>
</span></span><span class="line"><span class="ln"> 4</span><span class="cl">
</span></span><span class="line"><span class="ln"> 5</span><span class="cl"><span class="c1"># 啟動 Ollama + load 4B model</span>
</span></span><span class="line"><span class="ln"> 6</span><span class="cl">brew services start ollama
</span></span><span class="line"><span class="ln"> 7</span><span class="cl">ollama run gemma3:4b <span class="s2">&#34;hello&#34;</span>
</span></span><span class="line"><span class="ln"> 8</span><span class="cl">ollama ps
</span></span><span class="line"><span class="ln"> 9</span><span class="cl"><span class="c1"># NAME       SIZE     PROCESSOR    UNTIL</span>
</span></span><span class="line"><span class="ln">10</span><span class="cl"><span class="c1"># gemma3:4b  5.5 GB   100% Metal   4 minutes from now</span>
</span></span><span class="line"><span class="ln">11</span><span class="cl">
</span></span><span class="line"><span class="ln">12</span><span class="cl">vm_stat <span class="p">|</span> grep <span class="s2">&#34;Pages free&#34;</span>
</span></span><span class="line"><span class="ln">13</span><span class="cl"><span class="c1"># Pages free:                               750000.   ← 跌 ~5 GB（model 載入）</span>
</span></span><span class="line"><span class="ln">14</span><span class="cl">
</span></span><span class="line"><span class="ln">15</span><span class="cl"><span class="c1"># 額外啟動 ComfyUI + load SDXL</span>
</span></span><span class="line"><span class="ln">16</span><span class="cl">nohup python main.py &gt; /tmp/comfyui.log 2&gt;<span class="p">&amp;</span><span class="m">1</span> <span class="p">&amp;</span>
</span></span><span class="line"><span class="ln">17</span><span class="cl"><span class="c1"># 在 GUI 上 Queue Prompt 跑一次 SDXL generation</span>
</span></span><span class="line"><span class="ln">18</span><span class="cl">vm_stat <span class="p">|</span> grep <span class="s2">&#34;Pages free&#34;</span>
</span></span><span class="line"><span class="ln">19</span><span class="cl"><span class="c1"># Pages free:                               280000.   ← 再跌 ~7.5 GB（SDXL 載入 + Python venv）</span>
</span></span><span class="line"><span class="ln">20</span><span class="cl">
</span></span><span class="line"><span class="ln">21</span><span class="cl"><span class="c1"># kill 全部</span>
</span></span><span class="line"><span class="ln">22</span><span class="cl">brew services stop ollama
</span></span><span class="line"><span class="ln">23</span><span class="cl">pkill -9 -f <span class="s2">&#34;ComfyUI/main.py&#34;</span>
</span></span><span class="line"><span class="ln">24</span><span class="cl">sleep <span class="m">3</span>
</span></span><span class="line"><span class="ln">25</span><span class="cl">vm_stat <span class="p">|</span> grep <span class="s2">&#34;Pages free&#34;</span>
</span></span><span class="line"><span class="ln">26</span><span class="cl"><span class="c1"># Pages free:                              1090000.   ← 回到 baseline</span></span></span></code></pre></div><p>每 page 16 KB、所以 free pages 數字 × 16 KB = 實際 free RAM bytes。</p>
<h2 id="自動化釋放launchd--shell-alias">自動化釋放：launchd / shell alias</h2>
<p>寫個 shell function 一鍵 cleanup：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="ln"> 1</span><span class="cl"><span class="c1"># 加進 ~/.zshrc</span>
</span></span><span class="line"><span class="ln"> 2</span><span class="cl">llm-cleanup<span class="o">()</span> <span class="o">{</span>
</span></span><span class="line"><span class="ln"> 3</span><span class="cl">  <span class="nb">echo</span> <span class="s2">&#34;[*] Stopping Ollama...&#34;</span>
</span></span><span class="line"><span class="ln"> 4</span><span class="cl">  brew services stop ollama 2&gt;/dev/null
</span></span><span class="line"><span class="ln"> 5</span><span class="cl">
</span></span><span class="line"><span class="ln"> 6</span><span class="cl">  <span class="nb">echo</span> <span class="s2">&#34;[*] Killing ComfyUI...&#34;</span>
</span></span><span class="line"><span class="ln"> 7</span><span class="cl">  pkill -INT -f <span class="s2">&#34;ComfyUI/main.py&#34;</span> 2&gt;/dev/null
</span></span><span class="line"><span class="ln"> 8</span><span class="cl">  sleep <span class="m">3</span>
</span></span><span class="line"><span class="ln"> 9</span><span class="cl">  pkill -KILL -f <span class="s2">&#34;ComfyUI/main.py&#34;</span> 2&gt;/dev/null
</span></span><span class="line"><span class="ln">10</span><span class="cl">
</span></span><span class="line"><span class="ln">11</span><span class="cl">  <span class="nb">echo</span> <span class="s2">&#34;[*] Killing other model servers...&#34;</span>
</span></span><span class="line"><span class="ln">12</span><span class="cl">  pkill -KILL -f <span class="s2">&#34;llama-server&#34;</span> 2&gt;/dev/null
</span></span><span class="line"><span class="ln">13</span><span class="cl">  pkill -KILL -f <span class="s2">&#34;lm-studio-server&#34;</span> 2&gt;/dev/null
</span></span><span class="line"><span class="ln">14</span><span class="cl">
</span></span><span class="line"><span class="ln">15</span><span class="cl">  <span class="nb">echo</span> <span class="s2">&#34;[*] Verifying ports...&#34;</span>
</span></span><span class="line"><span class="ln">16</span><span class="cl">  <span class="k">for</span> p in <span class="m">11434</span> <span class="m">1234</span> <span class="m">8080</span> <span class="m">8188</span> 8000<span class="p">;</span> <span class="k">do</span>
</span></span><span class="line"><span class="ln">17</span><span class="cl">    lsof -i :<span class="nv">$p</span> 2&gt;/dev/null <span class="p">|</span> head -2
</span></span><span class="line"><span class="ln">18</span><span class="cl">  <span class="k">done</span>
</span></span><span class="line"><span class="ln">19</span><span class="cl">
</span></span><span class="line"><span class="ln">20</span><span class="cl">  <span class="nb">echo</span> <span class="s2">&#34;[*] Free RAM:&#34;</span>
</span></span><span class="line"><span class="ln">21</span><span class="cl">  vm_stat <span class="p">|</span> grep <span class="s2">&#34;Pages free&#34;</span>
</span></span><span class="line"><span class="ln">22</span><span class="cl"><span class="o">}</span></span></span></code></pre></div><p>完事打 <code>llm-cleanup</code> 一鍵釋放、不用記每個 process 怎麼 kill。</p>
<h2 id="何時這篇會過時">何時這篇會過時</h2>
<p><strong>不會過時的部分</strong>：</p>
<ul>
<li>RAM / 磁碟 / port 三個 dimension 是長期 invariant、用什麼 LLM server 都成立。</li>
<li>「Mac 是 shared resource、需要主動管理」這個 framing。</li>
<li>Ollama 跟 ComfyUI 兩種典型 lifecycle 對比（auto-unload vs persistent）。</li>
<li>觀察工具（<code>vm_stat</code>、<code>lsof</code>、<code>ps</code>、<code>du</code>、Activity Monitor）是 macOS 系統 API、不會 deprecate。</li>
<li>標準釋放程序、自動化 shell function 模式。</li>
</ul>
<p><strong>會變的部分</strong>：</p>
<ul>
<li>具體 model size / RAM 占用數字（隨模型架構演化）。</li>
<li><code>OLLAMA_KEEP_ALIVE</code> 等具體環境變數名（Ollama API 演化）。</li>
<li>ComfyUI 可能加 auto-unload feature（社群有 issue 在討論）。</li>
</ul>
<p>讀的時候若指令跑不過、先 <code>--help</code> 看當前版本 flag；釋放 RAM 的「kill process」這個機制本身永遠成立。</p>
<h2 id="跟其他-hands-on-章節的關係">跟其他 hands-on 章節的關係</h2>
<ul>
<li><a href="/blog/llm/01-local-llm-services/hands-on/ollama-setup/" data-link-title="Hands-on：安裝 Ollama &#43; 拉第一個 Gemma 模型" data-link-desc="brew install ollama、launchd service、ollama pull、curl 驗證 OpenAI 相容 API">Ollama 安裝</a>：介紹 <code>brew services start/stop</code>、本篇延伸 lifecycle 細節</li>
<li><a href="/blog/llm/01-local-llm-services/hands-on/comfyui-setup/" data-link-title="Hands-on：安裝 ComfyUI &#43; SDXL base" data-link-desc="git clone、venv、pip install requirements、SDXL safetensors 放哪、--listen 啟動 server、瀏覽器 workflow 驗證">ComfyUI 安裝</a>：介紹 ComfyUI 啟動、本篇延伸 RAM 占用 + 釋放</li>
<li><a href="/blog/llm/01-local-llm-services/troubleshooting/" data-link-title="1.7 排錯方法論：用三層架構做故障定位" data-link-desc="故障定位的分層思考、症狀到層級的對應反射、log 在三層的角色差異、最小可重現的縮減策略">1.7 排錯方法論</a>：用三層架構定位故障、本篇是 lifecycle 視角的補完</li>
<li><a href="/blog/llm/00-foundations/privacy-data-flow/" data-link-title="0.7 隱私 / 資安的資料流原理" data-link-desc="從「位置」到「資料流」的思考升級：信任邊界、合約模型、零信任原則套用到 LLM 工作流">0.7 隱私資料流原理</a>：「每個 hop 都要 audit」延伸到資源層</li>
</ul>
<p>整體心法：本地 LLM 工作流跟雲端不一樣、要主動管理 lifecycle、不能裝完就忘。</p>
]]></content:encoded></item><item><title>Hands-on：本地 AI 工具實作筆記</title><link>https://tarrragon.github.io/blog/llm/01-local-llm-services/hands-on/</link><pubDate>Mon, 11 May 2026 00:00:00 +0000</pubDate><guid>https://tarrragon.github.io/blog/llm/01-local-llm-services/hands-on/</guid><description>&lt;p>本子資料夾收錄本地 AI 工具的實際安裝跟驗證紀錄。跟 1.x 原理章節的關係：&lt;/p>
&lt;table>
 &lt;thead>
 &lt;tr>
 &lt;th>1.x 原理章節&lt;/th>
 &lt;th>Hands-on 紀錄&lt;/th>
 &lt;/tr>
 &lt;/thead>
 &lt;tbody>
 &lt;tr>
 &lt;td>為什麼選 Ollama&lt;/td>
 &lt;td>實際 &lt;code>brew install&lt;/code> + &lt;code>ollama pull&lt;/code> 流程&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>Speculative decoding 原理&lt;/td>
 &lt;td>MTP 模型實際載入 + 速度量測&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>ComfyUI 在生態的位置&lt;/td>
 &lt;td>實際 git clone + Python 環境 + 模型路徑配置&lt;/td>
 &lt;/tr>
 &lt;/tbody>
&lt;/table>
&lt;p>本資料夾的內容&lt;strong>會隨工具版本演化&lt;/strong>：指令、目錄結構、相依套件版本都會變。寫的時間戳記在每篇開頭、版本資訊在 frontmatter。跟 1.x 原理章節的差別是「原理跨工具世代不變、實作筆記是當下這版的快照」。&lt;/p>
&lt;h2 id="章節列表">章節列表&lt;/h2>
&lt;table>
 &lt;thead>
 &lt;tr>
 &lt;th>章節&lt;/th>
 &lt;th>主題&lt;/th>
 &lt;/tr>
 &lt;/thead>
 &lt;tbody>
 &lt;tr>
 &lt;td>&lt;a href="https://tarrragon.github.io/blog/llm/01-local-llm-services/hands-on/quickstart/" data-link-title="Hands-on Quickstart：clone repo 後跑通所有 demo" data-link-desc="4 步驟跑通 RAG / MCP / permission demo 的 setup 跟驗證指令、整合 hands-on 系列所有章節的 prerequisite">Quickstart：clone repo 後跑通所有 demo&lt;/a>&lt;/td>
 &lt;td>4 步驟整合 setup、跑 RAG / MCP / permission demo、跨 hands-on 系列導讀&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>&lt;a href="https://tarrragon.github.io/blog/llm/01-local-llm-services/hands-on/ollama-setup/" data-link-title="Hands-on：安裝 Ollama &amp;#43; 拉第一個 Gemma 模型" data-link-desc="brew install ollama、launchd service、ollama pull、curl 驗證 OpenAI 相容 API">Ollama 安裝 + Gemma 模型&lt;/a>&lt;/td>
 &lt;td>brew install、ollama pull、curl 驗證&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>&lt;a href="https://tarrragon.github.io/blog/llm/01-local-llm-services/hands-on/comfyui-setup/" data-link-title="Hands-on：安裝 ComfyUI &amp;#43; SDXL base" data-link-desc="git clone、venv、pip install requirements、SDXL safetensors 放哪、--listen 啟動 server、瀏覽器 workflow 驗證">ComfyUI + Stable Diffusion XL&lt;/a>&lt;/td>
 &lt;td>git clone、Python 環境、SDXL 模型放哪&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>&lt;a href="https://tarrragon.github.io/blog/llm/01-local-llm-services/hands-on/whisper-setup/" data-link-title="Hands-on：安裝 whisper.cpp 做語音轉文字" data-link-desc="brew install whisper-cpp、下載 GGML model、Metal 加速、ffmpeg 餵 WAV、484ms 完成 7 秒音訊轉錄">Whisper 語音轉文字&lt;/a>&lt;/td>
 &lt;td>&lt;code>brew install whisper-cpp&lt;/code> + Metal 加速、GGML 模型選擇、&lt;code>whisper-cli&lt;/code> + ffmpeg 驗證轉錄&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>&lt;a href="https://tarrragon.github.io/blog/llm/01-local-llm-services/hands-on/piper-tts-setup/" data-link-title="Hands-on：安裝 Piper TTS 做文字轉語音" data-link-desc="pip install piper-tts、ONNX voice model、stdin 餵文字、WAV 輸出、跟 Whisper 互為 round-trip 驗證">Piper TTS 文字轉語音&lt;/a>&lt;/td>
 &lt;td>下載 binary、voice 選擇、wav 輸出&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>&lt;a href="https://tarrragon.github.io/blog/llm/01-local-llm-services/hands-on/rag-demo/" data-link-title="Hands-on：用 blog content 當 corpus 跑 RAG" data-link-desc="200 行 Python：embedding &amp;#43; cosine retrieval &amp;#43; Ollama chat、validating 4.0 RAG 原理">RAG demo：用 blog content 當 corpus&lt;/a>&lt;/td>
 &lt;td>embedding + retrieval、串 Ollama&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>&lt;a href="https://tarrragon.github.io/blog/llm/01-local-llm-services/hands-on/mcp-demo/" data-link-title="Hands-on：用 blog content 寫一個最小 MCP server" data-link-desc="stdio JSON-RPC、stdlib-only Python、暴露 blog content 給 LLM 用、validating 4.3 應用層協議">MCP server demo：暴露 blog content&lt;/a>&lt;/td>
 &lt;td>最小 MCP server、給 LLM 用&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>&lt;a href="https://tarrragon.github.io/blog/llm/01-local-llm-services/hands-on/permission-boundary/" data-link-title="Hands-on：Ollama 改檔案 / 寫程式碼的權限邊界在哪" data-link-desc="四組對照實驗：Ollama 自己沒 FS / shell 權限、wrapper 才有；--dry-run / --confirm / --auto 三檔審查粒度的取捨">權限邊界實驗：LLM 改檔案 / 寫 shell 誰執行&lt;/a>&lt;/td>
 &lt;td>LLM 是 pure function、wrapper 才是權限 gate、&lt;code>--dry-run&lt;/code> / &lt;code>--confirm&lt;/code> / &lt;code>--auto&lt;/code> 取捨&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>&lt;a href="https://tarrragon.github.io/blog/llm/01-local-llm-services/hands-on/instruction-following-test/" data-link-title="Hands-on：跨資料夾風格 follow 任務的模型對比" data-link-desc="1B / 4B / 8B / 跨代 4B 在「讀風格參考、follow 既有格式、寫新章節」任務上的 structural metrics 對比、揭示 model size 不是唯一因素">跨資料夾風格 follow 任務的 model size 對比&lt;/a>&lt;/td>
 &lt;td>1B vs 4B 在「讀資料夾、follow 既有格式、寫新章節」任務上的 structural metrics phase transition&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>&lt;a href="https://tarrragon.github.io/blog/llm/01-local-llm-services/hands-on/resource-management/" data-link-title="Hands-on：LLM 運行中 &amp;#43; 結束的資源管理" data-link-desc="RAM / 磁碟 / port 三個 dimension 的觀察跟釋放、Ollama keep_alive 跟 ComfyUI 兩種 lifecycle 對比、實測釋放數字">LLM 運行中 + 結束的資源管理&lt;/a>&lt;/td>
 &lt;td>RAM / 磁碟 / port 三 dimension 觀察、Ollama auto-unload vs ComfyUI persistent lifecycle、實測釋放數字、自動化 cleanup shell function&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>&lt;a href="https://tarrragon.github.io/blog/llm/01-local-llm-services/hands-on/rag-mcp-resources/" data-link-title="Hands-on：RAG / MCP 的資源 footprint" data-link-desc="RAG ingest / query / MCP server 三階段的 RAM / 磁碟 / process 實測、多模型並存的 RAM 衝突、本地 LLM 跑 RAG 跟單純 chat 的差異">RAG / MCP 的資源 footprint&lt;/a>&lt;/td>
 &lt;td>RAG ingest / query / MCP server 三階段 RAM / 磁碟 / process 實測、多模型並存 RAM 衝突、長期累積管理&lt;/td>
 &lt;/tr>
 &lt;/tbody>
&lt;/table>
&lt;h2 id="通用前置">通用前置&lt;/h2>
&lt;p>所有工具都假設你的 Mac 滿足：&lt;/p></description><content:encoded><![CDATA[<p>本子資料夾收錄本地 AI 工具的實際安裝跟驗證紀錄。跟 1.x 原理章節的關係：</p>
<table>
  <thead>
      <tr>
          <th>1.x 原理章節</th>
          <th>Hands-on 紀錄</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>為什麼選 Ollama</td>
          <td>實際 <code>brew install</code> + <code>ollama pull</code> 流程</td>
      </tr>
      <tr>
          <td>Speculative decoding 原理</td>
          <td>MTP 模型實際載入 + 速度量測</td>
      </tr>
      <tr>
          <td>ComfyUI 在生態的位置</td>
          <td>實際 git clone + Python 環境 + 模型路徑配置</td>
      </tr>
  </tbody>
</table>
<p>本資料夾的內容<strong>會隨工具版本演化</strong>：指令、目錄結構、相依套件版本都會變。寫的時間戳記在每篇開頭、版本資訊在 frontmatter。跟 1.x 原理章節的差別是「原理跨工具世代不變、實作筆記是當下這版的快照」。</p>
<h2 id="章節列表">章節列表</h2>
<table>
  <thead>
      <tr>
          <th>章節</th>
          <th>主題</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td><a href="/blog/llm/01-local-llm-services/hands-on/quickstart/" data-link-title="Hands-on Quickstart：clone repo 後跑通所有 demo" data-link-desc="4 步驟跑通 RAG / MCP / permission demo 的 setup 跟驗證指令、整合 hands-on 系列所有章節的 prerequisite">Quickstart：clone repo 後跑通所有 demo</a></td>
          <td>4 步驟整合 setup、跑 RAG / MCP / permission demo、跨 hands-on 系列導讀</td>
      </tr>
      <tr>
          <td><a href="/blog/llm/01-local-llm-services/hands-on/ollama-setup/" data-link-title="Hands-on：安裝 Ollama &#43; 拉第一個 Gemma 模型" data-link-desc="brew install ollama、launchd service、ollama pull、curl 驗證 OpenAI 相容 API">Ollama 安裝 + Gemma 模型</a></td>
          <td>brew install、ollama pull、curl 驗證</td>
      </tr>
      <tr>
          <td><a href="/blog/llm/01-local-llm-services/hands-on/comfyui-setup/" data-link-title="Hands-on：安裝 ComfyUI &#43; SDXL base" data-link-desc="git clone、venv、pip install requirements、SDXL safetensors 放哪、--listen 啟動 server、瀏覽器 workflow 驗證">ComfyUI + Stable Diffusion XL</a></td>
          <td>git clone、Python 環境、SDXL 模型放哪</td>
      </tr>
      <tr>
          <td><a href="/blog/llm/01-local-llm-services/hands-on/whisper-setup/" data-link-title="Hands-on：安裝 whisper.cpp 做語音轉文字" data-link-desc="brew install whisper-cpp、下載 GGML model、Metal 加速、ffmpeg 餵 WAV、484ms 完成 7 秒音訊轉錄">Whisper 語音轉文字</a></td>
          <td><code>brew install whisper-cpp</code> + Metal 加速、GGML 模型選擇、<code>whisper-cli</code> + ffmpeg 驗證轉錄</td>
      </tr>
      <tr>
          <td><a href="/blog/llm/01-local-llm-services/hands-on/piper-tts-setup/" data-link-title="Hands-on：安裝 Piper TTS 做文字轉語音" data-link-desc="pip install piper-tts、ONNX voice model、stdin 餵文字、WAV 輸出、跟 Whisper 互為 round-trip 驗證">Piper TTS 文字轉語音</a></td>
          <td>下載 binary、voice 選擇、wav 輸出</td>
      </tr>
      <tr>
          <td><a href="/blog/llm/01-local-llm-services/hands-on/rag-demo/" data-link-title="Hands-on：用 blog content 當 corpus 跑 RAG" data-link-desc="200 行 Python：embedding &#43; cosine retrieval &#43; Ollama chat、validating 4.0 RAG 原理">RAG demo：用 blog content 當 corpus</a></td>
          <td>embedding + retrieval、串 Ollama</td>
      </tr>
      <tr>
          <td><a href="/blog/llm/01-local-llm-services/hands-on/mcp-demo/" data-link-title="Hands-on：用 blog content 寫一個最小 MCP server" data-link-desc="stdio JSON-RPC、stdlib-only Python、暴露 blog content 給 LLM 用、validating 4.3 應用層協議">MCP server demo：暴露 blog content</a></td>
          <td>最小 MCP server、給 LLM 用</td>
      </tr>
      <tr>
          <td><a href="/blog/llm/01-local-llm-services/hands-on/permission-boundary/" data-link-title="Hands-on：Ollama 改檔案 / 寫程式碼的權限邊界在哪" data-link-desc="四組對照實驗：Ollama 自己沒 FS / shell 權限、wrapper 才有；--dry-run / --confirm / --auto 三檔審查粒度的取捨">權限邊界實驗：LLM 改檔案 / 寫 shell 誰執行</a></td>
          <td>LLM 是 pure function、wrapper 才是權限 gate、<code>--dry-run</code> / <code>--confirm</code> / <code>--auto</code> 取捨</td>
      </tr>
      <tr>
          <td><a href="/blog/llm/01-local-llm-services/hands-on/instruction-following-test/" data-link-title="Hands-on：跨資料夾風格 follow 任務的模型對比" data-link-desc="1B / 4B / 8B / 跨代 4B 在「讀風格參考、follow 既有格式、寫新章節」任務上的 structural metrics 對比、揭示 model size 不是唯一因素">跨資料夾風格 follow 任務的 model size 對比</a></td>
          <td>1B vs 4B 在「讀資料夾、follow 既有格式、寫新章節」任務上的 structural metrics phase transition</td>
      </tr>
      <tr>
          <td><a href="/blog/llm/01-local-llm-services/hands-on/resource-management/" data-link-title="Hands-on：LLM 運行中 &#43; 結束的資源管理" data-link-desc="RAM / 磁碟 / port 三個 dimension 的觀察跟釋放、Ollama keep_alive 跟 ComfyUI 兩種 lifecycle 對比、實測釋放數字">LLM 運行中 + 結束的資源管理</a></td>
          <td>RAM / 磁碟 / port 三 dimension 觀察、Ollama auto-unload vs ComfyUI persistent lifecycle、實測釋放數字、自動化 cleanup shell function</td>
      </tr>
      <tr>
          <td><a href="/blog/llm/01-local-llm-services/hands-on/rag-mcp-resources/" data-link-title="Hands-on：RAG / MCP 的資源 footprint" data-link-desc="RAG ingest / query / MCP server 三階段的 RAM / 磁碟 / process 實測、多模型並存的 RAM 衝突、本地 LLM 跑 RAG 跟單純 chat 的差異">RAG / MCP 的資源 footprint</a></td>
          <td>RAG ingest / query / MCP server 三階段 RAM / 磁碟 / process 實測、多模型並存 RAM 衝突、長期累積管理</td>
      </tr>
  </tbody>
</table>
<h2 id="通用前置">通用前置</h2>
<p>所有工具都假設你的 Mac 滿足：</p>
<ul>
<li>Apple Silicon Mac（M1 / M2 / M3 / M4）</li>
<li>macOS 14 (Sonoma) 或以上</li>
<li>Homebrew 安裝完成（<code>brew --version</code> 可看版本）</li>
<li>至少 16 GB 統一記憶體（24 GB+ 較順）</li>
<li>至少 20 GB 可用磁碟空間（本系列總共會佔約 15 GB）</li>
</ul>
<p>需要 Python 環境的工具（ComfyUI、Whisper）會用 venv 隔離、不污染系統 Python。</p>
<h2 id="驗證紀錄環境">驗證紀錄環境</h2>
<p>本系列的指令在以下環境驗證：</p>
<table>
  <thead>
      <tr>
          <th>項目</th>
          <th>版本</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>macOS</td>
          <td>Darwin 24.3.0（Sonoma 14.x）</td>
      </tr>
      <tr>
          <td>Homebrew</td>
          <td>由 <code>/opt/homebrew/bin/brew</code> 提供</td>
      </tr>
      <tr>
          <td>Python</td>
          <td>3.x（系統或 pyenv 都可）</td>
      </tr>
      <tr>
          <td>驗證日期</td>
          <td>2026-05-11</td>
      </tr>
  </tbody>
</table>
<p>換 Mac 規格、換 macOS 版本、半年後再讀本系列、指令可能要小調整、但<strong>前置設定的種類跟驗證步驟的結構</strong>通常不變。看到指令跑不過時、回 1.7 <a href="/blog/llm/01-local-llm-services/troubleshooting/" data-link-title="1.7 排錯方法論：用三層架構做故障定位" data-link-desc="故障定位的分層思考、症狀到層級的對應反射、log 在三層的角色差異、最小可重現的縮減策略">排錯方法論</a> 的三層架構定位、不要把錯誤訊息當絕對。</p>
]]></content:encoded></item></channel></rss>