<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/"><channel><title>Metadata-Lock on Tarragon</title><link>https://tarrragon.github.io/blog/tags/metadata-lock/</link><description>Recent content in Metadata-Lock on Tarragon</description><generator>Hugo -- gohugo.io</generator><language>zh-TW</language><copyright>Tarragon (CC BY 4.0)</copyright><lastBuildDate>Fri, 22 May 2026 00:00:00 +0000</lastBuildDate><atom:link href="https://tarrragon.github.io/blog/tags/metadata-lock/index.xml" rel="self" type="application/rss+xml"/><item><title>MySQL Metadata Lock Deep Dive</title><link>https://tarrragon.github.io/blog/backend/01-database/vendors/mysql/metadata-lock-deep-dive/</link><pubDate>Fri, 22 May 2026 00:00:00 +0000</pubDate><guid>https://tarrragon.github.io/blog/backend/01-database/vendors/mysql/metadata-lock-deep-dive/</guid><description>&lt;p>MySQL metadata lock deep dive 的核心責任是說明 DDL、transaction 與 table metadata 之間的阻塞關係。MySQL 在查詢 table 時會取得 &lt;a href="https://tarrragon.github.io/blog/backend/knowledge-cards/metadata-lock/" data-link-title="Metadata Lock" data-link-desc="說明 DDL 與既有交易如何在 table metadata 層互相排隊與阻塞">metadata lock&lt;/a>；DDL 需要等待既有 metadata lock 釋放，等待中的 DDL 又會阻塞後續查詢，形成 production 常見雪崩。&lt;/p>
&lt;p>本文的判讀錨點是：MDL 事故通常來自 DDL 排隊在長交易後面，並把後續 query 一起擋住。解法要同時處理 long transaction、DDL window、OSC 工具與 observability。&lt;/p>
&lt;h2 id="lock-lifecycle">Lock Lifecycle&lt;/h2>
&lt;p>Lock lifecycle 的核心責任是建立 MDL 心智模型。&lt;/p>
&lt;table>
 &lt;thead>
 &lt;tr>
 &lt;th>行為&lt;/th>
 &lt;th>MDL 影響&lt;/th>
 &lt;/tr>
 &lt;/thead>
 &lt;tbody>
 &lt;tr>
 &lt;td>&lt;code>SELECT&lt;/code> / DML&lt;/td>
 &lt;td>取得 table metadata lock，交易結束釋放&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>Long transaction&lt;/td>
 &lt;td>延長 metadata lock 持有時間&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>&lt;code>ALTER TABLE&lt;/code>&lt;/td>
 &lt;td>等待相容鎖，期間可能阻塞後續 query&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>Online schema change&lt;/td>
 &lt;td>仍需 metadata lock 進行切換 / rename&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>Idle transaction&lt;/td>
 &lt;td>看似無操作，仍可能持有 metadata lock&lt;/td>
 &lt;/tr>
 &lt;/tbody>
&lt;/table>
&lt;p>MDL 的風險在於排隊。當 &lt;code>ALTER TABLE&lt;/code> 等待 long transaction 時，後續新的 query 可能排在 DDL 後面，讓原本小變更變成服務不可用。&lt;/p>
&lt;h2 id="detection">Detection&lt;/h2>
&lt;p>Detection 的核心責任是快速找出誰持鎖、誰等待。&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-sql" data-lang="sql">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">&lt;span class="k">SELECT&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="o">*&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">2&lt;/span>&lt;span class="cl">&lt;span class="w">&lt;/span>&lt;span class="k">FROM&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="n">performance_schema&lt;/span>&lt;span class="p">.&lt;/span>&lt;span class="n">metadata_locks&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">3&lt;/span>&lt;span class="cl">&lt;span class="w">&lt;/span>&lt;span class="k">WHERE&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="n">OBJECT_SCHEMA&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="s1">&amp;#39;appdb&amp;#39;&lt;/span>&lt;span class="w">
&lt;/span>&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="ln">4&lt;/span>&lt;span class="cl">&lt;span class="w">&lt;/span>&lt;span class="k">ORDER&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="k">BY&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="n">OBJECT_NAME&lt;/span>&lt;span class="p">,&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="n">LOCK_STATUS&lt;/span>&lt;span class="p">;&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>搭配 processlist：&lt;/p>





&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-sql" data-lang="sql">&lt;span class="line">&lt;span class="ln">1&lt;/span>&lt;span class="cl">&lt;span class="k">SHOW&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="k">FULL&lt;/span>&lt;span class="w"> &lt;/span>&lt;span class="n">PROCESSLIST&lt;/span>&lt;span class="p">;&lt;/span>&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>Production dashboard 應監控 running DDL、metadata lock wait、long transaction age、threads running、blocked query count 與 replication lag。&lt;/p>
&lt;h2 id="ddl-risk-review">DDL Risk Review&lt;/h2>
&lt;p>DDL risk review 的核心責任是在變更前預測 MDL 風險。&lt;/p>
&lt;table>
 &lt;thead>
 &lt;tr>
 &lt;th>DDL 類型&lt;/th>
 &lt;th>風險&lt;/th>
 &lt;th>控制方式&lt;/th>
 &lt;/tr>
 &lt;/thead>
 &lt;tbody>
 &lt;tr>
 &lt;td>Add nullable column&lt;/td>
 &lt;td>依版本 / algorithm 可能較低&lt;/td>
 &lt;td>staging dry run、algorithm check&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>Add index&lt;/td>
 &lt;td>可能長時間操作與切換 lock&lt;/td>
 &lt;td>online DDL / OSC、低峰窗口&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>Change column type&lt;/td>
 &lt;td>table rebuild 風險高&lt;/td>
 &lt;td>ghost table / phased migration&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>Rename / swap table&lt;/td>
 &lt;td>短暫但關鍵 MDL&lt;/td>
 &lt;td>kill blocker、短窗口&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>Drop column / table&lt;/td>
 &lt;td>destructive 且需鎖&lt;/td>
 &lt;td>backup、approval、blocked query watch&lt;/td>
 &lt;/tr>
 &lt;/tbody>
&lt;/table>
&lt;p>DDL review 要列出 algorithm、lock mode、預估時間、rollback、kill blocker policy 與 replication impact。&lt;/p>
&lt;h2 id="incident-runbook">Incident Runbook&lt;/h2>
&lt;p>Incident runbook 的核心責任是把 MDL 事故分流。&lt;/p></description><content:encoded><![CDATA[<p>MySQL metadata lock deep dive 的核心責任是說明 DDL、transaction 與 table metadata 之間的阻塞關係。MySQL 在查詢 table 時會取得 <a href="/blog/backend/knowledge-cards/metadata-lock/" data-link-title="Metadata Lock" data-link-desc="說明 DDL 與既有交易如何在 table metadata 層互相排隊與阻塞">metadata lock</a>；DDL 需要等待既有 metadata lock 釋放，等待中的 DDL 又會阻塞後續查詢，形成 production 常見雪崩。</p>
<p>本文的判讀錨點是：MDL 事故通常來自 DDL 排隊在長交易後面，並把後續 query 一起擋住。解法要同時處理 long transaction、DDL window、OSC 工具與 observability。</p>
<h2 id="lock-lifecycle">Lock Lifecycle</h2>
<p>Lock lifecycle 的核心責任是建立 MDL 心智模型。</p>
<table>
  <thead>
      <tr>
          <th>行為</th>
          <th>MDL 影響</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td><code>SELECT</code> / DML</td>
          <td>取得 table metadata lock，交易結束釋放</td>
      </tr>
      <tr>
          <td>Long transaction</td>
          <td>延長 metadata lock 持有時間</td>
      </tr>
      <tr>
          <td><code>ALTER TABLE</code></td>
          <td>等待相容鎖，期間可能阻塞後續 query</td>
      </tr>
      <tr>
          <td>Online schema change</td>
          <td>仍需 metadata lock 進行切換 / rename</td>
      </tr>
      <tr>
          <td>Idle transaction</td>
          <td>看似無操作，仍可能持有 metadata lock</td>
      </tr>
  </tbody>
</table>
<p>MDL 的風險在於排隊。當 <code>ALTER TABLE</code> 等待 long transaction 時，後續新的 query 可能排在 DDL 後面，讓原本小變更變成服務不可用。</p>
<h2 id="detection">Detection</h2>
<p>Detection 的核心責任是快速找出誰持鎖、誰等待。</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-sql" data-lang="sql"><span class="line"><span class="ln">1</span><span class="cl"><span class="k">SELECT</span><span class="w"> </span><span class="o">*</span><span class="w">
</span></span></span><span class="line"><span class="ln">2</span><span class="cl"><span class="w"></span><span class="k">FROM</span><span class="w"> </span><span class="n">performance_schema</span><span class="p">.</span><span class="n">metadata_locks</span><span class="w">
</span></span></span><span class="line"><span class="ln">3</span><span class="cl"><span class="w"></span><span class="k">WHERE</span><span class="w"> </span><span class="n">OBJECT_SCHEMA</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">&#39;appdb&#39;</span><span class="w">
</span></span></span><span class="line"><span class="ln">4</span><span class="cl"><span class="w"></span><span class="k">ORDER</span><span class="w"> </span><span class="k">BY</span><span class="w"> </span><span class="n">OBJECT_NAME</span><span class="p">,</span><span class="w"> </span><span class="n">LOCK_STATUS</span><span class="p">;</span></span></span></code></pre></div><p>搭配 processlist：</p>





<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-sql" data-lang="sql"><span class="line"><span class="ln">1</span><span class="cl"><span class="k">SHOW</span><span class="w"> </span><span class="k">FULL</span><span class="w"> </span><span class="n">PROCESSLIST</span><span class="p">;</span></span></span></code></pre></div><p>Production dashboard 應監控 running DDL、metadata lock wait、long transaction age、threads running、blocked query count 與 replication lag。</p>
<h2 id="ddl-risk-review">DDL Risk Review</h2>
<p>DDL risk review 的核心責任是在變更前預測 MDL 風險。</p>
<table>
  <thead>
      <tr>
          <th>DDL 類型</th>
          <th>風險</th>
          <th>控制方式</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>Add nullable column</td>
          <td>依版本 / algorithm 可能較低</td>
          <td>staging dry run、algorithm check</td>
      </tr>
      <tr>
          <td>Add index</td>
          <td>可能長時間操作與切換 lock</td>
          <td>online DDL / OSC、低峰窗口</td>
      </tr>
      <tr>
          <td>Change column type</td>
          <td>table rebuild 風險高</td>
          <td>ghost table / phased migration</td>
      </tr>
      <tr>
          <td>Rename / swap table</td>
          <td>短暫但關鍵 MDL</td>
          <td>kill blocker、短窗口</td>
      </tr>
      <tr>
          <td>Drop column / table</td>
          <td>destructive 且需鎖</td>
          <td>backup、approval、blocked query watch</td>
      </tr>
  </tbody>
</table>
<p>DDL review 要列出 algorithm、lock mode、預估時間、rollback、kill blocker policy 與 replication impact。</p>
<h2 id="incident-runbook">Incident Runbook</h2>
<p>Incident runbook 的核心責任是把 MDL 事故分流。</p>
<table>
  <thead>
      <tr>
          <th>Step</th>
          <th>操作</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>Identify blocker</td>
          <td>查 long transaction / metadata_locks</td>
      </tr>
      <tr>
          <td>Stop new DDL</td>
          <td>暫停 migration pipeline</td>
      </tr>
      <tr>
          <td>Decide kill</td>
          <td>依 owner / transaction age / impact</td>
      </tr>
      <tr>
          <td>Protect app</td>
          <td>降低 traffic、停 heavy endpoint</td>
      </tr>
      <tr>
          <td>Validate</td>
          <td>查 query 恢復、replication lag</td>
      </tr>
      <tr>
          <td>Retrospective</td>
          <td>補 DDL gate、long transaction alert</td>
      </tr>
  </tbody>
</table>
<p>Kill session 是高風險操作。決策要記錄 transaction owner、已執行時間、可能 rollback 成本與業務影響。</p>
<h2 id="osc-interaction">OSC Interaction</h2>
<p>OSC interaction 的核心責任是說明 gh-ost / pt-online-schema-change 仍需要 MDL 管理。Ghost table 工具把大部分 copy 與 backfill 移到旁路，但最後 cutover / rename 仍需要短暫 metadata lock。</p>
<table>
  <thead>
      <tr>
          <th>工具階段</th>
          <th>MDL 風險</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>Create ghost table</td>
          <td>低</td>
      </tr>
      <tr>
          <td>Copy / backfill</td>
          <td>主要是 load / replication lag</td>
      </tr>
      <tr>
          <td>Trigger / binlog</td>
          <td>依工具模式不同</td>
      </tr>
      <tr>
          <td>Cutover / rename</td>
          <td>關鍵 MDL window</td>
      </tr>
  </tbody>
</table>
<p>OSC runbook 要在 cutover 前檢查 long transaction。若 blocker 存在，先延後 cutover，而非硬切。</p>
<h2 id="prevention">Prevention</h2>
<p>Prevention 的核心責任是讓 MDL 事故在 release 前被擋下。</p>
<ol>
<li>Long transaction alert。</li>
<li>DDL dry run 與 algorithm / lock mode 記錄。</li>
<li>Migration window 與 kill blocker policy。</li>
<li>OSC cutover pre-check。</li>
<li>Application transaction timeout。</li>
<li>Read-only replica 上先測 schema change。</li>
</ol>
<p>MDL 是 MySQL schema governance 的核心議題。每個 production DDL 都要有 metadata lock plan。</p>
<h2 id="下一步路由">下一步路由</h2>
<p>Metadata lock deep dive 完成後，schema change 工具讀 <a href="../online-schema-change-tools/">Online Schema Change Tools</a>；lock 行為讀 <a href="../lock-contention/">Lock Contention</a>；操作演練讀 <a href="../hands-on/online-schema-change-lab/">Online Schema Change Lab</a>。</p>
]]></content:encoded></item></channel></rss>