3.4 GIL 與執行緒模型

2026-01-20

GIL（Global Interpreter Lock）是 CPython 中最具爭議的設計之一。本章深入探討 GIL 的歷史、實現，以及 Python 3.13+ Free-threading 的技術細節。

先備知識

本章目標

學完本章後，你將能夠：

理解 GIL 存在的歷史原因
理解 GIL 的釋放時機
理解 Free-threading 的實現挑戰
做出正確的並行策略選擇

【原理層】為什麼需要 GIL？

歷史背景

GIL 是 1992 年 Python 初創時的設計決策：

1當時的考量：
21. 單核 CPU 是主流
32. 簡化記憶體管理（參考計數）
43. 簡化 C 擴展開發
54. 避免細粒度鎖的複雜性

參考計數與執行緒安全

GIL 主要保護參考計數操作：

 1// 沒有 GIL 時，這個操作不是原子的
 2Py_INCREF(obj);
 3// 展開後：
 4// 1. 讀取 ob_refcnt
 5// 2. 加 1
 6// 3. 寫回 ob_refcnt
 7
 8// 如果兩個執行緒同時執行，可能會：
 9// Thread 1: 讀取 refcnt = 1
10// Thread 2: 讀取 refcnt = 1
11// Thread 1: 寫入 refcnt = 2
12// Thread 2: 寫入 refcnt = 2  ← 錯誤！應該是 3

GIL 的實現

1// 簡化版的 GIL 結構
2typedef struct {
3    PyMutex mutex;           // 互斥鎖
4    PyThread_type_lock lock; // 執行緒鎖
5    int locked;              // 鎖定狀態
6} _gil_runtime_state;

【設計層】GIL 的釋放時機

自動釋放

GIL 在以下情況會自動釋放：

 1# 1. I/O 操作
 2f = open('file.txt', 'r')  # 釋放 GIL
 3content = f.read()          # 釋放 GIL
 4
 5# 2. sleep
 6import time
 7time.sleep(1)  # 釋放 GIL
 8
 9# 3. 某些 C 擴展操作
10import numpy as np
11result = np.dot(a, b)  # NumPy 可能釋放 GIL
12
13# 4. 定期釋放（每 N 個 bytecode 指令）
14# Python 3.2+ 預設約每 5ms 檢查一次

檢查間隔

1import sys
2
3# 查看切換間隔（秒）
4print(sys.getswitchinterval())  # 0.005（5ms）
5
6# 設定切換間隔
7sys.setswitchinterval(0.001)  # 1ms

C 擴展中手動釋放 GIL

 1// C 擴展可以明確釋放 GIL
 2static PyObject* compute_intensive(PyObject* self, PyObject* args) {
 3    // 釋放 GIL
 4    Py_BEGIN_ALLOW_THREADS
 5
 6    // 這裡的程式碼可以並行執行
 7    do_heavy_computation();
 8
 9    // 重新獲取 GIL
10    Py_END_ALLOW_THREADS
11
12    Py_RETURN_NONE;
13}

【實驗】測量 GIL 的影響

CPU 密集任務

 1import threading
 2import time
 3
 4def cpu_intensive(n):
 5    """純 Python CPU 密集計算"""
 6    total = 0
 7    for i in range(n):
 8        total += i * i
 9    return total
10
11def benchmark(func, args, num_threads):
12    threads = []
13    start = time.perf_counter()
14
15    for _ in range(num_threads):
16        t = threading.Thread(target=func, args=args)
17        threads.append(t)
18        t.start()
19
20    for t in threads:
21        t.join()
22
23    return time.perf_counter() - start
24
25n = 5_000_000
26
27# 單執行緒
28time_1 = benchmark(cpu_intensive, (n,), 1)
29print(f"1 執行緒: {time_1:.3f}s")
30
31# 多執行緒（受 GIL 限制）
32time_4 = benchmark(cpu_intensive, (n,), 4)
33print(f"4 執行緒: {time_4:.3f}s")
34
35# 結果：多執行緒可能更慢（執行緒切換開銷）

I/O 密集任務

 1import threading
 2import time
 3import urllib.request
 4
 5def io_intensive(url):
 6    """I/O 密集操作"""
 7    try:
 8        with urllib.request.urlopen(url, timeout=5) as f:
 9            return len(f.read())
10    except:
11        return 0
12
13urls = ["https://example.com"] * 10
14
15# 單執行緒
16start = time.perf_counter()
17for url in urls:
18    io_intensive(url)
19time_sequential = time.perf_counter() - start
20print(f"序列: {time_sequential:.3f}s")
21
22# 多執行緒
23start = time.perf_counter()
24threads = [threading.Thread(target=io_intensive, args=(url,)) for url in urls]
25for t in threads:
26    t.start()
27for t in threads:
28    t.join()
29time_parallel = time.perf_counter() - start
30print(f"並行: {time_parallel:.3f}s")
31
32# 結果：多執行緒明顯更快（I/O 時釋放 GIL）

【深入】Free-threading 技術細節

Biased Reference Counting

Python 3.13+ Free-threading 使用「偏向參考計數」解決多執行緒問題：

 1傳統參考計數：
 2┌─────────────────────────────────────┐
 3│  ob_refcnt = 2                      │
 4│  每次操作都需要原子操作或鎖         │
 5└─────────────────────────────────────┘
 6
 7偏向參考計數：
 8┌─────────────────────────────────────┐
 9│  local_refcnt[thread_id] = 1        │  ← 每個執行緒有自己的計數
10│  local_refcnt[thread_id] = 1        │
11│  shared_refcnt = 0                  │  ← 共享計數
12└─────────────────────────────────────┘
13
14優點：
15- 大多數操作只需更新區域計數（無鎖）
16- 只有跨執行緒參考才需要更新共享計數

延遲參考計數

 1# Free-threading 中的優化策略
 2
 3# 對於不朽物件（immortal objects）
 4# 如 None、True、False、小整數
 5# 完全跳過參考計數
 6
 7# 對於局部物件
 8# 使用區域計數，無需同步
 9
10# 只有跨執行緒共享的物件
11# 才需要使用原子操作

記憶體模型變化

 1傳統 CPython（有 GIL）：
 2┌──────────────────────────────────────────┐
 3│  Thread 1  │  Thread 2  │  Thread 3      │
 4│     ↓            ↓            ↓          │
 5│     └────────────┼────────────┘          │
 6│                  ↓                       │
 7│               [ GIL ]                    │
 8│                  ↓                       │
 9│        [ Python Interpreter ]            │
10│                  ↓                       │
11│          [ 共享記憶體 ]                  │
12└──────────────────────────────────────────┘
13
14Free-threaded CPython：
15┌──────────────────────────────────────────┐
16│  Thread 1  │  Thread 2  │  Thread 3      │
17│     ↓            ↓            ↓          │
18│  [Local]     [Local]     [Local]         │
19│  State       State       State           │
20│     ↓            ↓            ↓          │
21│     └────────────┼────────────┘          │
22│                  ↓                       │
23│     [ 原子操作 / 鎖 / 無鎖資料結構 ]     │
24│                  ↓                       │
25│          [ 共享記憶體 ]                  │
26└──────────────────────────────────────────┘

【實戰】Free-threading 程式設計

檢查執行環境

 1import sys
 2
 3def check_environment():
 4    """檢查 Free-threading 環境"""
 5    try:
 6        gil_enabled = sys._is_gil_enabled()
 7        print(f"GIL 啟用: {gil_enabled}")
 8        return not gil_enabled
 9    except AttributeError:
10        print("傳統 Python（有 GIL）")
11        return False
12
13is_free_threaded = check_environment()

執行緒安全的程式設計

 1import threading
 2
 3# 不安全：共享可變狀態
 4counter = 0
 5
 6def unsafe_increment():
 7    global counter
 8    for _ in range(100000):
 9        counter += 1  # 競爭條件！
10
11# 安全：使用鎖
12counter = 0
13lock = threading.Lock()
14
15def safe_increment():
16    global counter
17    for _ in range(100000):
18        with lock:
19            counter += 1
20
21# 更好：使用原子操作或不可變資料
22from collections import Counter
23from concurrent.futures import ThreadPoolExecutor
24
25def better_approach(data_chunk):
26    """每個執行緒處理自己的資料，最後合併"""
27    local_count = 0
28    for item in data_chunk:
29        local_count += process(item)
30    return local_count
31
32with ThreadPoolExecutor() as executor:
33    results = executor.map(better_approach, data_chunks)
34    total = sum(results)

適應性程式碼

 1import sys
 2
 3def compute(data):
 4    """根據環境選擇策略"""
 5    free_threaded = getattr(sys, '_is_gil_enabled', lambda: True)() == False
 6
 7    if free_threaded:
 8        # 可以安全使用多執行緒
 9        return parallel_compute_threading(data)
10    else:
11        # 使用多進程或保持單執行緒
12        return parallel_compute_multiprocess(data)

【選擇指南】並行策略

決策流程

 1你的任務是什麼類型？
 2│
 3├── I/O 密集（網路、檔案、資料庫）
 4│   └── 使用 threading 或 asyncio
 5│       （GIL 不影響）
 6│
 7└── CPU 密集（計算、處理）
 8    │
 9    ├── 使用 Free-threaded Python (3.13+)?
10    │   ├── 是 → 可以使用 threading
11    │   └── 否 → 選擇以下方案
12    │
13    ├── 可以用 C 擴展？
14    │   └── NumPy、Cython 等（會釋放 GIL）
15    │
16    └── 純 Python？
17        └── 使用 multiprocessing 或 ProcessPoolExecutor

效能比較總結

任務類型	threading (有 GIL)	threading (Free)	multiprocessing
I/O 密集	好	好	過重
CPU 密集	無效	好	好
記憶體共享	簡單	簡單	複雜
啟動成本	低	低	高

【未來】GIL 的發展

路線圖

1Python 3.13 (2024): 實驗性 Free-threading
2Python 3.14 (2025): 正式支援 Free-threading
3Python 3.15/3.16:   可能成為預設
4未來:               GIL 可能完全移除

生態系統遷移

1# 檢查套件是否支援 Free-threading
2# pip index versions package-name
3
4# 主要框架的支援狀態（2025年底）
5# NumPy 2.1+:       支援
6# pandas 2.2+:      支援
7# scikit-learn 1.6+: 支援
8# PyTorch 2.6+:     支援

思考題

如果沒有 GIL，CPython 需要做哪些改變來保證記憶體安全？
為什麼其他 Python 實現（如 Jython、IronPython）沒有 GIL？
Free-threading 的效能損失主要來自哪裡？如何最小化？

實作練習

寫一個程式，測量 GIL 切換間隔對效能的影響
比較 Free-threaded 和傳統 Python 在相同 CPU 密集任務上的效能
將一個使用 multiprocessing 的程式改寫為 Free-threading 版本

延伸閱讀

上一章：Bytecode 與虛擬機 下一模組：模組四：用 C 擴展 Python

#python #python-advanced #cpython #gil #parallel