本案例基於 .claude/lib/markdown_link_checker.py 的實際程式碼,展示如何用 Cython 加速文字解析。

先備知識

問題背景

現有設計

markdown_link_checker.py 使用純 Python 解析 Markdown 連結。讓我們看看核心程式碼:

 1import re
 2from typing import List, Dict
 3
 4class MarkdownLinkChecker:
 5    """Markdown 連結檢查器"""
 6
 7    # Markdown 連結正則表達式
 8    # 匹配 [text](/python-advanced/05-c-extensions/case-studies/cython-markdown/target) 格式,排除圖片 ![alt](/python-advanced/05-c-extensions/case-studies/cython-markdown/src)
 9    INLINE_LINK_PATTERN = re.compile(
10        r'(?<!!)\[([^\]]+)\]\(([^)]+)\)'
11    )
12
13    # 引用式連結定義 [ref]: target
14    REFERENCE_DEF_PATTERN = re.compile(
15        r'^\s*\[([^\]]+)\]:\s*(.+)$',
16        re.MULTILINE
17    )
18
19    # 引用式連結使用 [text][ref]
20    REFERENCE_USE_PATTERN = re.compile(
21        r'\[([^\]]+)\]\[([^\]]+)\]'
22    )
23
24    def parse_markdown_links(self, content: str) -> List[Dict]:
25        """
26        解析 Markdown 內容中的所有連結
27
28        Args:
29            content: Markdown 內容
30
31        Returns:
32            list[dict]: 連結列表,每個包含 text, target, line
33        """
34        links = []
35        lines = content.split('\n')
36
37        # 首先收集引用式連結定義
38        reference_defs = {}
39        for match in self.REFERENCE_DEF_PATTERN.finditer(content):
40            ref_name = match.group(1).lower()
41            ref_target = match.group(2).strip()
42            reference_defs[ref_name] = ref_target
43
44        # 追蹤是否在程式碼區塊內
45        in_code_block = False
46
47        # 解析行內連結
48        for line_num, line in enumerate(lines, start=1):
49            # 檢查程式碼區塊開始/結束
50            if line.strip().startswith("```"):
51                in_code_block = not in_code_block
52                continue
53
54            # 跳過程式碼區塊內的連結
55            if in_code_block:
56                continue
57
58            # 行內連結 [text](/python-advanced/05-c-extensions/case-studies/cython-markdown/target)
59            for match in self.INLINE_LINK_PATTERN.finditer(line):
60                links.append({
61                    "text": match.group(1),
62                    "target": match.group(2),
63                    "line": line_num
64                })
65
66            # 引用式連結 [text][ref]
67            for match in self.REFERENCE_USE_PATTERN.finditer(line):
68                ref_name = match.group(2).lower()
69                if ref_name in reference_defs:
70                    links.append({
71                        "text": match.group(1),
72                        "target": reference_defs[ref_name],
73                        "line": line_num
74                    })
75
76        return links

效能限制

純 Python 的限制:

  • 正則表達式呼叫開銷:每次 finditer() 都有 Python 層級的迭代器開銷
  • 迴圈效率不如 C:Python 的 for 迴圈涉及迭代器協議和物件建立
  • 字串處理有額外開銷split()strip()startswith() 都會建立新物件
  • 字典存取開銷reference_defs[ref_name] 涉及雜湊計算和物件比較

當處理大量 Markdown 文件(例如整個文件專案)時,這些開銷會累積成可觀的效能損失。

進階解決方案

優化目標

  1. 保持相同的 API:不改變 parse_markdown_links() 的輸入輸出格式
  2. 顯著提升解析速度:目標 2-5x 加速
  3. 容易整合到現有專案:編譯後可直接替換原模組

實作步驟

步驟 1:建立 .pyx 檔案

首先,建立基本的 Cython 檔案結構:

 1# markdown_parser.pyx
 2"""
 3Cython accelerated Markdown link parser.
 4
 5This module provides fast parsing of Markdown links,
 6compatible with the original Python implementation.
 7"""
 8
 9import re
10from typing import List, Dict
11
12# Compile regex patterns at module level for reuse
13cdef object INLINE_LINK_PATTERN = re.compile(
14    r'(?<!!)\[([^\]]+)\]\(([^)]+)\)'
15)
16
17cdef object REFERENCE_DEF_PATTERN = re.compile(
18    r'^\s*\[([^\]]+)\]:\s*(.+)$',
19    re.MULTILINE
20)
21
22cdef object REFERENCE_USE_PATTERN = re.compile(
23    r'\[([^\]]+)\]\[([^\]]+)\]'
24)

重點說明

  • 使用 cdef object 宣告正則表達式物件,讓 Cython 知道這些是 Python 物件
  • 將正則表達式編譯放在模組層級,避免重複編譯
  • 保留 docstring 和 type hints 以維護可讀性

步驟 2:添加型別宣告

為關鍵變數添加 C 型別宣告:

 1# markdown_parser.pyx (continued)
 2
 3cdef class LinkInfo:
 4    """
 5    C-level struct to hold link information.
 6    Faster than Python dict for internal operations.
 7    """
 8    cdef public str text
 9    cdef public str target
10    cdef public int line
11
12    def __init__(self, str text, str target, int line):
13        self.text = text
14        self.target = target
15        self.line = line
16
17    def to_dict(self) -> dict:
18        """Convert to dictionary for API compatibility."""
19        return {
20            "text": self.text,
21            "target": self.target,
22            "line": self.line
23        }
24
25cdef bint is_code_fence(str line):
26    """
27    Check if line is a code fence marker.
28
29    cdef function: only callable from Cython, fastest.
30    """
31    cdef str stripped = line.strip()
32    return stripped.startswith("```") or stripped.startswith("~~~")

重點說明

  • cdef class LinkInfo:使用 Cython 的擴展類別,內部存取比 Python dict 快
  • cdef public:讓屬性可以從 Python 存取,同時保持 C 層級效率
  • cdef bint:使用 C 的布林型別(0 或 1),比 Python 的 bool
  • cdef 函式:只能從 Cython 呼叫,沒有 Python 呼叫開銷

步驟 3:優化迴圈

使用 Cython 優化主要的解析迴圈:

 1# markdown_parser.pyx (continued)
 2
 3cdef list _parse_inline_links(list lines, dict reference_defs):
 4    """
 5    Parse inline and reference links from lines.
 6
 7    Internal function with optimized loop.
 8    """
 9    cdef:
10        list links = []
11        int line_num
12        int total_lines = len(lines)
13        bint in_code_block = False
14        str line
15        str ref_name
16        object match
17
18    for line_num in range(total_lines):
19        line = lines[line_num]
20
21        # Check code fence
22        if is_code_fence(line):
23            in_code_block = not in_code_block
24            continue
25
26        if in_code_block:
27            continue
28
29        # Parse inline links [text](/python-advanced/05-c-extensions/case-studies/cython-markdown/target)
30        for match in INLINE_LINK_PATTERN.finditer(line):
31            links.append(LinkInfo(
32                match.group(1),
33                match.group(2),
34                line_num + 1  # 1-indexed
35            ))
36
37        # Parse reference links [text][ref]
38        for match in REFERENCE_USE_PATTERN.finditer(line):
39            ref_name = match.group(2).lower()
40            if ref_name in reference_defs:
41                links.append(LinkInfo(
42                    match.group(1),
43                    reference_defs[ref_name],
44                    line_num + 1
45                ))
46
47    return links
48
49cdef dict _collect_reference_defs(str content):
50    """
51    Collect reference link definitions from content.
52
53    Returns dict mapping ref_name -> target.
54    """
55    cdef:
56        dict reference_defs = {}
57        object match
58        str ref_name
59        str ref_target
60
61    for match in REFERENCE_DEF_PATTERN.finditer(content):
62        ref_name = match.group(1).lower()
63        ref_target = match.group(2).strip()
64        reference_defs[ref_name] = ref_target
65
66    return reference_defs

重點說明

  • cdef listcdef dict:明確宣告容器型別,減少型別檢查開銷
  • cdef int line_num:使用 C 整數進行迴圈計數
  • cdef bint in_code_block:使用 C 布林型別追蹤狀態
  • 將功能分解成多個 cdef 函式,每個函式專注單一職責

步驟 4:建立公開 API

使用 cpdefdef 建立可從 Python 呼叫的公開介面:

 1# markdown_parser.pyx (continued)
 2
 3cpdef list parse_markdown_links(str content):
 4    """
 5    Parse all links from Markdown content.
 6
 7    This is the main public API, compatible with the original
 8    Python implementation.
 9
10    Args:
11        content: Markdown content string
12
13    Returns:
14        List of dicts with 'text', 'target', 'line' keys
15    """
16    cdef:
17        list lines
18        dict reference_defs
19        list link_infos
20        list result
21        LinkInfo info
22
23    # Split content into lines
24    lines = content.split('\n')
25
26    # Collect reference definitions
27    reference_defs = _collect_reference_defs(content)
28
29    # Parse all links
30    link_infos = _parse_inline_links(lines, reference_defs)
31
32    # Convert to dict format for API compatibility
33    result = [info.to_dict() for info in link_infos]
34
35    return result
36
37def parse_markdown_links_py(content: str) -> List[Dict]:
38    """
39    Python-compatible wrapper with type hints.
40
41    Identical to parse_markdown_links but with explicit
42    Python type annotations for better IDE support.
43    """
44    return parse_markdown_links(content)

重點說明

  • cpdef:同時產生 Python 和 C 版本,從 Python 呼叫時用 Python 版本,從 Cython 呼叫時用 C 版本
  • 保持 API 相容性:回傳格式與原始 Python 版本完全相同
  • 提供 _py 版本:帶有完整型別提示,改善 IDE 支援

步驟 5:建立 setup.py

 1# setup.py
 2"""
 3Build script for Cython markdown parser.
 4
 5Usage:
 6    python setup.py build_ext --inplace
 7
 8Or for development with automatic rebuild:
 9    pip install -e .
10"""
11
12from setuptools import setup, Extension
13from Cython.Build import cythonize
14
15extensions = [
16    Extension(
17        "markdown_parser",
18        sources=["markdown_parser.pyx"],
19        # Optional: add compiler directives for optimization
20        # extra_compile_args=["-O3"],
21    )
22]
23
24setup(
25    name="markdown_parser",
26    version="0.1.0",
27    description="Cython accelerated Markdown link parser",
28    ext_modules=cythonize(
29        extensions,
30        compiler_directives={
31            "language_level": "3",      # Python 3 syntax
32            "boundscheck": False,       # Disable bounds checking
33            "wraparound": False,        # Disable negative indexing
34            "cdivision": True,          # Use C division semantics
35        },
36        annotate=True,  # Generate HTML annotation file
37    ),
38    zip_safe=False,
39)

編譯指令說明

指令說明效能影響
language_level=3使用 Python 3 語法
boundscheck=False停用陣列邊界檢查加速 5-10%
wraparound=False停用負數索引支援加速 2-5%
cdivision=True使用 C 的除法(不檢查除以零)加速除法運算
annotate=True產生 HTML 註解報告僅開發時使用

完整程式碼

將以上所有部分整合成完整的 .pyx 檔案:

  1# markdown_parser.pyx
  2"""
  3Cython accelerated Markdown link parser.
  4
  5This module provides fast parsing of Markdown links,
  6compatible with the original Python implementation.
  7
  8Build:
  9    python setup.py build_ext --inplace
 10
 11Usage:
 12    from markdown_parser import parse_markdown_links
 13    links = parse_markdown_links(markdown_content)
 14"""
 15
 16import re
 17from typing import List, Dict
 18
 19# ============================================================
 20# Compiled regex patterns (module level for reuse)
 21# ============================================================
 22
 23cdef object INLINE_LINK_PATTERN = re.compile(
 24    r'(?<!!)\[([^\]]+)\]\(([^)]+)\)'
 25)
 26
 27cdef object REFERENCE_DEF_PATTERN = re.compile(
 28    r'^\s*\[([^\]]+)\]:\s*(.+)$',
 29    re.MULTILINE
 30)
 31
 32cdef object REFERENCE_USE_PATTERN = re.compile(
 33    r'\[([^\]]+)\]\[([^\]]+)\]'
 34)
 35
 36# ============================================================
 37# C-level data structures
 38# ============================================================
 39
 40cdef class LinkInfo:
 41    """
 42    C-level struct to hold link information.
 43    Faster than Python dict for internal operations.
 44    """
 45    cdef public str text
 46    cdef public str target
 47    cdef public int line
 48
 49    def __init__(self, str text, str target, int line):
 50        self.text = text
 51        self.target = target
 52        self.line = line
 53
 54    def to_dict(self) -> dict:
 55        """Convert to dictionary for API compatibility."""
 56        return {
 57            "text": self.text,
 58            "target": self.target,
 59            "line": self.line
 60        }
 61
 62    def __repr__(self):
 63        return f"LinkInfo(text={self.text!r}, target={self.target!r}, line={self.line})"
 64
 65# ============================================================
 66# Internal helper functions (cdef = C-only, fastest)
 67# ============================================================
 68
 69cdef bint is_code_fence(str line):
 70    """
 71    Check if line is a code fence marker.
 72    """
 73    cdef str stripped = line.strip()
 74    return stripped.startswith("```") or stripped.startswith("~~~")
 75
 76cdef dict _collect_reference_defs(str content):
 77    """
 78    Collect reference link definitions from content.
 79    """
 80    cdef:
 81        dict reference_defs = {}
 82        object match
 83        str ref_name
 84        str ref_target
 85
 86    for match in REFERENCE_DEF_PATTERN.finditer(content):
 87        ref_name = match.group(1).lower()
 88        ref_target = match.group(2).strip()
 89        reference_defs[ref_name] = ref_target
 90
 91    return reference_defs
 92
 93cdef list _parse_inline_links(list lines, dict reference_defs):
 94    """
 95    Parse inline and reference links from lines.
 96    """
 97    cdef:
 98        list links = []
 99        int line_num
100        int total_lines = len(lines)
101        bint in_code_block = False
102        str line
103        str ref_name
104        object match
105
106    for line_num in range(total_lines):
107        line = lines[line_num]
108
109        # Check code fence
110        if is_code_fence(line):
111            in_code_block = not in_code_block
112            continue
113
114        if in_code_block:
115            continue
116
117        # Parse inline links [text](/python-advanced/05-c-extensions/case-studies/cython-markdown/target)
118        for match in INLINE_LINK_PATTERN.finditer(line):
119            links.append(LinkInfo(
120                match.group(1),
121                match.group(2),
122                line_num + 1
123            ))
124
125        # Parse reference links [text][ref]
126        for match in REFERENCE_USE_PATTERN.finditer(line):
127            ref_name = match.group(2).lower()
128            if ref_name in reference_defs:
129                links.append(LinkInfo(
130                    match.group(1),
131                    reference_defs[ref_name],
132                    line_num + 1
133                ))
134
135    return links
136
137# ============================================================
138# Public API (cpdef = callable from both Python and Cython)
139# ============================================================
140
141cpdef list parse_markdown_links(str content):
142    """
143    Parse all links from Markdown content.
144
145    Args:
146        content: Markdown content string
147
148    Returns:
149        List of dicts with 'text', 'target', 'line' keys
150
151    Example:
152        >>> content = "[Click here](https://example.com)"
153        >>> links = parse_markdown_links(content)
154        >>> links[0]['target']
155        'https://example.com'
156    """
157    cdef:
158        list lines
159        dict reference_defs
160        list link_infos
161        list result
162        LinkInfo info
163
164    lines = content.split('\n')
165    reference_defs = _collect_reference_defs(content)
166    link_infos = _parse_inline_links(lines, reference_defs)
167    result = [info.to_dict() for info in link_infos]
168
169    return result
170
171# Python-compatible wrapper with full type hints
172def parse_markdown_links_py(content: str) -> List[Dict]:
173    """
174    Python-compatible wrapper with type hints.
175
176    Identical to parse_markdown_links but with explicit
177    Python type annotations for better IDE support.
178    """
179    return parse_markdown_links(content)
180
181# ============================================================
182# Optional: Expose LinkInfo class for advanced usage
183# ============================================================
184
185def parse_markdown_links_fast(str content) -> list:
186    """
187    Parse links and return LinkInfo objects directly.
188
189    Faster than parse_markdown_links() as it skips
190    the dict conversion step.
191
192    Returns:
193        List of LinkInfo objects
194    """
195    cdef:
196        list lines
197        dict reference_defs
198
199    lines = content.split('\n')
200    reference_defs = _collect_reference_defs(content)
201    return _parse_inline_links(lines, reference_defs)

效能比較

建立效能測試腳本來比較純 Python 和 Cython 版本:

  1# benchmark.py
  2"""
  3Performance comparison between Python and Cython implementations.
  4
  5Usage:
  6    # First, build the Cython module
  7    python setup.py build_ext --inplace
  8
  9    # Then run benchmark
 10    python benchmark.py
 11"""
 12
 13import time
 14import statistics
 15from typing import Callable, List
 16
 17# Pure Python implementation (inline for comparison)
 18import re
 19
 20class PythonMarkdownParser:
 21    """Original pure Python implementation."""
 22
 23    INLINE_LINK_PATTERN = re.compile(r'(?<!!)\[([^\]]+)\]\(([^)]+)\)')
 24    REFERENCE_DEF_PATTERN = re.compile(r'^\s*\[([^\]]+)\]:\s*(.+)$', re.MULTILINE)
 25    REFERENCE_USE_PATTERN = re.compile(r'\[([^\]]+)\]\[([^\]]+)\]')
 26
 27    def parse_markdown_links(self, content: str) -> list:
 28        links = []
 29        lines = content.split('\n')
 30
 31        reference_defs = {}
 32        for match in self.REFERENCE_DEF_PATTERN.finditer(content):
 33            ref_name = match.group(1).lower()
 34            ref_target = match.group(2).strip()
 35            reference_defs[ref_name] = ref_target
 36
 37        in_code_block = False
 38
 39        for line_num, line in enumerate(lines, start=1):
 40            if line.strip().startswith("```"):
 41                in_code_block = not in_code_block
 42                continue
 43
 44            if in_code_block:
 45                continue
 46
 47            for match in self.INLINE_LINK_PATTERN.finditer(line):
 48                links.append({
 49                    "text": match.group(1),
 50                    "target": match.group(2),
 51                    "line": line_num
 52                })
 53
 54            for match in self.REFERENCE_USE_PATTERN.finditer(line):
 55                ref_name = match.group(2).lower()
 56                if ref_name in reference_defs:
 57                    links.append({
 58                        "text": match.group(1),
 59                        "target": reference_defs[ref_name],
 60                        "line": line_num
 61                    })
 62
 63        return links
 64
 65def generate_test_content(num_lines: int, links_per_100_lines: int = 10) -> str:
 66    """Generate test Markdown content with specified characteristics."""
 67    lines = []
 68    for i in range(num_lines):
 69        if i % (100 // links_per_100_lines) == 0:
 70            # Add an inline link
 71            lines.append(f"Check out [Link {i}](https://example.com/page{i}) for details.")
 72        elif i % 50 == 0:
 73            # Add a code block
 74            lines.append("```python")
 75            lines.append(f"# This is code, links here [should](/python-advanced/05-c-extensions/case-studies/cython-markdown/be/ignored)")
 76            lines.append("```")
 77        else:
 78            # Regular text
 79            lines.append(f"This is line {i} with some regular text content.")
 80
 81    return '\n'.join(lines)
 82
 83def benchmark(func: Callable, content: str, iterations: int = 100) -> dict:
 84    """Run benchmark and return statistics."""
 85    times = []
 86
 87    # Warmup
 88    for _ in range(5):
 89        func(content)
 90
 91    # Actual benchmark
 92    for _ in range(iterations):
 93        start = time.perf_counter()
 94        result = func(content)
 95        end = time.perf_counter()
 96        times.append(end - start)
 97
 98    return {
 99        "mean": statistics.mean(times) * 1000,  # Convert to ms
100        "stdev": statistics.stdev(times) * 1000,
101        "min": min(times) * 1000,
102        "max": max(times) * 1000,
103        "links_found": len(result),
104    }
105
106def main():
107    print("=" * 60)
108    print("Markdown Link Parser Benchmark")
109    print("=" * 60)
110
111    # Test different content sizes
112    sizes = [1000, 5000, 10000, 50000]
113
114    python_parser = PythonMarkdownParser()
115
116    # Try to import Cython version
117    try:
118        from markdown_parser import parse_markdown_links as cython_parse
119        has_cython = True
120    except ImportError:
121        print("\nWarning: Cython module not found.")
122        print("Run 'python setup.py build_ext --inplace' first.\n")
123        has_cython = False
124
125    for size in sizes:
126        print(f"\n--- Content size: {size} lines ---")
127        content = generate_test_content(size)
128
129        # Python benchmark
130        py_result = benchmark(python_parser.parse_markdown_links, content)
131        print(f"Python:  {py_result['mean']:.3f} ms (+/- {py_result['stdev']:.3f} ms)")
132        print(f"         Found {py_result['links_found']} links")
133
134        # Cython benchmark (if available)
135        if has_cython:
136            cy_result = benchmark(cython_parse, content)
137            speedup = py_result['mean'] / cy_result['mean']
138            print(f"Cython:  {cy_result['mean']:.3f} ms (+/- {cy_result['stdev']:.3f} ms)")
139            print(f"         Speedup: {speedup:.2f}x")
140
141    print("\n" + "=" * 60)
142
143if __name__ == "__main__":
144    main()

預期結果

執行效能測試後,預期會看到類似以下的結果:

 1============================================================
 2Markdown Link Parser Benchmark
 3============================================================
 4
 5--- Content size: 1000 lines ---
 6Python:  0.523 ms (+/- 0.031 ms)
 7         Found 100 links
 8Cython:  0.198 ms (+/- 0.012 ms)
 9         Speedup: 2.64x
10
11--- Content size: 5000 lines ---
12Python:  2.617 ms (+/- 0.089 ms)
13         Found 500 links
14Cython:  0.892 ms (+/- 0.045 ms)
15         Speedup: 2.93x
16
17--- Content size: 10000 lines ---
18Python:  5.234 ms (+/- 0.156 ms)
19         Found 1000 links
20Cython:  1.712 ms (+/- 0.078 ms)
21         Speedup: 3.06x
22
23--- Content size: 50000 lines ---
24Python:  26.18 ms (+/- 0.823 ms)
25         Found 5000 links
26Cython:  7.89 ms (+/- 0.312 ms)
27         Speedup: 3.32x
28
29============================================================

結果分析

內容大小PythonCython加速比
1,000 行0.52 ms0.20 ms2.6x
5,000 行2.62 ms0.89 ms2.9x
10,000 行5.23 ms1.71 ms3.1x
50,000 行26.2 ms7.89 ms3.3x

觀察:

  • 加速比隨著資料量增加而提高
  • 主要效能提升來自迴圈優化和型別化變數
  • 正則表達式仍然是瓶頸(Cython 無法加速 re 模組本身)

設計權衡

面向純 PythonCython
開發速度快,即寫即用中,需要編譯步驟
執行速度基準2-5x 加速
除錯難度低,標準 Python 工具中,需要看生成的 C 碼
部署複雜度簡單,純 Python需要編譯環境或預編譯 wheel
可維護性中,需要了解 Cython 語法
IDE 支援完整部分(.pyx 支援有限)
跨平台天生跨平台需要為每個平台編譯

進階優化:使用 C 正則表達式

如果需要更高的效能,可以考慮使用 C 語言的正則表達式庫。以下是使用 PCRE2 的範例:

 1# advanced_parser.pyx
 2"""
 3Advanced parser using PCRE2 C library for maximum performance.
 4
 5Requires: libpcre2-dev (Ubuntu) or pcre2 (macOS Homebrew)
 6"""
 7
 8cdef extern from "pcre2.h":
 9    # PCRE2 declarations...
10    pass
11
12# This is an advanced topic, see PCRE2 documentation for details

不過,對於大多數使用情境,Python 的 re 模組配合 Cython 優化的迴圈已經足夠。

什麼時候該用 Cython?

適合使用

  • 熱點程式碼已經用 profiler 確認
  • 需要 2x 以上的效能提升
  • 程式碼相對穩定,不常變動
  • 團隊有能力維護 Cython 程式碼
  • 可以接受編譯步驟

不建議使用

  • 效能瓶頸在 I/O(網路、磁碟)
  • 程式碼還在頻繁迭代中
  • 跨平台部署且沒有 CI/CD 支援
  • 團隊對 C 語言不熟悉
  • 效能提升不到 2x

替代方案考量

 1如果 Cython 不適合你的情境,考慮:
 2
 31. PyPy
 4   - 無需修改程式碼
 5   - JIT 編譯帶來 5-10x 加速
 6   - 但相容性問題較多
 7
 82. Numba
 9   - 針對數值計算優化
10   - 使用裝飾器即可加速
11   - 但僅支援部分 Python 語法
12
133. 演算法優化
14   - 先檢查是否有更好的演算法
15   - 減少不必要的記憶體分配
16   - 使用更高效的資料結構

練習

基礎練習

將以下純 Python 函式轉換為 Cython:

 1# exercise_1.py
 2def count_words(text: str) -> dict:
 3    """Count word frequencies in text."""
 4    words = text.lower().split()
 5    counts = {}
 6    for word in words:
 7        word = word.strip('.,!?;:')
 8        if word:
 9            counts[word] = counts.get(word, 0) + 1
10    return counts

提示:

  1. 建立 exercise_1.pyx
  2. counts 變數添加 cdef dict 宣告
  3. 為迴圈變數添加適當的型別宣告
  4. 考慮使用 cdef 輔助函式處理字串清理

進階練習

使用 cProfile 驗證 Cython 加速效果:

 1# exercise_2.py
 2import cProfile
 3import pstats
 4
 5def profile_parsers(content: str):
 6    """Profile both Python and Cython parsers."""
 7    from markdown_parser import parse_markdown_links as cython_parse
 8
 9    python_parser = PythonMarkdownParser()
10
11    # Profile Python version
12    print("=== Python Version ===")
13    cProfile.runctx(
14        'for _ in range(100): python_parser.parse_markdown_links(content)',
15        globals(), locals(),
16        'python_stats'
17    )
18    stats = pstats.Stats('python_stats')
19    stats.strip_dirs().sort_stats('cumulative').print_stats(10)
20
21    # Profile Cython version
22    print("\n=== Cython Version ===")
23    cProfile.runctx(
24        'for _ in range(100): cython_parse(content)',
25        globals(), locals(),
26        'cython_stats'
27    )
28    stats = pstats.Stats('cython_stats')
29    stats.strip_dirs().sort_stats('cumulative').print_stats(10)

挑戰題

比較不同型別宣告策略的效能影響:

  1. 無型別宣告:將 .pyx 當作純 Python 編譯
  2. 部分型別宣告:只為迴圈變數添加型別
  3. 完整型別宣告:為所有變數添加型別
  4. 使用 LinkInfo 類別 vs 使用 dict

記錄每種策略的效能,並分析哪些優化帶來最大效益。

延伸閱讀


返回:案例研究 返回:模組四:用 C 擴展 Python