案例:Cython 加速 Markdown 解析
案例:Cython 加速 Markdown 解析
本案例基於 .claude/lib/markdown_link_checker.py 的實際程式碼,展示如何用 Cython 加速文字解析。
先備知識
問題背景
現有設計
markdown_link_checker.py 使用純 Python 解析 Markdown 連結。讓我們看看核心程式碼:
1import re
2from typing import List, Dict
3
4class MarkdownLinkChecker:
5 """Markdown 連結檢查器"""
6
7 # Markdown 連結正則表達式
8 # 匹配 [text](/python-advanced/05-c-extensions/case-studies/cython-markdown/target) 格式,排除圖片 
9 INLINE_LINK_PATTERN = re.compile(
10 r'(?<!!)\[([^\]]+)\]\(([^)]+)\)'
11 )
12
13 # 引用式連結定義 [ref]: target
14 REFERENCE_DEF_PATTERN = re.compile(
15 r'^\s*\[([^\]]+)\]:\s*(.+)$',
16 re.MULTILINE
17 )
18
19 # 引用式連結使用 [text][ref]
20 REFERENCE_USE_PATTERN = re.compile(
21 r'\[([^\]]+)\]\[([^\]]+)\]'
22 )
23
24 def parse_markdown_links(self, content: str) -> List[Dict]:
25 """
26 解析 Markdown 內容中的所有連結
27
28 Args:
29 content: Markdown 內容
30
31 Returns:
32 list[dict]: 連結列表,每個包含 text, target, line
33 """
34 links = []
35 lines = content.split('\n')
36
37 # 首先收集引用式連結定義
38 reference_defs = {}
39 for match in self.REFERENCE_DEF_PATTERN.finditer(content):
40 ref_name = match.group(1).lower()
41 ref_target = match.group(2).strip()
42 reference_defs[ref_name] = ref_target
43
44 # 追蹤是否在程式碼區塊內
45 in_code_block = False
46
47 # 解析行內連結
48 for line_num, line in enumerate(lines, start=1):
49 # 檢查程式碼區塊開始/結束
50 if line.strip().startswith("```"):
51 in_code_block = not in_code_block
52 continue
53
54 # 跳過程式碼區塊內的連結
55 if in_code_block:
56 continue
57
58 # 行內連結 [text](/python-advanced/05-c-extensions/case-studies/cython-markdown/target)
59 for match in self.INLINE_LINK_PATTERN.finditer(line):
60 links.append({
61 "text": match.group(1),
62 "target": match.group(2),
63 "line": line_num
64 })
65
66 # 引用式連結 [text][ref]
67 for match in self.REFERENCE_USE_PATTERN.finditer(line):
68 ref_name = match.group(2).lower()
69 if ref_name in reference_defs:
70 links.append({
71 "text": match.group(1),
72 "target": reference_defs[ref_name],
73 "line": line_num
74 })
75
76 return links效能限制
純 Python 的限制:
- 正則表達式呼叫開銷:每次
finditer()都有 Python 層級的迭代器開銷 - 迴圈效率不如 C:Python 的 for 迴圈涉及迭代器協議和物件建立
- 字串處理有額外開銷:
split()、strip()、startswith()都會建立新物件 - 字典存取開銷:
reference_defs[ref_name]涉及雜湊計算和物件比較
當處理大量 Markdown 文件(例如整個文件專案)時,這些開銷會累積成可觀的效能損失。
進階解決方案
優化目標
- 保持相同的 API:不改變
parse_markdown_links()的輸入輸出格式 - 顯著提升解析速度:目標 2-5x 加速
- 容易整合到現有專案:編譯後可直接替換原模組
實作步驟
步驟 1:建立 .pyx 檔案
首先,建立基本的 Cython 檔案結構:
1# markdown_parser.pyx
2"""
3Cython accelerated Markdown link parser.
4
5This module provides fast parsing of Markdown links,
6compatible with the original Python implementation.
7"""
8
9import re
10from typing import List, Dict
11
12# Compile regex patterns at module level for reuse
13cdef object INLINE_LINK_PATTERN = re.compile(
14 r'(?<!!)\[([^\]]+)\]\(([^)]+)\)'
15)
16
17cdef object REFERENCE_DEF_PATTERN = re.compile(
18 r'^\s*\[([^\]]+)\]:\s*(.+)$',
19 re.MULTILINE
20)
21
22cdef object REFERENCE_USE_PATTERN = re.compile(
23 r'\[([^\]]+)\]\[([^\]]+)\]'
24)重點說明:
- 使用
cdef object宣告正則表達式物件,讓 Cython 知道這些是 Python 物件 - 將正則表達式編譯放在模組層級,避免重複編譯
- 保留 docstring 和 type hints 以維護可讀性
步驟 2:添加型別宣告
為關鍵變數添加 C 型別宣告:
1# markdown_parser.pyx (continued)
2
3cdef class LinkInfo:
4 """
5 C-level struct to hold link information.
6 Faster than Python dict for internal operations.
7 """
8 cdef public str text
9 cdef public str target
10 cdef public int line
11
12 def __init__(self, str text, str target, int line):
13 self.text = text
14 self.target = target
15 self.line = line
16
17 def to_dict(self) -> dict:
18 """Convert to dictionary for API compatibility."""
19 return {
20 "text": self.text,
21 "target": self.target,
22 "line": self.line
23 }
24
25cdef bint is_code_fence(str line):
26 """
27 Check if line is a code fence marker.
28
29 cdef function: only callable from Cython, fastest.
30 """
31 cdef str stripped = line.strip()
32 return stripped.startswith("```") or stripped.startswith("~~~")重點說明:
cdef class LinkInfo:使用 Cython 的擴展類別,內部存取比 Python dict 快cdef public:讓屬性可以從 Python 存取,同時保持 C 層級效率cdef bint:使用 C 的布林型別(0 或 1),比 Python 的bool快cdef函式:只能從 Cython 呼叫,沒有 Python 呼叫開銷
步驟 3:優化迴圈
使用 Cython 優化主要的解析迴圈:
1# markdown_parser.pyx (continued)
2
3cdef list _parse_inline_links(list lines, dict reference_defs):
4 """
5 Parse inline and reference links from lines.
6
7 Internal function with optimized loop.
8 """
9 cdef:
10 list links = []
11 int line_num
12 int total_lines = len(lines)
13 bint in_code_block = False
14 str line
15 str ref_name
16 object match
17
18 for line_num in range(total_lines):
19 line = lines[line_num]
20
21 # Check code fence
22 if is_code_fence(line):
23 in_code_block = not in_code_block
24 continue
25
26 if in_code_block:
27 continue
28
29 # Parse inline links [text](/python-advanced/05-c-extensions/case-studies/cython-markdown/target)
30 for match in INLINE_LINK_PATTERN.finditer(line):
31 links.append(LinkInfo(
32 match.group(1),
33 match.group(2),
34 line_num + 1 # 1-indexed
35 ))
36
37 # Parse reference links [text][ref]
38 for match in REFERENCE_USE_PATTERN.finditer(line):
39 ref_name = match.group(2).lower()
40 if ref_name in reference_defs:
41 links.append(LinkInfo(
42 match.group(1),
43 reference_defs[ref_name],
44 line_num + 1
45 ))
46
47 return links
48
49cdef dict _collect_reference_defs(str content):
50 """
51 Collect reference link definitions from content.
52
53 Returns dict mapping ref_name -> target.
54 """
55 cdef:
56 dict reference_defs = {}
57 object match
58 str ref_name
59 str ref_target
60
61 for match in REFERENCE_DEF_PATTERN.finditer(content):
62 ref_name = match.group(1).lower()
63 ref_target = match.group(2).strip()
64 reference_defs[ref_name] = ref_target
65
66 return reference_defs重點說明:
cdef list、cdef dict:明確宣告容器型別,減少型別檢查開銷cdef int line_num:使用 C 整數進行迴圈計數cdef bint in_code_block:使用 C 布林型別追蹤狀態- 將功能分解成多個
cdef函式,每個函式專注單一職責
步驟 4:建立公開 API
使用 cpdef 或 def 建立可從 Python 呼叫的公開介面:
1# markdown_parser.pyx (continued)
2
3cpdef list parse_markdown_links(str content):
4 """
5 Parse all links from Markdown content.
6
7 This is the main public API, compatible with the original
8 Python implementation.
9
10 Args:
11 content: Markdown content string
12
13 Returns:
14 List of dicts with 'text', 'target', 'line' keys
15 """
16 cdef:
17 list lines
18 dict reference_defs
19 list link_infos
20 list result
21 LinkInfo info
22
23 # Split content into lines
24 lines = content.split('\n')
25
26 # Collect reference definitions
27 reference_defs = _collect_reference_defs(content)
28
29 # Parse all links
30 link_infos = _parse_inline_links(lines, reference_defs)
31
32 # Convert to dict format for API compatibility
33 result = [info.to_dict() for info in link_infos]
34
35 return result
36
37def parse_markdown_links_py(content: str) -> List[Dict]:
38 """
39 Python-compatible wrapper with type hints.
40
41 Identical to parse_markdown_links but with explicit
42 Python type annotations for better IDE support.
43 """
44 return parse_markdown_links(content)重點說明:
cpdef:同時產生 Python 和 C 版本,從 Python 呼叫時用 Python 版本,從 Cython 呼叫時用 C 版本- 保持 API 相容性:回傳格式與原始 Python 版本完全相同
- 提供
_py版本:帶有完整型別提示,改善 IDE 支援
步驟 5:建立 setup.py
1# setup.py
2"""
3Build script for Cython markdown parser.
4
5Usage:
6 python setup.py build_ext --inplace
7
8Or for development with automatic rebuild:
9 pip install -e .
10"""
11
12from setuptools import setup, Extension
13from Cython.Build import cythonize
14
15extensions = [
16 Extension(
17 "markdown_parser",
18 sources=["markdown_parser.pyx"],
19 # Optional: add compiler directives for optimization
20 # extra_compile_args=["-O3"],
21 )
22]
23
24setup(
25 name="markdown_parser",
26 version="0.1.0",
27 description="Cython accelerated Markdown link parser",
28 ext_modules=cythonize(
29 extensions,
30 compiler_directives={
31 "language_level": "3", # Python 3 syntax
32 "boundscheck": False, # Disable bounds checking
33 "wraparound": False, # Disable negative indexing
34 "cdivision": True, # Use C division semantics
35 },
36 annotate=True, # Generate HTML annotation file
37 ),
38 zip_safe=False,
39)編譯指令說明:
| 指令 | 說明 | 效能影響 |
|---|---|---|
language_level=3 | 使用 Python 3 語法 | 無 |
boundscheck=False | 停用陣列邊界檢查 | 加速 5-10% |
wraparound=False | 停用負數索引支援 | 加速 2-5% |
cdivision=True | 使用 C 的除法(不檢查除以零) | 加速除法運算 |
annotate=True | 產生 HTML 註解報告 | 僅開發時使用 |
完整程式碼
將以上所有部分整合成完整的 .pyx 檔案:
1# markdown_parser.pyx
2"""
3Cython accelerated Markdown link parser.
4
5This module provides fast parsing of Markdown links,
6compatible with the original Python implementation.
7
8Build:
9 python setup.py build_ext --inplace
10
11Usage:
12 from markdown_parser import parse_markdown_links
13 links = parse_markdown_links(markdown_content)
14"""
15
16import re
17from typing import List, Dict
18
19# ============================================================
20# Compiled regex patterns (module level for reuse)
21# ============================================================
22
23cdef object INLINE_LINK_PATTERN = re.compile(
24 r'(?<!!)\[([^\]]+)\]\(([^)]+)\)'
25)
26
27cdef object REFERENCE_DEF_PATTERN = re.compile(
28 r'^\s*\[([^\]]+)\]:\s*(.+)$',
29 re.MULTILINE
30)
31
32cdef object REFERENCE_USE_PATTERN = re.compile(
33 r'\[([^\]]+)\]\[([^\]]+)\]'
34)
35
36# ============================================================
37# C-level data structures
38# ============================================================
39
40cdef class LinkInfo:
41 """
42 C-level struct to hold link information.
43 Faster than Python dict for internal operations.
44 """
45 cdef public str text
46 cdef public str target
47 cdef public int line
48
49 def __init__(self, str text, str target, int line):
50 self.text = text
51 self.target = target
52 self.line = line
53
54 def to_dict(self) -> dict:
55 """Convert to dictionary for API compatibility."""
56 return {
57 "text": self.text,
58 "target": self.target,
59 "line": self.line
60 }
61
62 def __repr__(self):
63 return f"LinkInfo(text={self.text!r}, target={self.target!r}, line={self.line})"
64
65# ============================================================
66# Internal helper functions (cdef = C-only, fastest)
67# ============================================================
68
69cdef bint is_code_fence(str line):
70 """
71 Check if line is a code fence marker.
72 """
73 cdef str stripped = line.strip()
74 return stripped.startswith("```") or stripped.startswith("~~~")
75
76cdef dict _collect_reference_defs(str content):
77 """
78 Collect reference link definitions from content.
79 """
80 cdef:
81 dict reference_defs = {}
82 object match
83 str ref_name
84 str ref_target
85
86 for match in REFERENCE_DEF_PATTERN.finditer(content):
87 ref_name = match.group(1).lower()
88 ref_target = match.group(2).strip()
89 reference_defs[ref_name] = ref_target
90
91 return reference_defs
92
93cdef list _parse_inline_links(list lines, dict reference_defs):
94 """
95 Parse inline and reference links from lines.
96 """
97 cdef:
98 list links = []
99 int line_num
100 int total_lines = len(lines)
101 bint in_code_block = False
102 str line
103 str ref_name
104 object match
105
106 for line_num in range(total_lines):
107 line = lines[line_num]
108
109 # Check code fence
110 if is_code_fence(line):
111 in_code_block = not in_code_block
112 continue
113
114 if in_code_block:
115 continue
116
117 # Parse inline links [text](/python-advanced/05-c-extensions/case-studies/cython-markdown/target)
118 for match in INLINE_LINK_PATTERN.finditer(line):
119 links.append(LinkInfo(
120 match.group(1),
121 match.group(2),
122 line_num + 1
123 ))
124
125 # Parse reference links [text][ref]
126 for match in REFERENCE_USE_PATTERN.finditer(line):
127 ref_name = match.group(2).lower()
128 if ref_name in reference_defs:
129 links.append(LinkInfo(
130 match.group(1),
131 reference_defs[ref_name],
132 line_num + 1
133 ))
134
135 return links
136
137# ============================================================
138# Public API (cpdef = callable from both Python and Cython)
139# ============================================================
140
141cpdef list parse_markdown_links(str content):
142 """
143 Parse all links from Markdown content.
144
145 Args:
146 content: Markdown content string
147
148 Returns:
149 List of dicts with 'text', 'target', 'line' keys
150
151 Example:
152 >>> content = "[Click here](https://example.com)"
153 >>> links = parse_markdown_links(content)
154 >>> links[0]['target']
155 'https://example.com'
156 """
157 cdef:
158 list lines
159 dict reference_defs
160 list link_infos
161 list result
162 LinkInfo info
163
164 lines = content.split('\n')
165 reference_defs = _collect_reference_defs(content)
166 link_infos = _parse_inline_links(lines, reference_defs)
167 result = [info.to_dict() for info in link_infos]
168
169 return result
170
171# Python-compatible wrapper with full type hints
172def parse_markdown_links_py(content: str) -> List[Dict]:
173 """
174 Python-compatible wrapper with type hints.
175
176 Identical to parse_markdown_links but with explicit
177 Python type annotations for better IDE support.
178 """
179 return parse_markdown_links(content)
180
181# ============================================================
182# Optional: Expose LinkInfo class for advanced usage
183# ============================================================
184
185def parse_markdown_links_fast(str content) -> list:
186 """
187 Parse links and return LinkInfo objects directly.
188
189 Faster than parse_markdown_links() as it skips
190 the dict conversion step.
191
192 Returns:
193 List of LinkInfo objects
194 """
195 cdef:
196 list lines
197 dict reference_defs
198
199 lines = content.split('\n')
200 reference_defs = _collect_reference_defs(content)
201 return _parse_inline_links(lines, reference_defs)效能比較
建立效能測試腳本來比較純 Python 和 Cython 版本:
1# benchmark.py
2"""
3Performance comparison between Python and Cython implementations.
4
5Usage:
6 # First, build the Cython module
7 python setup.py build_ext --inplace
8
9 # Then run benchmark
10 python benchmark.py
11"""
12
13import time
14import statistics
15from typing import Callable, List
16
17# Pure Python implementation (inline for comparison)
18import re
19
20class PythonMarkdownParser:
21 """Original pure Python implementation."""
22
23 INLINE_LINK_PATTERN = re.compile(r'(?<!!)\[([^\]]+)\]\(([^)]+)\)')
24 REFERENCE_DEF_PATTERN = re.compile(r'^\s*\[([^\]]+)\]:\s*(.+)$', re.MULTILINE)
25 REFERENCE_USE_PATTERN = re.compile(r'\[([^\]]+)\]\[([^\]]+)\]')
26
27 def parse_markdown_links(self, content: str) -> list:
28 links = []
29 lines = content.split('\n')
30
31 reference_defs = {}
32 for match in self.REFERENCE_DEF_PATTERN.finditer(content):
33 ref_name = match.group(1).lower()
34 ref_target = match.group(2).strip()
35 reference_defs[ref_name] = ref_target
36
37 in_code_block = False
38
39 for line_num, line in enumerate(lines, start=1):
40 if line.strip().startswith("```"):
41 in_code_block = not in_code_block
42 continue
43
44 if in_code_block:
45 continue
46
47 for match in self.INLINE_LINK_PATTERN.finditer(line):
48 links.append({
49 "text": match.group(1),
50 "target": match.group(2),
51 "line": line_num
52 })
53
54 for match in self.REFERENCE_USE_PATTERN.finditer(line):
55 ref_name = match.group(2).lower()
56 if ref_name in reference_defs:
57 links.append({
58 "text": match.group(1),
59 "target": reference_defs[ref_name],
60 "line": line_num
61 })
62
63 return links
64
65def generate_test_content(num_lines: int, links_per_100_lines: int = 10) -> str:
66 """Generate test Markdown content with specified characteristics."""
67 lines = []
68 for i in range(num_lines):
69 if i % (100 // links_per_100_lines) == 0:
70 # Add an inline link
71 lines.append(f"Check out [Link {i}](https://example.com/page{i}) for details.")
72 elif i % 50 == 0:
73 # Add a code block
74 lines.append("```python")
75 lines.append(f"# This is code, links here [should](/python-advanced/05-c-extensions/case-studies/cython-markdown/be/ignored)")
76 lines.append("```")
77 else:
78 # Regular text
79 lines.append(f"This is line {i} with some regular text content.")
80
81 return '\n'.join(lines)
82
83def benchmark(func: Callable, content: str, iterations: int = 100) -> dict:
84 """Run benchmark and return statistics."""
85 times = []
86
87 # Warmup
88 for _ in range(5):
89 func(content)
90
91 # Actual benchmark
92 for _ in range(iterations):
93 start = time.perf_counter()
94 result = func(content)
95 end = time.perf_counter()
96 times.append(end - start)
97
98 return {
99 "mean": statistics.mean(times) * 1000, # Convert to ms
100 "stdev": statistics.stdev(times) * 1000,
101 "min": min(times) * 1000,
102 "max": max(times) * 1000,
103 "links_found": len(result),
104 }
105
106def main():
107 print("=" * 60)
108 print("Markdown Link Parser Benchmark")
109 print("=" * 60)
110
111 # Test different content sizes
112 sizes = [1000, 5000, 10000, 50000]
113
114 python_parser = PythonMarkdownParser()
115
116 # Try to import Cython version
117 try:
118 from markdown_parser import parse_markdown_links as cython_parse
119 has_cython = True
120 except ImportError:
121 print("\nWarning: Cython module not found.")
122 print("Run 'python setup.py build_ext --inplace' first.\n")
123 has_cython = False
124
125 for size in sizes:
126 print(f"\n--- Content size: {size} lines ---")
127 content = generate_test_content(size)
128
129 # Python benchmark
130 py_result = benchmark(python_parser.parse_markdown_links, content)
131 print(f"Python: {py_result['mean']:.3f} ms (+/- {py_result['stdev']:.3f} ms)")
132 print(f" Found {py_result['links_found']} links")
133
134 # Cython benchmark (if available)
135 if has_cython:
136 cy_result = benchmark(cython_parse, content)
137 speedup = py_result['mean'] / cy_result['mean']
138 print(f"Cython: {cy_result['mean']:.3f} ms (+/- {cy_result['stdev']:.3f} ms)")
139 print(f" Speedup: {speedup:.2f}x")
140
141 print("\n" + "=" * 60)
142
143if __name__ == "__main__":
144 main()預期結果
執行效能測試後,預期會看到類似以下的結果:
1============================================================
2Markdown Link Parser Benchmark
3============================================================
4
5--- Content size: 1000 lines ---
6Python: 0.523 ms (+/- 0.031 ms)
7 Found 100 links
8Cython: 0.198 ms (+/- 0.012 ms)
9 Speedup: 2.64x
10
11--- Content size: 5000 lines ---
12Python: 2.617 ms (+/- 0.089 ms)
13 Found 500 links
14Cython: 0.892 ms (+/- 0.045 ms)
15 Speedup: 2.93x
16
17--- Content size: 10000 lines ---
18Python: 5.234 ms (+/- 0.156 ms)
19 Found 1000 links
20Cython: 1.712 ms (+/- 0.078 ms)
21 Speedup: 3.06x
22
23--- Content size: 50000 lines ---
24Python: 26.18 ms (+/- 0.823 ms)
25 Found 5000 links
26Cython: 7.89 ms (+/- 0.312 ms)
27 Speedup: 3.32x
28
29============================================================結果分析:
| 內容大小 | Python | Cython | 加速比 |
|---|---|---|---|
| 1,000 行 | 0.52 ms | 0.20 ms | 2.6x |
| 5,000 行 | 2.62 ms | 0.89 ms | 2.9x |
| 10,000 行 | 5.23 ms | 1.71 ms | 3.1x |
| 50,000 行 | 26.2 ms | 7.89 ms | 3.3x |
觀察:
- 加速比隨著資料量增加而提高
- 主要效能提升來自迴圈優化和型別化變數
- 正則表達式仍然是瓶頸(Cython 無法加速
re模組本身)
設計權衡
| 面向 | 純 Python | Cython |
|---|---|---|
| 開發速度 | 快,即寫即用 | 中,需要編譯步驟 |
| 執行速度 | 基準 | 2-5x 加速 |
| 除錯難度 | 低,標準 Python 工具 | 中,需要看生成的 C 碼 |
| 部署複雜度 | 簡單,純 Python | 需要編譯環境或預編譯 wheel |
| 可維護性 | 高 | 中,需要了解 Cython 語法 |
| IDE 支援 | 完整 | 部分(.pyx 支援有限) |
| 跨平台 | 天生跨平台 | 需要為每個平台編譯 |
進階優化:使用 C 正則表達式
如果需要更高的效能,可以考慮使用 C 語言的正則表達式庫。以下是使用 PCRE2 的範例:
1# advanced_parser.pyx
2"""
3Advanced parser using PCRE2 C library for maximum performance.
4
5Requires: libpcre2-dev (Ubuntu) or pcre2 (macOS Homebrew)
6"""
7
8cdef extern from "pcre2.h":
9 # PCRE2 declarations...
10 pass
11
12# This is an advanced topic, see PCRE2 documentation for details不過,對於大多數使用情境,Python 的 re 模組配合 Cython 優化的迴圈已經足夠。
什麼時候該用 Cython?
適合使用
- 熱點程式碼已經用 profiler 確認
- 需要 2x 以上的效能提升
- 程式碼相對穩定,不常變動
- 團隊有能力維護 Cython 程式碼
- 可以接受編譯步驟
不建議使用
- 效能瓶頸在 I/O(網路、磁碟)
- 程式碼還在頻繁迭代中
- 跨平台部署且沒有 CI/CD 支援
- 團隊對 C 語言不熟悉
- 效能提升不到 2x
替代方案考量
1如果 Cython 不適合你的情境,考慮:
2
31. PyPy
4 - 無需修改程式碼
5 - JIT 編譯帶來 5-10x 加速
6 - 但相容性問題較多
7
82. Numba
9 - 針對數值計算優化
10 - 使用裝飾器即可加速
11 - 但僅支援部分 Python 語法
12
133. 演算法優化
14 - 先檢查是否有更好的演算法
15 - 減少不必要的記憶體分配
16 - 使用更高效的資料結構練習
基礎練習
將以下純 Python 函式轉換為 Cython:
1# exercise_1.py
2def count_words(text: str) -> dict:
3 """Count word frequencies in text."""
4 words = text.lower().split()
5 counts = {}
6 for word in words:
7 word = word.strip('.,!?;:')
8 if word:
9 counts[word] = counts.get(word, 0) + 1
10 return counts提示:
- 建立
exercise_1.pyx - 為
counts變數添加cdef dict宣告 - 為迴圈變數添加適當的型別宣告
- 考慮使用
cdef輔助函式處理字串清理
進階練習
使用 cProfile 驗證 Cython 加速效果:
1# exercise_2.py
2import cProfile
3import pstats
4
5def profile_parsers(content: str):
6 """Profile both Python and Cython parsers."""
7 from markdown_parser import parse_markdown_links as cython_parse
8
9 python_parser = PythonMarkdownParser()
10
11 # Profile Python version
12 print("=== Python Version ===")
13 cProfile.runctx(
14 'for _ in range(100): python_parser.parse_markdown_links(content)',
15 globals(), locals(),
16 'python_stats'
17 )
18 stats = pstats.Stats('python_stats')
19 stats.strip_dirs().sort_stats('cumulative').print_stats(10)
20
21 # Profile Cython version
22 print("\n=== Cython Version ===")
23 cProfile.runctx(
24 'for _ in range(100): cython_parse(content)',
25 globals(), locals(),
26 'cython_stats'
27 )
28 stats = pstats.Stats('cython_stats')
29 stats.strip_dirs().sort_stats('cumulative').print_stats(10)挑戰題
比較不同型別宣告策略的效能影響:
- 無型別宣告:將
.pyx當作純 Python 編譯 - 部分型別宣告:只為迴圈變數添加型別
- 完整型別宣告:為所有變數添加型別
- 使用 LinkInfo 類別 vs 使用 dict
記錄每種策略的效能,並分析哪些優化帶來最大效益。
延伸閱讀
返回:案例研究 返回:模組四:用 C 擴展 Python