fundou1081/sv-trace

GitHub: fundou1081/sv-trace

sv-trace是一款用于追踪和可视化数据流向的工具，帮助安全人员快速定位和识别系统漏洞。

Stars: 0 | Forks: 0

# sv-trace ## 人类友好箭头式输出 (M5.1j) 所有 trace 都能用箭头式表达数据流向 — 人眼在终端/文档/聊天里一眼看懂谁驱动谁、谁被读。 ### 箭头语义 (固定) | 符号 | 含义 | |------|------| | `←` | driver (信号被这个表达式驱动) | | `→` | load (信号被这个表达式读取) | | `⚠` | 多驱动冲突 | | `✓` | verified (credibility >= 0.8) | | `✗` | not verified (credibility < 0.8) | | `⤴` | cross-file 跨文件 | | `↻` | cycle detected | ### 5 个 API 层级 (都可以用箭头式) ``` from signal_tracer import trace_signal, SignalTracer # 1. TraceResult / TraceSummary — 一键全部 drivers+loads result = trace_signal("count", sv, "counter.sv") print(result.to_arrow()) # DRIVERS (2): # count ← 8'h00 @ counter.sv:9 [counter] ✓ cred=1.00 # count ← count + data_in @ counter.sv:10 [counter] ✓ cred=1.00 # LOADS (0): # (none) # 2. 单条 trace for d in result.drivers: print(d.to_arrow()) # count ← 8'h00 @ counter.sv:9 [counter] ✓ cred=1.00 # 3. SignalTracer — 一键多驱动 t = SignalTracer() t.add_file("buggy.sv", multi_sv); t.build() print(t.multi_drivers_to_arrow()) # data ⚠ 2 drivers: # data ← 8'hAA @ buggy.sv:9 [buggy] ✓ cred=1.00 # data ← 8'h55 @ buggy.sv:12 [buggy] ✓ cred=1.00 # 4. 链追踪 — 完整上溯/下溯链 print(t.chain_to_arrow("data_out", direction="driver")) # data_out ← c ⤴ ← b ← a # 5. dump 转箭头 — 全链 + summary print(t.dump_to_arrow("data_out")) # Chain data_out: 4 hops, avg_cred=0.95, cross-file ✓, cycle ✗ # data_out ← c ✓ ← b ✓ ← a ✓ ``` ### 直接用 formatter 函数 ``` from signal_tracer import format_driver, format_load, format_all, ARROW_DRIVER, ARROW_LOAD print(format_driver(result.drivers[0])) print(format_all(result)) print(ARROW_DRIVER) # '←' print(ARROW_LOAD) # '→' ``` ### 与 `summary()` 区别 | 方法 | 适合场景 | |------|----------| | `summary()` | 短/字段化/适合 LLM 当 context (e.g. 'counter.sv:10 (always_ff) clk=clk reset=rst_n cond=[!rst_n]') | | `to_arrow()` | 箭头/数据流/适合人眼扫/聊天贴出来 (e.g. 'count ← count + data_in @ counter.sv:11 ✓ cred=1.0') | 两者并存, 根据场景选。 ### Tree / Vertical 风格 (M5.1k) — 长链/文档/聊天友好当链太长 (≥ 4 个信号) 或要贴到文档/聊天里, 一行箭头看不清楚。换成 **tree 风格** (类似 `tree(1)` 工具的输出) 或 **vertical 风格** (每行一个信号 + 箭头): ``` t = SignalTracer() t.add_file('top.sv', top_code) t.add_file('mid.sv', mid_code) t.add_file('leaf.sv', leaf_code) t.build() ``` **5 种风格 (全部带 tree 节点) — 选一个**: ``` # 1. arrow (默认): 一行, 短链友好 print(t.chain_to_arrow('top.u_mid.u_leaf_a.out_data', style='arrow')) # out_data ← out_data ← mid_data (↻ cycle detected) # 2. tree: tree 风格, Unicode box-drawing print(t.chain_to_arrow('top.u_mid.u_leaf_a.out_data', style='tree')) # Driver chain: top.u_mid.u_leaf_a.out_data (3 hops, ↻ cycle) # ├─ out_data [leaf.sv:11] ✓ cred=1.00 # │ ← out_data [leaf.sv:12] ✓ cred=1.00 # └─ ← mid_data [leaf.sv:9] ✓ cred=1.00 # 3. ascii: 同 tree 但用 ASCII (老终端 / 邮件 / 纯文本 log) print(t.chain_to_arrow('top.u_mid.u_leaf_a.out_data', style='ascii')) # Driver chain: top.u_mid.u_leaf_a.out_data (3 hops, ↻ cycle) # +-- out_data [leaf.sv:11] ✓ cred=1.00 # | ← out_data [leaf.sv:12] ✓ cred=1.00 # +-- ← mid_data [leaf.sv:9] ✓ cred=1.00 # 4. vertical: 每行一个信号, 缩进表示深度 print(t.chain_to_arrow('top.u_mid.u_leaf_a.out_data', style='vertical')) # out_data @ leaf.sv:11 ✓ cred=1.00 # ← out_data @ leaf.sv:12 ✓ cred=1.00 # ← mid_data @ leaf.sv:9 ✓ cred=1.00 # 5. all / both: arrow + tree 两个都返 print(t.chain_to_arrow('top.u_mid.u_leaf_a.out_data', style='all')) ``` **dump 也支持 tree/vertical**: ``` # dump_to_arrow 默认 1 行, style='tree' 转 tree print(t.dump_to_arrow('top.u_mid.u_leaf_a.out_data', style='tree')) # Driver chain: top.u_mid.u_leaf_a.out_data (3 hops) # ├─ out_data [leaf.sv:11] # │ ← out_data [leaf.sv:12] # └─ ← mid_data [leaf.sv:9] # 还可以用 alias t.chain_to_tree(signal, use_box=True) # tree style t.chain_to_tree(signal, use_box=False) # ASCII t.chain_to_vertical(signal) # vertical t.dump_to_tree(signal, use_box=True) # dump + tree t.dump_to_tree(signal, use_box=False) # dump + ascii ``` **`format_driver_chain` / `format_dump_summary` 也都接受 `style` 参数**, 给纯函数用户用。 **怎么选风格**: - **短链 (≤ 3 个信号)**: `arrow` (默认) — 一行就够 - **中链 (4-7) + 看代码**: `tree` — 节点 + location + cred 一起看 - **中链 + 贴 chat/markdown**: `vertical` — 不依赖 box-drawing - **老终端 / 邮件 / 纯文本 log**: `ascii` — 不需要 Unicode - **要全面**: `all` — arrow + tree 都给 ## 公开 API ### 函数式 ``` from signal_tracer import trace_signal, trace_signal_from_file result = trace_signal("signal_name", sv_code, "file.sv") result = trace_signal_from_file("signal_name", "path/to/file.sv") ``` ### 类式（多文件 + 层次路径） ``` from signal_tracer import SignalTracer, TraceSummary, ContextBundle t = SignalTracer() t.add_file('top.sv', top_code) t.add_file('sub.sv', sub_code) t.build() result = t.trace("signal_name") # TraceSummary ``` ### SignalTracer 主要方法 | 方法 | 说明 | |------|------| | `add_file(path, code)` | 加一个 .sv 文件到项目（链式） | | `build()` | 解析所有文件，构建追踪索引（必须先调） | | `trace(name)` | 追踪信号，返回 `TraceSummary`（智能匹配 hpath / leaf / 数组 / 后缀） | | `trace_drivers(name)` | 只返回 driver 列表 | | `trace_loads(name)` | 只返回 load 列表 | | `find_multi_drivers()` | 找所有被 ≥2 个 scope 驱动的信号（多驱动检测） | | `get_driver_count(name)` | 返回某信号的不同 scope 数 | | `get_driver_chain(name, max_depth=10)` | 递归查上游 driver 链（带 cycle detection） | ### TraceSummary 方法 | 方法 | 说明 | |------|------| | `get_clock_domains()` | 该信号涉及的所有时钟 | | `is_multi_driver()` | 是否被多个 scope 驱动 | | `get_driver_scopes()` | 所有驱动 scope 源码（去重） | | `to_contexts()` | 打包所有 driver 为 `List[ContextBundle]` | ### ContextBundle 字段 `ContextBundle`（frozen=True，不可变）打包： - `file` / `line` / `char_offset` — 位置 - `scope_text` / `scope_line_start/end` / `scope_kind` — scope 信息 - `clock` / `reset` — 时钟/复位 - `condition` / `condition_stack` — 嵌套条件栈 - `is_port` / `port_direction` / `hierarchical_path` — 端口 + 层次 - `confidence` — 置信度 - `to_dict()` / `summary()` — 序列化 / 一行可读 ## 状态 | 指标 | 数据 | |------|------| | 公开 API 测试 | **210/210 通过** (~4s) (含 50 个箭头式输出测试: 28 M5.1j + 22 M5.1k tree/vertical/ascii) | | 跨版本验证 | ✅ pyslang 10.x **和** 11.x 都 210/210 (make test-cross-version) | | 真实项目验证 | ✅ OpenTitan 6 模块 (30,218 drivers, 0 warning, 0 empty) | | 跨文件 fixture | 3 文件 / 3 层 instance (`tests/fixtures/m3_hierarchical/`) | | Benchmark | 11/11 (0 warning, 0 exception) | | 旧架构测试 | 已迁移 `tests/_legacy/`, 主测试 68/68 干净通过 | | 版本 | alpha | 跑测试： ``` python -m pytest tests/ -v # 跨 pyslang 10/11 版本验证 make test-cross-version ``` ## 测试覆盖 (M0–M4) 主测试 `tests/unit/test_signal_tracer.py` 包含 **23 个 TestClass, 117 个测试**： | 阶段 | TestClass | 测试数 | 覆盖点 | |------|-----------|--------|--------| | M0 | `TestBasic`, `TestControlFlow`, `TestArrays`, `TestNoCrashes` | — | 基础 always_ff/comb/latch, if/else/case 条件, 1D/2D 数组 | | M1 | `TestTraceResultFields` | — | 完整 TraceResult 字段填充 | | M1.5 | `TestMultiDriver`, `TestClockResetExtraction`, `TestDriverChain` | — | 多驱动检测, clock/reset 提取, driver_chain 递归 (cycle detection) | | M2 | `TestContextAccuracy`, `TestContextBundle` | — | line/scope_text 准确性, ContextBundle frozen dataclass | | M3 | `TestMultiFile` | — | 多文件 build, 层次路径 (`top.u_mid.u_leaf`), 后缀匹配 | | M4 | `TestExpressionCoverage`, `TestContinuousAssignRobustness`, `TestMultiFileLineFallback`, `TestScopeFilePath`, `TestAdditionalExpressions` | +5 | 17 种 SV 表达式, InvalidExpression 防御, 跨文件行号 (SourceManager), TraceResult.file 精确, 嵌套 MemberAccess+RangeSelect | | M4.1 | `TestInterfaceModport` | +6 | Interface/Modport 信号追踪 (HierarchicalValue), 跨 modport 读写, m.data[3:0] 位选 | | M5.1 | `TestCodeEvidence` | +8 | 代码证据链 (CodeEvidence), credibility_score 0-1 量化, is_verified 标记, `trace_verified()` 自动验证 | | M5.1b | `TestMultiDriverEvidence` | +4 | `find_multi_drivers(verify=True)` 默认自动带 evidence (看到冲突 + 真凭实据) | | M5.1c | `TestDriverChainEvidence` | +4 | `get_driver_chain(verify=True)` 默认链上每跳自动带 evidence (顺藤摸瓜带 credibility) | | M5.1d | `TestTraceLoadsEvidence` | +7 | `trace()`/`trace_drivers()`/`trace_loads()` 默认 verify=True, drivers 和 loads 都自动带 evidence (查谁读了某信号) | | M5.1e | `TestLoadChainEvidence` | +5 | `get_load_chain(verify=True)` 顺藤摸瓜查下游 (与 driver chain 对称) | | M5.1f | `TestDumpChain` | +9 | `dump_driver_chain()`/`dump_load_chain()` 一次 dump 整链为 dict (含 summary, LLM 友好) | | M5.1g | `TestDumpMultiDrivers` | +6 | `dump_multi_drivers()` 一次 dump 多驱动检测 (冲突列表 + 每个 driver evidence) | | M5.1h | `TestSyntaxNodeSnapshot` | +6 | syntax-based evidence 路径: SyntaxNodeSnapshot 冻结 + OpenTitan 跨文件 snippet 精度 | | M5.1h+ | (Makefile target) | — | 跨 pyslang 10.x/11.x 验证 (`make test-cross-version` 双 venv 跑 160+160 tests) | 各阶段演进： | 阶段 | 新增测试 | 累计 | |------|---------|------| | M0 | 13 | 13 | | M1 | 13 | 26 | | M1.5 | 20 | 46 | | M2 | 13 | 59 | | M3 | 9 | 68 | | M4 | 5 | 73 | | M4.1 | 6 | 74 | | M5.1 | 8 | 82 | | M5.1b | 4 | 86 | | M5.1c | 4 | 90 | | M5.1d | 7 | 97 | | M5.1e | 5 | 102 | | M5.1f | 9 | 111 | | M5.1g | 6 | 117 | | M5.1h | 6 | 123 | 主测试套件 (含 `test_signal_tracer.py` 和 `test_evidence_via_syntax.py`) 累计 **160 个** (其他测试文件: 边界/CI/legacy 37 个)。详见 [tests/README.md](tests/README.md) 和 [TEST_PLAN.md](TEST_PLAN.md)。 ## 代码证据链 (M5.1) 每个 trace 都带**可证伪的代码证据链** — 读回实际文件, 验证 `source_expr` 和 `signal_name` 真的在该行。LLM/用户能反查 trace 真的对, 而不是默默相信。 ### 核心 API ``` # 方式 1: trace_signal + 传 file_content result = trace_signal('count', sv_code, 'counter.sv') for ctx in result.to_contexts(file_content=sv_code): d = ctx.to_dict() print(f" credibility={d['credibility_score']} is_verified={d['is_verified']}") print(f" snippet: {d['evidence_snippet']}") print(ctx.code_evidence.to_evidence_string()) # 方式 2: SignalTracer 多文件 + 自动 in-memory 验证 t = SignalTracer() t.add_file('top.sv', top_code) t.add_file('sub.sv', sub_code) t.build() result = t.trace_verified('top.u_sub.signal') # 自动用 self._files 验证 ``` ### 可信度评分 (credibility_score 0-1) | 验证项 | 分值 | 说明 | |--------|------|------| | `file_readable` | +0.2 | 文件能读 | | `snippet_present` | +0.2 | line 存在 | | `matches_source_expr` | +0.4 | 文本里真找到 source_expr | | `matches_signal_name` | +0.2 | 文本里真找到 signal_name | `is_verified = file_readable ∧ snippet_present ∧ (matches_source ∨ matches_signal)` ### OpenTitan 验证 ``` tx_enable @ uart_core.sv:77: snippet: 'assign tx_enable = reg2hw.ctrl.tx.q;' matches: source_expr ✓, signal_name ✓ credibility: 1.0/1.0 (VERIFIED) context_before: [''] context_after: [' assign rx_enable = reg2hw.ctrl.rx.q;', ...] readbuf_threshold @ spi_device.sv:600: snippet: 'assign readbuf_threshold = reg2hw.read_threshold.q[BufferAw:0];' credibility: 1.0/1.0 (VERIFIED) — 含 BufferAw 的 RangeSelect 也 OK ``` ### 防御性: 不匹配会真实反映 | 场景 | credibility | is_verified | |------|-------------|-------------| | 文件不存在 | 0.0 | ❌ | | 可读但都不匹配 | 0.4 | ❌ | | 仅 signal_name 匹配 | 0.6 | ✅ | | 全部匹配 | 1.0 | ✅ | evidence 不会"假装 OK"，会真实反映可信度。 ## 代码证据链语法路径 (M5.1h) **核心问题**: file-based evidence 依赖 `

标签：API 开发, SV 代码分析, SV 语言, 代码追踪, 信号处理, 信号追踪, 可信度评估, 多层级追踪, 多文件分析, 多驱动检测, 完整性分析, 平均可信度, 循环检测, 数据流向, 数据管道, 格式化输出, 电子设计自动化, 箭头式输出, 系统级验证, 跨文件追踪, 软件工程, 逆向工具, 链追踪, 验证与测试