dannote/reach
GitHub: dannote/reach
Reach 为 Elixir/Erlang 项目构建程序依赖图,解决代码依赖关系不清与安全数据流难以追踪的问题。
Stars: 0 | Forks: 0
# 到达
Elixir 和 Erlang 的程序依赖图。
Reach 构建一个图,捕捉**你的代码依赖于什么**。
对于任意表达式,你可以询问:什么会影响它?它会影响什么?
这两个语句可以安全地重排吗?用户输入在没有经过净化的情况下会到达这个
数据库调用吗?
## 使用场景
### 安全性:污点分析
用户输入在没有经过净化的情况下会到达危险的数据接收器吗?
```
graph = Reach.file_to_graph!("lib/my_app_web/controllers/user_controller.ex")
Reach.taint_analysis(graph,
sources: [type: :call, function: :params],
sinks: [type: :call, module: System, function: :cmd, arity: 2],
sanitizers: [type: :call, function: :sanitize]
)
#=> [%{source: ..., sink: ..., path: [...], sanitized: false}]
```
### 代码质量:死代码检测
```
graph = Reach.file_to_graph!("lib/accounts.ex")
for node <- Reach.dead_code(graph) do
IO.warn("Dead code at #{inspect(node.source_span)}: #{node.type}")
end
```
### 重构:安全重排
```
graph = Reach.file_to_graph!("lib/pipeline.ex")
for block <- Reach.nodes(graph, type: :block),
{a, b} <- pairs(block.children) do
if Reach.independent?(graph, a.id, b.id) do
IO.puts("Statements at lines #{a.source_span.start_line} and " <>
"#{b.source_span.start_line} can be safely reordered")
end
end
```
### 克隆检测:重排等价代码
如果语句是独立的,那么具有相同语句但不同顺序的两个块是克隆体。
`canonical_order` 会根据结构哈希对独立的兄弟节点进行排序,使得两种顺序产生相同的序列:
```
# 用于 ExDNA 集成
order = Reach.canonical_order(graph, block_node_id)
hash = :erlang.phash2(Enum.map(order, fn {_, node} -> node end))
```
### OTP:GenServer 状态流分析
```
graph = Reach.file_to_graph!("lib/my_server.ex")
edges = Reach.edges(graph)
# 哪些回调读取状态?
state_reads = Enum.filter(edges, &(&1.label == :state_read))
# 状态在回调之间流动吗?
state_passes = Enum.filter(edges, &(&1.label == :state_pass))
# ETS 写后读依赖
ets_deps = Enum.filter(edges, &match?({:ets_dep, _table}, &1.label))
```
### 并发:崩溃传播
```
graph = Reach.file_to_graph!("lib/my_supervisor.ex")
edges = Reach.edges(graph)
# 哪些监控连接到哪个 :DOWN 处理程序?
Enum.filter(edges, &(&1.label == :monitor_down))
# 此进程是否捕获退出?哪个处理程序接收它们?
Enum.filter(edges, &(&1.label == :trap_exit))
# Task.async → Task.await 数据流
Enum.filter(edges, &(&1.label == :task_result))
# 监督子进程启动顺序
Enum.filter(edges, &(&1.label == :startup_order))
```
## 安装
```
def deps do
[{:reach, "~> 1.0"}]
end
```
## 构建图
```
# 来自 Elixir 源代码
{:ok, graph} = Reach.string_to_graph("def foo(x), do: x + 1")
{:ok, graph} = Reach.file_to_graph("lib/my_module.ex")
# 来自 Erlang 源代码
{:ok, graph} = Reach.string_to_graph(source, language: :erlang)
{:ok, graph} = Reach.file_to_graph("src/my_module.erl") # auto-detected
# 来自预解析 AST(用于 Credo/ExDNA 集成)
{:ok, ast} = Code.string_to_quoted(source)
{:ok, graph} = Reach.ast_to_graph(ast)
# 来自已编译的 BEAM 字节码(看到宏展开后的代码)
{:ok, graph} = Reach.module_to_graph(MyApp.Accounts)
{:ok, graph} = Reach.compiled_to_graph(source)
# Bang 变体
graph = Reach.file_to_graph!("lib/my_module.ex")
```
BEAM 前端捕获了源代码分析无法获取的信息 ——
`use GenServer` 回调、宏展开后的 `try/rescue`、生成的函数:
```
# 来源:仅看到 init/1 和 handle_call/3
Reach.string_to_graph(genserver_source)
# BEAM:也看到 child_spec/1、terminate/2、handle_info/2
Reach.compiled_to_graph(genserver_source)
```
## 多文件项目分析
```
# 分析整个 Mix 项目
project = Reach.Project.from_mix_project()
# 或特定路径
project = Reach.Project.from_glob("lib/**/*.ex")
# 跨模块污点分析
Reach.Project.taint_analysis(project,
sources: [type: :call, function: :params],
sinks: [type: :call, module: System, function: :cmd]
)
# 外部模块的依赖项摘要
summaries = Reach.Project.summarize_dependency(Ecto.Adapters.SQL)
project = Reach.Project.from_mix_project(summaries: summaries)
```
## API 参考
### 节点
```
Reach.nodes(graph) # all nodes
Reach.nodes(graph, type: :call) # by type
Reach.nodes(graph, type: :call, module: Enum) # by module
Reach.nodes(graph, type: :call, module: Repo, function: :insert, arity: 1)
node = Reach.node(graph, node_id)
node.type #=> :call
node.meta #=> %{module: Repo, function: :insert, arity: 1}
node.source_span #=> %{file: "lib/accounts.ex", start_line: 12, ...}
```
### 切片
```
Reach.backward_slice(graph, node_id) # what affects this?
Reach.forward_slice(graph, node_id) # what does this affect?
Reach.chop(graph, source_id, sink_id) # how does A influence B?
```
### 数据流与独立性
```
Reach.data_flows?(graph, source_id, sink_id)
Reach.depends?(graph, id_a, id_b)
Reach.independent?(graph, id_a, id_b)
Reach.has_dependents?(graph, node_id)
Reach.passes_through?(graph, source_id, sink_id, &predicate/1)
```
### 效应
涵盖了 `Enum`、`Map`、`List`、`String`、`Keyword`、`Tuple`、`Integer`、
`Float`、`Atom`、`MapSet`、`Range`、`Regex`、`URI`、`Path`、`Base` 以及
Erlang 等价类型。`Enum.each` 被正确归类为不纯函数。
### 依赖关系
```
Reach.control_deps(graph, node_id) #=> [{controller_id, label}, ...]
Reach.data_deps(graph, node_id) #=> [{source_id, :variable_name}, ...]
Reach.edges(graph) # all dependence edges
```
### 过程间分析
```
Reach.function_graph(graph, {MyModule, :my_function, 2})
Reach.context_sensitive_slice(graph, node_id) # Horwitz-Reps-Binkley
Reach.call_graph(graph) # {module, function, arity} vertices
```
### 污点分析
```
# 关键词过滤器(与 nodes/2 格式相同)
Reach.taint_analysis(graph,
sources: [type: :call, function: :params],
sinks: [type: :call, module: Repo, function: :query],
sanitizers: [type: :call, function: :sanitize]
)
# 谓词也有效
Reach.taint_analysis(graph,
sources: &(&1.meta[:function] in [:params, :body_params]),
sinks: [type: :call, module: System, function: :cmd]
)
#=> [%{source: node, sink: node, path: [node_id, ...], sanitized: boolean}]
```
### 死代码
```
Reach.dead_code(graph)
#=> [%Reach.IR.Node{type: :call, meta: %{function: :upcase}}, ...]
```
### 规范排序
```
Reach.canonical_order(graph, block_id)
#=> [{node_id, %Reach.IR.Node{}}, ...] sorted so independent
# 兄弟节点具有确定性顺序,与来源顺序无关
```
### 邻居
```
Reach.neighbors(graph, node_id) # all direct neighbors
Reach.neighbors(graph, node_id, :state_read) # only state_read edges
Reach.neighbors(graph, node_id, :data) # matches {:data, _} labels
```
### 原始图访问
```
raw = Reach.to_graph(graph) # returns the underlying libgraph Graph.t()
Graph.vertices(raw) # use any libgraph function
Graph.get_shortest_path(raw, id_a, id_b)
```
### 导出
```
{:ok, dot} = Reach.to_dot(graph)
File.write!("graph.dot", dot)
# dot -Tpng graph.dot -o graph.png
```
## 边类型
| 标签 | 来源 | 含义 |
|------|------|------|
| `{:data, var}` | DDG | 数据通过变量 `var` 流动 |
| `:containment` | DDG | 父表达式依赖于子表达式 |
| `{:control, label}` | CDG | 执行由分支条件控制 |
| `:call` | SDG | 调用位置到被调用函数 |
| `:parameter_in` | SDG | 参数流向形式参数 |
| `:parameter_out` | SDG | 返回值流回调用者 |
| `:summary` | SDG | 快捷方式:参数流向返回值 |
| `:state_read` | OTP | GenServer 回调读取状态参数 |
| `:state_pass` | OTP | 状态在连续回调之间流动 |
| `{:ets_dep, table}` | OTP | ETS 写入 → 同一表的读取 |
| `{:pdict_dep, key}` | OTP | 进程字典放入 → 同一键的获取 |
| `:message_order` | OTP | 对同一 pid 的顺序发送 |
| `:monitor_down` | 并发 | `Process.monitor` → `handle_info({:DOWN, ...})` |
| `:trap_exit` | 并发 | `Process.flag(:trap_exit)` → `handle_info({:EXIT, ...})` |
| `:link_exit` | 并发 | `spawn_link` / `Process.link` → `:EXIT` 处理器 |
| `:task_result` | 并发 | `Task.async` → `Task.await` 数据流 |
| `:startup_order` | 并发 | 监督子进程 A 在子进程 B 之前启动 |
| `{:message_content, tag}` | OTP | `send(pid, {tag, data})` 负载 → 处理模式变量 |
| `:call_reply` | OTP | `{:reply, value, state}` → `GenServer.call` 返回 |
| `:match_binding` | DDG | 匹配右侧流向左侧定义变量 |
| `:higher_order` | SDG | 参数通过高阶函数流向结果 |
## 架构
```
graph TD
Source["Source (.ex / .erl / .beam)"] --> Frontend
Frontend["Frontend → IR Nodes
Elixir AST · Erlang abstract forms · BEAM bytecode"] --> CFG CFG["Control Flow Graph
entry/exit · branching · exceptions"] --> Dom Dom["Dominators
post-dominator tree · dominance frontier"] --> CD CD["Control Dependence
Ferrante et al. 1987"] --> PDG CFG --> DD DD["Data Dependence
def-use chains · containment edges"] --> PDG PDG["Program Dependence Graph
control ∪ data"] --> SDG SDG["System Dependence Graph + OTP + Concurrency
call/summary edges · GenServer · ETS · monitors · tasks"] --> Query Query["Query / Effects
slicing · independence · taint · purity"] ``` ## 性能 在真实项目上进行基准测试(单线程,Apple M1 Pro): | 项目 | 文件数 | 函数数 | IR 节点数 | 耗时 | 每文件耗时 | |------|--------|--------|-----------|------|------------| | ex_slop | 26 | 109 | 5,186 | 36ms | 1.4 | | ex_dna | 32 | 208 | 10,649 | 87ms | 2.7 | | Livebook | 72 | 589 | 22,156 | 160ms | 2.2 | | Oban | 64 | 606 | 31,413 | 195ms | 3.0 | | Keila | 190 | 1,074 | 49,354 | 282ms | 1.5 | | Phoenix | 74 | 1,090 | 48,292 | 333ms | 4.5 | | Absinthe | 282 | 1,411 | 68,416 | 375ms | 1.3 | 740 个文件,零崩溃。 ## 参考 - Ferrante, Ottenstein, Warren — *The Program Dependence Graph and Its Use in Optimization* (1987) - Horwitz, Reps, Binkley — *Interprocedural Slicing Using Dependence Graphs* (1990) - Silva, Tamarit, Tomás — *System Dependence Graphs for Erlang Programs* (2012) - Cooper, Harvey, Kennedy — *A Simple, Fast Dominance Algorithm* (2001) ## 许可证 [MIT](LICENSE)
Elixir AST · Erlang abstract forms · BEAM bytecode"] --> CFG CFG["Control Flow Graph
entry/exit · branching · exceptions"] --> Dom Dom["Dominators
post-dominator tree · dominance frontier"] --> CD CD["Control Dependence
Ferrante et al. 1987"] --> PDG CFG --> DD DD["Data Dependence
def-use chains · containment edges"] --> PDG PDG["Program Dependence Graph
control ∪ data"] --> SDG SDG["System Dependence Graph + OTP + Concurrency
call/summary edges · GenServer · ETS · monitors · tasks"] --> Query Query["Query / Effects
slicing · independence · taint · purity"] ``` ## 性能 在真实项目上进行基准测试(单线程,Apple M1 Pro): | 项目 | 文件数 | 函数数 | IR 节点数 | 耗时 | 每文件耗时 | |------|--------|--------|-----------|------|------------| | ex_slop | 26 | 109 | 5,186 | 36ms | 1.4 | | ex_dna | 32 | 208 | 10,649 | 87ms | 2.7 | | Livebook | 72 | 589 | 22,156 | 160ms | 2.2 | | Oban | 64 | 606 | 31,413 | 195ms | 3.0 | | Keila | 190 | 1,074 | 49,354 | 282ms | 1.5 | | Phoenix | 74 | 1,090 | 48,292 | 333ms | 4.5 | | Absinthe | 282 | 1,411 | 68,416 | 375ms | 1.3 | 740 个文件,零崩溃。 ## 参考 - Ferrante, Ottenstein, Warren — *The Program Dependence Graph and Its Use in Optimization* (1987) - Horwitz, Reps, Binkley — *Interprocedural Slicing Using Dependence Graphs* (1990) - Silva, Tamarit, Tomás — *System Dependence Graphs for Erlang Programs* (2012) - Cooper, Harvey, Kennedy — *A Simple, Fast Dominance Algorithm* (2001) ## 许可证 [MIT](LICENSE)
标签:Elixir, Erlang, OTP, WebSocket, 云安全监控, 代码安全, 代码路径分析, 代码重构, 依赖分析, 克隆检测, 函数式编程, 死代码检测, 源码分析, 漏洞枚举, 独立性分析, 程序依赖图, 自定义脚本, 静态分析