dannote/reach

GitHub: elixir-vibe/reach

Reach 为 Elixir/Erlang 项目构建程序依赖图，解决代码依赖关系不清与安全数据流难以追踪的问题。

Stars: 160 | Forks: 11

# 到达 Elixir 和 Erlang 的程序依赖图。 Reach 构建一个图，捕捉**你的代码依赖于什么**。对于任意表达式，你可以询问：什么会影响它？它会影响什么？这两个语句可以安全地重排吗？用户输入在没有经过净化的情况下会到达这个数据库调用吗？ ## 使用场景 ### 安全性：污点分析用户输入在没有经过净化的情况下会到达危险的数据接收器吗？ ``` graph = Reach.file_to_graph!("lib/my_app_web/controllers/user_controller.ex") Reach.taint_analysis(graph, sources: [type: :call, function: :params], sinks: [type: :call, module: System, function: :cmd, arity: 2], sanitizers: [type: :call, function: :sanitize] ) #=> [%{source: ..., sink: ..., path: [...], sanitized: false}] ``` ### 代码质量：死代码检测 ``` graph = Reach.file_to_graph!("lib/accounts.ex") for node <- Reach.dead_code(graph) do IO.warn("Dead code at #{inspect(node.source_span)}: #{node.type}") end ``` ### 重构：安全重排 ``` graph = Reach.file_to_graph!("lib/pipeline.ex") for block <- Reach.nodes(graph, type: :block), {a, b} <- pairs(block.children) do if Reach.independent?(graph, a.id, b.id) do IO.puts("Statements at lines #{a.source_span.start_line} and " <> "#{b.source_span.start_line} can be safely reordered") end end ``` ### 克隆检测：重排等价代码如果语句是独立的，那么具有相同语句但不同顺序的两个块是克隆体。 `canonical_order` 会根据结构哈希对独立的兄弟节点进行排序，使得两种顺序产生相同的序列： ``` # 用于 ExDNA 集成 order = Reach.canonical_order(graph, block_node_id) hash = :erlang.phash2(Enum.map(order, fn {_, node} -> node end)) ``` ### OTP：GenServer 状态流分析 ``` graph = Reach.file_to_graph!("lib/my_server.ex") edges = Reach.edges(graph) # 哪些回调读取状态？ state_reads = Enum.filter(edges, &(&1.label == :state_read)) # 状态在回调之间流动吗？ state_passes = Enum.filter(edges, &(&1.label == :state_pass)) # ETS 写后读依赖 ets_deps = Enum.filter(edges, &match?({:ets_dep, _table}, &1.label)) ``` ### 并发：崩溃传播 ``` graph = Reach.file_to_graph!("lib/my_supervisor.ex") edges = Reach.edges(graph) # 哪些监控连接到哪个 :DOWN 处理程序？ Enum.filter(edges, &(&1.label == :monitor_down)) # 此进程是否捕获退出？哪个处理程序接收它们？ Enum.filter(edges, &(&1.label == :trap_exit)) # Task.async → Task.await 数据流 Enum.filter(edges, &(&1.label == :task_result)) # 监督子进程启动顺序 Enum.filter(edges, &(&1.label == :startup_order)) ``` ## 安装 ``` def deps do [{:reach, "~> 1.0"}] end ``` ## 构建图 ``` # 来自 Elixir 源代码 {:ok, graph} = Reach.string_to_graph("def foo(x), do: x + 1") {:ok, graph} = Reach.file_to_graph("lib/my_module.ex") # 来自 Erlang 源代码 {:ok, graph} = Reach.string_to_graph(source, language: :erlang) {:ok, graph} = Reach.file_to_graph("src/my_module.erl") # auto-detected # 来自预解析 AST（用于 Credo/ExDNA 集成） {:ok, ast} = Code.string_to_quoted(source) {:ok, graph} = Reach.ast_to_graph(ast) # 来自已编译的 BEAM 字节码（看到宏展开后的代码） {:ok, graph} = Reach.module_to_graph(MyApp.Accounts) {:ok, graph} = Reach.compiled_to_graph(source) # Bang 变体 graph = Reach.file_to_graph!("lib/my_module.ex") ``` BEAM 前端捕获了源代码分析无法获取的信息 —— `use GenServer` 回调、宏展开后的 `try/rescue`、生成的函数： ``` # 来源：仅看到 init/1 和 handle_call/3 Reach.string_to_graph(genserver_source) # BEAM：也看到 child_spec/1、terminate/2、handle_info/2 Reach.compiled_to_graph(genserver_source) ``` ## 多文件项目分析 ``` # 分析整个 Mix 项目 project = Reach.Project.from_mix_project() # 或特定路径 project = Reach.Project.from_glob("lib/**/*.ex") # 跨模块污点分析 Reach.Project.taint_analysis(project, sources: [type: :call, function: :params], sinks: [type: :call, module: System, function: :cmd] ) # 外部模块的依赖项摘要 summaries = Reach.Project.summarize_dependency(Ecto.Adapters.SQL) project = Reach.Project.from_mix_project(summaries: summaries) ``` ## API 参考 ### 节点 ``` Reach.nodes(graph) # all nodes Reach.nodes(graph, type: :call) # by type Reach.nodes(graph, type: :call, module: Enum) # by module Reach.nodes(graph, type: :call, module: Repo, function: :insert, arity: 1) node = Reach.node(graph, node_id) node.type #=> :call node.meta #=> %{module: Repo, function: :insert, arity: 1} node.source_span #=> %{file: "lib/accounts.ex", start_line: 12, ...} ``` ### 切片 ``` Reach.backward_slice(graph, node_id) # what affects this? Reach.forward_slice(graph, node_id) # what does this affect? Reach.chop(graph, source_id, sink_id) # how does A influence B? ``` ### 数据流与独立性 ``` Reach.data_flows?(graph, source_id, sink_id) Reach.depends?(graph, id_a, id_b) Reach.independent?(graph, id_a, id_b) Reach.has_dependents?(graph, node_id) Reach.passes_through?(graph, source_id, sink_id, &predicate/1) ``` ### 效应涵盖了 `Enum`、`Map`、`List`、`String`、`Keyword`、`Tuple`、`Integer`、 `Float`、`Atom`、`MapSet`、`Range`、`Regex`、`URI`、`Path`、`Base` 以及 Erlang 等价类型。`Enum.each` 被正确归类为不纯函数。 ### 依赖关系 ``` Reach.control_deps(graph, node_id) #=> [{controller_id, label}, ...] Reach.data_deps(graph, node_id) #=> [{source_id, :variable_name}, ...] Reach.edges(graph) # all dependence edges ``` ### 过程间分析 ``` Reach.function_graph(graph, {MyModule, :my_function, 2}) Reach.context_sensitive_slice(graph, node_id) # Horwitz-Reps-Binkley Reach.call_graph(graph) # {module, function, arity} vertices ``` ### 污点分析 ``` # 关键词过滤器（与 nodes/2 格式相同） Reach.taint_analysis(graph, sources: [type: :call, function: :params], sinks: [type: :call, module: Repo, function: :query], sanitizers: [type: :call, function: :sanitize] ) # 谓词也有效 Reach.taint_analysis(graph, sources: &(&1.meta[:function] in [:params, :body_params]), sinks: [type: :call, module: System, function: :cmd] ) #=> [%{source: node, sink: node, path: [node_id, ...], sanitized: boolean}] ``` ### 死代码 ``` Reach.dead_code(graph) #=> [%Reach.IR.Node{type: :call, meta: %{function: :upcase}}, ...] ``` ### 规范排序 ``` Reach.canonical_order(graph, block_id) #=> [{node_id, %Reach.IR.Node{}}, ...] sorted so independent # 兄弟节点具有确定性顺序，与来源顺序无关 ``` ### 邻居 ``` Reach.neighbors(graph, node_id) # all direct neighbors Reach.neighbors(graph, node_id, :state_read) # only state_read edges Reach.neighbors(graph, node_id, :data) # matches {:data, _} labels ``` ### 原始图访问 ``` raw = Reach.to_graph(graph) # returns the underlying libgraph Graph.t() Graph.vertices(raw) # use any libgraph function Graph.get_shortest_path(raw, id_a, id_b) ``` ### 导出 ``` {:ok, dot} = Reach.to_dot(graph) File.write!("graph.dot", dot) # dot -Tpng graph.dot -o graph.png ``` ## 边类型 | 标签 | 来源 | 含义 | |------|------|------| | `{:data, var}` | DDG | 数据通过变量 `var` 流动 | | `:containment` | DDG | 父表达式依赖于子表达式 | | `{:control, label}` | CDG | 执行由分支条件控制 | | `:call` | SDG | 调用位置到被调用函数 | | `:parameter_in` | SDG | 参数流向形式参数 | | `:parameter_out` | SDG | 返回值流回调用者 | | `:summary` | SDG | 快捷方式：参数流向返回值 | | `:state_read` | OTP | GenServer 回调读取状态参数 | | `:state_pass` | OTP | 状态在连续回调之间流动 | | `{:ets_dep, table}` | OTP | ETS 写入 → 同一表的读取 | | `{:pdict_dep, key}` | OTP | 进程字典放入 → 同一键的获取 | | `:message_order` | OTP | 对同一 pid 的顺序发送 | | `:monitor_down` | 并发 | `Process.monitor` → `handle_info({:DOWN, ...})` | | `:trap_exit` | 并发 | `Process.flag(:trap_exit)` → `handle_info({:EXIT, ...})` | | `:link_exit` | 并发 | `spawn_link` / `Process.link` → `:EXIT` 处理器 | | `:task_result` | 并发 | `Task.async` → `Task.await` 数据流 | | `:startup_order` | 并发 | 监督子进程 A 在子进程 B 之前启动 | | `{:message_content, tag}` | OTP | `send(pid, {tag, data})` 负载 → 处理模式变量 | | `:call_reply` | OTP | `{:reply, value, state}` → `GenServer.call` 返回 | | `:match_binding` | DDG | 匹配右侧流向左侧定义变量 | | `:higher_order` | SDG | 参数通过高阶函数流向结果 | ## 架构 ``` graph TD Source["Source (.ex / .erl / .beam)"] --> Frontend Frontend["Frontend → IR Nodes
Elixir AST · Erlang abstract forms · BEAM bytecode"] --> CFG CFG["Control Flow Graph
entry/exit · branching · exceptions"] --> Dom Dom["Dominators
post-dominator tree · dominance frontier"] --> CD CD["Control Dependence
Ferrante et al. 1987"] --> PDG CFG --> DD DD["Data Dependence
def-use chains · containment edges"] --> PDG PDG["Program Dependence Graph
control ∪ data"] --> SDG SDG["System Dependence Graph + OTP + Concurrency
call/summary edges · GenServer · ETS · monitors · tasks"] --> Query Query["Query / Effects
slicing · independence · taint · purity"] ``` ## 性能在真实项目上进行基准测试（单线程，Apple M1 Pro）： | 项目 | 文件数 | 函数数 | IR 节点数 | 耗时 | 每文件耗时 | |------|--------|--------|-----------|------|------------| | ex_slop | 26 | 109 | 5,186 | 36ms | 1.4 | | ex_dna | 32 | 208 | 10,649 | 87ms | 2.7 | | Livebook | 72 | 589 | 22,156 | 160ms | 2.2 | | Oban | 64 | 606 | 31,413 | 195ms | 3.0 | | Keila | 190 | 1,074 | 49,354 | 282ms | 1.5 | | Phoenix | 74 | 1,090 | 48,292 | 333ms | 4.5 | | Absinthe | 282 | 1,411 | 68,416 | 375ms | 1.3 | 740 个文件，零崩溃。 ## 参考 - Ferrante, Ottenstein, Warren — *The Program Dependence Graph and Its Use in Optimization* (1987) - Horwitz, Reps, Binkley — *Interprocedural Slicing Using Dependence Graphs* (1990) - Silva, Tamarit, Tomás — *System Dependence Graphs for Erlang Programs* (2012) - Cooper, Harvey, Kennedy — *A Simple, Fast Dominance Algorithm* (2001) ## 许可证 [MIT](LICENSE)

标签：Elixir, Erlang, OTP, WebSocket, 云安全监控, 代码安全, 代码路径分析, 代码重构, 依赖分析, 克隆检测, 函数式编程, 死代码检测, 源码分析, 漏洞枚举, 独立性分析, 程序依赖图, 自定义脚本, 静态分析