dannote/reach

GitHub: dannote/reach

Reach 为 Elixir/Erlang 项目构建程序依赖图,解决代码依赖关系不清与安全数据流难以追踪的问题。

Stars: 0 | Forks: 0

# 到达 Elixir 和 Erlang 的程序依赖图。 Reach 构建一个图,捕捉**你的代码依赖于什么**。 对于任意表达式,你可以询问:什么会影响它?它会影响什么? 这两个语句可以安全地重排吗?用户输入在没有经过净化的情况下会到达这个 数据库调用吗? ## 使用场景 ### 安全性:污点分析 用户输入在没有经过净化的情况下会到达危险的数据接收器吗? ``` graph = Reach.file_to_graph!("lib/my_app_web/controllers/user_controller.ex") Reach.taint_analysis(graph, sources: [type: :call, function: :params], sinks: [type: :call, module: System, function: :cmd, arity: 2], sanitizers: [type: :call, function: :sanitize] ) #=> [%{source: ..., sink: ..., path: [...], sanitized: false}] ``` ### 代码质量:死代码检测 ``` graph = Reach.file_to_graph!("lib/accounts.ex") for node <- Reach.dead_code(graph) do IO.warn("Dead code at #{inspect(node.source_span)}: #{node.type}") end ``` ### 重构:安全重排 ``` graph = Reach.file_to_graph!("lib/pipeline.ex") for block <- Reach.nodes(graph, type: :block), {a, b} <- pairs(block.children) do if Reach.independent?(graph, a.id, b.id) do IO.puts("Statements at lines #{a.source_span.start_line} and " <> "#{b.source_span.start_line} can be safely reordered") end end ``` ### 克隆检测:重排等价代码 如果语句是独立的,那么具有相同语句但不同顺序的两个块是克隆体。 `canonical_order` 会根据结构哈希对独立的兄弟节点进行排序,使得两种顺序产生相同的序列: ``` # 用于 ExDNA 集成 order = Reach.canonical_order(graph, block_node_id) hash = :erlang.phash2(Enum.map(order, fn {_, node} -> node end)) ``` ### OTP:GenServer 状态流分析 ``` graph = Reach.file_to_graph!("lib/my_server.ex") edges = Reach.edges(graph) # 哪些回调读取状态? state_reads = Enum.filter(edges, &(&1.label == :state_read)) # 状态在回调之间流动吗? state_passes = Enum.filter(edges, &(&1.label == :state_pass)) # ETS 写后读依赖 ets_deps = Enum.filter(edges, &match?({:ets_dep, _table}, &1.label)) ``` ### 并发:崩溃传播 ``` graph = Reach.file_to_graph!("lib/my_supervisor.ex") edges = Reach.edges(graph) # 哪些监控连接到哪个 :DOWN 处理程序? Enum.filter(edges, &(&1.label == :monitor_down)) # 此进程是否捕获退出?哪个处理程序接收它们? Enum.filter(edges, &(&1.label == :trap_exit)) # Task.async → Task.await 数据流 Enum.filter(edges, &(&1.label == :task_result)) # 监督子进程启动顺序 Enum.filter(edges, &(&1.label == :startup_order)) ``` ## 安装 ``` def deps do [{:reach, "~> 1.0"}] end ``` ## 构建图 ``` # 来自 Elixir 源代码 {:ok, graph} = Reach.string_to_graph("def foo(x), do: x + 1") {:ok, graph} = Reach.file_to_graph("lib/my_module.ex") # 来自 Erlang 源代码 {:ok, graph} = Reach.string_to_graph(source, language: :erlang) {:ok, graph} = Reach.file_to_graph("src/my_module.erl") # auto-detected # 来自预解析 AST(用于 Credo/ExDNA 集成) {:ok, ast} = Code.string_to_quoted(source) {:ok, graph} = Reach.ast_to_graph(ast) # 来自已编译的 BEAM 字节码(看到宏展开后的代码) {:ok, graph} = Reach.module_to_graph(MyApp.Accounts) {:ok, graph} = Reach.compiled_to_graph(source) # Bang 变体 graph = Reach.file_to_graph!("lib/my_module.ex") ``` BEAM 前端捕获了源代码分析无法获取的信息 —— `use GenServer` 回调、宏展开后的 `try/rescue`、生成的函数: ``` # 来源:仅看到 init/1 和 handle_call/3 Reach.string_to_graph(genserver_source) # BEAM:也看到 child_spec/1、terminate/2、handle_info/2 Reach.compiled_to_graph(genserver_source) ``` ## 多文件项目分析 ``` # 分析整个 Mix 项目 project = Reach.Project.from_mix_project() # 或特定路径 project = Reach.Project.from_glob("lib/**/*.ex") # 跨模块污点分析 Reach.Project.taint_analysis(project, sources: [type: :call, function: :params], sinks: [type: :call, module: System, function: :cmd] ) # 外部模块的依赖项摘要 summaries = Reach.Project.summarize_dependency(Ecto.Adapters.SQL) project = Reach.Project.from_mix_project(summaries: summaries) ``` ## API 参考 ### 节点 ``` Reach.nodes(graph) # all nodes Reach.nodes(graph, type: :call) # by type Reach.nodes(graph, type: :call, module: Enum) # by module Reach.nodes(graph, type: :call, module: Repo, function: :insert, arity: 1) node = Reach.node(graph, node_id) node.type #=> :call node.meta #=> %{module: Repo, function: :insert, arity: 1} node.source_span #=> %{file: "lib/accounts.ex", start_line: 12, ...} ``` ### 切片 ``` Reach.backward_slice(graph, node_id) # what affects this? Reach.forward_slice(graph, node_id) # what does this affect? Reach.chop(graph, source_id, sink_id) # how does A influence B? ``` ### 数据流与独立性 ``` Reach.data_flows?(graph, source_id, sink_id) Reach.depends?(graph, id_a, id_b) Reach.independent?(graph, id_a, id_b) Reach.has_dependents?(graph, node_id) Reach.passes_through?(graph, source_id, sink_id, &predicate/1) ``` ### 效应 涵盖了 `Enum`、`Map`、`List`、`String`、`Keyword`、`Tuple`、`Integer`、 `Float`、`Atom`、`MapSet`、`Range`、`Regex`、`URI`、`Path`、`Base` 以及 Erlang 等价类型。`Enum.each` 被正确归类为不纯函数。 ### 依赖关系 ``` Reach.control_deps(graph, node_id) #=> [{controller_id, label}, ...] Reach.data_deps(graph, node_id) #=> [{source_id, :variable_name}, ...] Reach.edges(graph) # all dependence edges ``` ### 过程间分析 ``` Reach.function_graph(graph, {MyModule, :my_function, 2}) Reach.context_sensitive_slice(graph, node_id) # Horwitz-Reps-Binkley Reach.call_graph(graph) # {module, function, arity} vertices ``` ### 污点分析 ``` # 关键词过滤器(与 nodes/2 格式相同) Reach.taint_analysis(graph, sources: [type: :call, function: :params], sinks: [type: :call, module: Repo, function: :query], sanitizers: [type: :call, function: :sanitize] ) # 谓词也有效 Reach.taint_analysis(graph, sources: &(&1.meta[:function] in [:params, :body_params]), sinks: [type: :call, module: System, function: :cmd] ) #=> [%{source: node, sink: node, path: [node_id, ...], sanitized: boolean}] ``` ### 死代码 ``` Reach.dead_code(graph) #=> [%Reach.IR.Node{type: :call, meta: %{function: :upcase}}, ...] ``` ### 规范排序 ``` Reach.canonical_order(graph, block_id) #=> [{node_id, %Reach.IR.Node{}}, ...] sorted so independent # 兄弟节点具有确定性顺序,与来源顺序无关 ``` ### 邻居 ``` Reach.neighbors(graph, node_id) # all direct neighbors Reach.neighbors(graph, node_id, :state_read) # only state_read edges Reach.neighbors(graph, node_id, :data) # matches {:data, _} labels ``` ### 原始图访问 ``` raw = Reach.to_graph(graph) # returns the underlying libgraph Graph.t() Graph.vertices(raw) # use any libgraph function Graph.get_shortest_path(raw, id_a, id_b) ``` ### 导出 ``` {:ok, dot} = Reach.to_dot(graph) File.write!("graph.dot", dot) # dot -Tpng graph.dot -o graph.png ``` ## 边类型 | 标签 | 来源 | 含义 | |------|------|------| | `{:data, var}` | DDG | 数据通过变量 `var` 流动 | | `:containment` | DDG | 父表达式依赖于子表达式 | | `{:control, label}` | CDG | 执行由分支条件控制 | | `:call` | SDG | 调用位置到被调用函数 | | `:parameter_in` | SDG | 参数流向形式参数 | | `:parameter_out` | SDG | 返回值流回调用者 | | `:summary` | SDG | 快捷方式:参数流向返回值 | | `:state_read` | OTP | GenServer 回调读取状态参数 | | `:state_pass` | OTP | 状态在连续回调之间流动 | | `{:ets_dep, table}` | OTP | ETS 写入 → 同一表的读取 | | `{:pdict_dep, key}` | OTP | 进程字典放入 → 同一键的获取 | | `:message_order` | OTP | 对同一 pid 的顺序发送 | | `:monitor_down` | 并发 | `Process.monitor` → `handle_info({:DOWN, ...})` | | `:trap_exit` | 并发 | `Process.flag(:trap_exit)` → `handle_info({:EXIT, ...})` | | `:link_exit` | 并发 | `spawn_link` / `Process.link` → `:EXIT` 处理器 | | `:task_result` | 并发 | `Task.async` → `Task.await` 数据流 | | `:startup_order` | 并发 | 监督子进程 A 在子进程 B 之前启动 | | `{:message_content, tag}` | OTP | `send(pid, {tag, data})` 负载 → 处理模式变量 | | `:call_reply` | OTP | `{:reply, value, state}` → `GenServer.call` 返回 | | `:match_binding` | DDG | 匹配右侧流向左侧定义变量 | | `:higher_order` | SDG | 参数通过高阶函数流向结果 | ## 架构 ``` graph TD Source["Source (.ex / .erl / .beam)"] --> Frontend Frontend["Frontend → IR Nodes
Elixir AST · Erlang abstract forms · BEAM bytecode"] --> CFG CFG["Control Flow Graph
entry/exit · branching · exceptions"] --> Dom Dom["Dominators
post-dominator tree · dominance frontier"] --> CD CD["Control Dependence
Ferrante et al. 1987"] --> PDG CFG --> DD DD["Data Dependence
def-use chains · containment edges"] --> PDG PDG["Program Dependence Graph
control ∪ data"] --> SDG SDG["System Dependence Graph + OTP + Concurrency
call/summary edges · GenServer · ETS · monitors · tasks"] --> Query Query["Query / Effects
slicing · independence · taint · purity"] ``` ## 性能 在真实项目上进行基准测试(单线程,Apple M1 Pro): | 项目 | 文件数 | 函数数 | IR 节点数 | 耗时 | 每文件耗时 | |------|--------|--------|-----------|------|------------| | ex_slop | 26 | 109 | 5,186 | 36ms | 1.4 | | ex_dna | 32 | 208 | 10,649 | 87ms | 2.7 | | Livebook | 72 | 589 | 22,156 | 160ms | 2.2 | | Oban | 64 | 606 | 31,413 | 195ms | 3.0 | | Keila | 190 | 1,074 | 49,354 | 282ms | 1.5 | | Phoenix | 74 | 1,090 | 48,292 | 333ms | 4.5 | | Absinthe | 282 | 1,411 | 68,416 | 375ms | 1.3 | 740 个文件,零崩溃。 ## 参考 - Ferrante, Ottenstein, Warren — *The Program Dependence Graph and Its Use in Optimization* (1987) - Horwitz, Reps, Binkley — *Interprocedural Slicing Using Dependence Graphs* (1990) - Silva, Tamarit, Tomás — *System Dependence Graphs for Erlang Programs* (2012) - Cooper, Harvey, Kennedy — *A Simple, Fast Dominance Algorithm* (2001) ## 许可证 [MIT](LICENSE)
标签:Elixir, Erlang, OTP, WebSocket, 云安全监控, 代码安全, 代码路径分析, 代码重构, 依赖分析, 克隆检测, 函数式编程, 死代码检测, 源码分析, 漏洞枚举, 独立性分析, 程序依赖图, 自定义脚本, 静态分析