droidsaw/droidsaw-common

GitHub: droidsaw/droidsaw-common

反编译器管线中与格式无关的算法核心库，提供支配者计算、SSA 构建、区域结构化等编译器中间分析 pass，并辅以差异 fuzz 与形式化验证保障正确性。

Stars: 0 | Forks: 1

# droidsaw-common `droidsaw-common` 是工作区的共享算法 crate。它包含了反编译器 pipeline 中与格式无关的中间部分——支配者、SSA 构建（Braun 等人，CC 2013）、区域结构化——以及每个 bundle parser 都依赖的 finding 模型和字节编码原语。没有 I/O，没有 bytecode 知识。纯 Rust 编写，BSD-3-Clause 许可证。 **Common 托管算法；bundle 托管指令。** DEX 和 Hermes 使用相同的支配者 pass、相同的 SSA 构建器和相同的区域结构化器。此处的正确性衡量标准适用于两者。 ## Trait 边界算法对于格式 crate 所拥有的内容是泛型的： - 图算法接受 `impl Graph` (`src/graph/mod.rs:79`)。该 trait 只有四个方法 —— `entry`、`nodes`、`successors`、`predecessors` —— 没有别的。没有 opcode，没有指令类型，没有操作数语义。 - SSA 构建接受 `impl SsaCfg` 外加寄存器类型 `R` (`src/ssa/mod.rs`)。调用者在每次操作数使用时驱动 `read_variable` / `write_variable` —— common 不解释指令。 - 区域结构化接受 `impl StmtBackend` (`src/region/mod.rs`)。该 backend 提供 `extract_condition`、`emit_block_ops`、`build_if`、`build_while`、`build_try_catch` 等。Common 拥有树形状和下降顺序；backend 拥有每种特定的语言字符串。永远不放在这里的内容： - 格式 parser（DEX、HBC、AXML、ZIP、MUTF-8 字符串表）。 - Opcode 表和按格式分类的百科数据。 - I/O（有一个作用域内的例外 —— `diag` 从 `panic::set_hook` 处理程序写入 panic 复现 bundle）。 ## 模块 | 模块 | 用途 | |---|---| | `graph` | RPO、Cooper-Harvey-Kennedy 支配者、带虚拟 exit 的后支配者、自然循环、可归约性、不可归约 SCC 检测 | | `ssa` | 两阶段迭代的 Braun 实时 SSA 名称解析（`Builder::read_variable` / `write_variable` / `seal_phis`） | | `region` | 区域 IR（`RegionNode`）、结构化器（`build_region_tree`）、降级器（`lower_region`）、深度上限 `MAX_REGION_DEPTH = 512` | | `signature` | 用于反转驱动反编译的识别器引擎（按 opcode 分类指纹位于 bundle crate 中） | | `encoding` | 小端序读取器、ULEB128 / SLEB128、MUTF-8、base64、UTF-8 安全截断。每个读取器都是 `Result<_, EncodingError>` 并在 slice 访问前进行边界检查 | | `entropy` | 香农熵 + 滑动窗口 `entropy_profile`（`Encrypted` / `Compressed` / `PackedPayload` / `Plaintext`） | | `strings` | ASCII 序列提取和熵分类（`Text` < 3.5 < `Structured` < 4.2 < `Mixed` < 4.8 < `HighEntropy` < 5.5 ≤ `RandomLooking`） | | `finding` | `Finding`、`Severity`（`Critical < High < Medium < Low < Info`）、`Layer`、`GaugeClass`（`Cryptographic` / `Semantic` / `Representational`） | | `analysis` | `TaintSource`、`TaintSink`、`TaintFinding` —— 跨层污点词汇 | | `provenance` | `Provenance`、`SdkId`、`AssetFormat` —— 仅类型载体，用于跨检测器抑制 | | `crypto` | `KeyMaterial`、`CryptoExtractor` trait —— 批量 GCD / ROCA 跨层的共享形状 | | `budget` | `ParseBudget`（内存 + 步数 + 挂钟时间）、`BudgetExhausted` | | `guard` | `bound_count` —— 在解析时每次读取计数进行 `Vec::with_capacity(n)` 之前的 slice 长度交叉检查 | | `diag` | Panic-hook 复现 bundle，每线程阶段转储（`DROIDSAW_CRASH_DIR`、`DROIDSAW_DUMP`） | | `threat_model` | 采集元数据（通过 chrono 的 RFC-3339 时间戳） | | `telemetry` | `SilentSkipCounters` —— 针对 silent-skip 代码路径的可观测性 | ## 分层 Oracle 差异测试此 crate 中的三个生产算法在 fuzz target 后台运行着教科书级别的差异 oracle。该 oracle 是 **不崩溃的静默错误答案** bug 类型的唯一检测器 —— panic-fuzz 和 ASAN 无法观察到语义错误的输出。 | 生产环境 | Oracle | 参考 | Fuzz target | |---|---|---|---| | Cooper-Harvey-Kennedy 支配者（`src/graph/dominators.rs`） | O(N³) 基于可达性（`src/graph/naive_oracle.rs`） | SAS 2001 | `fuzz/fuzz_targets/dominators_differential.rs` | | Braun 实时 SSA（`src/ssa/mod.rs`） | 通过支配边界进行的 CFRWZ 后处理（`src/ssa/naive_oracle.rs`） | TOPLAS 1991 | `fuzz/fuzz_targets/ssa_differential.rs` | | `build_region_tree` 递归下降（`src/region/mod.rs`） | Cifuentes 可归约 CFG 结构化器（`src/region/naive_oracle.rs`） | Cifuentes 1994, ch. 6 | `fuzz/fuzz_targets/region_differential.rs` | 每种情况的模式都是相同的：生产代码和 oracle 不共享任何代码，使用不同的算法族，并且 fuzz target 断言每个 libFuzzer 生成的输入上的等价性。先例和谱系记录在模块级别的 doc-comments 中 —— 支配者 oracle 是在一次真实的静默错误支配者事件逃过 clippy、ASAN 和 panic-fuzz 之后添加的（`src/graph/naive_oracle.rs:18-21`）。第四个 fuzz target `fuzz_entropy_profile` 验证 `entropy_profile` 在任意字节输入下不会发生 panic。 ## 属性测试 proptest 用例与它们所覆盖的算法放在一起： - `tests/graph_proptest.rs` —— 针对 `Graph` 的 7 个 proptest：RPO 长度等于可达节点数；RPO 在非后向边上是拓扑排序的；entry 支配每个可达节点；支配关系是自反的、反对称的、传递的；支配者是确定性的。 - `tests/proptest_invariants.rs` —— `Graph::successors` / `predecessors` 对称性；SSA phi 参数计数与前驱计数匹配；SSA phi 参数变量在对应的前驱处定义。 - `tests/region_invariants.rs` —— `Sequence` 永不直接包含另一个 `Sequence`；循环头保留在其循环体内；基本块划分覆盖作用域（无丢失，除了自循环头外无重复）。 - `tests/base64_proptest.rs` —— `base64_decode` 拒绝 `len % 4 == 1` 并接受所有其他长度类别。 - `tests/region_depth_cap.rs`, `tests/region_builder.rs` —— 对抗性嵌套下的 `MAX_REGION_DEPTH = 512` 强制执行。 - `tests/finding_global_cap.rs`, `tests/keymaterial_modulus_cap.rs` —— finding 计数和 RSA 模数上限。本地验证： ``` cargo test -p droidsaw-common ``` ## Kani 证明 Tier-1 证明主体位于 `proofs/` 中，由 `#[cfg(kani)]` 控制并且对正常构建不可见。7 个文件中共有 17 个 harness： | 文件 | Harnesses | 属性 | |---|---|---| | `leb128_roundtrip.rs` | 4 | `write_uleb128 ∘ read_uleb128 ≡ id` 适用于所有 `u32`；对于 `i32` 对称；编码长度有界 | | `mutf8_codepoint.rs` | 4 | `decode_one_codepoint` 通过 Rust 标准库的 UTF-8 / UTF-16 编码器进行往返测试（独立 oracle —— 不同作者，不同位提取策略） | | `agaps.rs` | 5 | 否定证明：当前验证套件中的对抗性间隙（跨 `bound_count` 调用等累积放大） | | `base64_capacity.rs` | 1 | `base64_decode` 容量提示永不溢出并且受输入长度限制 | | `base64_decode_quartet.rs` | 1 | Base64 四重奏解码的 totality | | `is_string_byte_ascii_subset.rs` | 1 | ASCII 子集谓词正确性 | | `read_lp.rs` | 1 | 长度前缀字符串读取永不越界 | 这些 harness 针对 lint 基线（`clippy::unwrap_used` + `clippy::indexing_slicing` + `clippy::arithmetic_side_effects`）尚无法证明的规范不变量。 ## Lean 证明 `droidsaw-lean` 工作区 crate 中的 16 个定理涵盖了类型系统无法强制执行的关于图算法的结构事实： - **支配者** —— `Dominates.refl`、`Dominates.entry`、`Dominates.trans`、`Dominates.antisymm`、`Dominates.total`、`immediate_dom_unique`。 - **后支配者** —— `PostDominates.refl`、`PostDominates.antisymm`、`immediate_postdom_unique`；将旧的 `u32::MAX` sentinel 替换为 `PostDom::Node(N) | PostDom::Exit` 的可靠性。 - **路径** —— 关于支配者证明所使用的顶点成员资格和路径组成的 6 个路径引理。 - **格** —— `propagation_monotonic`，支持数据流不动点迭代的 Kleene 上升链结果。 ## 公共接口 Crate 根重新导出了关键标识（`src/lib.rs:77-101`）： ``` pub use graph::{ Graph, NaturalLoop, PostDom, back_edges, dominates, dominators, dominators_with_rpo, is_reducible, natural_loops, post_dominators, post_dominators_with_virtual_exit, reverse_post_order, /* … */ }; pub use ssa::{ /* via module: Builder, SsaCfg, … */ }; pub use region::{ /* via module: RegionNode, StmtBackend, RegionError, … */ }; pub use encoding::{ EncodingError, base64_decode, decode_mutf8, decode_one_codepoint, read_sleb128, read_uleb128, truncate_str, u16_le, u32_le, u64_le, }; pub use finding::{ Confidence, Finding, FindingProvenance, GaugeClass, Layer, Severity, Source, /* … */ }; ``` 每个公共类型都实现了 `serde::Serialize` + `serde::Deserialize`，以便它可以跨越 SQLite / NDJSON 导出边界。 ## 编译时基线 Crate 根为整个工作区的算法核心固定了 lint 规范（`src/lib.rs:3-32`）： ``` #![cfg_attr(not(test), deny( clippy::unwrap_used, clippy::expect_used, clippy::panic, clippy::unreachable, clippy::todo, clippy::arithmetic_side_effects, clippy::indexing_slicing, clippy::string_slice, clippy::cast_lossless, clippy::cast_possible_truncation, clippy::cast_sign_loss, clippy::cast_precision_loss, clippy::cast_possible_wrap, /* … */ ))] #![deny(missing_docs)] #![deny(unsafe_code)] ``` 三个模块带有作用域内的 `#![allow(clippy::arithmetic_side_effects, reason = "…")]`，并在原因字符串中包含已解除的证明义务：`encoding`（LEB128 / base64 / MUTF-8 / ASCII 大小写映射）、`region`（DFS 游标、标签计数器、深度上限）、`diag`（Hinnant 日期数学、dump-stage 计数器）。其他所有位置都使用 `checked_*` / `?` 规范。 ## 许可证 BSD-3-Clause。

标签：Rust, 云安全监控, 云资产清单, 反编译器, 可视化界面, 控制流分析, 编译器中间表达, 网络流量审计, 逆向工程, 通知系统, 静态分析