ForensicFoundry/ualforge

GitHub: ForensicFoundry/ualforge

将 Microsoft 365 统一审计日志 CSV 导出解析为 SQLite 数据库，并针对 BEC 调查优化字段索引的去重与查询工具。

Stars: 0 | Forks: 0

# ualforge 将 Microsoft 365 **统一审计日志 (UAL)** CSV 导出解析为单一的 SQLite 数据库，专为数字取证和**商业邮件妥协 (BEC)** 调查优化。每个 CSV 导出的每一行都原样保留，JSON `auditData` blob 以原始形式（逐字节）和规范形式存储，并且大量 BEC 相关字段被提升为 **已索引且规范化**的列，以便快速查询。重新导入相同的数据是安全的：全行 SHA-256 哈希用作去重键。配套脚本 `bec-triage` 使用此数据库生成 BEC 分诊报告。请参阅 `bec-triage` 仓库[`此处`](https://github.com/ForensicFoundry/bec-triage) ## v2 中的新增功能 (schema_version = 2) 数据库 `user_version = 2`。请参阅[从 v1 迁移](#migrating-from-v1)。 - **全面的提升器** - 跨 Microsoft `RecordType` 架构名称不同的每个字段（`ClientIP` vs `ClientIPAddress` vs `IPAddress`；`ApplicationId` vs `AppId`；顶层 `UserAgent` vs `ExtendedProperties[Name='UserAgent'].Value` vs 嵌入在 `ActorInfoString` 内）现在通过别名链和每种 RecordType 专门的解析器来解决。这弥补了 `MailItemsAccessed`（RecordType 50）事件的空白，在 v1 中这些事件会导致 `client_ip` / `user_agent` / `application_id` 为 NULL，尽管数据存在于规范 JSON 中。 - **摄入时规范化身份列：** - `client_ip` 去除端口（IPv4 `1.2.3.4:54321` -> `1.2.3.4`）并去除 IPv6 括号；原始原值保存在新的 `client_ip_raw` 列中。 - `user_principal_name` 转换为小写；原始大小写保存在新的 `user_principal_name_raw` 列中。`bec-triage` v2 不再需要 `LOWER(user_principal_name) = LOWER(?)` 谓词。 - **两个新的结构化子表**，在摄入时填充： - `mail_items_accessed` - 来自 `auditData.Folders[].FolderItems[]` 的每个访问的邮件项目一行（Microsoft 在每个审计行中聚合多次读取）。这让分析师能够用普通 `SELECT` 回答"actor 读取了哪些 InternetMessageIds？"。 - `consent_grants` - 每个（事件，目标应用）的 OAuth 同意和服务主体事件一行。`app_id`、`app_display_name`、`consent_type`、 `is_admin_consent` 和 `permission_scope` 是一流索引列，而不是埋在 `ModifiedProperties` JSON 中。 - **覆盖诊断。**每次摄入 / `--reextract` 运行都会计算并持久化 `client_ip` / `user_agent` / `application_id` / `user_principal_name` 填充百分比以及每条消息 / 同意授权计数到 `ingest_runs.coverage_json`。 `bec-triage` 在其报告横幅中显示这些信息，让分析师预先知道数据的完整程度。 - **`--reextract ` 子命令** 原地重新运行提升器针对数据库，无需重新读取任何源 CSV。这是将 v1 数据库升级到 v2（或在未来的提升器改进后刷新提升列）的方法——每个事件行都从 ualforge 始终保留的规范 JSON 重新派生。 ## 需求 - Python **3.13+**（仅使用 `from __future__ import annotations` 和现代标准库特性——无第三方依赖） - SQLite 3.35+（任何现代 Linux/macOS/Windows 都附带此版本） ## 可选，但建议使用 - [`uv`](https://docs.astral.sh/uv/) `uv` 是我首选的 Python 包和项目管理器。我鼓励您也考虑使用它。由于我使用 `uv`，安装说明将假设使用 `uv` 来创建和同步 Python venv。 ## 安装 ``` git clone https://github.com/ForensicFoundry/ualforge.git cd ualforge uv venv --python 3.13 ./.venv uv sync chmod +x ualforge sed -i "1c#!$(pwd)/.venv/bin/python" ualforge # 无需运行时依赖；无需安装其他任何东西 ``` ## 快速开始 ``` # 将 ./input/ 下任意位置的 UAL CSV 导入到 ./out/ualforge-YYYYMMDD.sqlite ./ualforge -o ./out ./input/ # 同上，但也会在数据库旁边写入 ualforge-YYYYMMDD-HHMMSS.log ./ualforge -o ./out -l ./input/ # 将更多 CSV 追加到现有的 ualforge 数据库中 ./ualforge -a ./out/ualforge-20260427.sqlite ./more-input/ ``` 如果您在同一天重新运行相同的输入，行会被去重，并且对于已存在的任何行，磁盘上的数据库保持不变。 ## 用法参考 ``` usage: ualforge [-h] [-v] (-o DIR | -a DB | --reextract DB) [-l] [INPUT_DIR] positional arguments: INPUT_DIR directory to recursively scan for UAL CSV files (required for -o / -a; ignored for --reextract) options: -h, --help show this help message and exit -v, --version print version and exit target (mutually exclusive, exactly one required): -o, --output DIR write to /ualforge-YYYYMMDD.sqlite (creates DIR if needed; same-day re-runs append to the same file) -a, --append DB append to a pre-existing ualforge database at PATH (must be a valid ualforge database, verified via PRAGMA application_id + ualforge_meta table + user_version) --reextract DB rebuild promoted columns + child tables in place from the preserved auditData JSON, without re-reading any source CSV. Used to upgrade a v1 database to v2, or to refresh promoted values after a future promoter improvement. logging: -l, --log write a ualforge-YYYYMMDD-HHMMSS.log next to the database capturing all stdout output for the run ``` 退出代码： - `0` 成功（即使某些行有 JSON 解析错误；这些会记录在 `parse_errors` 中） - `1` 运行时错误（无法读取输入、无法写入数据库等） - `2` 无效的 CLI 用法 / 追加目标验证失败 ## 输出：数据库文件 `./out/ualforge-YYYYMMDD.sqlite` 是启用了 WAL 模式的标准 SQLite 数据库。使用以下方式打开： - `sqlite3` CLI - DB Browser for SQLite (`sqlitebrowser`) - 任何支持 SQLite 的客户端（DBeaver、JetBrains DataGrip、DuckDB 等） - 配套的 **`bec-triage`** 脚本 - 请参阅 `bec-triage` 仓库[`此处`](https://github.com/ForensicFoundry/bec-triage) 同一天的重新运行追加到同一文件；文件名中的日期是**首次**摄入发生的日期。使用 `-a / --append` 将数据添加到在不同日期创建的数据库。数据库携带 SQLite `application_id` 为 `0x55414L48`（ASCII `UALH`）和 `user_version` 为 `2`（自 v2026.05.01 起；v1 数据库可以通过 `--reextract` 原地升级）。追加模式和 `bec-triage` 会检查这些值。 ## 配套脚本：bec-triage `bec-triage` 使用 `ualforge` 生成的数据库来生成彩色 BEC 分诊报告，包括：注意事项 / 方法论、工具指纹、行为异常筛选（UA 独立启发式）、候选被入侵的 UPN、OAuth 应用滥用、源 IP 分析、身份验证支点、泄露时间线、持久性（收件箱规则、传输规则和邮箱权限）指标、 OAuth 同意授权时间线、每条消息的泄露深度分析，以及可选的 `--per-upn` 聚焦报告。 `bec-triage` v2 期望 v2 数据库（`PRAGMA user_version == 2`）并且将拒绝在 v1 数据库上运行，并显示指向 `--reextract` 的明确错误。请参阅 `bec-triage` 自己的 README 了解用法和方法论。 ## 配套脚本：ual-normalize `ualforge` 强制执行严格的 13 列标题（规范的 Microsoft 365 / Purview Audit 门户导出架构）。请参阅 `ual-normalize` 仓库[`此处`](https://github.com/ForensicFoundry/ual-normalize) 野外的 UAL 证据通常以**其他**形式出现： | 来源格式 | 典型来源 | 外观 | |---|---|---| | **规范**（13 列）| Purview 门户 CSV 下载 / Microsoft Graph 审计导出转换器 | `id, createdDateTime, ..., auditData` | | **PowerShell**（约 13 列）| `Search-UnifiedAuditLog` cmdlet 输出通过管道传送到 `Export-Csv` | `RunspaceId, PSComputerName, CreationDate, UserIds, Operations, RecordType, AuditData, ResultIndex, ResultCount, Identity, IsValid, ObjectState` | | **Splunk 重新导出**（40+ 列）| Splunk 对索引化的 PowerShell UAL 源的 `outputcsv` | PowerShell 列 + `_bkt, _cd, _raw, _si, _sourcetype, _time, splunk_server, ...` | | **仅 AuditData** | 自定义工具、第三方 SIEM 转储 | 单个 `AuditData` 列（或该列加上任意不相关列）| 所有这些都在 `AuditData` 列中嵌入相同的每事件 JSON，而该 JSON 正是 `ualforge` 的提升列和 `bec-triage` 的分析实际依赖的。 `ual-normalize` 读取四种形式中的任何一种，将每行重新投影到规范 13 列（当周围列缺失或命名不同时，从 JSON 派生 `id`、`createdDateTime`、 `auditLogRecordType`、`operation`、`organizationId`、`userType`、`userId`、`service`、 `objectId`、`userPrincipalName` 和 `clientIp`），并写入 `ualforge` 可直接接受的干净 CSV。 ## 示例查询以下是 `bec-triage` 执行的一些示例查询。 ``` -- All FileDownloaded events for a UPN, sorted by time SELECT created_at_utc, client_ip, user_agent, object_id FROM events WHERE user_principal_name = 'user@example.com' AND operation = 'FileDownloaded' ORDER BY created_at_utc; -- Top user agents by event count SELECT user_agent, COUNT(*) AS hits FROM events WHERE user_agent IS NOT NULL GROUP BY user_agent ORDER BY hits DESC LIMIT 20; -- Inbox rule changes per UPN per day SELECT user_principal_name, substr(created_at_utc, 1, 10) AS day, operation, COUNT(*) AS hits FROM events WHERE operation IN ('New-InboxRule','Set-InboxRule','Remove-InboxRule') GROUP BY user_principal_name, day, operation ORDER BY day, hits DESC; -- Drill into the raw rule contents SELECT created_at_utc, user_principal_name, operation, json_extract(audit_data_canonical, '$.Parameters') FROM events WHERE operation IN ('New-InboxRule','Set-InboxRule') ORDER BY created_at_utc DESC LIMIT 50; -- Verify provenance for a specific row SELECT source_file, source_line, export_batch, ingested_at, parse_error FROM events WHERE id = '...the GUID from the original CSV...'; -- Audit the ingest run history SELECT run_id, started_at, ended_at, files_processed, rows_inserted, rows_duplicate, rows_with_errors, status FROM ingest_runs ORDER BY run_id; ``` ## 更多信息如需更多、更深入的信息，请阅读 `MANUAL.md`。 ## 许可证 GPL-3.0-or-later

标签：Azure AD, BEC, CSV解析, DAST, Exchange Online, IP规范化, JSON解析, Microsoft 365, Office 365, SHA-256, SQLite, UAL, 云取证, 取证调查, 商业邮件妥协, 域环境安全, 子域名变形, 审计日志, 恶意软件分析, 数字取证, 数据去重, 数据库导入, 用户身份规范化, 统一审计日志, 网络安全, 网络安全审计, 自动化脚本, 逆向工具, 邮件安全, 钓鱼调查, 隐私保护