toneillcodes/DataHound

GitHub: toneillcodes/DataHound

模块化的 BloodHound OpenGraph 数据流水线引擎，支持多源数据采集、规范化转换和跨图关联。

Stars: 19 | Forks: 1

Graph The Planet

# 概述一个模块化的数据流水线引擎，旨在提取、规范化并关联数据到 BloodHound OpenGraph 框架中。 ## 快速入门与前置条件 DataHound 需要 Python 3.x 和 Pandas。 1. 克隆仓库 ``` git clone https://github.com/toneillcodes/DataHound.git cd DataHound ``` 2. 安装依赖 ``` pip install -r requirements.txt ``` ## 使用方法 ``` usage: DataHound.py [-h] --operation {collect,connect} --output OUTPUT [--source-kind SOURCE_KIND] [--config CONFIG] [--graphA GRAPHA] [--rootA ROOTA] [--idA IDA] [--matchA MATCHA] [--graphB GRAPHB] [--rootB ROOTB] [--idB IDB] [--matchB MATCHB] [--edge-kind EDGE_KIND] A versatile data pipeline engine that ingests information from diverse external sources and transforms the extracted node and edge data into the BloodHound OpenGraph format. options: -h, --help show this help message and exit General Options: --operation {collect,connect} Operation to complete. --output OUTPUT Output file path for graph JSON Collect Options: --source-kind SOURCE_KIND The 'source_kind' to use for nodes in the graph. --config CONFIG The path to the collection config file. Connect Options: --graphA GRAPHA Graph containing Start nodes. --rootA ROOTA Element containing the root of the node data (ex: nodes). --idA IDA Element containing the field to use as the start node ID (ex: id) from Graph A. --matchA MATCHA Element containing the field to match on in Graph A. --graphB GRAPHB Graph containing End nodes. --rootB ROOTB Element containing the field to match on in Graph B. --idB IDB Element containing the field to use as the end node ID (ex: id) from Graph B. --matchB MATCHB Element containing the field to match on in Graph B. --edge-kind EDGE_KIND Kind value to use when generating connection edges (ex: MapsTo). ``` ## 核心功能 DataHound 在两种不同的模式下运行：**collect** 和 **connect**。 ### ```collect```: **数据提取与规范化** collect 操作从外部源（API、数据库、文件）提取原始数据，执行初始转换（如列重命名和类型转换），并生成符合 BloodHound OpenGraph 格式的规范化节点和边数据。 #### 工作原理 1. 读取定义了源和转换规则的 JSON 配置文件。 2. 调用指定的数据源以收集原始数据。 3. 将原始数据转换为 Pandas DataFrame 以进行高效处理。 4. 通过调用转换方法创建最终的 BloodHound OpenGraph 节点和边。 #### Collect 使用方法 ``` python DataHound.py --operation collect \ --config /path/to/config.json \ --source-kind MyCustomSource \ --output my_transformed_graph.json ``` 使用 HTTP 模块进行 BHCE 收集的示例输出。 ``` $ python DataHound.py --operation collect --source-kind BHCE --config my-bloodhound-collection-definitions.json --output bhce-collection-exmaple.json [INFO] Successfully read config from: my-bloodhound-collection-definitions.json [INFO] Processing Item: Users (Type: node) [INFO] {"event": "HTTP_REQUEST_SUCCESS", "correlation_id": "c8205c99-2ebd-4494-926b-c9e760fc8cd4", "url": "http://127.0.0.1:8080/api/v2/bloodhound-users", "status_code": 200, "elapsed_seconds": 0.03598, "content_length": 16699} [INFO] Successfully processed 5 nodes. [INFO] Processing Item: Roles (Type: node) [INFO] {"event": "HTTP_REQUEST_SUCCESS", "correlation_id": "79c72ffd-f670-4a72-a69c-7c07ae14181a", "url": "http://127.0.0.1:8080/api/v2/roles", "status_code": 200, "elapsed_seconds": 0.012322, "content_length": 11990} [INFO] Successfully processed 5 nodes. [INFO] Processing Item: Permissions (Type: node) [INFO] {"event": "HTTP_REQUEST_SUCCESS", "correlation_id": "ffd005fc-19dc-4568-ba83-a4268aeaa9a9", "url": "http://127.0.0.1:8080/api/v2/permissions", "status_code": 200, "elapsed_seconds": 0.017549, "content_length": 4106} [INFO] Successfully processed 21 nodes. [INFO] Processing Item: SSO Providers (Type: node) [INFO] {"event": "HTTP_REQUEST_SUCCESS", "correlation_id": "eccb7a40-5f0d-42c0-b3d1-94c0f82c7c07", "url": "http://127.0.0.1:8080/api/v2/sso-providers", "status_code": 200, "elapsed_seconds": 0.012122, "content_length": 961} [INFO] Successfully processed 1 nodes. [INFO] Processing Item: User Roles Edges (Type: edge) [INFO] {"event": "HTTP_REQUEST_SUCCESS", "correlation_id": "f7b1a952-e482-4bf2-8caf-6dd1021d13d8", "url": "http://127.0.0.1:8080/api/v2/bloodhound-users", "status_code": 200, "elapsed_seconds": 0.01173, "content_length": 16699} [INFO] Successfully processed 5 edges. [INFO] Processing Item: Role Permissions Edges (Type: edge) [INFO] {"event": "HTTP_REQUEST_SUCCESS", "correlation_id": "6b6acc5d-f77f-4ab1-bef8-412ca69da669", "url": "http://127.0.0.1:8080/api/v2/roles", "status_code": 200, "elapsed_seconds": 0.015697, "content_length": 11990} [INFO] Successfully processed 55 edges. [INFO] Processing Item: User SSO Provider Edges (Type: edge) [INFO] {"event": "HTTP_REQUEST_SUCCESS", "correlation_id": "7c3c9644-22c7-4de2-a501-4a89e92388ae", "url": "http://127.0.0.1:8080/api/v2/bloodhound-users", "status_code": 200, "elapsed_seconds": 0.011963, "content_length": 16699} [INFO] Successfully processed 1 edges. [INFO] Writing graph to output file: bhce-collection-exmaple.json [INFO] Successfully Wrote graph to bhce-collection-exmaple.json [INFO] Done. $ ``` #### 支持的收集器 | Type | Description | |----|----| | CSV | 通用基于文件的 CSV 收集器 | | DPAPI | Windows DPAPI blob 和主密钥收集器 | | Host | 针对 Windows 和 Linux 计算机的通用主机收集器 | | HTTP | 通用 HTTP 收集器 | | JSON | 通用基于文件的 JSON 收集器 | | LDAP | 通用 LDAP 收集器 | | Nmap | Nmap XML 和 Gnmap 输出收集器 | | PE | Windows 可移植可执行 (PE) 文件格式收集器 | | SMB | Windows 服务器消息块 (SMB) 共享收集器 | | XML | 通用基于文件的 XML 收集器 | | YAML | 通用基于文件的 YAML 收集器 | * 请查阅 [收集器指南](CollectorGuide.md) 以获取正在开发的收集器的扩展列表、状态以及任何已知的限制或问题。 * 请查阅 [收集器配置指南](CollectorConfigurationGuide.md) 以了解有关 JSON 文件格式和现有收集器可用属性（例如 ```source_type```、```column_mapping```）的详细信息。 #### 参数 | Parameter | Argument Values | Description | |----|----|----| | --operation | collect | 要执行的主要功能。 | | --config | filename | 收集定义和转换定义。 | | --source-kind | source_kind | 在生成的 graph 中使用的 source_kind。 | | --output | filename | 结果 graph JSON 的输出文件路径。（默认值：output_graph.json） | ## 示例探索实际示例，查看 DataHound collect 操作的运行情况。 ### Collect 示例 - [BloodHound 收集器](examples/collection/bloodhound/README.md) - [LDAP 收集器](examples/collection/ldap/README.md) - [Nmap 收集器](examples/collection/nmap/README.md) ### ```connect```: **图关联与链接** connect 操作获取两个 JSON 文件（```--graphA``` 和 ```--graphB```），并在共享公共可关联属性的节点之间创建新的边。 #### 工作原理 1. 使用 Pandas DataFrame 执行外合并，基于指定属性（--matchA 和 --matchB）匹配节点。 2. 对于成功匹配的项，它会生成一个具有指定类型（--edge-kind）的新边对象，用于连接匹配的节点。 3. 将生成的边输出到一个新的 graph 文件中 #### Connect 使用方法将 BHCE graph 连接到 Azure 示例数据集的示例用法。 ``` python DataHound.py --operation connect \ --graphA dev\bhce-collection-20251204.json --rootA nodes --idA id --matchA properties.email \ --graphB entra_sampledata\azurehound_example.json --rootB data --idB data.id --matchB data.userPrincipalName \ --edge-kind MapsTo --output ..\bhce-connected-to-azure.json ``` 示例输出 ``` $ python DataHound.py --operation connect \ --graphA dev\bhce-collection-20251204.json --rootA nodes --idA id --matchA properties.email \ --graphB entra_sampledata\azurehound_example.json --rootB data --idB data.id --matchB data.userPrincipalName \ --edge-kind MapsTo --output ..\bhce-connected-to-azure.json [INFO] Correlating dev\bhce-collection-20251204.json (root: nodes) and entra_sampledata\azurehound_example.json (root: data) using keys 'properties.email' and 'data.userPrincipalName'. [INFO] Success! Output written to: ..\bhce-connected-to-azure.json [INFO] Successfully connected graphs with MapsTo edge kind. [INFO] Done. $ ``` #### 参数 | Parameter | Argument Values | Description | |----|----|----| | --operation | connect | 要执行的主要功能。 | | --graphA | filename | 要连接到 Graph B 的 Graph A 的文件名。 | | --rootA | NA | 包含要处理的节点数据的数据元素。 | | --idA | NA | 包含要在边输出中使用的节点 ID 的数据元素。 | | --matchA | NA | Graph A 中要匹配的参数名称。 | | --graphB | filename | 要连接到 Graph B 的 Graph A 的文件名。 | | --rootB | NA | 包含要处理的节点数据的数据元素。 | | --idB | NA | 包含要在边输出中使用的节点 ID 的数据元素。 | | --matchB | NA | Graph B 中要匹配的参数名称。 | | --edge-kind| NA | 生成的 JSON 使用的 edge kind 值。 | | --output | filename | 结果 graph JSON 的输出文件路径。（默认值：output_graph.json） | ## 示例探索实际示例，查看 DataHound 的运行情况。 ### Connect 示例 - 使用静态边连接两个示例 OG Graph - 将示例 OG Graph 与示例 AD 数据集连接 - 将示例 OG Graph 与示例 Azure 数据集连接 ## 待办与未来功能 * 带有日志记录的 Debug 或 verbose 消息 * 支持加密机密 * 基本认证 HTTP 收集器 * ~使用 CSV 和 JSON 格式的基于文件的收集器~ * 健壮的错误处理

标签：AD域安全, BloodHound, Checkov, CTI, ESC4, ETL, Homebrew安装, JavaCC, OpenGraph, OSINT, PE 加载器, Python, Web报告查看器, 代码示例, 威胁建模, 攻击路径分析, 数据分析, 数据管道, 无后门, 网络安全, 网络安全审计, 自动化收集, 节点关联, 软件工程, 逆向工具, 隐私保护