emixor/pysoc

GitHub: emixor/pysoc

一个基于纯 Python 的模块化轻量级 SOC 平台，为零运行时依赖环境提供日志解析、安全检测与报告生成能力。

Stars: 0 | Forks: 0

# PySOC — 一个本地优先的迷你安全运营中心 [![Python 3.10+](https://img.shields.io/badge/python-3.10%2B-blue.svg)](https://www.python.org/downloads/) [![测试](https://img.shields.io/badge/tests-80%20passing-brightgreen.svg)](#how-i-validated-this) [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE) [![TDD](https://img.shields.io/badge/methodology-TDD-orange.svg)](#how-i-validated-this) [![OWASP 对齐](https://img.shields.io/badge/OWASP-aligned-purple.svg)](docs/DETECTION_RULES.md) [![零运行时依赖](https://img.shields.io/badge/runtime%20deps-0-success.svg)](pyproject.toml) PySOC 的构建旨在展示专业的软件工程实践 **以及**实用的检测工程判断力 —— 测试驱动开发 (TDD)、模块化架构、OWASP 对齐的检测内容、SANS 风格的威胁狩猎报告，以及明确的误报处理策略。 ## 目录 1. [为什么选择 PySOC？（解决的问题）](#why-pysoc-issues-it-solves) 2. [PySOC 能检测什么](#what-pysoc-catches) 3. [架构](#architecture) 4. [安装说明](#installation) 5. [用法](#usage) 6. [我如何验证](#how-i-validated-this) 7. [误报处理](#false-positive-handling) 8. [仓库布局](#repository-layout) 9. [文档](#documentation) 10. [路线图](#roadmap) 11. [贡献指南](#contributing) 12. [License](#license) ## 为什么选择 PySOC？（解决的问题）中小型安全团队面临着一系列反复出现的问题，而商业 SIEM 要么过度设计，要么完全忽视它们： | 问题 | PySOC 如何解决 | |---|---| | **供应商锁定** — 大多数 SIEM 需要专属的 agent、专属的查询语言和专属的仪表板格式。 | PySOC 读取**普通文件**（`.log`、`.json`、`.jsonl`）并输出**开放格式**（JSON + 静态 HTML）。无需 agent，无专有语言。 | | **成本高昂** — 按 GB/天计费的模式迫使团队减少日志收集量。 | PySOC 可以在一台 5 美元的 VM 上针对本地存储的日志运行。零运行时依赖；可轻松移植到气隙环境中。 | | **检测逻辑不透明** — 供应商规则隐藏在 UI 中；你无法阅读它们。 | 每一个 PySOC 规则都是带有注释的纯 Python 代码，包含 MITRE ATT&CK 映射和有文档记录的误报策略。参见 [`docs/DETECTION_RULES.md`](docs/DETECTION_RULES.md)。 | | **缺乏测试的检测内容** — 供应商发布规则时不带测试；你无法判断它们在内容升级后是否仍然有效。 | 每个检测器都附带单元测试和集成测试，证明它能在真实的模拟数据上触发。参见 [`tests/`](tests/) 和 [`我如何验证`](#how-i-validated-this) 章节。 | | **格式泛滥** — Windows EVTX、Linux syslog、Nginx combined、Apache combined、自定义 JSON —— 每一个来源都在使用不同的方言。 | PySOC 在一个地方（`src/pysoc/models.py`）将每条记录标准化为受 ECS 启发的单一 schema，因此检测器无需了解日志格式。 | | **无误报策略** — 检测器触发后，告警队列被淹没，分析师疲于奔命。 | 每个检测器都带有一个明确的 `note` 字段，描述常见的误报以及如何进行分诊。参见 [`docs/FALSE_POSITIVES.md`](docs/FALSE_POSITIVES.md)。 | | **缺乏演示数据** — “向我展示它是如何工作的”变成了长达 30 分钟的屏幕共享。 | PySOC 附带了一个确定性的合成日志生成器（`data/generator/generate_logs.py`），可生成恶意和良性的流量，这样招聘人员只需运行 `make demo` 就能在 5 秒内看到真实的告警。 | ## PySOC 能检测什么 PySOC 开箱即用，包含四个生产级别的检测规则： | 规则 ID | 名称 | 来源 | MITRE ATT&CK | 严重程度 | 检测内容 | |---|---|---|---|---|---| | **BF-001** | 暴力破解登录 (SSH/Windows) | Linux `auth.log`，Windows 4624/4625 | T1110 | HIGH | 在滑动窗口内，同一 IP 对同一用户的失败登录次数 ≥ N。 | | **SP-001** | 可疑进程执行 | Windows 4688 | T1059.001, T1003, T1218, T1204 | HIGH/CRITICAL | 编码的 PowerShell、下载载体、mimikatz、procdump、可疑的父→子进程（Office→PowerShell）、certutil LOLBin。 | | **WA-001** | Web 攻击模式 (OWASP Top-10) | Nginx / Apache 访问日志 | T1190, T1059.007, T1083 | HIGH/CRITICAL | SQLi（UNION、OR 注释、sleep）、XSS（script 标签、事件处理程序、javascript: URI）、路径遍历（`../`、编码的 `/etc/passwd`）、命令注入、SSRF 探测（云元数据 endpoint）、RFI。 | | **IT-001** | 不可能移动 (地理位置速度) | 任何成功的登录 | T1078 | MEDIUM | 同一用户在两个国家登录，而在经过的时间内无法在物理上跨越这两个国家之间的距离（默认：隐含速度 > 900 公里/小时）。 | 每条规则都在 [`docs/DETECTION_RULES.md`](docs/DETECTION_RULES.md) 中有深入的文档说明。 ## 架构 PySOC 被设计为一个经典的四阶段流水线： **摄取 → 解析 → 检测 → 报告**。 ``` flowchart LR subgraph Sources[Log Sources] A1[Linux auth.log] A2[Nginx access.log] A3[Apache access.log] A4[Windows EVTX/JSON] A5[JSON-lines] end subgraph Ingest[1. Ingest] I[File walker + extension sniffing] end subgraph Parse[2. Parse + Normalise] P1[LinuxAuthParser] P2[NginxParser] P3[ApacheParser] P4[WindowsJsonParser] P5[JSONLinesParser] S[(ECS Event schema)] P1 --> S P2 --> S P3 --> S P4 --> S P5 --> S end subgraph Detect[3. Detect] D1[BruteForceDetector] D2[SuspiciousProcessDetector] D3[WebAttackDetector] D4[ImpossibleTravelDetector] end subgraph Report[4. Report] R1[JSONReporter] R2[HTMLReporter] end A1 --> I A2 --> I A3 --> I A4 --> I A5 --> I I --> P1 I --> P2 I --> P3 I --> P4 I --> P5 S --> D1 S --> D2 S --> D3 S --> D4 D1 --> R1 D2 --> R1 D3 --> R1 D4 --> R1 D1 --> R2 D2 --> R2 D3 --> R2 D4 --> R2 Geo[(GeoIP helper)] -.-> D4 ``` ### 设计原则 1. **无状态解析器，有状态检测器。** 解析器每次看到相同的输入时，必须产生相同的 `Event` 流。检测器可以在*单个 `analyze()` 调用内*保持状态，但必须是确定性的。 2. **不可变模型。** `Event` 和 `Alert` 是 `frozen=True` 的 dataclass —— 没有检测器可以修改传输中的事件。 3. **零运行时依赖。** 仅需 Python 标准库。这使得安装占用空间极小，并且让 PySOC 能轻松移植到气隙环境中。 4. **受 ECS 启发的 schema。** `Event` dataclass 是 Elastic Common Schema 的一个实用子集；添加新字段是向后兼容的，因为每个字段都有默认值。 5. **可插拔的检测器。** 每个检测器都是一个小类，继承自 `BaseDetector` 并实现 `analyze(events)`。只需不到 50 行代码即可添加新规则；流水线会自动发现并加载它们。 ## 安装说明 PySOC 运行在 **Python 3.10+** 上，并且具有**零运行时依赖**。 ### 从源码安装（推荐用于此仓库） ``` git clone cd pysoc # 创建虚拟环境（可选但推荐） python -m venv .venv source .venv/bin/activate # Linux/macOS # .venv\Scripts\activate # Windows # 以 editable 模式安装 PySOC + dev dependencies pip install -e ".[dev]" ``` ### 验证安装 ``` pytest # 80 tests should pass python -m pysoc list-rules # Print all registered detection rules ``` ## 用法 ### 快速开始（5 秒内产生第一个告警） ``` # 1. 生成合成 mock logs（data/raw/ 中的 5 个文件） python -m pysoc generate --out data/raw # 2. 针对所有生成的文件运行完整的 pipeline python -m pysoc run data/raw/auth.log data/raw/nginx_access.log \ data/raw/apache_access.log \ data/raw/windows_events.json \ data/raw/impossible_travel.jsonl \ --json-out data/output/report.json \ --html-out data/output/report.html # 3. 在浏览器中打开 HTML dashboard open data/output/report.html # macOS xdg-open data/output/report.html # Linux ``` 预期输出（简略）： ``` PySOC run complete. Events analysed : 86 Alerts raised : 30 high : 28 medium : 1 critical : 1 JSON report : data/output/report.json HTML report : data/output/report.html ``` ### 针对真实日志运行 ``` # Linux SSH brute-force 检测 python -m pysoc run /var/log/auth.log --json-out report.json # Nginx web-attack 检测 python -m pysoc run /var/log/nginx/access.log --html-out dashboard.html # Windows events（在使用 Get-WinEvent | ConvertTo-Json 导出后） python -m pysoc run windows_events.json # 如果 auto-detection 失败，强制指定特定的 parser python -m pysoc run weird-named-file.xyz --parser nginx ``` ### 将 PySOC 作为库使用 ``` from pysoc import run_pipeline result = run_pipeline( ["data/raw/auth.log", "data/raw/nginx_access.log"], json_out="report.json", html_out="report.html", ) for alert in result["alerts"]: print(f"[{alert.severity.value:>8}] {alert.rule_id} {alert.description}") ``` 查看 [`examples/`](examples/) 获取更多信息。 ### Makefile 快捷方式 ``` make install # pip install -e ".[dev]" make test # pytest -v make demo # generate data + run pipeline + open dashboard make lint # ruff / flake8 (if installed) make clean # remove build artefacts and generated data ``` ## 我如何验证 PySOC 是在严格的**测试驱动开发 (TDD)** 下开发的：在实现*之前*编写测试，并不断迭代实现，直到所有测试通过。验证策略分为三个层级： ### 第 1 层 — 单元测试（65 个测试）位于 [`tests/unit/`](tests/unit/)。每个单元测试都独立测试单个类或函数，使用手工制作的内存 `Event` 固件。示例： - `test_detect_brute_force.py` — 验证滑动窗口逻辑在达到阈值时触发，在阈值以下时不触发，并按用户/IP 区分突发请求。 - `test_detect_suspicious_process.py` — 验证编码的 PowerShell 解码、mimikatz 检测、certutil LOLBin 以及 Word→PowerShell 父/子进程检测。 - `test_detect_web_attacks.py` — 验证 SQLi、XSS、路径遍历、SSRF 模式；验证多类型匹配会提升严重程度。 - `test_detect_impossible_travel.py` — 验证地理位置速度计算、同国家抑制和内部 IP 过滤。 - `test_parsers.py` — 每个解析器包含一个正常路径测试和一个异常路径测试。 - `test_models.py` — 验证 `Event` 的不可变性、指纹的稳定性以及 `Severity` 的排序。 ### 第 2 层 — 集成测试（10 个测试）位于 [`tests/integration/`](tests/integration/)。端到端测试 (`test_end_to_end.py`) 执行以下操作： 1. 作为子进程调用数据生成器，生成 5 个模拟日志文件。 2. 对所有 5 个文件运行完整的 PySOC 流水线（`run_pipeline`）。 3. 断言**每条规则至少触发一次**（BF-001、SP-001、WA-001、IT-001）—— 即这些模拟攻击确实被检测到了。 4. 断言 JSON 和 HTML 报告已写入且格式正确。 5. 断言流水线是**幂等的** —— 运行两次会产生相同的告警数量。 ### 第 3 层 — 合成数据生成器位于 [`data/generator/generate_logs.py`](data/generator/generate_logs.py)。生成确定性的、**无害的**模拟数据（绝不生成可执行代码 —— 仅生成日志行和 JSON 记录）。该生成器模拟： - SSH 暴力破解（来自 `203.0.113.5` 的 `root` 账户的 8 次失败 + 成功的撞库后续攻击）。 - Web SQLi、XSS、路径遍历、SSRF、命令注入探测。 - 针对 `Administrator` 的 Windows 4625 暴力破解。 - 编码的 PowerShell（带有 base64 payload 的 `-EncodedCommand`，解码后为 `Write-Host 'pysoc-test: harmless encoded payload'`）。 - Mimikatz 调用。 - Word → PowerShell 宏恶意软件模式。 - 不可能移动（alice 于 14:00 从美国登录，于 14:30 从中国登录）。 ### 自行运行验证 ``` # 完整验证：80 项测试在 <1 秒内完成 pytest -v # 包含 coverage report pytest --cov=pysoc --cov-report=term-missing # 仅运行 end-to-end integration tests pytest tests/integration -v ``` 示例输出： ``` ============================= test session starts ============================== platform linux -- Python 3.12.13, pytest-9.0.2 collected 80 items tests/integration/test_data_generator.py ... [ 3%] tests/integration/test_end_to_end.py ........ [ 13%] tests/unit/test_detect_brute_force.py ...... [ 21%] tests/unit/test_detect_impossible_travel.py ...... [ 28%] tests/unit/test_detect_suspicious_process.py ....... [ 37%] tests/unit/test_detect_web_attacks.py ......... [ 48%] tests/unit/test_ingest.py ........ [ 58%] tests/unit/test_models.py .......... [ 71%] tests/unit/test_parsers.py .................. [ 93%] tests/unit/test_report.py ..... [100%] ============================== 80 passed in 0.47s ============================== ``` ## 误报处理每个 PySOC 检测器在其告警上下文中都带有一个明确的 `note` 字段，描述最常见的误报以及如何进行分诊。完整的策略记录在 [`docs/FALSE_POSITIVES.md`](docs/FALSE_POSITIVES.md) 中。 | 规则 | 常见误报 | PySOC 的策略 | |---|---|---| | **BF-001** | 使用错误凭据的负载均衡器健康检查；使用过期密码重试的脚本。 | 发出带有 `note` 字段的告警；分析师将其与同一 IP 随后的成功登录进行关联。 | | **SP-001** | 合法管理员使用编码的 PowerShell；调用 PowerShell 的已签名供应商安装程序。 | 解码 payload 并将其包含在告警上下文中；分析师可以立即判断意图。 | | **WA-001** | 安全扫描器（Nessus、Burp）；激进的 WAF 探测。 | 包含源 IP、User-Agent 和完整 URL；分析师可以将已知的扫描器 IP 加入白名单。 | | **IT-001** | 通过多个 POP 出口的企业 VPN；在蜂窝网络和 Wi-Fi 之间切换的移动设备。 | 发出带有隐含速度的告警；分析师将其与 MFA 挑战响应进行关联。 | PySOC 还在每份报告中发布每条规则的**估计真阳性率先验概率** （参见 JSON 输出中的 `summary.true_positive_estimates`）。这些是从公开的事件响应数据中得出的先验概率，而不是从当前运行中测量出来的。 ## 仓库布局 ``` pysoc/ ├── README.md # This file ├── LICENSE # MIT ├── CONTRIBUTING.md # How to contribute ├── CODE_OF_CONDUCT.md # Contributor Covenant ├── SECURITY.md # Vulnerability disclosure ├── CHANGELOG.md # Semantic versioning changelog ├── pyproject.toml # PEP 621 project metadata + pytest config ├── requirements.txt # Pinned dev requirements (for CI) ├── requirements-dev.txt # Pinned dev requirements ├── Makefile # install / test / demo / lint / clean ├── .gitignore ├── .env.example ├── .github/ │ └── workflows/ │ └── ci.yml # GitHub Actions: pytest on push/PR ├── docs/ │ ├── ARCHITECTURE.md # Deep-dive on the pipeline design │ ├── DETECTION_RULES.md # Per-rule documentation │ ├── FALSE_POSITIVES.md # FP handling strategy │ ├── ROADMAP.md # What's next │ └── DEVELOPMENT.md # How to add a new detector ├── data/ │ ├── generator/ │ │ ├── __init__.py │ │ ├── generate_logs.py # Synthetic mock-log generator │ │ └── README.md │ ├── raw/ # Generated logs (gitignored) │ ├── sample/ # Committed sample inputs │ └── output/ # Generated reports (gitignored) ├── examples/ │ ├── run_pysoc.py # Library usage example │ └── custom_rule.py # How to add a custom detector ├── screenshots/ │ └── README.md # How to regenerate dashboard screenshots ├── scripts/ │ ├── scaffold.py # Create the directory structure │ └── run_all.sh # End-to-end demo script ├── src/ │ └── pysoc/ │ ├── __init__.py │ ├── __main__.py # python -m pysoc │ ├── cli.py # Argparse CLI │ ├── models.py # Event / Alert / Severity (ECS-inspired) │ ├── geo.py # Pseudo-GeoIP + haversine │ ├── ingest.py # File walker + parser dispatch │ ├── pipeline.py # run_pipeline() orchestrator │ ├── parsers/ │ │ ├── __init__.py # Registry │ │ ├── base.py │ │ ├── linux_auth.py │ │ ├── nginx.py │ │ ├── apache.py │ │ ├── json_parser.py │ │ └── windows_json.py │ ├── detect/ │ │ ├── __init__.py # Registry │ │ ├── base.py │ │ ├── brute_force.py │ │ ├── suspicious_process.py │ │ ├── web_attacks.py │ │ └── impossible_travel.py │ └── report/ │ ├── __init__.py │ ├── base.py │ ├── json_reporter.py │ └── html_reporter.py └── tests/ ├── conftest.py # Shared fixtures ├── unit/ │ ├── test_parsers.py │ ├── test_models.py │ ├── test_ingest.py │ ├── test_detect_brute_force.py │ ├── test_detect_suspicious_process.py │ ├── test_detect_web_attacks.py │ ├── test_detect_impossible_travel.py │ └── test_report.py └── integration/ ├── test_data_generator.py └── test_end_to_end.py ``` ## 文档 | 文档 | 内容 | |---|---| | [`docs/ARCHITECTURE.md`](docs/ARCHITECTURE.md) | 深入探讨四阶段流水线、schema 设计和检测器模型。 | | [`docs/DETECTION_RULES.md`](docs/DETECTION_RULES.md) | 各规则参考：触发条件、MITRE ATT&CK 映射、告警示例、调整参数。 | | [`docs/FALSE_POSITIVES.md`](docs/FALSE_POSITIVES.md) | 各规则的误报策略，以及具体的分诊手册。 | | [`docs/ROADMAP.md`](docs/ROADMAP.md) | 后续计划：Sigma 规则导入、MaxMind GeoLite2、ECS 对齐、Kafka 摄等。 | | [`docs/DEVELOPMENT.md`](docs/DEVELOPMENT.md) | 如何用 50 行以内的代码添加新检测器（TDD 方案）。 | ## 路线图 PySOC 的范围是有意限定的；目标是建立一个经过精心打磨、充分测试的基础 —— 而不是与 Splunk 实现功能对等。完整列表请参见 [`docs/ROADMAP.md`](docs/ROADMAP.md)。重点内容： - **Sigma 规则导入** — 从开放的 Sigma 格式加载检测规则。 - **真实的 GeoLite2** — 用 MaxMind 的 GeoLite2 替换合成的地理位置映射。 - **实时 tail 模式** — 运行 `python -m pysoc tail /var/log/auth.log` 进行实时检测。 - **流关联** — 用于多阶段攻击的滑动窗口关联器。 - **ECS 对齐** — 将 `Event` 扩展为更完整的 ECS 子集，以便与 Elastic / OpenSearch 实现互操作。 - **EVTX 原生读取器** — 集成 `python-evtx` 以直接读取 `.evtx` 文件（无需预先通过 PowerShell 导出）。 - **威胁情报富化** — 对源 IP 进行 Virustotal / AbuseIPDB 查询。 - **Webhook 告警** — 将告警 POST 到 Slack / MS Teams / Discord。 ## License MIT —— 参见 [`LICENSE`](LICENSE)。

标签：Homebrew安装, Python, 多模态安全, 安全运营中心, 无后门, 红队行动, 网络映射, 逆向工具