seerflow/seerflow

GitHub: seerflow/seerflow

一个轻量级流式日志智能代理，通过传统 ML 集成与 Sigma 规则实现跨日志源的异常检测和安全威胁识别。

Stars: 0 | Forks: 1

# Seerflow 一个流式、以实体为中心的日志智能代理，可跨日志源检测操作故障和安全威胁。结合传统 ML（快速、成本低）进行批量检测，并使用 Sigma 规则（3,000+ 社区检测）来识别已知的威胁模式。 ## 状态 **Alpha** — 完整的摄取 + 检测 + Sigma 规则管道已投入运行。 [![CI](https://static.pigsec.cn/wp-content/uploads/repos/2026/04/1e742f3ad0152245.svg)](https://github.com/seerflow/seerflow/actions/workflows/ci.yml) [![PyPI](https://img.shields.io/pypi/v/seerflow)](https://pypi.org/project/seerflow/) [![Python 3.13+](https://img.shields.io/badge/python-3.13%2B-blue)](https://www.python.org/) [![License: AGPL-3.0](https://img.shields.io/badge/license-AGPL--3.0-green)](LICENSE) ## 快速开始 ``` # 从源代码安装 git clone https://github.com/seerflow/seerflow.git cd seerflow uv sync # 复制并编辑示例 config cp seerflow.example.yaml seerflow.yaml # 启动 pipeline uv run python -m seerflow start ``` ### 命令行 ``` # 使用默认 config 启动（当前目录下的 seerflow.yaml） uv run python -m seerflow start # 使用特定的 config 文件启动 uv run python -m seerflow --config /path/to/seerflow.yaml start # 显示版本 uv run python -m seerflow --version ``` ### Docker ``` # 使用 SQLite 默认设置构建并运行（零配置） docker compose up -d # 使用 PostgreSQL 运行（先设置密码） export POSTGRES_PASSWORD=your-secure-password docker compose --profile postgres up -d # 或者从 registry 镜像独立运行 docker run -p 8080:8080 -p 4317:4317 -p 514:514/udp seerflow/seerflow # 挂载自定义 config docker run -v ./seerflow.yaml:/app/seerflow.yaml:ro seerflow/seerflow ``` ### 它的功能 1. **摄取** 同时从多个来源（syslog、OTLP gRPC/HTTP、文件追踪、webhooks）获取日志 2. **解析** 每一行日志，使用 Drain3（模板提取）和正则表达式实体提取（IP、用户、主机、文件、域名、进程） 3. **解析实体** 为确定性的 UUID5 ID，用于跨来源关联 4. **评分** 使用 ML 集成对事件进行评分：Half-Space Trees（内容）、Holt-Winters（体量）、CUSUM（变化）、Markov chains（序列）——通过 z-normalization 混合 5. **阈值** 使用 biDSPOT（基于 EVT 的自动阈值——无需手动调优）对分数进行阈值判断 6. **评估** 63 个内置的 Sigma 规则（Linux、Web、DNS、进程、网络）并带有 MITRE ATT&CK 标签 7. **绘制图表** 使用 igraph 构建实体关系图——PageRank、Louvain、fan-out、betweenness centrality 8. **累积** 通过指数衰减累积每个实体的风险——捕捉缓慢燃烧的多步骤攻击 9. **告警** 针对异常、Sigma 匹配和风险阈值超标发出警报 10. **持久化** 将所有事件、警报、图边缘和 ML 模型状态保存到 SQLite ### 示例：检测 Syslog 中的异常 ``` # seerflow.yaml receivers: syslog_enabled: true syslog_udp_port: 5514 # use high port to avoid root otlp_grpc_enabled: false otlp_http_enabled: false webhook_enabled: false detection: hst_window_size: 100 # lower for faster calibration dspot: calibration_window: 200 risk_level: 0.01 # more sensitive for testing ``` ``` # 终端 1：启动 Seerflow uv run python -m seerflow start # 终端 2：发送正常流量 for i in $(seq 1 300); do echo "<134>1 2026-03-24T19:00:00Z web nginx $i - - GET /api/v$((i%5)) 200 ${i}ms" \ | nc -u -w1 127.0.0.1 5514 done # 终端 2：发送 anomalies echo '<11>1 2026-03-24T19:01:00Z db postgres 999 - - FATAL connection limit exceeded 847/100' \ | nc -u -w1 127.0.0.1 5514 ``` 输出： ``` INFO Seerflow 0.3.0 starting INFO Receivers: syslog INFO Pipeline running — Ctrl+C to stop WARNING ANOMALY [syslog] score=0.952 threshold=0.009 dir=upper WARNING template: [7] <*> <*> postgres <*> - - FATAL connection limit exceeded <*> WARNING message: <11>1 2026-03-24T19:01:00Z db postgres 999 - - FATAL connection limit exceeded 847/100 WARNING entities: 203.0.113.1 ``` ### 关闭摘要按下 Ctrl+C 以查看会话统计信息： ``` INFO --- Session Summary --- INFO Events processed: 312 INFO Anomalies detected: 10 INFO Unique templates: 7 INFO Duration: 45.3s INFO Throughput: 7 events/sec INFO Seerflow stopped ``` ## 配置有关完整的配置参考，请参阅 [SETTINGS.md](SETTINGS.md)。所有设置均为可选——Seerflow 使用合理的默认值运行（零配置）。关键配置部分： - **receivers** -- syslog、OTLP gRPC/HTTP、文件追踪、webhooks（启用/禁用 + 端口） - **detection** -- HST 窗口大小、DSPOT 校准、评分权重、自定义 Sigma 规则目录 - **storage** -- SQLite（默认）或 PostgreSQL - **alerting** -- 去重窗口、webhook/PagerDuty 目标 ## 接收器 | Receiver | Port | Protocol | Status | |----------|------|----------|--------| | Syslog UDP/TCP | 514 (5514) | RFC 5424/3164 | Done | | OTLP gRPC | 4317 | Protobuf | Done | | OTLP HTTP | 4318 | Protobuf + JSON | Done | | File tailing | -- | Glob + watchfiles | Done | | Webhooks | 8081 | JSON/form + auth | Done | ## 检测管道 ``` Log Sources → Receivers → Drain3 → UUID5 Entities → ML Ensemble → Sigma Rules ↓ ↓ ↓ Entity Graph blended score ATT&CK tags Window Buffer [0.0 - 1.0] tactic/technique Risk Register ↓ ↓ ↓ Risk Accumulation → Alert PageRank, Louvain Fan-out, Betweenness ``` - **Drain3**: 流式日志模板提取（120K msgs/sec） - **UUID5 Entity Resolution**: 确定性的跨源实体 ID（相同实体 = 相同 UUID） - **Half-Space Trees**: 通过 River 进行内容异常检测（恒定时间/内存） - **Holt-Winters**: 体量异常检测（趋势 + 季节性分解） - **CUSUM**: 变化点检测（双向累积和） - **Markov Chains**: 序列异常检测（每个实体的转换矩阵） - **biDSPOT**: 双向 EVT 自动阈值（上突刺 + 下突降） - **DetectionEnsemble**: 编排所有检测器 + 每个来源的混合评分 - **Sigma Engine**: 63 个内置 SigmaHQ 规则，带有 logsource 索引分派 - **Entity Graph**: igraph 支持的关系图，具有类型化边缘 + 6 种算法 - **Risk Accumulation**: 每个实体的风险寄存器，带有指数衰减 + 可配置阈值 - **Sliding Window**: 每个实体的事件缓冲区，带有基于水印的迟到容忍度 ## 开发要求 Python 3.13+ 和 [uv](https://docs.astral.sh/uv/)。 ``` # 安装 dependencies uv sync # 运行 tests uv run pytest # 运行 quality gates uv run ruff check . && uv run ruff format --check . && uv run mypy src/ && uv run bandit -r src/ -c pyproject.toml && uv run pytest --cov=src/seerflow --cov-fail-under=95 ``` ### 项目结构 ``` src/seerflow/ __main__.py # CLI entry point (config → pipeline → detection → storage) cli.py # argparse (--config, --version) config.py # YAML config loader with ${ENV_VAR} interpolation models/ # SeerflowEvent, Alert, entity structs (msgspec) storage/ protocols.py # Protocol interfaces (LogStore, AlertStore, ModelStore, EntityStore) sqlite.py # SQLite backend (WAL, FTS5, WriteBuffer) migrations.py # Schema versioning + forward-only migration runner receivers/ base.py # RawEvent dataclass, Receiver protocol manager.py # ReceiverManager (bounded queue, backpressure, shutdown) syslog.py # UDP/TCP syslog (RFC 5424/3164) otlp_grpc.py # OTLP gRPC receiver (protobuf LogRecord) otlp_http.py # OTLP HTTP receiver (/v1/logs, protobuf + JSON) file_tail.py # File tailing (glob, rotation, checkpoint) webhook.py # Webhooks (JSON/form, field mapping, auth) parsing/ drain.py # Drain3 wrapper for template extraction entities.py # Regex entity extraction (6 types, params-aware tagging) normalizer.py # EventNormalizer: RawEvent → SeerflowEvent detection/ protocols.py # Detector Protocol (score, learn, serialize, deserialize) hst.py # Half-Space Trees detector (River) threshold.py # biDSPOT auto-threshold (scipy GPD) ensemble.py # DetectionEnsemble orchestrator (4 detectors + blended scoring) sigma/ engine.py # SigmaEngine: rule loading, logsource dispatch, evaluation matcher.py # Custom detection matcher (condition tree walker, regex cache) pipeline.py # pySigma processing pipeline (22 field mappings) attack.py # MITRE ATT&CK tactic/technique extraction bundled.py # Bundled rule path discovery (importlib.resources) loader.py # Custom rule directory discovery + validation rules/ # 63 curated SigmaHQ YAML rules (linux, web, dns, process, network) graph/ entity_graph.py # igraph wrapper: vertices, edges, queries, algorithms edges.py # Typed edge inference from entity pairs algorithms.py # PageRank, Louvain, fan-out, fan-in, betweenness, ego-graph correlation/ window.py # Per-entity sliding window buffer (deque, LRU eviction) watermark.py # Watermark-based late arrival tolerance risk.py # Risk accumulation with exponential decay pipeline/ handler.py # Event handler: parse → detect → graph → correlate → store run.py # Pipeline runner (config → receivers → handler → storage) tests/ unit/ # 1200+ unit tests integration/ # Integration tests (pipeline, graph, correlation, real SQLite) benchmarks/ # Throughput benchmarks (pytest-benchmark, CI history tracking) ``` ### 基准测试 ``` uv run pytest tests/benchmarks/ --benchmark-autosave uv run pytest tests/benchmarks/ --benchmark-compare ``` | Component | Throughput | |-----------|-----------| | Syslog parse | ~561K msgs/sec | | Drain3 templates | ~120K msgs/sec | | Entity extraction | ~41K msgs/sec | | Full normalizer | ~39.5K msgs/sec | | **Full pipeline** (parse + ML + Sigma + storage) | **~1,800 events/sec** | ## 许可证 [AGPL-3.0](LICENSE)

标签：AIOps, Apex, DevSecOps, Docker, gRPC, OISF, OTLP, PE 加载器, PMD, PostgreSQL, Python, Sigma规则, SQLite, Syslog, 上游代理, 安全威胁检测, 安全运营中心, 安全防御评估, 实体提取, 实时流处理, 开源安全工具, 异常检测, 插件系统, 无后门, 日志智能分析, 日志管理, 智能运维, 机器学习, 测试用例, 目标导入, 网络安全, 网络映射, 请求拦截, 逆向工具, 逆向工程平台, 隐私保护, 风险控制