seerflow/seerflow
GitHub: seerflow/seerflow
一个轻量级流式日志智能代理,通过传统 ML 集成与 Sigma 规则实现跨日志源的异常检测和安全威胁识别。
Stars: 0 | Forks: 0
# Seerflow
一个流式、以实体为中心的日志智能代理,可跨日志源检测操作故障和安全威胁。结合传统 ML(快速、成本低)进行批量检测,并使用 Sigma 规则(3,000+ 社区检测)来识别已知的威胁模式。
## 状态
**Alpha** — 完整的摄取 + 检测 + Sigma 规则管道已投入运行。
[](https://github.com/seerflow/seerflow/actions/workflows/ci.yml)
[](https://pypi.org/project/seerflow/)
[](https://www.python.org/)
[](LICENSE)
## 快速开始
```
# 从源代码安装
git clone https://github.com/seerflow/seerflow.git
cd seerflow
uv sync
# 复制并编辑示例 config
cp seerflow.example.yaml seerflow.yaml
# 启动 pipeline
uv run python -m seerflow start
```
### 命令行
```
# 使用默认 config 启动(当前目录下的 seerflow.yaml)
uv run python -m seerflow start
# 使用特定的 config 文件启动
uv run python -m seerflow --config /path/to/seerflow.yaml start
# 显示版本
uv run python -m seerflow --version
```
### Docker
```
# 使用 SQLite 默认设置构建并运行(零配置)
docker compose up -d
# 使用 PostgreSQL 运行(先设置密码)
export POSTGRES_PASSWORD=your-secure-password
docker compose --profile postgres up -d
# 或者从 registry 镜像独立运行
docker run -p 8080:8080 -p 4317:4317 -p 514:514/udp seerflow/seerflow
# 挂载自定义 config
docker run -v ./seerflow.yaml:/app/seerflow.yaml:ro seerflow/seerflow
```
### 它的功能
1. **摄取** 同时从多个来源(syslog、OTLP gRPC/HTTP、文件追踪、webhooks)获取日志
2. **解析** 每一行日志,使用 Drain3(模板提取)和正则表达式实体提取(IP、用户、主机、文件、域名、进程)
3. **解析实体** 为确定性的 UUID5 ID,用于跨来源关联
4. **评分** 使用 ML 集成对事件进行评分:Half-Space Trees(内容)、Holt-Winters(体量)、CUSUM(变化)、Markov chains(序列)——通过 z-normalization 混合
5. **阈值** 使用 biDSPOT(基于 EVT 的自动阈值——无需手动调优)对分数进行阈值判断
6. **评估** 63 个内置的 Sigma 规则(Linux、Web、DNS、进程、网络)并带有 MITRE ATT&CK 标签
7. **绘制图表** 使用 igraph 构建实体关系图——PageRank、Louvain、fan-out、betweenness centrality
8. **累积** 通过指数衰减累积每个实体的风险——捕捉缓慢燃烧的多步骤攻击
9. **告警** 针对异常、Sigma 匹配和风险阈值超标发出警报
10. **持久化** 将所有事件、警报、图边缘和 ML 模型状态保存到 SQLite
### 示例:检测 Syslog 中的异常
```
# seerflow.yaml
receivers:
syslog_enabled: true
syslog_udp_port: 5514 # use high port to avoid root
otlp_grpc_enabled: false
otlp_http_enabled: false
webhook_enabled: false
detection:
hst_window_size: 100 # lower for faster calibration
dspot:
calibration_window: 200
risk_level: 0.01 # more sensitive for testing
```
```
# 终端 1:启动 Seerflow
uv run python -m seerflow start
# 终端 2:发送正常流量
for i in $(seq 1 300); do
echo "<134>1 2026-03-24T19:00:00Z web nginx $i - - GET /api/v$((i%5)) 200 ${i}ms" \
| nc -u -w1 127.0.0.1 5514
done
# 终端 2:发送 anomalies
echo '<11>1 2026-03-24T19:01:00Z db postgres 999 - - FATAL connection limit exceeded 847/100' \
| nc -u -w1 127.0.0.1 5514
```
输出:
```
INFO Seerflow 0.3.0 starting
INFO Receivers: syslog
INFO Pipeline running — Ctrl+C to stop
WARNING ANOMALY [syslog] score=0.952 threshold=0.009 dir=upper
WARNING template: [7] <*> <*> postgres <*> - - FATAL connection limit exceeded <*>
WARNING message: <11>1 2026-03-24T19:01:00Z db postgres 999 - - FATAL connection limit exceeded 847/100
WARNING entities: 203.0.113.1
```
### 关闭摘要
按下 Ctrl+C 以查看会话统计信息:
```
INFO --- Session Summary ---
INFO Events processed: 312
INFO Anomalies detected: 10
INFO Unique templates: 7
INFO Duration: 45.3s
INFO Throughput: 7 events/sec
INFO Seerflow stopped
```
## 配置
有关完整的配置参考,请参阅 [SETTINGS.md](SETTINGS.md)。
所有设置均为可选——Seerflow 使用合理的默认值运行(零配置)。
关键配置部分:
- **receivers** -- syslog、OTLP gRPC/HTTP、文件追踪、webhooks(启用/禁用 + 端口)
- **detection** -- HST 窗口大小、DSPOT 校准、评分权重、自定义 Sigma 规则目录
- **storage** -- SQLite(默认)或 PostgreSQL
- **alerting** -- 去重窗口、webhook/PagerDuty 目标
## 接收器
| Receiver | Port | Protocol | Status |
|----------|------|----------|--------|
| Syslog UDP/TCP | 514 (5514) | RFC 5424/3164 | Done |
| OTLP gRPC | 4317 | Protobuf | Done |
| OTLP HTTP | 4318 | Protobuf + JSON | Done |
| File tailing | -- | Glob + watchfiles | Done |
| Webhooks | 8081 | JSON/form + auth | Done |
## 检测管道
```
Log Sources → Receivers → Drain3 → UUID5 Entities → ML Ensemble → Sigma Rules
↓ ↓ ↓
Entity Graph blended score ATT&CK tags
Window Buffer [0.0 - 1.0] tactic/technique
Risk Register ↓ ↓
↓ Risk Accumulation → Alert
PageRank, Louvain
Fan-out, Betweenness
```
- **Drain3**: 流式日志模板提取(120K msgs/sec)
- **UUID5 Entity Resolution**: 确定性的跨源实体 ID(相同实体 = 相同 UUID)
- **Half-Space Trees**: 通过 River 进行内容异常检测(恒定时间/内存)
- **Holt-Winters**: 体量异常检测(趋势 + 季节性分解)
- **CUSUM**: 变化点检测(双向累积和)
- **Markov Chains**: 序列异常检测(每个实体的转换矩阵)
- **biDSPOT**: 双向 EVT 自动阈值(上突刺 + 下突降)
- **DetectionEnsemble**: 编排所有检测器 + 每个来源的混合评分
- **Sigma Engine**: 63 个内置 SigmaHQ 规则,带有 logsource 索引分派
- **Entity Graph**: igraph 支持的关系图,具有类型化边缘 + 6 种算法
- **Risk Accumulation**: 每个实体的风险寄存器,带有指数衰减 + 可配置阈值
- **Sliding Window**: 每个实体的事件缓冲区,带有基于水印的迟到容忍度
## 开发
要求 Python 3.13+ 和 [uv](https://docs.astral.sh/uv/)。
```
# 安装 dependencies
uv sync
# 运行 tests
uv run pytest
# 运行 quality gates
uv run ruff check . && uv run ruff format --check . && uv run mypy src/ && uv run bandit -r src/ -c pyproject.toml && uv run pytest --cov=src/seerflow --cov-fail-under=95
```
### 项目结构
```
src/seerflow/
__main__.py # CLI entry point (config → pipeline → detection → storage)
cli.py # argparse (--config, --version)
config.py # YAML config loader with ${ENV_VAR} interpolation
models/ # SeerflowEvent, Alert, entity structs (msgspec)
storage/
protocols.py # Protocol interfaces (LogStore, AlertStore, ModelStore, EntityStore)
sqlite.py # SQLite backend (WAL, FTS5, WriteBuffer)
migrations.py # Schema versioning + forward-only migration runner
receivers/
base.py # RawEvent dataclass, Receiver protocol
manager.py # ReceiverManager (bounded queue, backpressure, shutdown)
syslog.py # UDP/TCP syslog (RFC 5424/3164)
otlp_grpc.py # OTLP gRPC receiver (protobuf LogRecord)
otlp_http.py # OTLP HTTP receiver (/v1/logs, protobuf + JSON)
file_tail.py # File tailing (glob, rotation, checkpoint)
webhook.py # Webhooks (JSON/form, field mapping, auth)
parsing/
drain.py # Drain3 wrapper for template extraction
entities.py # Regex entity extraction (6 types, params-aware tagging)
normalizer.py # EventNormalizer: RawEvent → SeerflowEvent
detection/
protocols.py # Detector Protocol (score, learn, serialize, deserialize)
hst.py # Half-Space Trees detector (River)
threshold.py # biDSPOT auto-threshold (scipy GPD)
ensemble.py # DetectionEnsemble orchestrator (4 detectors + blended scoring)
sigma/
engine.py # SigmaEngine: rule loading, logsource dispatch, evaluation
matcher.py # Custom detection matcher (condition tree walker, regex cache)
pipeline.py # pySigma processing pipeline (22 field mappings)
attack.py # MITRE ATT&CK tactic/technique extraction
bundled.py # Bundled rule path discovery (importlib.resources)
loader.py # Custom rule directory discovery + validation
rules/ # 63 curated SigmaHQ YAML rules (linux, web, dns, process, network)
graph/
entity_graph.py # igraph wrapper: vertices, edges, queries, algorithms
edges.py # Typed edge inference from entity pairs
algorithms.py # PageRank, Louvain, fan-out, fan-in, betweenness, ego-graph
correlation/
window.py # Per-entity sliding window buffer (deque, LRU eviction)
watermark.py # Watermark-based late arrival tolerance
risk.py # Risk accumulation with exponential decay
pipeline/
handler.py # Event handler: parse → detect → graph → correlate → store
run.py # Pipeline runner (config → receivers → handler → storage)
tests/
unit/ # 1200+ unit tests
integration/ # Integration tests (pipeline, graph, correlation, real SQLite)
benchmarks/ # Throughput benchmarks (pytest-benchmark, CI history tracking)
```
### 基准测试
```
uv run pytest tests/benchmarks/ --benchmark-autosave
uv run pytest tests/benchmarks/ --benchmark-compare
```
| Component | Throughput |
|-----------|-----------|
| Syslog parse | ~561K msgs/sec |
| Drain3 templates | ~120K msgs/sec |
| Entity extraction | ~41K msgs/sec |
| Full normalizer | ~39.5K msgs/sec |
| **Full pipeline** (parse + ML + Sigma + storage) | **~1,800 events/sec** |
## 许可证
[AGPL-3.0](LICENSE)
标签:AIOps, Apex, DevSecOps, Docker, gRPC, OISF, OTLP, PE 加载器, PMD, PostgreSQL, Python, Sigma规则, SQLite, Syslog, 上游代理, 安全威胁检测, 安全运营中心, 安全防御评估, 实体提取, 实时流处理, 开源安全工具, 异常检测, 插件系统, 无后门, 日志智能分析, 日志管理, 智能运维, 机器学习, 测试用例, 目标导入, 网络安全, 网络映射, 请求拦截, 逆向工具, 逆向工程平台, 隐私保护, 风险控制