RobertoDeLaCamara/CognitiveNetworkAnomalyDetector
GitHub: RobertoDeLaCamara/CognitiveNetworkAnomalyDetector
一个融合规则引擎、Isolation Forest和LSTM自编码器的三引擎网络异常检测系统,支持Scapy实时抓包、18维特征提取和告警可视化。
Stars: 1 | Forks: 0
# 认知异常检测器
[](https://www.python.org/downloads/)
[](LICENSE)
基于三引擎集成的网络异常检测:Isolation Forest (40%) + LSTM Autoencoder/PyTorch (40%) + 规则引擎 (20%)。支持 Scapy 抓包、18 维 per-IP 特征提取、MLflow 跟踪以及 Streamlit 仪表板。
## 架构
```
[Scapy capture] --> [Packet queue] --> [Feature extraction (18 features/IP)]
|
+-------------------+-------------------+
Rule-based Isolation Forest LSTM Autoencoder
(20%) (40%) (40%)
+-------------------+-------------------+
Ensemble scorer
(threshold: 0.6)
|
[Alert / SQLite]
```
### 检测引擎
| 引擎 | 类型 | 输入 | 技术 |
|--------|------|-------|-----------|
| **Rule-based** | 启发式 | 原始数据包 | 流量激增、ICMP 泛洪、端口扫描、payload 模式 (SQLi, XSS, shell) |
| **Isolation Forest** | 无监督 ML | 18 维 per-IP 特征 | 基于特征向量的统计异常检测 |
| **LSTM Autoencoder** | 深度学习 | 滑动窗口 | 基于重构误差的序列异常检测 |
### 特征提取 (每个 IP 18 维特征)
- **统计**: 数据包计数、字节量、数据包大小均值/标准差
- **时间**: 到达间隔时间统计、突发检测
- **协议**: TCP/UDP/ICMP 比率、SYN/FIN/RST 标志计数
- **端口**: 唯一目标端口数、端口扫描指标
- **Payload**: 平均 payload 大小、熵、模式匹配分数
## 快速开始
```
# 安装
git clone https://github.com/RobertoDeLaCamara/CognitiveNetworkAnomalyDetector.git
cd cognitive-anomaly-detector
python3 -m venv venv && source venv/bin/activate
pip install -r requirements.txt
# 生成合成训练数据
python scripts/generate_synthetic_data.py
# 训练模型
python scripts/train_model.py --from-file data/training/synthetic_baseline.csv --version 1
python scripts/train_lstm_model.py --from-file data/training/synthetic_baseline.csv
# 运行检测(抓包需要 root)
sudo venv/bin/python main.py
# Dashboard(可选)
./run_dashboard.sh # --> http://localhost:8501
```
关于 Docker、完整安装选项以及 MLflow 远程设置,请参阅 [SETUP.md](docs/SETUP.md)。
## 项目结构
```
cognitive-anomaly-detector/
├── src/
│ ├── core/ # Infrastructure (DB, Queue, Logging)
│ ├── detection/ # Triple detection engine + ensemble scorer
│ │ ├── rule_engine.py # Pattern matching, threshold rules
│ │ ├── ml_detector.py # Isolation Forest wrapper
│ │ ├── lstm_detector.py # LSTM Autoencoder wrapper
│ │ └── ensemble.py # Weighted confidence fusion
│ ├── ml/ # Feature extraction, model training, detectors
│ │ ├── feature_extractor.py # 18-feature pipeline
│ │ ├── model_trainer.py # IF training with MLflow tracking
│ │ └── lstm_trainer.py # LSTM Autoencoder training
│ ├── config/ # Rule thresholds and ML settings
│ └── dashboard/ # Dashboard-specific logic
├── scripts/
│ ├── generate_synthetic_data.py # Synthetic training data generation
│ ├── train_model.py # Isolation Forest training
│ ├── train_lstm_model.py # LSTM Autoencoder training
│ └── auto_train_lstm.sh # Scheduled retraining
├── tests/ # Organized by component (core, detection, ml)
├── models/ # Trained model files (.pkl, .pt)
├── data/ # Training data
├── main.py # Detection entry point (sudo required)
├── dashboard.py # Streamlit dashboard
├── docker-compose.yml
├── Jenkinsfile # CI pipeline
└── requirements.txt
```
## 配置
| 变量 | 描述 | 默认值 |
|----------|-------------|---------|
| `CAPTURE_INTERFACE` | Scapy 的网络接口 | `eth0` |
| `ENSEMBLE_IF_WEIGHT` | Isolation Forest 权重 | `0.4` |
| `ENSEMBLE_LSTM_WEIGHT` | LSTM Autoencoder 权重 | `0.4` |
| `ENSEMBLE_RULES_WEIGHT` | 规则引擎权重 | `0.2` |
| `ALERT_THRESHOLD` | 告警的置信度阈值 | `0.6` |
| `MLFLOW_TRACKING_URI` | MLflow 服务器 URL | `None` (本地) |
| `MLFLOW_EXPERIMENT_NAME` | 实验名称 | `anomaly-detection` |
| `DB_PATH` | SQLite 数据库路径 | `data/alerts.db` |
所有选项请参阅 [CONFIGURATION.md](docs/CONFIGURATION.md)。
## 训练选项
```
# From synthetic data(快速,无需 sudo)
python scripts/train_model.py --from-file data/training/synthetic_baseline.csv --version 1
# From live traffic
sudo venv/bin/python scripts/train_model.py --duration 60 --version 1
# LSTM Autoencoder
python scripts/train_lstm_model.py --from-file data/training/synthetic_baseline.csv
# Disable MLflow for a run
python scripts/train_model.py --from-file data.csv --no-mlflow
```
## 测试
```
pytest tests/ -v
pytest tests/ --cov=src --cov-report=term-missing
```
## CI/CD
Jenkins 多分支流水线 (Gitea SCM 源):
- **构建** Docker 镜像
- **Lint** + 安全扫描
- **测试** (含覆盖率)
- **SonarQube** 分析
## 文档
- [SETUP.md](docs/SETUP.md) -- 安装、Docker、MLflow 远程设置
- [CONFIGURATION.md](docs/CONFIGURATION.md) -- 所有配置选项和环境变量
- [DASHBOARD.md](docs/DASHBOARD.md) -- Streamlit 仪表板指南
- [SECURITY.md](docs/SECURITY.md) -- 安全说明及已应用的加固措施
## 需求
- Python 3.8+
- 需要 root/sudo 权限以进行数据包捕获 (`main.py`、实时训练)
- 可选:远程 MLflow 服务器 + MinIO 用于实验跟踪
标签:Apex, Caido项目解析, HTTP/HTTPS抓包, Kubernetes, LSTM自编码器, MLflow, PB级数据处理, Python, PyTorch, Scapy, Streamlit, Web攻击检测, 代码示例, 凭据扫描, 基于规则的检测, 孤立森林, 安全运维, 异常检测, 态势感知, 数据分析, 无后门, 机器学习, 深度学习, 特征提取, 端口扫描检测, 网络安全, 访问控制, 请求拦截, 逆向工具, 隐私保护, 集成学习