RobertoDeLaCamara/CognitiveNetworkAnomalyDetector

GitHub: RobertoDeLaCamara/CognitiveNetworkAnomalyDetector

一个融合规则引擎、Isolation Forest和LSTM自编码器的三引擎网络异常检测系统，支持Scapy实时抓包、18维特征提取和告警可视化。

Stars: 1 | Forks: 0

# 认知异常检测器 [![Python 3.8+](https://img.shields.io/badge/python-3.8+-blue.svg)](https://www.python.org/downloads/) [![License: AGPL v3](https://img.shields.io/badge/License-AGPL%20v3-blue.svg)](LICENSE) 基于三引擎集成的网络异常检测：Isolation Forest (40%) + LSTM Autoencoder/PyTorch (40%) + 规则引擎 (20%)。支持 Scapy 抓包、18 维 per-IP 特征提取、MLflow 跟踪以及 Streamlit 仪表板。 ## 架构 ``` [Scapy capture] --> [Packet queue] --> [Feature extraction (18 features/IP)] | +-------------------+-------------------+ Rule-based Isolation Forest LSTM Autoencoder (20%) (40%) (40%) +-------------------+-------------------+ Ensemble scorer (threshold: 0.6) | [Alert / SQLite] ``` ### 检测引擎 | 引擎 | 类型 | 输入 | 技术 | |--------|------|-------|-----------| | **Rule-based** | 启发式 | 原始数据包 | 流量激增、ICMP 泛洪、端口扫描、payload 模式 (SQLi, XSS, shell) | | **Isolation Forest** | 无监督 ML | 18 维 per-IP 特征 | 基于特征向量的统计异常检测 | | **LSTM Autoencoder** | 深度学习 | 滑动窗口 | 基于重构误差的序列异常检测 | ### 特征提取 (每个 IP 18 维特征) - **统计**: 数据包计数、字节量、数据包大小均值/标准差 - **时间**: 到达间隔时间统计、突发检测 - **协议**: TCP/UDP/ICMP 比率、SYN/FIN/RST 标志计数 - **端口**: 唯一目标端口数、端口扫描指标 - **Payload**: 平均 payload 大小、熵、模式匹配分数 ## 快速开始 ``` # 安装 git clone https://github.com/RobertoDeLaCamara/CognitiveNetworkAnomalyDetector.git cd cognitive-anomaly-detector python3 -m venv venv && source venv/bin/activate pip install -r requirements.txt # 生成合成训练数据 python scripts/generate_synthetic_data.py # 训练模型 python scripts/train_model.py --from-file data/training/synthetic_baseline.csv --version 1 python scripts/train_lstm_model.py --from-file data/training/synthetic_baseline.csv # 运行检测（抓包需要 root） sudo venv/bin/python main.py # Dashboard（可选） ./run_dashboard.sh # --> http://localhost:8501 ``` 关于 Docker、完整安装选项以及 MLflow 远程设置，请参阅 [SETUP.md](docs/SETUP.md)。 ## 项目结构 ``` cognitive-anomaly-detector/ ├── src/ │ ├── core/ # Infrastructure (DB, Queue, Logging) │ ├── detection/ # Triple detection engine + ensemble scorer │ │ ├── rule_engine.py # Pattern matching, threshold rules │ │ ├── ml_detector.py # Isolation Forest wrapper │ │ ├── lstm_detector.py # LSTM Autoencoder wrapper │ │ └── ensemble.py # Weighted confidence fusion │ ├── ml/ # Feature extraction, model training, detectors │ │ ├── feature_extractor.py # 18-feature pipeline │ │ ├── model_trainer.py # IF training with MLflow tracking │ │ └── lstm_trainer.py # LSTM Autoencoder training │ ├── config/ # Rule thresholds and ML settings │ └── dashboard/ # Dashboard-specific logic ├── scripts/ │ ├── generate_synthetic_data.py # Synthetic training data generation │ ├── train_model.py # Isolation Forest training │ ├── train_lstm_model.py # LSTM Autoencoder training │ └── auto_train_lstm.sh # Scheduled retraining ├── tests/ # Organized by component (core, detection, ml) ├── models/ # Trained model files (.pkl, .pt) ├── data/ # Training data ├── main.py # Detection entry point (sudo required) ├── dashboard.py # Streamlit dashboard ├── docker-compose.yml ├── Jenkinsfile # CI pipeline └── requirements.txt ``` ## 配置 | 变量 | 描述 | 默认值 | |----------|-------------|---------| | `CAPTURE_INTERFACE` | Scapy 的网络接口 | `eth0` | | `ENSEMBLE_IF_WEIGHT` | Isolation Forest 权重 | `0.4` | | `ENSEMBLE_LSTM_WEIGHT` | LSTM Autoencoder 权重 | `0.4` | | `ENSEMBLE_RULES_WEIGHT` | 规则引擎权重 | `0.2` | | `ALERT_THRESHOLD` | 告警的置信度阈值 | `0.6` | | `MLFLOW_TRACKING_URI` | MLflow 服务器 URL | `None` (本地) | | `MLFLOW_EXPERIMENT_NAME` | 实验名称 | `anomaly-detection` | | `DB_PATH` | SQLite 数据库路径 | `data/alerts.db` | 所有选项请参阅 [CONFIGURATION.md](docs/CONFIGURATION.md)。 ## 训练选项 ``` # From synthetic data（快速，无需 sudo） python scripts/train_model.py --from-file data/training/synthetic_baseline.csv --version 1 # From live traffic sudo venv/bin/python scripts/train_model.py --duration 60 --version 1 # LSTM Autoencoder python scripts/train_lstm_model.py --from-file data/training/synthetic_baseline.csv # Disable MLflow for a run python scripts/train_model.py --from-file data.csv --no-mlflow ``` ## 测试 ``` pytest tests/ -v pytest tests/ --cov=src --cov-report=term-missing ``` ## CI/CD Jenkins 多分支流水线 (Gitea SCM 源): - **构建** Docker 镜像 - **Lint** + 安全扫描 - **测试** (含覆盖率) - **SonarQube** 分析 ## 文档 - [SETUP.md](docs/SETUP.md) -- 安装、Docker、MLflow 远程设置 - [CONFIGURATION.md](docs/CONFIGURATION.md) -- 所有配置选项和环境变量 - [DASHBOARD.md](docs/DASHBOARD.md) -- Streamlit 仪表板指南 - [SECURITY.md](docs/SECURITY.md) -- 安全说明及已应用的加固措施 ## 需求 - Python 3.8+ - 需要 root/sudo 权限以进行数据包捕获 (`main.py`、实时训练) - 可选：远程 MLflow 服务器 + MinIO 用于实验跟踪

标签：Apex, Caido项目解析, HTTP/HTTPS抓包, Kubernetes, LSTM自编码器, MLflow, PB级数据处理, Python, PyTorch, Scapy, Streamlit, Web攻击检测, 代码示例, 凭据扫描, 基于规则的检测, 孤立森林, 安全运维, 异常检测, 态势感知, 数据分析, 无后门, 机器学习, 深度学习, 特征提取, 端口扫描检测, 网络安全, 访问控制, 请求拦截, 逆向工具, 隐私保护, 集成学习