srin2705/Context-aware-cyber-threat-forecasting-with-multi-modal-intelligence-and-proactive-defense

GitHub: srin2705/Context-aware-cyber-threat-forecasting-with-multi-modal-intelligence-and-proactive-defense

融合XGBoost、LSTM与自适应马尔可夫链的混合管道,实现基于上下文的网络威胁实时预测与分级告警。

Stars: 0 | Forks: 0

# 🛡️ 基于上下文的网络威胁预测 ## 🔍 项目功能 大多数入侵检测系统是被动响应的。而本项目则用于**主动预测**。 给定一个网络流量时间窗口,系统将: 1. 实时**分类**每个网络流(Normal / DoS / DDoS / Reconnaissance) 2. **预测**接下来*可能*出现的威胁类别 3. 使用 Monte Carlo Dropout **量化不确定性** 4. **权衡上下文信号**(一天中的时间段、设备类型、地理位置、威胁历史) 5. 结合风险评分**触发分级警报**(🔴 HIGH / 🟠 MEDIUM / 🟢 LOW) ## 🏗️ 架构 ``` Network Traffic │ ▼ ┌─────────────────────┐ │ XGBoost Classifier │ ← 32 flow features, calibrated probabilities │ (Isotonic Calib.) │ Accuracy: 99.89% │ F1: 0.9993 └────────┬────────────┘ │ Probability sequences (window = 10 steps) ▼ ┌─────────────────────┐ │ LSTM Forecaster │ ← Temporal sequence learning │ (MC Dropout ×30) │ Val Accuracy: 94.8% │ Uncertainty-aware └────────┬────────────┘ │ Posterior distribution over next state ▼ ┌──────────────────────────────┐ │ Adaptive Markov v3 │ ← Blends empirical transitions + │ + Escalation Prior │ cyber kill-chain domain priors │ + Context Engine │ + 5 real-time context signals └────────┬─────────────────────┘ │ ▼ 🎯 Next-State Forecast + Risk Score + Alert Level ``` ## 📊 模型性能 | 模型 | 指标 | 得分 | |---|---|---| | XGBoost (calibrated) | Test Accuracy | **99.89%** | | XGBoost | Macro F1 (5-fold CV) | **0.9993 ± 0.0002** | | LSTM | Validation Accuracy | **94.8%** | | Full Pipeline | Forecast Accuracy (Normal scenario) | **100%** | **Dataset:** Bot-IoT — 3,668,522 flows × 46 features (DDoS, DoS, Reconnaissance, Normal) **Class imbalance handled with:** SMOTE-based balanced sampling ## 🚀 演示场景 实时演示(`demo1.py`)内置了 5 个预设场景: | 场景 | 描述 | |---|---| | 🟢 A — All Normal | 基线健康流量 | | 🟠 B — Slow Escalation | Recon 逐步升级为 DoS | | 🔴 C — Sudden DDoS | 突发的大规模攻击 | | 🔵 D — Stealth Recon | 低速且缓慢的侦察扫描 | | 🟣 E — APT Simulation | 多阶段高级持续性威胁 | ## 🗂️ 仓库结构 ``` ├── demo1.py # Interactive demo (4 modes) ├── CyberThreat_FYP_Final_Clean.ipynb # Full training notebook ├── fyp_saved_models/ │ ├── xgb_calibrated.pkl # Trained XGBoost + isotonic calibration │ ├── lstm_model.keras # Trained LSTM forecaster │ ├── scaler.pkl # Feature scaler │ ├── label_encoder.pkl # Class label encoder │ ├── class_names.json # [DDoS, DoS, Normal, Reconnaissance] │ └── feature_cols.json # 32 selected flow features ├── viz_01_raw_distribution.png # Class distribution (raw) ├── viz_05_correlation_heatmap.png # Feature correlation heatmap ├── viz_09_xgb_confusion.png # XGBoost confusion matrix ├── viz_12_lstm_training.png # LSTM training curves ├── viz_14_threat_forecast.png # Markov forecast output ├── dashboard_*.png # Live dashboard screenshots └── classification_basis.svg # Architecture diagrams ``` ## ⚡ 快速开始 ``` # Clone the repository git clone https://github.com/YOUR_USERNAME/cyber-threat-forecasting.git cd cyber-threat-forecasting # Install dependencies pip install -r requirements.txt # 运行交互式 demo python demo1.py ``` **演示模式:** ``` 1 → Scenario Sweep (all 5 pre-built scenarios) 2 → Real Samples (draws from actual Bot-IoT data) 3 → Interactive (enter your own feature values) 4 → Stress Test (edge cases & uncertainty analysis) ``` ## 🧰 技术栈 | 类别 | 库 | |---|---| | ML / Classification | `XGBoost`, `scikit-learn` (isotonic calibration, SMOTE) | | Deep Learning | `TensorFlow / Keras` (LSTM, MC Dropout) | | Data Processing | `NumPy`, `Pandas` | | Visualisation | `Matplotlib`, `Seaborn` | | Serialisation | `joblib` | ## 🧠 关键设计决策 **为什么选择 XGBoost → LSTM(而不是端到端)?** XGBoost 为每个网络流提供经过校准的*概率向量*。LSTM 通过学习这些概率向量序列中的模式进行预测——这比原始特征提供了更丰富的时间信号。 **为什么使用 Adaptive Markov v3?** 纯粹的神经网络预测会忽略领域知识。Markov 层融合了三种信号:经验状态转换、网络杀伤链升级先验(Normal → Recon → DoS → DDoS)以及实时上下文信号。 **为什么使用 MC Dropout?** 在安全领域,不确定性量化至关重要。一个高置信度的错误预测比一个不确定的正确预测更危险。30 次随机前向传递可为每次预测提供经过校准的不确定性估计。 ## 📸 可视化

XGBoost Confusion Matrix

LSTM Training Curves

Threat Forecast Output

Dashboard — Normal Traffic

Dashboard — DDoS Attack

Dashboard — APT Simulation
## 📄 依赖要求 ``` tensorflow>=2.12 xgboost>=1.7 scikit-learn>=1.2 imbalanced-learn>=0.10 numpy>=1.23 pandas>=1.5 matplotlib>=3.6 seaborn>=0.12 joblib>=1.2 ``` ## 👤 作者 Sriram S BCA AI & ML 方向 [![LinkedIn](https://img.shields.io/badge/LinkedIn-blue?logo=linkedin)](https://linkedin.com/in/YOUR_PROFILE) [![GitHub](https://img.shields.io/badge/GitHub-black?logo=github)](https://github.com/YOUR_USERNAME) ## ⭐ 如果这个项目对你有帮助 请给它点个 Star——这将帮助更多网络安全和 ML 领域的人发现这项工作!
标签:Bot-IoT数据集, CISA项目, DDoS攻击, DoS攻击, LSTM, XGBoost, 上下文感知, 入侵检测系统, 分类器, 威胁预警, 安全数据湖, 实时威胁预测, 异常检测, 插件系统, 时间序列预测, 深度学习, 混合机器学习模型, 网络安全, 网络安全威胁情报, 网络流量预测, 蒙特卡洛Dropout, 逆向工具, 隐私保护, 集成学习, 风险评分, 马尔可夫链