Ayan-creator-web/cybersentinel

GitHub: Ayan-creator-web/cybersentinel

一个基于机器学习与规则引擎的混合入侵检测系统，在30毫秒内识别零日攻击并提供可解释的MITRE告警。

Stars: 0 | Forks: 0

# 🛡 CyberSentinel **检测零日攻击、命名MITRE技术并解释 *为什么* —— 在30毫秒以内。** *生产风格的混合IDS：规则引擎 → 异常检测 → ML分类器，配备结构化JSON告警、SHAP可解释性与实时SOC仪表板。完全基于公开数据集构建 — 无需企业级访问权限。* [![Python](https://img.shields.io/badge/Python-3.10+-3776AB?style=flat&logo=python&logoColor=white)](https://python.org) [![scikit-learn](https://img.shields.io/badge/scikit--learn-1.3-F7931E?style=flat&logo=scikitlearn&logoColor=white)](https://scikit-learn.org) [![Streamlit](https://img.shields.io/badge/Streamlit-Dashboard-FF4B4B?style=flat&logo=streamlit&logoColor=white)](https://streamlit.io) [![SHAP](https://img.shields.io/badge/SHAP-Explainable_AI-8B5CF6?style=flat)](https://shap.readthedocs.io) [![MITRE ATT&CK](https://img.shields.io/badge/MITRE_ATT%26CK-13_Techniques-DC2626?style=flat)](https://attack.mitre.org) [![REST API](https://img.shields.io/badge/REST_API-POST_%2Fdetect-16A34A?style=flat)]() [![License](https://img.shields.io/badge/License-MIT-22c55e?style=flat)](LICENSE)

## ⚡ 快速开始 — 3分钟内运行 ``` git clone https://github.com/YOUR-USERNAME/cybersentinel.git cd cybersentinel pip install -r requirements.txt python main.py # train → detect → benchmark (all automatic) streamlit run dashboard/app.py # open http://localhost:8501 ``` ## 🖥 演示预览 ### SOC仪表板 — 实时概览 ![Dashboard](https://static.pigsec.cn/wp-content/uploads/repos/2026/04/bfe3fd93ac231144.png) ### 告警调查面板 + SHAP解释 ![Alert Detail](https://static.pigsec.cn/wp-content/uploads/repos/2026/04/a6d07d091e231145.png) ### 示例告警JSON输出 ``` { "alert_id": "ALT-2963DA", "severity": "critical", "confidence": 0.982, "attack_type": "SSH-Patator", "mitre": { "tactic": "Credential Access", "technique": "Brute Force", "technique_id": "T1110", "recommended_action": "Enforce MFA + lock account after 5 failures" }, "threat_intel": { "malicious_votes": 42, "country": "Russia" }, "top_features": { "login_frequency": 0.42, "iat_cv": 0.28 } } ``` ## 💡 存在的意义传统IDS工具存在两个问题，而CyberSentinel直接解决了它们： **问题1 — 它们会漏掉未见过的攻击。** 基于签名的系统无法应对零日与变种攻击。CyberSentinel的第2阶段（孤立森林）仅使用*正常*流量训练 — 任何统计偏差都会触发告警，即使攻击类型在训练中从未出现。 **问题2 — 它们的告警不可操作。** `Threat_Detected: true` 对分析师毫无帮助。每个CyberSentinel告警都包含MITRE ATT&CK技术ID、显示具体哪些特征导致告警的SHAP瀑布图、IP信誉评分以及推荐响应 — 包含即时分诊所需的一切。 ## 🏗 架构 ![Architecture](https://static.pigsec.cn/wp-content/uploads/repos/2026/04/92b61f7b60231147.png) | 阶段 | 方法 | 速度 | 存在原因 | |---|---|---|---| | **1 — 规则引擎** | 7个确定性阈值 | < 1 ms | 以零计算成本捕获明显的洪水与扫描 | | **2 — 孤立森林** | 无监督异常检测 | ~8 ms | 捕获分类器从未见过的零日威胁 | | **3 — 随机森林** | 有监督多分类器 | ~22 ms | 命名攻击类型、提供置信度分数并启用SHAP | 各阶段相互独立。被第1阶段拦截的流程不会进入昂贵的后期ML阶段。任何阶段的告警都会进行相同的特征增强。 ## 🧑‍💻 SOC分析师实际使用方式 ![Usage Flow](https://static.pigsec.cn/wp-content/uploads/repos/2026/04/2949ba5ece231148.png) 1. **流量摄入** — 从CICIDS-2017 CSV、`--detect-only` CLI标志，或对`/detect`的实时POST 2. **流水线运行** — 所有三个阶段在每条流程中顺序执行，耗时不到30毫秒 3. **生成告警** — 结构化JSON（包含严重性、MITRE ID、SHAP值与威胁情报）保存至 `outputs/alerts/` 4. **分析师查看仪表板** — 打开Streamlit（`localhost:8501`），查看时间线与告警表 5. **SHAP解释** — 点击任意告警行即可打开SHAP瀑布图与通俗英文解释 6. **采取行动** — 告警面板直接显示推荐操作；流程阶段跟踪（检测 → 分级 → 确认 → 解决） ## 🌐 REST API 运行 `python api.py` 启动API服务器，然后向 `/detect` 发送POST请求： ``` curl -X POST http://localhost:5000/detect \ -H "Content-Type: application/json" \ -d '{ "source_ip": "45.33.32.156", "destination_ip": "10.0.0.10", "protocol": "TCP", "destination_port": 22, "flow_duration": 2100000, "total_fwd_packets": 22, "syn_flag_count": 1, "ack_flag_count": 18, "avg_packet_size": 143 }' ``` **可用端点：** | 方法 | 端点 | 功能 | |---|---|---| | POST | `/detect` | 在单条流程上运行完整的3阶段流水线 | | GET | `/health` | 模型就绪状态 + 系统状态 | | GET | `/stats` | 自启动以来的各严重级别告警计数 | | GET | `/sample` | 可复制粘贴的示例请求体 | 无外部框架 — 纯Python标准库。启动它，用curl测试，完成。 ## 📊 评估 — 真实数字 ![Evaluation](https://raw.githubusercontent.com/Ayan-creator-web/cybersentinel/main/images/evaluation_charts.png) ### 模型对比 — 80/20分层分割 | 模型 | F1 | 精确率 | 召回率 | 假正率 | |---|---|---|---|---| | **随机森林**（选用） | **99.0%** | **98.7%** | **99.4%** | **0.9%** | | 决策树 | 97.1% | 96.4% | 97.8% | 2.9% | | 孤立森林（第2阶段） | 93.9% | 92.1% | 95.8% | 5.8% | | 逻辑回归 | 89.4% | 87.6% | 91.2% | 8.6% | ### 混淆矩阵 ![Confusion Matrix](https://static.pigsec.cn/wp-content/uploads/repos/2026/04/64022b275f231150.png) ### 透明度说明 — 面试重要提示这些分数是在**模拟数据**上测得的，其分布模仿CICIDS-2017。高分数（~99%）反映了类别分离良好的合成数据 — 真实网络流量噪声更大。 | 设计选择 | 实现方式 | 原因 | |---|---|---| | 类别不平衡 | `class_weight='balanced'` + 70/30 良性:攻击比例 | 防止多数类偏差 | | 训练/测试分割 | 分层80/20 | 保持两类在集合中的比例 | | 缩放 | RobustScaler（中位数/IQR） | 处理网络特征的厚尾分布 | | 真实CICIDS数据 | 数据集章节提供下载链接 | 在真实CSV上期望F1为96–98% | 仪表板中的基准测试页面包含实时阈值滑块，可精确展示精确率与召回率的权衡。将阈值从0.50移动到0.80会将误报率从13%降至2%，同时召回率从95%降至75% — 这是一个可调的操作决策。 ## 📦 数据集 **CICIDS-2017**（加拿大网络安全研究所） 87个CICFlowMeter特征 · 7类攻击 · IDS研究中广泛引用 ``` Download: https://www.unb.ca/cic/datasets/ids-2017.html Place at: cybersentinel/data/CICIDS2017_sample.csv ``` 如果文件不存在，系统会生成50,000条符合CICIDS统计分布的模拟流程。结果对演示流水线而言几乎一致。 ## 🛠 技术栈 | 层 | 工具 | |---|---| | 数据 | Pandas、NumPy、RobustScaler | | 检测 | scikit-learn（随机森林、孤立森林）、自定义规则引擎 | | 可解释性 | SHAP TreeExplainer | | 仪表板 | Streamlit、Plotly | | API | Python标准库 http.server（无外部框架） | | 可视化 | Matplotlib、Seaborn | ## ▶ 所有运行命令 ``` python main.py # full run: train + detect + benchmark python main.py --train-only # train and save models python main.py --detect-only # load models, run detection, save alerts python main.py --benchmark # compare 4 models, save CSV streamlit run dashboard/app.py # SOC dashboard → http://localhost:8501 python api.py # REST API → http://localhost:5000 ``` ## 📁 项目结构 ``` cybersentinel/ ├── src/ │ ├── config.py all thresholds and constants │ ├── data_loader.py CICIDS-2017 loader + 50k synthetic generator │ ├── preprocessor.py RobustScaler + 7 engineered flow features │ ├── rule_engine.py Stage 1 — 7 deterministic signature rules │ ├── models.py Stage 2 Isolation Forest + Stage 3 Random Forest │ ├── alert_engine.py structured JSON alert generation + MITRE mapper │ ├── threat_enricher.py IP reputation + geolocation + port risk │ ├── explainer.py SHAP attribution + waterfall plots │ ├── pipeline.py end-to-end orchestrator │ └── visualisation.py saved charts for README and reports ├── dashboard/app.py Streamlit SOC dashboard (6 pages) ├── api.py REST API — POST /detect ├── main.py CLI entry point ├── data/ place CICIDS CSV here ├── models/ trained artifacts (auto-created) ├── outputs/alerts/ JSON alert files (auto-created) ├── images/ charts and diagrams └── requirements.txt ``` ## 🚀 真实世界扩展 | 扩展 | 方法 | 努力程度 | |---|---|---| | 实时抓包 | `scapy` 嗅探器 → 5元组流重建 → 发送至 `/detect` | 中等 | | 实时VirusTotal查询 | 设置 `VT_API_KEY` 环境变量 → 自动激活IP信誉查询 | 简单 | | Docker部署 | `docker build -t cybersentinel . && docker run -p 5000:5000 cybersentinel` | 简单 | | 对抗加固 | IBM ART库 → FGSM攻击 → 对扰动流程进行对抗训练 | 高级 | | Slack/邮件告警 | 在 `alert_engine.create_alert()` 中添加关键严重性时的Webhook调用 | 简单 | ## 这展示了什么 | 技能 | 证据 | |---|---| | ML工程 | 3阶段流水线、特征工程、RobustScaler、分层交叉验证 | | 安全知识 | MITRE ATT&CK映射、IDS架构、SOCS告警分级 | | 可解释AI | SHAP TreeExplainer、每条告警的瀑布图、通俗英文输出 | | 系统设计 | 配置驱动、模块化源码布局、CLI标志、REST API | | 诚实评估 | 明确记录训练/测试分割、解释类别不平衡、展示FP率 | | 可视化 | Plotly Streamlit仪表板、8+张Matplotlib图表 | ## 📚 参考资料 - Sharafaldin et al., "Toward Generating a New Intrusion Detection Dataset", ICISSP 2018 - MITRE ATT&CK: https://attack.mitre.org - Lundberg & Lee, "A Unified Approach to Interpreting Model Predictions", NeurIPS 2017 - Liu et al., "Isolation Forest", ICDM 2008

_{毕业设计 · 展示应用于网络安全的行业级AI/ML}

标签：30 毫秒响应, AI 解释性, Cloudflare, JSON 告警, Kubernetes, MITRE ATT&CK, POST /detect, Python 3.10, REST API, scikit-learn, SHAP, SOC 仪表板, Streamlit, T1110 暴力破解, TCP/UDP协议, 云计算, 入侵检测系统, 公开数据, 凭证访问, 可解释性, 安全信息与事件管理, 安全数据湖, 开源数据集, 异常检测, 快速检测, 快速部署, 搜索引擎爬取, 无需企业访问, 时间线生成, 机器学习分类, 生产环境 IDS, 结构化告警, 网络威胁检测, 网络安全防护, 网络测绘, 规则引擎, 访问控制, 逆向工具, 零日攻击检测