Not-muzzyy/mini-siem-ai

GitHub: Not-muzzyy/mini-siem-ai

基于机器学习的轻量级安全信息与事件管理系统，集成 SHAP 可解释性和 LLM 威胁推理，实现从日志分析到 PDF 报告生成的完整安全运营流程。

Stars: 1 | Forks: 0

# 🛡️ Mini-SIEM — AI 驱动的安全运营平台

一个生产级安全信息与事件管理系统，融合了 ML 攻击分类、 SHAP 可解释性、LLM 威胁推理、自动化 PDF 事件报告以及实时 Streamlit 仪表盘。

## 🤯 什么是 SIEM？ **SIEM（安全信息与事件管理系统）** 是企业网络安全的中央神经系统。它负责摄取网络日志、实时检测攻击、解释推理过程，并为安全分析师生成报告。真正的 SIEM 产品（IBM QRadar、Splunk、Microsoft Sentinel）每年的费用高达 **₹50L–₹5Cr**。本项目使用开源 Python 构建了一个功能完备的迷你版本！🔥 ## 🏗️ 架构 ``` Raw Network Logs (CSV) │ ▼ ① Feature Engineering network_feature_engineering.py 9 features per IP per 5-min window │ ▼ ② ML Attack Classifier attack_classification_pipeline.py RandomForest + GridSearchCV → Attack type + confidence % │ ├─────────────────────────┐ ▼ ▼ ③ Risk Scoring Engine ④ SHAP Explainability risk_scoring_engine.py shap_explainability.py → 0-100 risk score → Why did AI decide this? → Low/Medium/High/Critical → Top feature drivers │ │ └──────────┬──────────────┘ ▼ ⑤ LLM Threat Reasoning llm_threat_reasoning.py GPT-4o-mini → Narrative threat assessment → Business impact + response steps │ ▼ ⑥ PDF Incident Report incident_response_report.py Auto-generated professional PDF (built from scratch — no external PDF libs!) │ ▼ ⑦ Streamlit Dashboard streamlit_dashboard.py Live metrics, SHAP charts, risk gauge, AI reports ``` ## ✨ 功能特性 |模块 |功能 | |-------------------------|---------------------------------------------------------------------------------| |🔧 **特征工程**|从原始日志中提取 9 个 ML 特征 —— 端口熵、SYN 比率、C2 信标时序等| |🤖 **攻击分类器** |通过 GridSearchCV 进行超参数调优，检测 6 种攻击类型 | |🔍 **SHAP 可解释性**|真正的 TreeExplainer —— 解释模型为何标记每个事件 | |⚠️ **风险评分** |加权 4 信号综合评分（0-100）及类别映射 | |🧠 **LLM 推理** |GPT-4o-mini 生成叙述性威胁评估，支持重试 + 退避 | |📄 **PDF 报告** |基于原始字节构建的专业事件响应 PDF —— 零依赖 | |📊 **仪表盘** |Streamlit UI，包含混淆矩阵、SHAP 图表、风险仪表盘和 AI 报告 | ## 🎯 可检测的攻击类型 |攻击 |描述 |关键信号 | |---------------------|-------------------|--------------------------------------------| |✅ `benign` |正常流量 |行为平衡、规律 | |🔑 `brute_force` |密码猜解 |高 failed_login_ratio | |🔭 `scan` |端口侦察|高 port_entropy, unique_ports | |💥 `ddos` |泛洪攻击 |极高的 connection_rate, 高 syn_flag_ratio| |🕹️ `c2` |恶意软件信标 |inter_request_time_std 接近零 | |📤 `data_exfiltration`|数据窃取 |非常高的 avg_packet_size | ## 📦 安装 ``` # 克隆 repository git clone https://github.com/Not-muzzyy/try.git cd try # 安装 dependencies pip install -r requirements.txt # 启动 dashboard streamlit run streamlit_dashboard.py ``` ## 🚀 快速开始 1. 运行 `streamlit run streamlit_dashboard.py` 2. 在侧边栏中启用 **“Use sample dataset”**（*包含在仓库中*） 3. 点击 **“Run SIEM Analysis”** 4. 探索所有 4 个标签页 —— 指标、混淆矩阵、SHAP、风险 + AI 报告 ## 🗂️ 项目结构 ``` mini-siem/ │ ├── streamlit_dashboard.py → Main UI — run this ├── network_feature_engineering.py → Raw logs → 9 ML features ├── attack_classification_pipeline.py → ML training + inference ├── shap_explainability.py → SHAP TreeExplainer integration ├── risk_scoring_engine.py → 4-signal weighted risk score ├── llm_threat_reasoning.py → GPT-4o-mini threat analysis ├── incident_response_report.py → Raw PDF byte generation ├── mini_siem_design.md → Full architecture document │ ├── data/ │ └── sample_intrusion_dataset.csv → 2000-row labeled dataset │ ├── artifacts/ → Saved ML models (auto-created) ├── logs/ → Log storage └── requirements.txt ``` ## 🛠️ 技术栈 |技术 |用途 | |--------------------------|----------------------------| |Python 3.10+ |核心语言 | |Streamlit |交互式仪表盘 | |Scikit-learn |RandomForest + GridSearchCV | |SHAP |TreeExplainer 可解释性| |Pandas / NumPy |数据处理 | |Matplotlib |图表和可视化 | |OpenAI API (stdlib urllib)|LLM 威胁推理 | |Raw PDF bytes |事件报告生成 | ## 🧠 核心工程亮点 **零外部 PDF 依赖** —— `incident_response_report.py` 通过直接在 Python 中写入原始 PDF 规范字节来生成 PDF。无需 ReportLab，无需 FPDF。 **多版本 SHAP 兼容** —— `shap_explainability.py` 处理所有 3 种 SHAP 输出格式（列表、二维数组、三维数组），以防止版本升级导致的损坏。 **指数退避重试** —— `llm_threat_reasoning.py` 使用标准库 `urllib` 配合指数退避 —— 无需 `requests` 库。 **时间窗口特征** —— `network_feature_engineering.py` 将日志按源 IP 聚合到 5 分钟的窗口中，计算 Shannon 熵以进行端口扫描检测。 **可配置风险融合** —— `risk_scoring_engine.py` 使用经过验证的加权评分，权重总和必须精确为 1.0（浮点精度容差为 1e-9）。 ## 📊 示例结果 ``` Classification Report (sample dataset): precision recall f1-score benign 0.97 0.98 0.97 brute_force 0.99 0.98 0.98 c2 0.96 0.94 0.95 data_exfil 0.95 0.96 0.95 ddos 0.99 0.99 0.99 scan 0.98 0.99 0.98 Weighted F1: 0.978 ``` ## 🔮 未来路线图 - [ ] 部署到 Streamlit Cloud（公开演示） - [ ] 通过 Kafka/WebSocket 实现实时日志流 - [ ] MITRE ATT&CK 框架映射 - [ ] 支持 RBAC 的多租户支持 - [ ] Email/Slack 告警通知 - [ ] 扩展至 15+ 种攻击类型 ## 👨‍💻 关于作者 **Mohammed Muzamil C** 最后一年 BCA 学生 | 网络安全与机器学习 Nandi Institute of Management & Science College, Ballari Vijayanagara Sri Krishnadevaraya University [![LinkedIn](https://img.shields.io/badge/LinkedIn-Connect-blue?style=flat&logo=linkedin)](https://linkedin.com/in/muzzammilc7) [![GitHub](https://img.shields.io/badge/GitHub-Follow-black?style=flat&logo=github)](https://github.com/Not-muzzyy) ## 📄 许可证 MIT 许可证 —— 开源且免费使用。

⭐ 如果这个项目对您有帮助，请给仓库一个 Star！

标签：Apex, DLL 劫持, GPT-4o, Kubernetes, PDF生成, Petitpotam, Python, RandomForest, Scikit-learn, SHAP, Streamlit, XAI, 人工智能, 仪表盘, 可解释人工智能, 大语言模型, 威胁推理, 安全信息与事件管理, 安全运营, 开源SIEM, 扫描框架, 搜索引擎爬取, 攻击分类, 数据科学, 无后门, 机器学习, 特征工程, 用户模式Hook绕过, 网络安全, 网络流量分析, 自动化报告, 访问控制, 资源验证, 逆向工具, 随机森林, 隐私保护