strawberrykyuu/CyberSentinel-AI

GitHub: strawberrykyuu/CyberSentinel-AI

一个基于多智能体协作架构的实时网络安全威胁检测系统，融合三类异常检测模型与计算机视觉恶意软件分析，通过 Streamlit 仪表板实现全流程可视化监控与自动响应。

Stars: 0 | Forks: 0

# 🛡️ CyberSentinel AI — 智能体化网络安全系统 ## 📖 项目概述 CyberSentinel AI 是一个完全智能体化的网络安全管道，使用纯 Python 构建。它处理持续的安全日志事件流，使用三个互补的异常检测模型对事件进行评分，将可疑事件通过决策层进行路由，利用计算机视觉模型（灵感来自最初的 **bytes-cv.ipynb** 笔记本）深入分析文件威胁，并执行自动化响应——所有这些都可在实时 Streamlit 仪表板中查看。这**不是**一个简单的机器学习管道。每个阶段都封装在具有单一、明确职责的自主智能体中。智能体通过传递结构化的 Python 字典进行通信——没有消息队列，没有外部服务，一切都在本地运行。 ## 🏗️ 架构 ``` ┌─────────────┐ raw events ┌──────────────────┐ │ Simulator │ ─────────────► │ MonitoringAgent │ validate / enrich └─────────────┘ └────────┬─────────┘ │ cleaned events ▼ ┌──────────────────┐ │ DetectionAgent │ IF + Z-Score + UBA └────────┬─────────┘ │ anomaly_score + is_anomaly ▼ ┌──────────────────┐ │ DecisionAgent │ severity + routing └──┬─────────┬─────┘ file threat │ │ other threats ▼ ▼ ┌──────────────┐ ┌──────────────────┐ │ MalwareAgent │ │ ResponseAgent │ │ (CV model) │──►│ block / isolate │ └──────────────┘ └────────┬─────────┘ │ finalised events ▼ ┌──────────────────┐ │ Streamlit UI │ └──────────────────┘ ``` ### 智能体职责 | 智能体 | 文件 | 角色 | |-------|------|------| | **Monitoring** | `agents/monitoring_agent.py` | 类型检查、验证、去重、附加已知恶意 IP 标记 | | **Detection** | `agents/detection_agent.py` | 运行三个模型，将分数融合为一个 `anomaly_score` | | **Decision** | `agents/decision_agent.py` | 分配严重程度、选择动作、路由文件威胁 | | **Malware** | `agents/malware_agent.py` | 将 .bytes 解析为 RGB 图像 → 提取纹理特征 → 分配类别标签 | | **Response** | `agents/response_agent.py` | 执行记录日志/限速/封禁/隔离动作 | | **Orchestrator** | `orchestrator/main_orchestrator.py` | 连接各智能体，驱动 tick 循环 | ## 📂 文件夹结构 ``` seminar_project/ ├── agents/ │ ├── monitoring_agent.py │ ├── detection_agent.py │ ├── decision_agent.py │ ├── malware_agent.py │ └── response_agent.py ├── models/ │ ├── isolation_forest.py │ ├── zscore.py │ ├── uba.py │ └── malware_cv_model.py ├── orchestrator/ │ └── main_orchestrator.py ├── data/ │ ├── simulator.py │ └── cybersecurity_logs.csv ← place Kaggle dataset here ├── ui/ │ └── app.py ├── utils/ │ └── helpers.py ├── logs/ ← auto-created ├── config.py ├── requirements.txt └── README.md ``` ## ⚙️ 模型工作原理 ### 孤立森林 (`models/isolation_forest.py`) 随机划分特征空间。在较少分裂次数内被隔离的点被视为异常。基于前 50 个事件（预热阶段）进行训练，然后对后续的每个事件进行评分。融合权重：**40%**。 ### 2. Z-Score 检测器 (`models/zscore.py`) 维护每个数值特征的滚动窗口（默认为 50 个事件）。如果某个事件中至少有一个特征偏离滚动均值超过 3 个标准差，则将其标记。能够适应行为漂移。融合权重：**35%**。 ### 3. UBA — 用户行为分析 (`models/uba.py`) 对每个用户的历史行为进行特征画像：登录失败次数、不同源 IP 的数量以及每种事件类型的频率。标记偏离用户自身过去基线的行为。融合权重：**25%**。 ### 4. Malware 计算机视觉模型 (`models/malware_cv_model.py`) 直接源自笔记本 (`bytes-cv.ipynb`)： ``` .bytes hexdump file │ ▼ parse_hexdump() — offset + hex bytes → uint8 array │ ▼ bytes_to_image() — pad/truncate → reshape to (64, 64, 3) │ ▼ _texture_features() — entropy, zero_frac, edge_density, … │ ▼ _heuristic_classify() — rule-based → label + confidence ``` 启发式分类器在无需 GPU 且无预训练权重的情况下运行： | 条件 | 预测标签 | |-----------|----------------| | entropy > 0.85 | ransomware（加密载荷） | | zero_frac > 0.40 | benign（填充二进制文件） | | edge_density > 0.15 | worm（密集代码） | | high_frac > 0.35 | trojan | | entropy < 0.40 | adware（大量字符串） | | 其他情况 | spyware | 在 `models/malware_cnn.pkl` 处放置一个真实的 sklearn 或 ONNX 检查点，即可用训练好的模型替换启发式算法。 ## 🚀 安装说明 ### 1. 克隆/下载项目 ``` git clone cd seminar_project ``` ### 2. 创建虚拟环境（推荐） ``` python -m venv venv # Linux / macOS source venv/bin/activate # Windows venv\Scripts\activate ``` ### 3. 安装依赖项 ``` pip install -r requirements.txt ``` ### 4. （可选）下载 Kaggle 数据集数据集：**Synthetic Cybersecurity Logs for Anomaly Detection** URL：https://www.kaggle.com/datasets/fcwebdev/synthetic-cybersecurity-logs-for-anomaly-detection 将下载的 CSV 放置于： ``` seminar_project/data/cybersecurity_logs.csv ``` **在没有该数据集的情况下**，系统会自动切换到*合成模式*，生成逼真的随机事件——所有功能仍可正常运行。 ### 5. （可选）添加恶意软件字节样本将 `.bytes` 十六进制转储文件（Kaggle Microsoft Malware Classification 数据集格式）放入： ``` seminar_project/data/malware_bytes/ ``` 如果没有这些文件，Malware 智能体将使用确定性的合成字节图像。 ## ▶️ 如何运行 ### Streamlit 仪表板（推荐） ``` # 从 seminar_project/ 目录： streamlit run ui/app.py ``` 在浏览器中打开 **http://localhost:8501**，然后在侧边栏中点击 **▶ Start**。 ### 命令行批处理模式 ``` # 从 seminar_project/ 目录： python -c " from orchestrator.main_orchestrator import MainOrchestrator orc = MainOrchestrator() orc.run(max_batches=10) print(orc.stats) " ``` ## 🔄 数据流演练 ``` 1. EventSimulator.next_batch() → list of raw event dicts with source_ip, dest_ip, user, event_type, raw_features {bytes_sent, bytes_received, duration_sec, …} 2. MonitoringAgent.process() → validates fields, coerces types, flags known-bad IPs, deduplicates → output: same list, cleaned 3. DetectionAgent.process() → IF model scores each event (0–1) → Z-Score model scores each event (0–1) → UBA model scores each event (0–1) → fusion: 0.40*IF + 0.35*Z + 0.25*UBA → anomaly_score → adds: anomaly_score, is_anomaly, model_scores{} 4. DecisionAgent.process() → maps score to severity (low/medium/high/critical) → maps severity to action (log/alert/block/isolate) → flags file-related high-severity events as is_file_threat=True → returns (file_threats, other_threats) 5. MalwareAgent.process() [file_threats only] → loads or synthesises a .bytes sample → converts bytes → 64×64 RGB image → extracts texture features (entropy, edge density, …) → classifies → malware_label, malware_confidence, malware_is_threat → escalates severity to "critical" if malware confirmed 6. ResponseAgent.process() [all events] → executes action: add to block-list, throttle list, isolation list → annotates: response_action, response_message, is_blocked, is_isolated 7. Streamlit UI → reads finalised events from orchestrator.history → renders KPI cards, trend charts, severity bar, pie chart, live table ``` ## 📊 仪表板图表说明 | 图表 | 显示内容 | |-------|---------------| | **Anomaly Score Trend** | 每批次随时间变化的平均异常分数。红色虚线标记检测阈值。尖峰表示攻击突发。 | | **Alerts per Batch** | 每次模拟 tick 检测到的异常事件数量。颜色编码：绿色（低） → 橙色 → 红色（高）。 | | **Normal vs Anomalous Pie** | 自启动以来所有事件的累积分布。理想情况下主要为绿色。 | | **Events by Severity** | 低/中/高/严重事件计数的条形图。指示整体威胁级别。 | | **Malware Type Distribution** | 当存在恶意软件事件时，显示由计算机视觉模型检测到的 ransomware / trojan / worm 等的混合比例。 | | **Live Event Table** | 最近的 200 个事件。行颜色编码：红色 = 严重，橙色 = 高，黄色 = 中。 | ## 🗃️ 数据集说明 ### 主数据集（CSV 模式所需） **Synthetic Cybersecurity Logs for Anomaly Detection** https://www.kaggle.com/datasets/fcwebdev/synthetic-cybersecurity-logs-for-anomaly-detection 包含带有源/目标 IP、传输字节数、持续时间和异常标签等字段的已标记网络日志事件。 ### 附加数据集（如果你想要扩展功能） | 需求 | Kaggle 数据集 | |------|----------------| | 真实恶意软件字节样本 | [Microsoft Malware Classification Challenge](https://www.kaggle.com/c/malware-classification) — 完全符合笔记本/MalwareAgent 所期望格式的 `.bytes` 十六进制转储文件 | | 网络入侵 | [KDD Cup 1999](https://www.kaggle.com/datasets/galaxyh/kdd-cup-1999-data) — 增加 41 个特征列以实现更丰富的检测 | | 用户登录行为 | [User Behaviour Anomaly Detection](https://www.kaggle.com/datasets/taha7ussein007/userauthenticationdataset) — 改进 UBA 模型 | ## 🔧 配置参考 (`config.py`) | 参数 | 默认值 | 描述 | |-----------|---------|-------------| | `SIMULATION_BATCH_SIZE` | 20 | 每次 tick 的事件数 | | `SIMULATION_INTERVAL_SEC` | 2.0 | UI 刷新之间的秒数 | | `IF_N_ESTIMATORS` | 100 | Isolation Forest 树数量 | | `IF_CONTAMINATION` | 0.05 | 预期异常比例 | | `ZSCORE_THRESHOLD` | 3.0 | 触发标记的标准差数 | | `ZSCORE_ROLLING_WINDOW` | 50 | Z-Score 的窗口大小 | | `UBA_MAX_FAILED_LOGINS` | 5 | 触发阈值 | | `UBA_MAX_DISTINCT_IPS` | 3 | 触发阈值 | | `ANOMALY_SCORE_THRESHOLD` | 0.50 | 综合分数 → 异常 | | `HIGH_SEVERITY_THRESHOLD` | 0.75 | → 高严重性 | | `CRITICAL_SEVERITY_THRESHOLD` | 0.90 | → 严重级别 | | `MALWARE_IMAGE_SIZE` | (64, 64) | 字节转图像的宽度 × 高度 | | `LOG_TO_FILE` | True | 启用文件日志记录 | ## 💡 输出示例 (CLI) ``` Batch 1/10 | events= 18 | anomalies= 1 | blocked=0 Batch 2/10 | events= 19 | anomalies= 2 | blocked=1 Batch 3/10 | events= 20 | anomalies= 0 | blocked=1 Batch 4/10 | events= 18 | anomalies= 3 | blocked=2 ... ``` 日志文件 (`logs/system.log`)： ``` [2024-07-15 10:23:01] INFO agents.monitoring_agent — MonitoringAgent: 20 in → 19 forwarded [2024-07-15 10:23:01] INFO agents.detection_agent — DetectionAgent: 19 events, 2 anomalies [2024-07-15 10:23:01] WARNING agents.response_agent — BLOCK: 10.0.0.99 | user=user_007 | severity=high ``` ## 🛠️ 故障排除 **`ModuleNotFoundError: No module named 'streamlit'`** → 在你的虚拟环境中运行 `pip install -r requirements.txt`。 **`streamlit run ui/app.py` — “找不到命令”** → 确保已激活你的虚拟环境： `source venv/bin/activate` (Linux/macOS) 或 `venv\Scripts\activate` (Windows)。 **图表为空/没有事件出现** → 按下侧边栏中的 **▶ Start** 按钮。图表仅会在第一批次处理完成后填充数据。 **找不到数据集 CSV 的 `FileNotFoundError`** → 系统会自动回退到合成模式。除非你明确希望使用真实数据集进行测试，否则无需任何操作。 **Isolation Forest 在日志中显示 `cold-start` 警告** → 属于正常行为。Isolation Forest 模型在完全拟合之前需要 50 个事件。前几个批次的评分是初步的。 **所有事件都显示 `malware_label = —`** → 只有具有高异常分数的 `file_access` / `data_exfiltration` 事件会被路由到 Malware 智能体。请运行更多批次，随后此类事件将会出现。 ## 👤 作者 Gaurika Nawani

标签：Agentic AI, AI智能体, Apex, DAST, DNS 反向解析, HTTP工具, Kubernetes, PyRIT, Python, SOAR, Streamlit, Z-Score异常分析, 多智能体系统, 安全数据大屏, 安全日志分析, 实时威胁检测, 异常检测, 恶意软件分析, 插件系统, 文件威胁深度分析, 无后门, 无线安全, 机器学习, 用户行为分析, 网络安全, 网络安全审计, 自动化响应, 计算机视觉, 访问控制, 逆向工具, 隐私保护