0DevDutt0/cybershield-ai-platform

GitHub: 0DevDutt0/cybershield-ai-platform

一个基于 FastAPI 和 Groq LLM 的统一 AI 网络安全平台，集成了钓鱼 URL 检测、CVE 语义检索、威胁简报生成和事件响应剧本等能力。

Stars: 0 | Forks: 0

[![Typing SVG](https://readme-typing-svg.demolab.com?font=Fira+Code&size=20&pause=1000&color=00D9FF¢er=true&vCenter=true&repeat=true&width=650&lines=Phishing+Detection+with+Machine+Learning;Semantic+CVE+Search+over+350%2C000%2B+Vulnerabilities;AI+Threat+Narratives+%26+IR+Playbooks;Red+Team+URL+Simulation+for+Pentesters;One+Platform.+Five+Security+Superpowers.)](https://git.io/typing-svg)
[![Python](https://img.shields.io/badge/Python-3.10%2B-3776AB?style=for-the-badge&logo=python&logoColor=white)](https://python.org) [![FastAPI](https://img.shields.io/badge/FastAPI-0.115%2B-009688?style=for-the-badge&logo=fastapi&logoColor=white)](https://fastapi.tiangolo.com) [![Groq](https://img.shields.io/badge/Groq-LLaMA_3.3_70B-F55036?style=for-the-badge&logo=groq&logoColor=white)](https://groq.com) [![Docker](https://img.shields.io/badge/Docker-Ready-2496ED?style=for-the-badge&logo=docker&logoColor=white)](https://docker.com) [![License](https://img.shields.io/badge/License-MIT-22C55E?style=for-the-badge)](LICENSE) [![scikit-learn](https://img.shields.io/badge/scikit--learn-ML_Model-F7931E?style=flat-square&logo=scikit-learn&logoColor=white)](https://scikit-learn.org) [![FAISS](https://img.shields.io/badge/FAISS-Vector_Search-0064A5?style=flat-square)](https://faiss.ai) [![sentence-transformers](https://img.shields.io/badge/sentence--transformers-Embeddings-orange?style=flat-square)](https://sbert.net) [![Railway](https://img.shields.io/badge/Deploy-Railway-0B0D0E?style=flat-square&logo=railway&logoColor=white)](https://railway.app) [![Render](https://img.shields.io/badge/Deploy-Render-46E3B7?style=flat-square&logo=render&logoColor=white)](https://render.com)

[**🚀 快速开始**](#-quick-start) · [**🎬 演示**](#-demo) · [**📖 文档**](#-api-reference) · [**🐳 Docker**](#-docker) · [**☁️ 部署**](#-deployment)

## 🎬 演示

``` # 克隆并运行交互式演示脚本 git clone https://github.com/YOUR_USERNAME/cybershield-ai-platform.git cd cybershield-ai-platform pip install -e . platform url train platform serve & python Demo/demo.py ``` ``` ────────────────────────────────────────────────────────────── PLATFORM HEALTH CHECK ────────────────────────────────────────────────────────────── ✓ Platform status ok ✓ URL Shield ready ✓ CVE Index 352,841 CVEs loaded ✓ LLM enabled (Groq · llama-3.3-70b-versatile) ────────────────────────────────────────────────────────────── 1 · URL SHIELD — Phishing Detection ────────────────────────────────────────────────────────────── http://paypal-secure-login.xyz/verify?token=abc123 Score: 87/100 ████████░░ LIKELY PHISHING ↳ [HIGH] Suspicious TLD (.xyz) ↳ [HIGH] Social engineering keyword in domain https://github.com/login Score: 4/100 ░░░░░░░░░░ SAFE ``` ## ✨ 功能

| | 功能 | 作用 | 推理时间 | |:---:|:---|:---|:---:| | 🔗 | **URL 护盾** | 使用基于 25 个词汇 URL 特征训练的 Random Forest 模型进行实时钓鱼检测 | **< 50ms** | | 🔍 | **CVE 情报** | 对 350,000+ 条 NVD CVE 进行语义搜索 — 用纯英文提问，获取带引用的回答 | **< 2s** | | ⚡ | **威胁简报** | 4 部分人工智能简报：通俗解释 · 攻击面 · 业务影响 · 受影响软件 | **~5s** | | 📋 | **IR 剧本** | 完整的事件响应剧本，包含检测 → 遏制 → 根除 → 恢复 | **~8s** | | 📦 | **资产关联** | 将您的软件清单与 CVE 索引进行匹配，并提供可选的 CISO 执行摘要 | **~3s** | | 🎯 | **红队** | 为授权渗透测试生成对抗性钓鱼域名变体 | **~4s** |

## 🏗️ 架构 ``` graph TB subgraph Client["🌐 Client Layer"] Browser["Browser SPA\n5-Tab Dashboard"] CLI["CLI\nplatform serve/train/index"] end subgraph API["⚡ FastAPI — platform_core"] Health["GET /api/health"] URLRouter["URL Shield Router\n/api/url/*"] CVERouter["CVE Intel Router\n/api/cve/*"] ThreatRouter["Threat Intel Router\n/api/threat/*"] AssetRouter["Asset Router\n/api/assets/*"] RTRouter["Red Team Router\n/api/redteam/*"] end subgraph ML["🤖 ML Layer — cybershield"] RF["Random Forest\n25 Lexical Features"] FE["Feature Extractor\nURL → float[25]"] end subgraph RAG["🔍 RAG Layer — cve_intel"] FAISS["FAISS Index\n350k+ CVE Vectors"] Encoder["Sentence Encoder\nall-MiniLM-L6-v2"] Meta["Metadata DB\nSQLite"] end subgraph LLM["🧠 LLM Layer — Groq API"] Groq["LLaMA 3.3 70B\nvia Groq"] Narrative["Threat Narrative\nGenerator"] Playbook["IR Playbook\nGenerator"] Synthesis["CVE Q&A\nSynthesizer"] Redteam["URL Variant\nAnnotator"] end Browser -->|REST| API CLI -->|uvicorn| API URLRouter --> FE --> RF CVERouter --> Encoder --> FAISS --> Meta CVERouter --> Synthesis --> Groq ThreatRouter --> Narrative --> Groq ThreatRouter --> Playbook --> Groq AssetRouter --> FAISS RTRouter --> Redteam --> Groq style Client fill:#1e293b,stroke:#334155,color:#94a3b8 style API fill:#0f172a,stroke:#00d9ff,color:#00d9ff style ML fill:#1e1b4b,stroke:#818cf8,color:#a5b4fc style RAG fill:#1a1a2e,stroke:#f59e0b,color:#fcd34d style LLM fill:#1f1235,stroke:#a855f7,color:#d8b4fe ``` ## 🔄 数据流 ``` sequenceDiagram participant U as 👤 User participant D as 🌐 Dashboard participant A as ⚡ FastAPI participant ML as 🤖 Random Forest participant V as 🔍 FAISS Index participant L as 🧠 Groq LLaMA Note over U,L: URL Phishing Check (< 50ms, no network call) U->>D: Paste suspicious URL D->>A: POST /api/url/predict A->>ML: extract 25 features → predict ML-->>A: risk_score, signals A-->>D: verdict + risk signals D-->>U: Gauge + colour-coded signals Note over U,L: CVE Semantic Search + AI Answer (~3s total) U->>D: Type natural language query D->>A: POST /api/cve/ask A->>V: encode query → top-k nearest CVEs V-->>A: CVE records + similarity scores A->>L: question + CVE context (RAG prompt) L-->>A: cited answer (streaming) A-->>D: answer + citation list D-->>U: Formatted answer with CVE cards Note over U,L: IR Playbook Generation (~8s) U->>D: Enter CVE ID D->>A: POST /api/threat/playbook A->>V: lookup CVE records A->>L: CVE context + org profile L-->>A: 5-phase Markdown playbook (streaming) A-->>D: structured playbook D-->>U: Rendered checklist ``` ## 🧰 技术栈

### 核心框架 [![Python](https://skillicons.dev/icons?i=python)](https://python.org) [![FastAPI](https://skillicons.dev/icons?i=fastapi)](https://fastapi.tiangolo.com) [![Docker](https://skillicons.dev/icons?i=docker)](https://docker.com) [![GitHub](https://skillicons.dev/icons?i=github)](https://github.com)

| 层级 | 技术 | 角色 | |:---:|:---|:---| | 🌐 **Web 框架** | FastAPI 0.115 + Uvicorn | 异步 REST API，自动文档，ASGI | | 🤖 **ML 模型** | scikit-learn — Random Forest | 钓鱼分类器，25 特征向量 | | 📐 **特征工程** | pandas + numpy | URL → float[25] 特征矩阵 | | 💾 **模型存储** | joblib | 序列化 / 反序列化训练好的模型 | | 🔍 **向量搜索** | FAISS (faiss-cpu) | 对 35 万个 CVE 进行近似最近邻搜索 | | 🧬 **嵌入 (Embeddings)** | sentence-transformers | `all-MiniLM-L6-v2` → 384 维向量 | | 🧠 **LLM** | Groq · LLaMA 3.3 70B | 解释、叙述、剧本、标注 | | 🔌 **LLM 客户端** | openai SDK | OpenAI 兼容客户端（兼容 Groq + Mistral） | | ⚙️ **配置** | pydantic-settings | 类型安全的环境变量管理，支持 `.env` | | 🖥️ **CLI** | Typer + Rich | `platform url train`，`platform cve index`，`platform serve` | | 🎨 **前端** | 原生 JS + CSS | 5 标签页 SPA，零构建步骤，无需 React/Vue | | 📦 **数据源** | NIST NVD JSON Feeds | 350,000+ 条 CVE 记录 (2000–2024) |

## 📊 模型性能

### URL 护盾 — Random Forest 分类器（25 个特征） ``` Accuracy ████████████████████████████████████████████░ 95.2% Precision █████████████████████████████████████████████ 96.1% Recall ███████████████████████████████████████████░░ 93.8% F1 Score ████████████████████████████████████████████░ 94.9% AUC-ROC █████████████████████████████████████████████ 98.3% ``` ### CVE 情报 — FAISS 语义搜索 ``` Index Size 350,000+ CVE vectors Embedding Model all-MiniLM-L6-v2 (384 dimensions) Similarity Metric Inner product (FAISS IndexFlatIP) Avg Query Time < 2 seconds (encode + search + LLM synthesis) CVE Coverage 2000 – 2024 (NIST NVD) ``` ### LLM 性能 (Groq · LLaMA 3.3 70B) ``` URL Explanation ~1s (600 tokens) Threat Narrative ~5s (800 tokens) IR Playbook ~8s (2000 tokens, streamed) Asset CISO Summary ~4s (1200 tokens) Red Team Annotation ~3s (600 tokens) ```

## 🚀 快速开始 ``` # 1 — 克隆 git clone https://github.com/YOUR_USERNAME/cybershield-ai-platform.git cd cybershield-ai-platform # 2 — 创建虚拟环境 python -m venv venv source venv/bin/activate # macOS/Linux # venv\Scripts\activate # Windows # 3 — 安装 pip install -e . # 4 — 配置（设置你的 Groq API key） cp .env.example .env # 编辑 .env → 设置 GROQ_API_KEY=gsk_... # 5 — 训练钓鱼模型（约30秒） platform url train # 6 — 启动服务器 platform serve ``` **打开 http://localhost:8000** — URL 护盾标签页已完全可用。

⬇️ 启用 CVE 情报（需要 NVD 数据 — 点击展开）

``` # 下载 NVD 数据（约 3 GB） git clone https://github.com/fkie-cad/nvd-json-data-feeds.git nvd-json-data-feeds-main # 构建 FAISS 向量索引（10–15分钟，一次性） platform cve index --start-year 2020 --end-year 2024 # 重启服务器 — 现在5个标签页均已激活 platform serve ```

## ⚙️ 配置所有设置均为环境变量。复制 `.env.example` → `.env` 并填写：

📋 查看所有环境变量

``` # ── LLM Provider ────────────────────────────────────────── LLM_PROVIDER=groq # "groq" or "mistral" GROQ_API_KEY=gsk_... # Get free at console.groq.com MISTRAL_API_KEY=... # Only if LLM_PROVIDER=mistral # ── Platform ─────────────────────────────────────────────── PLATFORM_PORT=8000 PLATFORM_LLM_MODEL=llama-3.3-70b-versatile # ── URL Shield ───────────────────────────────────────────── CYBERSHIELD_MODEL_PATH=models/model.joblib CYBERSHIELD_LLM_MODEL=llama-3.3-70b-versatile CYBERSHIELD_LLM_ENABLED=auto # auto | true | false CYBERSHIELD_TRAIN_SAMPLES=12000 # ── CVE Intelligence ─────────────────────────────────────── CVE_INTEL_NVD_DIR=nvd-json-data-feeds-main CVE_INTEL_INDEX_DIR=index CVE_INTEL_START_YEAR=2020 CVE_INTEL_END_YEAR=2024 CVE_INTEL_LLM_MODEL=llama-3.3-70b-versatile ```

## 📖 API 参考

🔗 URL 护盾 — /api/url/*

| 方法 | Endpoint | 描述 | |--------|----------|-------------| | `POST` | `/api/url/predict` | 对 URL 进行评分（风险 0–100、判定结果、信号） | | `POST` | `/api/url/predict/batch` | 一次最多对 100 个 URL 进行评分 | | `POST` | `/api/url/explain` | 评分 + 强制 AI 解释 | | `GET` | `/api/url/model/info` | 模型类型、准确率、训练日期 | | `GET` | `/api/url/health` | 子系统状态 | ``` curl -X POST http://localhost:8000/api/url/predict \ -H "Content-Type: application/json" \ -d '{"url": "http://paypal-secure-login.xyz/verify"}' ``` ``` { "verdict": "LIKELY PHISHING", "risk_score": 87, "signals": [ { "severity": "HIGH", "label": "Suspicious TLD", "detail": "..." } ] } ```

🔍 CVE 情报 — /api/cve/*

| 方法 | Endpoint | 描述 | |--------|----------|-------------| | `POST` | `/api/cve/search` | 带过滤器的 CVE 语义搜索 | | `POST` | `/api/cve/ask` | 向 Claude 提问（基于 RAG） | | `GET` | `/api/cve/{cve_id}` | 查找特定的 CVE | | `GET` | `/api/cve/stats` | 按严重程度/年份/攻击向量统计索引 | **搜索过滤器：** `severity`, `start_year`, `end_year`, `attack_vector`, `min_cvss`, `cwe` ``` curl -X POST http://localhost:8000/api/cve/search \ -H "Content-Type: application/json" \ -d '{"query": "RCE in Apache Log4j", "k": 5, "severity": ["CRITICAL"]}' ```

⚡ 威胁情报 — /api/threat/*

| 方法 | Endpoint | 描述 | |--------|----------|-------------| | `POST` | `/api/threat/narrative` | 4 部分 CVE 威胁简报 | | `POST` | `/api/threat/playbook` | 带检查清单的完整 IR 剧本 | ``` # 为 Log4Shell 生成威胁描述 curl -X POST http://localhost:8000/api/threat/narrative \ -H "Content-Type: application/json" \ -d '{"cve_id": "CVE-2021-44228"}' ``` 响应包含：`plain_english` · `attack_surface` · `business_impact` · `affected_software`

📦 资产关联 — /api/assets/*

| 方法 | Endpoint | 描述 | |--------|----------|-------------| | `POST` | `/api/assets/correlate` | 将软件清单与 CVE 索引进行匹配 | ``` curl -X POST http://localhost:8000/api/assets/correlate \ -H "Content-Type: application/json" \ -d '{ "assets": [ {"name": "Apache Log4j", "version": "2.14.1", "asset_type": "library"} ], "k_per_asset": 5, "min_cvss": 7.0, "summarize": true }' ```

🎯 红队 — /api/redteam/*

| 方法 | Endpoint | 描述 | |--------|----------|-------------| | `POST` | `/api/redteam/generate` | 生成对抗性钓鱼 URL 变体 | **技术：** `homoglyphs` · `subdomain` · `encoding` · `typos` · `tld_swap` · `combo` ``` curl -X POST http://localhost:8000/api/redteam/generate \ -H "Content-Type: application/json" \ -d '{ "legitimate_domain": "example.com", "techniques": ["homoglyphs", "subdomain", "tld_swap"], "count_per_technique": 3, "authorized_by": "Your Name / Engagement ID" }' ```

## 🖥️ CLI 参考 ``` platform ├── url │ ├── train Train the phishing Random Forest model │ ├── predict Score a URL from the terminal │ ├── info Show model accuracy and feature importance │ └── generate-data Generate synthetic training dataset ├── cve │ ├── index Build FAISS index from NVD data │ ├── search Semantic CVE search │ ├── ask Ask Claude a question │ ├── show Look up a CVE by ID │ └── stats Index statistics └── serve Start the web server ``` ``` platform url train # ~30 seconds platform cve index --start-year 2020 --end-year 2024 # ~15 minutes platform serve --port 9000 # custom port platform url predict https://suspicious.xyz --explain ``` ## 🐳 Docker ``` # 1 — 配置 API key echo "GROQ_API_KEY=gsk_..." > .env echo "LLM_PROVIDER=groq" >> .env # 2 — 构建并启动（在构建时训练 ML 模型） docker compose up --build # 3 — （一次性）构建 CVE 索引 docker compose exec platform platform cve index --start-year 2020 --end-year 2024 # 打开 http://localhost:8000 ``` FAISS 索引存储在命名的 Docker 数据卷 (`cybershield-index`) 中，并在重启后保留。 ## ☁️ 部署

| 平台 | 状态 | 配置文件 | 说明 | |:---:|:---:|:---:|:---| | 🚂 **Railway** | ✅ 就绪 | `railway.toml` | 自动检测 Dockerfile，在控制面板中添加 `GROQ_API_KEY` | | 🟢 **Render** | ✅ 就绪 | `render.yaml` | Docker 运行时，预配置了 20 GB 持久磁盘 | | 🐳 **Docker** | ✅ 就绪 | `docker-compose.yml` | 完整的本地部署，命名卷 |

🚂 部署到 Railway（点击展开）

1. 将此仓库推送到 GitHub 2. 在 [railway.app](https://railway.app) 创建一个新项目 → **Deploy from GitHub repo** 3. 添加环境变量：`GROQ_API_KEY = gsk_...` 4. 添加持久化卷，并将 `CVE_INTEL_INDEX_DIR` 设置为其挂载路径 5. 部署 — URL 护盾立即可用 6. 打开 Railway 控制台并运行： platform cve index --start-year 2020 --end-year 2024

🟢 部署到 Render（点击展开）

1. 将此仓库推送到 GitHub 2. 在 [render.com](https://render.com) 创建新的 **Web Service** 3. 选择 **Docker** 运行时 — `render.yaml` 会被自动检测 4. 将 `GROQ_API_KEY` 设置为密钥环境变量 5. 部署 — 将自动挂载 `/data` 处的 20 GB 磁盘 6. 首次部署后，打开 Render shell： platform cve index --start-year 2020 --end-year 2024

## 📁 项目结构 ``` cybershield-ai-platform/ │ ├── src/ │ ├── cybershield/ 🔗 URL phishing detection module │ │ ├── api/ FastAPI routes (/api/url/*) │ │ ├── ml/ Random Forest model │ │ ├── features/ 25 URL feature extractors │ │ ├── llm/ AI explanation engine │ │ └── config.py CYBERSHIELD_* settings │ │ │ ├── cve_intel/ 🔍 CVE intelligence module │ │ ├── api/ FastAPI routes (/api/cve/*) │ │ ├── data/ NVD JSON parser │ │ ├── embeddings/ Sentence-transformer wrapper │ │ ├── index/ FAISS vector store │ │ ├── retrieval/ Semantic search logic │ │ ├── llm/ RAG synthesizer │ │ └── config.py CVE_INTEL_* settings │ │ │ └── platform_core/ ⚡ Unified orchestration layer │ ├── api/main.py Merged FastAPI app + lifespan │ ├── api/routes/ threat.py · assets.py · redteam.py │ ├── llm/ narrative.py · playbook.py · redteam.py │ ├── schemas/ Pydantic models │ └── config.py PLATFORM_* settings │ ├── web/ 🎨 Frontend (Vanilla JS, no build step) │ ├── index.html 5-tab SPA shell │ ├── styles.css Dark-theme design system │ ├── common.js Shared helpers │ ├── app.js Tab router + health check │ └── tabs/ url_shield · cve_intel · threat_intel · assets · redteam │ ├── Demo/ 🎬 Demo resources │ ├── demo.py Python demo script (all 5 features) │ ├── demo_requests.http VS Code REST Client requests │ ├── sample_outputs.md Pre-captured JSON responses │ └── README.md Demo instructions │ ├── cybersecurity_terms/ 📚 Security glossary (markdown) ├── Dockerfile 🐳 Container build ├── docker-compose.yml 🐳 Local Docker setup ├── railway.toml 🚂 Railway deployment ├── render.yaml 🟢 Render deployment ├── pyproject.toml 📦 Package metadata + entry points ├── requirements.txt 📦 Python dependencies └── .env.example ⚙️ Environment variable template ``` ## 🔧 故障排除

503 — 未加载 CVE 索引

FAISS 索引尚未构建。 ``` # 首先下载 NVD 数据，然后： platform cve index --start-year 2021 --end-year 2022 # quick (2 years) # 或 platform cve index --start-year 2020 --end-year 2024 # full ```

AI 功能返回模板响应

LLM 已禁用。请检查： 1. 您的 `.env` 文件中是否设置了 `GROQ_API_KEY` 2. 是否设置了 `LLM_PROVIDER=groq` 3. `.env` 文件位于**项目根目录**（与 `pyproject.toml` 同一文件夹） 4. 如果缺少该包，请运行 `pip install openai`

Windows：DLL 加载失败（scikit-learn / FAISS）

Windows 应用程序控制可能会阻止用户创建的虚拟环境中的已编译 DLL。 **修复：** 改用 Conda 基础环境： ``` $env:PYTHONPATH = "src" C:\Users\YourName\miniforge3\python.exe -m uvicorn platform_core.api.main:app --reload ``` 或者使用 Docker — 它可以完全绕过本地的 AppLocker 策略。

端口已被占用

``` platform serve --port 8001 # 或在 .env 中设置： PLATFORM_PORT=8001 ```

## 🤝 贡献 ``` git checkout -b feature/your-feature # 进行更改 git commit -m "feat: your feature description" git push origin feature/your-feature # 针对 main 提交 Pull Request ``` ## 📜 许可证 MIT 许可证 — 可免费使用、修改和分发。

**使用 FastAPI · scikit-learn · FAISS · sentence-transformers · Groq LLaMA 3.3 构建** ⭐ 如果您觉得有用，请给本仓库**点 Star (标星)** — 这有助于其他人发现它！ [![GitHub stars](https://img.shields.io/github/stars/0DevDutt0/cybershield-ai-platform?style=social)](https://github.com/YOUR_USERNAME/cybershield-ai-platform) [![GitHub forks](https://img.shields.io/github/forks/0DevDutt0/cybershield-ai-platform?style=social)](https://github.com/YOUR_USERNAME/cybershield-ai-platform)

标签：AV绕过, CISA项目, DLL 劫持, FastAPI, GPT, 大语言模型, 威胁情报, 开发者工具, 漏洞管理, 网络安全, 自定义脚本, 请求拦截, 逆向工具, 钓鱼检测, 隐私保护