fatyma31/CyberThreat-Nlp-Intelligence-System

GitHub: fatyma31/CyberThreat-Nlp-Intelligence-System

一个基于 DistilBERT 和 NLP 技术的网络威胁情报分析系统，能够从文本中自动分类六种网络威胁并生成可视化报告。

Stars: 1 | Forks: 0

# 🛡️ CyberGuard AI — 网络威胁情报系统一个**基于 AI 和 NLP 技术的网络威胁情报系统**，使用 DistilBERT、Streamlit 和 Python 构建。可通过可解释 AI、NER、关键词提取和 PDF 报告，从原始文本中检测并分类 6 种威胁类型。 ## 🚀 快速开始 ### 1. 安装依赖 ``` cd cyber_threat_intel pip install -r requirements.txt ``` ### 2. 启动应用（无需训练 — 直接使用基于规则的引擎） ``` streamlit run app.py ``` ### 3. （可选）训练 DistilBERT 以获得更高的准确率 ``` python train.py streamlit run app.py # now uses the fine-tuned model ``` ## 📁 项目结构 ``` cyber_threat_intel/ ├── app.py # Main Streamlit dashboard ├── train.py # DistilBERT training pipeline ├── requirements.txt │ ├── config/ │ └── settings.py # Global config, constants, paths │ ├── data/ │ └── data_generator.py # Synthetic dataset generator │ ├── nlp/ │ └── preprocessor.py # Text cleaning, NER, keyword extraction │ ├── models/ │ ├── classifier.py # DistilBERT fine-tune + inference │ ├── rule_based.py # Keyword-frequency fallback │ └── saved/ # Saved model weights (after training) │ ├── intelligence/ │ ├── engine.py # Threat analysis orchestration │ └── report_generator.py # PDF / text report generator │ ├── ui/ │ └── components.py # Reusable Streamlit UI components │ ├── reports/ # Auto-saved PDF/text reports └── assets/ # Static assets ``` ## 🎯 检测的威胁类型 | 威胁 | 严重程度 | 描述 | |---|---|---| | ✅ 良性 | 无 | 正常的安全通信 | | 🎣 钓鱼 | 高 | 窃取凭证，社会工程学 | | 🦠 恶意软件 | 严重 | 木马、RAT、间谍软件、僵尸网络 | | 💰 勒索软件 | 严重 | 文件加密，勒索钱财 | | 💥 DDoS | 高 | 流量型/应用层泛洪攻击 | | 💉 SQL 注入 | 高 | 数据库查询操纵攻击 | ## 🧠 系统架构 ``` Input Text │ ▼ NLP Preprocessor ──► Text Cleaning, Tokenization, Lemmatization │ ├──► Named Entity Recognition (IPs, URLs, Emails, CVEs, Hashes) │ ├──► Keyword Extraction (per threat category) │ ▼ Classifier ├── DistilBERT (if trained model exists) └── Rule-Based Fallback (always available) │ ▼ Threat Intel Engine ──► Severity + Risk Score + XAI Reasoning │ ▼ Streamlit Dashboard ──► Charts, Entity Tags, Recommendations │ ▼ PDF Report Generator ``` ## ⚡ 功能特性 - 🔍 **实时分析** — 输入/提交后即时获得结果 - 🧠 **DistilBERT** — 基于 Transformer 的多类别分类 - 🎯 **置信度评分** — 所有 6 个类别的概率分布 - 🔎 **NER** — 提取 IP、URL、电子邮件、CVE、文件哈希值 - 🏷️ **关键词提取** — 映射到威胁类别 - 🧩 **可解释 AI** — 为每个预测提供人类可读的推理 - 📊 **交互式图表** — 通过 Plotly 生成仪表盘、环形图、条形图 - 📄 **PDF 报告** — 可下载的威胁情报报告 - 🕒 **分析历史** — 记录会话中的所有分析 - ⚡ **基于规则的回退机制** — 无需 GPU/训练即可运行 ## 📦 环境要求 - Python 3.9+ - 最低 4GB 内存（建议 8GB 用于训练） - GPU 可选（支持 CPU 推理）

标签：Apex, CISA项目, DDoS攻击, DistilBERT, Kubernetes, NLP, Python, Streamlit, 人工智能, 关键词提取, 勒索软件检测, 可解释AI, 命名实体识别, 威胁情报, 威胁报告, 子域名枚举, 实时分析, 开发者工具, 文本分类, 无后门, 机器学习, 深度学习, 用户模式Hook绕过, 系统安全, 系统调用监控, 网络安全, 网络安全大模型, 访问控制, 逆向工具, 钓鱼检测, 隐私保护