adityashirsatrao007/sentinelx-threat-intel

GitHub: adityashirsatrao007/sentinelx-threat-intel

SentinelX 是一个基于 AI 和 NLP 技术的实时威胁检测后端平台，用于自动识别和评分通信内容中的钓鱼、诈骗及社会工程学攻击。

Stars: 1 | Forks: 0

# 🛡️ SentinelX — AI 驱动的实时威胁检测平台

![SentinelX](https://img.shields.io/badge/SentinelX-v1.0.0-blueviolet?style=for-the-badge) ![FastAPI](https://img.shields.io/badge/FastAPI-0.115-009688?style=for-the-badge&logo=fastapi) ![Python](https://img.shields.io/badge/Python-3.11-3776AB?style=for-the-badge&logo=python) ![PostgreSQL](https://img.shields.io/badge/PostgreSQL-16-336791?style=for-the-badge&logo=postgresql) ![Redis](https://img.shields.io/badge/Redis-7-DC382D?style=for-the-badge&logo=redis) ![Docker](https://img.shields.io/badge/Docker-Compose-2496ED?style=for-the-badge&logo=docker) **生产级 AI 网络安全后端，实时检测网络钓鱼、诈骗和社会工程学。** [API 文档](#api-documentation) • [架构](#architecture) • [配置](#quick-start) • [Docker](#docker-deployment)

## 概述 SentinelX 是一个模块化、适配 AI 的后端平台，可持续监控通信流——电子邮件、短信、消息和电话——以在凭证被盗或系统遭到入侵之前检测恶意意图。 ### 核心功能 | 功能 | 技术 | |---------|-----------| | NLP 钓鱼检测 | HuggingFace DistilBERT / 零样本分类 | | 行为分析 | 基于规则的社会工程学模式引擎 | | URL 威胁评分 | Regex 启发式算法 + 可选的 VirusTotal API | | 语音转文本 | OpenAI Whisper (tiny -> large 模型) | | 风险评分 | 加权多信号复合引擎 | | 异步处理 | Celery + Redis 任务队列 | | 身份验证 | JWT + bcrypt + RBAC | | 数据库 | PostgreSQL + SQLAlchemy + Alembic | | 容器化 | Docker + Docker Compose | ## 架构 ``` SentinelX Backend │ ├── app/ │ ├── api/ │ │ ├── routes/ # FastAPI route handlers │ │ │ ├── auth.py # POST /auth/register, /login, GET /me │ │ │ ├── analyze.py # POST /analyze/{email,sms,call}, /transcribe/audio │ │ │ ├── alerts.py # GET /alerts, POST /alerts/{id}/acknowledge │ │ │ └── dashboard.py # GET /dashboard/{stats,threats,trends} │ │ ├── dependencies/ # Auth dependency injectors │ │ └── middleware/ # Request logging, tracing │ │ │ ├── core/ │ │ ├── config.py # Pydantic Settings (env vars) │ │ ├── security.py # JWT + bcrypt │ │ └── logging.py # Structured JSON logging │ │ │ ├── database/ │ │ ├── base.py # SQLAlchemy DeclarativeBase │ │ ├── session.py # Engine + session factory │ │ └── models/ # User, Threat, Alert, AuditLog │ │ │ ├── schemas/ # Pydantic request/response models │ │ │ ├── services/ # Business logic orchestration │ │ ├── email_service.py │ │ ├── sms_service.py │ │ ├── call_service.py │ │ ├── alert_service.py │ │ ├── risk_service.py │ │ └── dashboard_service.py │ │ │ ├── ml/ # AI/ML inference layer │ │ ├── phishing_model.py # HuggingFace zero-shot classifier │ │ ├── sms_model.py # SMS-specific scam detector │ │ ├── url_detector.py # URL threat analysis │ │ ├── behavior_model.py # Social engineering pattern engine │ │ ├── whisper_service.py # Speech-to-text │ │ └── risk_engine.py # Composite risk scorer │ │ │ ├── workers/ │ │ └── celery_worker.py # Async task definitions │ │ │ └── main.py # FastAPI app entry point │ ├── alembic/ # Database migrations ├── Dockerfile ├── requirements.txt └── .env.example ``` ### 风险评分公式 ``` RiskScore = 0.35 × NLPScore + 0.25 × BehaviorScore + 0.20 × URLScore + 0.20 × ReputationScore ``` | 分数范围 | 威胁等级 | |-------------|-------------| | 0 – 30 | 🟢 低危 | | 31 – 60 | 🟡 中危 | | 61 – 85 | 🟠 高危 | | 86 – 100 | 🔴 严重 | ## 快速开始 ### 前置条件 - Docker ≥ 24.0 + Docker Compose ≥ 2.0 - 或者：Python 3.11+, PostgreSQL 16, Redis 7 ### 1. 克隆并配置 ``` git clone https://github.com/your-org/SentinelX.git cd SentinelX # 创建环境文件 cp backend/.env.example backend/.env # 重要：生成安全的 JWT secret key python -c "import secrets; print(secrets.token_hex(32))" # 将输出作为 SECRET_KEY 粘贴到 backend/.env 中 ``` ### 2. 编辑 `backend/.env` ``` SECRET_KEY=your-generated-256-bit-key-here DATABASE_URL=postgresql://sentinelx:sentinelx_pass@postgres:5432/sentinelx_db REDIS_URL=redis://redis:6379/0 ``` ## Docker 部署 ### 启动所有服务 ``` docker-compose up --build ``` 将启动： - **PostgreSQL** 运行于端口 5432 - **Redis** 运行于端口 6379 - **FastAPI 后端** 运行于端口 8000 - **Celery worker**（邮件、短信、电话队列） ### 访问 API | 接口 | URL | |-----------|-----| | Swagger UI | http://localhost:8000/docs | | ReDoc | http://localhost:8000/redoc | | 健康检查 | http://localhost:8000/health | ### 启动并使用 Celery Flower 监控 ``` docker-compose --profile monitoring up --build # Flower dashboard 位于 http://localhost:5555 ``` ### 实用命令 ``` # 查看 backend 日志 docker-compose logs -f backend # 查看 Celery worker 日志 docker-compose logs -f celery_worker # 运行数据库迁移 docker-compose exec backend alembic upgrade head # 停止所有服务 docker-compose down # 停止并移除 volumes（全新状态） docker-compose down -v ``` ## 本地开发（不使用 Docker） ``` # 1. 创建虚拟环境 cd backend python -m venv .venv source .venv/bin/activate # 2. 安装依赖 pip install -r requirements.txt # 3. 设置环境变量 cp .env.example .env # 使用本地 DATABASE_URL 和 REDIS_URL 编辑 .env # 4. 运行数据库迁移 alembic upgrade head # 5. 启动 API server uvicorn app.main:app --reload --host 0.0.0.0 --port 8000 # 6. 在单独的终端中，启动 Celery worker celery -A app.workers.celery_worker.celery_app worker --loglevel=info ``` ## API 文档所有端点都带有前缀 `/api/v1`。 ### 🔐 身份验证 | 方法 | 端点 | 描述 | 需要身份验证 | |--------|----------|-------------|---------------| | `POST` | `/api/v1/auth/register` | 创建新用户 | 否 | | `POST` | `/api/v1/auth/login` | 获取 JWT 访问 token | 否 | | `GET` | `/api/v1/auth/me` | 获取当前用户资料 | 是 | #### 注册 ``` curl -X POST http://localhost:8000/api/v1/auth/register \ -H "Content-Type: application/json" \ -d '{ "name": "John Doe", "email": "john@example.com", "password": "SecurePass123", "role": "operator" }' ``` #### 登录 ``` curl -X POST http://localhost:8000/api/v1/auth/login \ -H "Content-Type: application/json" \ -d '{"email": "john@example.com", "password": "SecurePass123"}' ``` 响应： ``` { "access_token": "eyJhbGci...", "token_type": "bearer", "expires_in": 3600 } ``` ### 🔍 威胁分析所有分析端点都需要 `Authorization: Bearer `。 #### 分析电子邮件 ``` curl -X POST http://localhost:8000/api/v1/analyze/email \ -H "Authorization: Bearer " \ -H "Content-Type: application/json" \ -d '{ "sender": "security@paypa1-alert.xyz", "subject": "URGENT: Your account has been suspended", "body": "Click here immediately to verify your account and avoid permanent suspension." }' ``` 响应： ``` { "threat_id": "550e8400-e29b-41d4-a716-446655440000", "threat_detected": true, "risk_score": 87.4, "threat_level": "CRITICAL", "confidence": 0.92, "classification_label": "phishing", "reasons": [ "NLP classified as 'phishing' (score: 91.0)", "Urgency manipulation tactics detected (2 indicators)", "Authority or brand impersonation detected (1 indicator)", "Suspicious TLD (.xyz)" ], "extracted_urls": [], "nlp_score": 91.0, "behavior_score": 78.5, "url_score": 25.0, "reputation_score": 60.0, "processing_mode": "sync" } ``` #### 分析短信 ``` curl -X POST http://localhost:8000/api/v1/analyze/sms \ -H "Authorization: Bearer " \ -H "Content-Type: application/json" \ -d '{ "sender": "+91XXXXXXXXXX", "message": "Congratulations! You won a free iPhone. Claim now: https://bit.ly/abc123" }' ``` #### 分析通话转录 ``` curl -X POST http://localhost:8000/api/v1/analyze/call \ -H "Authorization: Bearer " \ -H "Content-Type: application/json" \ -d '{ "transcript": "Hello this is IRS. You owe taxes. Pay now or you will be arrested.", "caller_id": "+18005551234", "duration_seconds": 120 }' ``` #### 转录音频文件 (Whisper) ``` curl -X POST http://localhost:8000/api/v1/transcribe/audio \ -H "Authorization: Bearer " \ -F "file=@/path/to/call_recording.mp3" ``` #### 异步处理向任何分析端点传递 `"async_processing": true` 即可通过 Celery 加入队列： ``` { "sender": "...", "subject": "...", "body": "...", "async_processing": true } ``` 响应包含 `task_id`，用于通过 Celery 结果后端进行轮询。 ### 🚨 告警 | 方法 | 端点 | 描述 | |--------|----------|-------------| | `GET` | `/api/v1/alerts` | 告警列表（分页） | | `POST` | `/api/v1/alerts/{id}/acknowledge` | 确认告警 | ``` # 获取未确认的 alerts curl http://localhost:8000/api/v1/alerts?unacknowledged_only=true \ -H "Authorization: Bearer " # 确认一个 alert curl -X POST http://localhost:8000/api/v1/alerts/550e8400.../acknowledge \ -H "Authorization: Bearer " ``` ### 📊 仪表板 | 方法 | 端点 | 描述 | |--------|----------|-------------| | `GET` | `/api/v1/dashboard/stats` | KPI 统计数据 | | `GET` | `/api/v1/dashboard/threats` | 最近威胁列表 | | `GET` | `/api/v1/dashboard/trends` | 每日趋势（过去 N 天） | ``` # 获取 KPI 统计数据 curl http://localhost:8000/api/v1/dashboard/stats \ -H "Authorization: Bearer " # 获取 14 天趋势 curl "http://localhost:8000/api/v1/dashboard/trends?days=14" \ -H "Authorization: Bearer " ``` ## 数据库迁移 (Alembic) ``` # 在模型更改后生成新的 migration alembic revision --autogenerate -m "add_new_field" # 应用所有待处理的 migrations alembic upgrade head # 回滚一个 migration alembic downgrade -1 # 查看当前 migration 状态 alembic current ``` ## ML 模型 ### NLP 分类器 (HuggingFace) 系统默认在零样本分类模式下使用 **DistilBERT**。分类标签： - `safe` — 未检测到威胁 - `phishing` — 钓鱼尝试 - `scam` — 一般诈骗 - `credential_theft` — 针对凭证 - `malicious_link` — 包含恶意 URL - `impersonation` — 身份伪造 **后备方案**：如果 HuggingFace 不可用，系统会自动回退到 Regex 关键词启发式算法。 ### Whisper 模型在 `.env` 中配置模型大小： | 模型 | VRAM | 速度 | 准确度 | |-------|------|-------|----------| | `tiny` | ~1 GB | ~32x | 良好 | | `base` | ~1 GB | ~16x | 较好 | | `small` | ~2 GB | ~6x | 很好 | | `medium` | ~5 GB | ~2x | 优秀 | | `large` | ~10 GB | 1x | 最佳 | ## 环境变量参考 | 变量 | 默认值 | 描述 | |----------|---------|-------------| | `SECRET_KEY` | — | **必填。** 256位 JWT 签名密钥 | | `DATABASE_URL` | — | PostgreSQL 连接字符串 | | `REDIS_URL` | — | Redis 连接字符串 | | `NLP_MODEL_NAME` | `distilbert-base-uncased` | HuggingFace 模型名称 | | `WHISPER_MODEL_SIZE` | `base` | Whisper 模型变体 | | `ALERT_TRIGGER_THRESHOLD` | `61` | 生成告警的最低风险分数 | | `VIRUSTOTAL_API_KEY` | — | 可选的 VirusTotal API 密钥 | | `LOG_LEVEL` | `INFO` | 日志详细程度 | ## 安全注意事项 - **JWT 密钥**：始终为每个环境生成唯一的 `SECRET_KEY` - **密码哈希**：bcrypt，12轮 - **速率限制**：每个 IP 每分钟 100 次请求（可配置） - **CORS**：限制为已配置的来源 - **非 root Docker**：容器以 `sentinelx` 用户 (UID 1001) 身份运行 - **输入清理**：对所有输入进行 Pydantic v2 严格验证 - **内容截断**：邮件/短信正文保存为 2000 字符摘要 ## 未来路线图 - [ ] Kafka 集成以进行实时流处理 - [ ] 用于实时告警的 WebSocket 推送通知 - [ ] 多语言支持（Whisper 多语言 + 多语言 NLP） - [ ] Deepfake 语音检测流水线 - [ ] MITRE ATT&CK 框架映射 - [ ] SOC 集成（Splunk, Elastic SIEM） - [ ] AI 语音代理检测 - [ ] 实时电信集成（Twilio, Vonage） ## 许可证 MIT 许可证 — 详情请参阅 [LICENSE](LICENSE)。

满怀 ❤️ 为全球网络安全运营人员而构建。

标签：AMSI绕过, AV绕过, DistilBERT, FastAPI, NLP自然语言处理, 威胁检测, 搜索引擎查询, 测试用例, 网络安全, 请求拦截, 逆向工具, 钓鱼检测, 隐私保护