dhami200/Fraude-Escalate
GitHub: dhami200/Fraude-Escalate
基于 FastAPI 构建的多层 AI 欺诈检测系统,通过启发式规则、机器学习和大语言模型的分级管线,对多模态输入进行钓鱼、凭证窃取和金融诈骗等欺诈信号的自动化识别与风险评级。
Stars: 0 | Forks: 0
# 🛡️ Fraud Escalate — 多层欺诈检测系统
一个基于 **FastAPI** 构建的多层 AI 驱动欺诈检测后端。它通过由启发式规则、机器学习和基于 LLM 的推理组成的结构化管线,分析文本(及媒体)输入,以检测网络钓鱼、凭证窃取、金融诈骗和其他欺诈信号。
## 🚀 功能特性
- **多模态输入支持** — 文本、PDF、DOCX、图片、**音频**(使用 Whisper 转录)和视频
- **音频转文本** — 使用 OpenAI 的 Whisper 模型进行自动转录(支持 MP3、WAV、FLAC、AAC、OGG、M4A、WMA)
- **隐私优先** — 在进行任何分析之前剥离 PII(电子邮件、Aadhaar、PAN、IFSC、银行卡号、OTP)
- **基于规则的启发式** — 快速、可解释的欺诈信号检测
- **智能 ML 路由** — 仅在启发式评分模棱两可时才调用 ML 模型
- **LLM 解释** — Mistral(通过 Ollama)为每个决策生成人类可读的解释
- **风险引擎** — 结合所有信号得出最终风险评分和决策(ALLOW / MONITOR / REVIEW / BLOCK)
## 🏗️ 项目结构
```
Fraud Escalate/
└── backend/
├── main.py # FastAPI app & pipeline orchestration
├── requirements.txt
│
├── input_layer/ # Step 1 — Input handling
│ ├── input_handler.py # Routes text/file to correct reader
│ ├── file_segregator.py # Detects input type (text/image/audio/video)
│ └── file_readers.py # Readers for TXT, PDF, DOCX
│
├── layer0_privacy/ # Step 2 — Privacy & feature extraction
│ ├── pii_removal.py # Strips email, phone, Aadhaar, PAN, IFSC, card, OTP
│ ├── normalization.py # Lowercasing, whitespace cleanup
│ ├── feature_extraction.py # Extracts boolean features (urgency, sensitive keywords)
│ └── hashing.py # Input fingerprinting
│
├── layer1_heuristics/ # Step 3 — Rule-based detection
│ ├── heuristic_engine.py # Aggregates all rule scores
│ ├── phishing_rules.py # Phishing keyword patterns
│ ├── credential_rules.py # Credential theft patterns
│ ├── url_rules.py # Suspicious/long URL detection
│ ├── intent_rules.py # Financial intent signals
│ └── urgency_rules.py # Urgency language detection
│
├── layer2_ml/ # Step 4 — ML classification
│ ├── ml_engine.py # TF-IDF + classifier inference + feature boosting
│ ├── model_loader.py # Loads saved vectorizer & model
│ ├── text_model.py # Model training/evaluation helpers
│ └── train.py # Training entry point
│
├── layer3_llm/ # Step 6 — LLM explanation
│ ├── llm_engine.py # Orchestrates prompt → Ollama → explanation
│ ├── prompt_builder.py # Builds structured prompt from all layer outputs
│ └── ollama_client.py # HTTP client for local Ollama (Mistral)
│
├── risk_engine/ # Step 5 — Final decision
│ ├── decision_engine.py # Maps scores to ALLOW/MONITOR/REVIEW/BLOCK
│ └── scoring.py # Dynamic weighted scoring (heuristics + ML)
│
└── datasets/
└── phishing_data.csv # Training dataset
```
## ⚙️ 分析管线
```
Input
│
▼
[Input Layer] — Detect type, read content (text/PDF/DOCX/image/audio/video)
│
▼
[Layer 0 — Privacy] — Remove PII, normalize text, extract features
│
▼
[Layer 1 — Heuristics] — Score against phishing, credential, URL, intent, urgency rules
│
├─ Score ≥ 120 → BLOCK immediately (skip ML)
├─ Score ≤ 20 → ALLOW immediately (skip ML)
└─ Score 21–119 → continue ↓
│
▼
[Layer 2 — ML] — TF-IDF vectorization + classifier + feature boosting
│
▼
[Risk Engine] — Dynamic weighted score → ALLOW / MONITOR / REVIEW / BLOCK
│
▼
[Layer 3 — LLM] — Mistral generates a plain-English explanation
│
▼
Final JSON Response
```
### 决策阈值
| 风险评分 | 决策 |
| -------- | ----------- |
| 0 – 30 | ✅ ALLOW |
| 31 – 60 | 👁️ MONITOR |
| 61 – 80 | 🔍 REVIEW |
| 81 – 100 | 🚫 BLOCK |
## 🔌 API 接口
| 方法 | 接口 | 描述 |
| ----- | --------------- | ---------------------------------------- |
| GET | `/` | 健康检查 |
| GET | `/health` | 状态检查 |
| POST | `/analyze` | 分析文本输入 |
| POST | `/analyze-file` | 上传并分析文件(音频、PDF、DOCX、TXT) |
### 请求示例(文本)
```
curl -X POST "http://127.0.0.1:8000/analyze?text=Your+account+has+been+compromised+click+here+to+verify+your+OTP"
```
### 请求示例(音频文件)
```
curl -X POST "http://127.0.0.1:8000/analyze-file" \
-F "file=@suspicious_call.mp3"
```
**Python 示例:**
```
import requests
with open("audio_message.mp3", "rb") as f:
files = {"file": ("audio.mp3", f, "audio/mpeg")}
response = requests.post("http://127.0.0.1:8000/analyze-file", files=files)
print(response.json())
```
### 响应示例
```
{
"input": { "type": "text", "content": "...", "metadata": { "timestamp": "..." } },
"layer0": { "clean_text": "...", "features": { "has_urgent_words": true, ... } },
"layer1": { "heuristic_score": 90, "flags": ["phishing", "urgency", "credential_theft"] },
"layer2": { "ml_probability": 0.87, "ml_prediction": "fraud", "confidence": 0.74 },
"final": { "risk_score": 88.5, "decision": "BLOCK", "reason": "Combined heuristic + ML analysis" },
"layer3": { "explanation": "This message exhibits strong phishing indicators..." }
}
```
## 🧰 技术栈
| 类别 | 库 / 工具 |
| ------------- | -------------------------------------------- |
| API 框架 | FastAPI, Uvicorn |
| ML / NLP | scikit-learn, XGBoost, Transformers, PyTorch |
| LLM | Ollama (Mistral, 本地) |
| 音频转文本 | OpenAI Whisper |
| 隐私 / PII | Presidio Analyzer & Anonymizer, regex |
| 文件解析 | pdfplumber, python-docx |
| 音频 | librosa, ffmpeg |
| URL 分析 | tldextract |
| 数据 | pandas, numpy |
## 🛠️ 安装与运行
### 1. 安装依赖
```
pip install -r requirements.txt
```
**注意:** 如果您计划使用音频转录功能,您的系统还需要安装 **ffmpeg**:
- **Windows**:从 [ffmpeg.org](https://ffmpeg.org/download.html) 下载
- **Mac**:`brew install ffmpeg`
- **Linux**:`sudo apt install ffmpeg`
### 2. 启动包含 Mistral 的 Ollama(用于 LLM 层)
```
ollama run mistral
```
### 3. 启动 API 服务器
```
cd backend
uvicorn main:app --reload
```
API 将在 `http://127.0.0.1:8000` 上可用。
## 📊 分析前剥离的 PII
隐私层会在进行任何 ML 或 LLM 处理之前剥离以下内容:
- 电子邮件地址
- 印度电话号码(10 位数字,以 6–9 开头)
- Aadhaar 号码(12 位数字)
- PAN 卡号
- IFSC 代码
- 银行账号(9–18 位数字)
- 信用卡/借记卡号
- OTP(4–6 位数字)
## 📁 启发式评分映射
| 标志 | 分数 |
| -------------------------------------- | ------------- |
| `strong_phishing` | +50 |
| `credential_theft` | +50 |
| `suspicious_url` | +40 |
| `phishing` | +30 |
| `financial_intent` | +20 |
| `urgency` | +20 |
| `long_url` | +10 |
| **urgency + credential_theft (组合)** | **+20 奖励** |
## 📄 许可证
MIT
标签:AI反欺诈, AI风险缓解, Apex, API密钥扫描, AV绕过, FastAPI, LLM大语言模型, LLM评估, Mistral, Naabu, NLP, OCR文字识别, Ollama, PII脱敏, Python, Whisper, 云计算, 人工智能, 内容安全, 凭据扫描, 凭证窃取防护, 后端开发, 多模态处理, 实时警报, 文件解析, 文本分析, 无后门, 机器学习, 欺诈检测, 深度伪造检测, 深度学习, 用户模式Hook绕过, 系统调用监控, 网络安全, 网络安全, 网络钓鱼防御, 规则引擎, 逆向工具, 金融反诈, 隐私保护, 隐私保护, 音频转文字, 风控系统, 风险控制