dhami200/Fraude-Escalate

GitHub: dhami200/Fraude-Escalate

基于 FastAPI 构建的多层 AI 欺诈检测系统,通过启发式规则、机器学习和大语言模型的分级管线,对多模态输入进行钓鱼、凭证窃取和金融诈骗等欺诈信号的自动化识别与风险评级。

Stars: 0 | Forks: 0

# 🛡️ Fraud Escalate — 多层欺诈检测系统 一个基于 **FastAPI** 构建的多层 AI 驱动欺诈检测后端。它通过由启发式规则、机器学习和基于 LLM 的推理组成的结构化管线,分析文本(及媒体)输入,以检测网络钓鱼、凭证窃取、金融诈骗和其他欺诈信号。 ## 🚀 功能特性 - **多模态输入支持** — 文本、PDF、DOCX、图片、**音频**(使用 Whisper 转录)和视频 - **音频转文本** — 使用 OpenAI 的 Whisper 模型进行自动转录(支持 MP3、WAV、FLAC、AAC、OGG、M4A、WMA) - **隐私优先** — 在进行任何分析之前剥离 PII(电子邮件、Aadhaar、PAN、IFSC、银行卡号、OTP) - **基于规则的启发式** — 快速、可解释的欺诈信号检测 - **智能 ML 路由** — 仅在启发式评分模棱两可时才调用 ML 模型 - **LLM 解释** — Mistral(通过 Ollama)为每个决策生成人类可读的解释 - **风险引擎** — 结合所有信号得出最终风险评分和决策(ALLOW / MONITOR / REVIEW / BLOCK) ## 🏗️ 项目结构 ``` Fraud Escalate/ └── backend/ ├── main.py # FastAPI app & pipeline orchestration ├── requirements.txt │ ├── input_layer/ # Step 1 — Input handling │ ├── input_handler.py # Routes text/file to correct reader │ ├── file_segregator.py # Detects input type (text/image/audio/video) │ └── file_readers.py # Readers for TXT, PDF, DOCX │ ├── layer0_privacy/ # Step 2 — Privacy & feature extraction │ ├── pii_removal.py # Strips email, phone, Aadhaar, PAN, IFSC, card, OTP │ ├── normalization.py # Lowercasing, whitespace cleanup │ ├── feature_extraction.py # Extracts boolean features (urgency, sensitive keywords) │ └── hashing.py # Input fingerprinting │ ├── layer1_heuristics/ # Step 3 — Rule-based detection │ ├── heuristic_engine.py # Aggregates all rule scores │ ├── phishing_rules.py # Phishing keyword patterns │ ├── credential_rules.py # Credential theft patterns │ ├── url_rules.py # Suspicious/long URL detection │ ├── intent_rules.py # Financial intent signals │ └── urgency_rules.py # Urgency language detection │ ├── layer2_ml/ # Step 4 — ML classification │ ├── ml_engine.py # TF-IDF + classifier inference + feature boosting │ ├── model_loader.py # Loads saved vectorizer & model │ ├── text_model.py # Model training/evaluation helpers │ └── train.py # Training entry point │ ├── layer3_llm/ # Step 6 — LLM explanation │ ├── llm_engine.py # Orchestrates prompt → Ollama → explanation │ ├── prompt_builder.py # Builds structured prompt from all layer outputs │ └── ollama_client.py # HTTP client for local Ollama (Mistral) │ ├── risk_engine/ # Step 5 — Final decision │ ├── decision_engine.py # Maps scores to ALLOW/MONITOR/REVIEW/BLOCK │ └── scoring.py # Dynamic weighted scoring (heuristics + ML) │ └── datasets/ └── phishing_data.csv # Training dataset ``` ## ⚙️ 分析管线 ``` Input │ ▼ [Input Layer] — Detect type, read content (text/PDF/DOCX/image/audio/video) │ ▼ [Layer 0 — Privacy] — Remove PII, normalize text, extract features │ ▼ [Layer 1 — Heuristics] — Score against phishing, credential, URL, intent, urgency rules │ ├─ Score ≥ 120 → BLOCK immediately (skip ML) ├─ Score ≤ 20 → ALLOW immediately (skip ML) └─ Score 21–119 → continue ↓ │ ▼ [Layer 2 — ML] — TF-IDF vectorization + classifier + feature boosting │ ▼ [Risk Engine] — Dynamic weighted score → ALLOW / MONITOR / REVIEW / BLOCK │ ▼ [Layer 3 — LLM] — Mistral generates a plain-English explanation │ ▼ Final JSON Response ``` ### 决策阈值 | 风险评分 | 决策 | | -------- | ----------- | | 0 – 30 | ✅ ALLOW | | 31 – 60 | 👁️ MONITOR | | 61 – 80 | 🔍 REVIEW | | 81 – 100 | 🚫 BLOCK | ## 🔌 API 接口 | 方法 | 接口 | 描述 | | ----- | --------------- | ---------------------------------------- | | GET | `/` | 健康检查 | | GET | `/health` | 状态检查 | | POST | `/analyze` | 分析文本输入 | | POST | `/analyze-file` | 上传并分析文件(音频、PDF、DOCX、TXT) | ### 请求示例(文本) ``` curl -X POST "http://127.0.0.1:8000/analyze?text=Your+account+has+been+compromised+click+here+to+verify+your+OTP" ``` ### 请求示例(音频文件) ``` curl -X POST "http://127.0.0.1:8000/analyze-file" \ -F "file=@suspicious_call.mp3" ``` **Python 示例:** ``` import requests with open("audio_message.mp3", "rb") as f: files = {"file": ("audio.mp3", f, "audio/mpeg")} response = requests.post("http://127.0.0.1:8000/analyze-file", files=files) print(response.json()) ``` ### 响应示例 ``` { "input": { "type": "text", "content": "...", "metadata": { "timestamp": "..." } }, "layer0": { "clean_text": "...", "features": { "has_urgent_words": true, ... } }, "layer1": { "heuristic_score": 90, "flags": ["phishing", "urgency", "credential_theft"] }, "layer2": { "ml_probability": 0.87, "ml_prediction": "fraud", "confidence": 0.74 }, "final": { "risk_score": 88.5, "decision": "BLOCK", "reason": "Combined heuristic + ML analysis" }, "layer3": { "explanation": "This message exhibits strong phishing indicators..." } } ``` ## 🧰 技术栈 | 类别 | 库 / 工具 | | ------------- | -------------------------------------------- | | API 框架 | FastAPI, Uvicorn | | ML / NLP | scikit-learn, XGBoost, Transformers, PyTorch | | LLM | Ollama (Mistral, 本地) | | 音频转文本 | OpenAI Whisper | | 隐私 / PII | Presidio Analyzer & Anonymizer, regex | | 文件解析 | pdfplumber, python-docx | | 音频 | librosa, ffmpeg | | URL 分析 | tldextract | | 数据 | pandas, numpy | ## 🛠️ 安装与运行 ### 1. 安装依赖 ``` pip install -r requirements.txt ``` **注意:** 如果您计划使用音频转录功能,您的系统还需要安装 **ffmpeg**: - **Windows**:从 [ffmpeg.org](https://ffmpeg.org/download.html) 下载 - **Mac**:`brew install ffmpeg` - **Linux**:`sudo apt install ffmpeg` ### 2. 启动包含 Mistral 的 Ollama(用于 LLM 层) ``` ollama run mistral ``` ### 3. 启动 API 服务器 ``` cd backend uvicorn main:app --reload ``` API 将在 `http://127.0.0.1:8000` 上可用。 ## 📊 分析前剥离的 PII 隐私层会在进行任何 ML 或 LLM 处理之前剥离以下内容: - 电子邮件地址 - 印度电话号码(10 位数字,以 6–9 开头) - Aadhaar 号码(12 位数字) - PAN 卡号 - IFSC 代码 - 银行账号(9–18 位数字) - 信用卡/借记卡号 - OTP(4–6 位数字) ## 📁 启发式评分映射 | 标志 | 分数 | | -------------------------------------- | ------------- | | `strong_phishing` | +50 | | `credential_theft` | +50 | | `suspicious_url` | +40 | | `phishing` | +30 | | `financial_intent` | +20 | | `urgency` | +20 | | `long_url` | +10 | | **urgency + credential_theft (组合)** | **+20 奖励** | ## 📄 许可证 MIT
标签:AI反欺诈, AI风险缓解, Apex, API密钥扫描, AV绕过, FastAPI, LLM大语言模型, LLM评估, Mistral, Naabu, NLP, OCR文字识别, Ollama, PII脱敏, Python, Whisper, 云计算, 人工智能, 内容安全, 凭据扫描, 凭证窃取防护, 后端开发, 多模态处理, 实时警报, 文件解析, 文本分析, 无后门, 机器学习, 欺诈检测, 深度伪造检测, 深度学习, 用户模式Hook绕过, 系统调用监控, 网络安全, 网络安全, 网络钓鱼防御, 规则引擎, 逆向工具, 金融反诈, 隐私保护, 隐私保护, 音频转文字, 风控系统, 风险控制