CrystalPrime/argus-fraud-detection

GitHub: CrystalPrime/argus-fraud-detection

基于 IEEE-CIS 数据集的多层 AI 欺诈检测平台，结合异常检测、规则引擎和本地 RAG 解释性，通过 FastAPI 提供实时交易风险评估。

Stars: 0 | Forks: 0

# Argus — AI 驱动的欺诈检测平台基于 IEEE-CIS 欺诈检测数据集构建的多层欺诈与异常检测平台。结合了统计异常检测、可配置的规则引擎、上下文感知评分、基于 RAG 的可解释性以及多智能体编排——所有功能均通过 FastAPI REST API 提供。 ## 快速开始 ``` pip install -r requirements.txt ``` 将数据集文件放置在 `data/raw/` 中： ``` data/raw/ ├── train_transaction.csv └── train_identity.csv ``` ## Pipeline ``` # Step 1: 数据加载、profiling 与特征工程 python pipeline_day1.py --full # Step 2: 训练检测引擎 python pipeline_day2.py # Step 3: 上下文调整与规则引擎 python pipeline_day3.py # Step 4: RAG pipeline 与多智能体编排（必须运行 Ollama） python pipeline_day4.py --llm gemma3:2b --embed nomic-embed-text # Step 5: 启动 API 服务器 uvicorn src.api.main:app --reload --port 8000 ``` Swagger UI：`http://localhost:8000/docs` ## API Endpoints | Endpoint | Method | Description | |---|---|---| | `/score` | POST | 计算完整的异常评分 | | `/explain` | POST | 完整的可解释性报告 | | `/rules/evaluate` | POST | 规则引擎评估 | | `/rules/list` | GET | 列出所有已配置的规则 | | `/rag/query` | POST | 查询欺诈政策知识库 | ### 请求示例 — `/score` ``` curl -X POST http://localhost:8000/score \ -H "Content-Type: application/json" \ -d '{ "TransactionID": 12345, "TransactionAmt": 3500.0, "tx_hour": 3, "is_night": 1, "is_weekend": 1, "is_business_hours": 0, "entity_tx_count": 1, "velocity_hourly": 9, "tx_amt_is_round": 1, "addr1_missing": 1, "card4_risk": 1.0, "email_domain_match": 0, "anomaly_score": 0.72 }' ``` ### 响应示例 ``` { "transaction_id": 12345, "anomaly_score": 0.72, "context_score": 1.0, "rule_score": 1.0, "final_score": 1.0, "risk_level": "critical", "risk_label": "KRİTİK", "recommended_action": "OTOMATIK_BLOKE: İşlem durduruldu, fraud ekibine iletildi" } ``` ## 项目结构 ``` argus-fraud-detection/ ├── src/ │ ├── data/ # Steps 1-2: Data loading, quality analysis, schema intelligence │ ├── features/ # Step 3: Feature engineering (30 features) │ ├── detection/ # Step 4: Multi-layer anomaly detection engine │ │ ├── base.py │ │ ├── column_detector.py │ │ ├── multivariate_detector.py │ │ ├── entity_detector.py │ │ ├── temporal_detector.py │ │ └── engine.py │ ├── scoring/ # Step 5: Weighted score aggregation │ ├── context/ # Step 6: Context adjust engine (8 rules) │ ├── rules/ # Step 7: Configurable rule engine (YAML/JSON) │ ├── rag/ # Step 8: RAG pipeline (ChromaDB + Ollama) │ ├── agents/ # Step 9: Multi-agent orchestration │ └── api/ # Step 10: FastAPI │ └── routers/ # /score /explain /rules /rag ├── configs/ │ ├── rules.yaml # 10 fraud rules (YAML) │ └── rules.json # 10 fraud rules (JSON) ├── knowledge_base/ │ └── fraud_policy.md # RAG knowledge base (fraud policies) ├── docs/ │ └── technical_doc.md # Detailed technical documentation ├── pipeline_day1.py ├── pipeline_day2.py ├── pipeline_day3.py ├── pipeline_day4.py └── requirements.txt ``` ## 架构 ### 异常检测层 | Layer | Method | Weight | |---|---|---| | Column | IQR + Z-score 混合 | 0.25 | | Multivariate | Isolation Forest + LOF | 0.40 | | Entity | 个人基准 z-score | 0.20 | | Temporal | 小时 / 每日画像 | 0.15 | ### 上下文调整规则 | Context | Effect | Rationale | |---|---|---| | 夜间时段 (23:00–06:00) | +20% / +30% | 高风险时间段 | | 营业时间外 | +15% | 银行干预能力低 | | 周末 + 高金额 | +12% | 周末监管减少 | | 受信任实体 | −30% | 减少误报 | | 高频操作 (≥5/小时) | +25% | 试卡攻击模式 | | 整数金额 (≥$100) | +15% | 洗钱指标 | | 新实体 | +20% | 首次交易不确定性高 | | 地理风险 | +18% | 缺失地址字段 | ### 风险等级 | Score | Level | Action | |---|---|---| | 0.00 – 0.35 | Low | 自动批准 | | 0.35 – 0.60 | Medium | 额外验证 | | 0.60 – 0.80 | High | 人工审核 | | 0.80 – 1.00 | Critical | 自动拦截 | ### RAG Pipeline ``` fraud_policy.md → chunking (500 chars) → nomic-embed-text → ChromaDB (cosine similarity) → top-3 chunks → Ollama LLM → explanation ``` ### Multi-Agent 流程 ``` DataAgent → ScoringAgent → ContextAgent → RuleAgent → RAGAgent → ExplainAgent ``` 每个 agent 通过共享的 `FraudDetectionState` TypedDict 独立运行。出错时提供优雅降级。 ## 关键设计决策 **Isolation Forest + LOF 组合：** Isolation Forest 擅长处理高维空间中的全局异常；LOF 捕获基于局部密度的离群点。两者结合涵盖了互补的失效模式。 **加权聚合优于投票：** 基于领域知识的固定权重优于等权重投票，因为各层的可靠性各不相同——对于新用户而言，实体异常毫无意义，而 Multivariate 层始终适用。 **本地 LLM (Ollama)：** 欺诈数据包含敏感的财务信息。将其发送到外部 API 会带来合规风险。所有推理均在本地进行。 **支持 YAML + JSON 规则：** 为不同团队提供灵活性——数据科学家偏好 YAML，后端集成偏好 JSON。两者都通过 `from_file()` 自动检测使用同一个引擎。 ## 环境要求 - Python 3.11+ - Ollama 在本地运行 (`ollama serve`) - Ollama 模型：`gemma3:2b` 或 `qwen2.5:3b`，以及 `nomic-embed-text` - 8GB+ RAM（用于处理 10 万行数据） ## 文档完整的架构与设计决策：[`docs/technical_doc.md`](docs/technical_doc.md)

标签：AI风险缓解, Apex, AV绕过, DLL 劫持, FastAPI, Homebrew安装, 云计算, 大语言模型, 异常检测, 机器学习, 检索增强生成, 欺诈检测, 规则引擎, 逆向工具