CrystalPrime/argus-fraud-detection

GitHub: CrystalPrime/argus-fraud-detection

Stars: 0 | Forks: 0

# Argus — AI-Powered Fraud Detection Platform Multi-layer fraud and anomaly detection platform built on the IEEE-CIS Fraud Detection dataset. Combines statistical anomaly detection, a configurable rule engine, context-aware scoring, RAG-based explainability, and multi-agent orchestration — all served via a FastAPI REST API. ## Quick Start pip install -r requirements.txt Place the dataset files in `data/raw/`: data/raw/ ├── train_transaction.csv └── train_identity.csv ## Pipeline # Step 1: Data loading, profiling & feature engineering python pipeline_day1.py --full # Step 2: Train the detection engine python pipeline_day2.py # Step 3: Context adjustment & rule engine python pipeline_day3.py # Step 4: RAG pipeline & multi-agent orchestration (Ollama must be running) python pipeline_day4.py --llm gemma3:2b --embed nomic-embed-text # Step 5: Start the API server uvicorn src.api.main:app --reload --port 8000 Swagger UI: `http://localhost:8000/docs` ## API Endpoints | Endpoint | Method | Description | |---|---|---| | `/score` | POST | Compute full anomaly score | | `/explain` | POST | Full explainability report | | `/rules/evaluate` | POST | Rule engine evaluation | | `/rules/list` | GET | List all configured rules | | `/rag/query` | POST | Query fraud policy knowledge base | ### Example Request — `/score` curl -X POST http://localhost:8000/score \ -H "Content-Type: application/json" \ -d '{ "TransactionID": 12345, "TransactionAmt": 3500.0, "tx_hour": 3, "is_night": 1, "is_weekend": 1, "is_business_hours": 0, "entity_tx_count": 1, "velocity_hourly": 9, "tx_amt_is_round": 1, "addr1_missing": 1, "card4_risk": 1.0, "email_domain_match": 0, "anomaly_score": 0.72 }' ### Example Response { "transaction_id": 12345, "anomaly_score": 0.72, "context_score": 1.0, "rule_score": 1.0, "final_score": 1.0, "risk_level": "critical", "risk_label": "KRİTİK", "recommended_action": "OTOMATIK_BLOKE: İşlem durduruldu, fraud ekibine iletildi" } ## Project Structure argus-fraud-detection/ ├── src/ │ ├── data/ # Steps 1-2: Data loading, quality analysis, schema intelligence │ ├── features/ # Step 3: Feature engineering (30 features) │ ├── detection/ # Step 4: Multi-layer anomaly detection engine │ │ ├── base.py │ │ ├── column_detector.py │ │ ├── multivariate_detector.py │ │ ├── entity_detector.py │ │ ├── temporal_detector.py │ │ └── engine.py │ ├── scoring/ # Step 5: Weighted score aggregation │ ├── context/ # Step 6: Context adjust engine (8 rules) │ ├── rules/ # Step 7: Configurable rule engine (YAML/JSON) │ ├── rag/ # Step 8: RAG pipeline (ChromaDB + Ollama) │ ├── agents/ # Step 9: Multi-agent orchestration │ └── api/ # Step 10: FastAPI │ └── routers/ # /score /explain /rules /rag ├── configs/ │ ├── rules.yaml # 10 fraud rules (YAML) │ └── rules.json # 10 fraud rules (JSON) ├── knowledge_base/ │ └── fraud_policy.md # RAG knowledge base (fraud policies) ├── docs/ │ └── technical_doc.md # Detailed technical documentation ├── pipeline_day1.py ├── pipeline_day2.py ├── pipeline_day3.py ├── pipeline_day4.py └── requirements.txt ## Architecture ### Anomaly Detection Layers | Layer | Method | Weight | |---|---|---| | Column | IQR + Z-score hybrid | 0.25 | | Multivariate | Isolation Forest + LOF | 0.40 | | Entity | Personal baseline z-score | 0.20 | | Temporal | Hourly / daily profiling | 0.15 | ### Context Adjustment Rules | Context | Effect | Rationale | |---|---|---| | Night window (23:00–06:00) | +20% / +30% | High-risk time window | | Outside business hours | +15% | Low bank intervention capacity | | Weekend + high amount | +12% | Reduced oversight on weekends | | Trusted entity | −30% | False positive reduction | | High velocity (≥5/hour) | +25% | Card testing attack pattern | | Round amount (≥$100) | +15% | Money laundering indicator | | New entity | +20% | High uncertainty on first transaction | | Geographic risk | +18% | Missing address fields | ### Risk Levels | Score | Level | Action | |---|---|---| | 0.00 – 0.35 | Low | Auto-approve | | 0.35 – 0.60 | Medium | Extra verification | | 0.60 – 0.80 | High | Manual review | | 0.80 – 1.00 | Critical | Auto-block | ### RAG Pipeline fraud_policy.md → chunking (500 chars) → nomic-embed-text → ChromaDB (cosine similarity) → top-3 chunks → Ollama LLM → explanation ### Multi-Agent Flow DataAgent → ScoringAgent → ContextAgent → RuleAgent → RAGAgent → ExplainAgent Each agent operates independently via a shared `FraudDetectionState` TypedDict. Graceful fallback on errors. ## Key Design Decisions **Isolation Forest + LOF combination:** Isolation Forest excels at global anomalies in high-dimensional space; LOF captures local density-based outliers. Together they cover complementary failure modes. **Weighted aggregation over voting:** Fixed weights based on domain knowledge outperform equal voting because layer reliability varies — entity anomaly is meaningless for new users, while multivariate always applies. **Local LLM (Ollama):** Fraud data contains sensitive financial information. Sending it to external APIs creates compliance risk. All inference stays local. ## Requirements - Python 3.11+ - Ollama running locally (`ollama serve`) - Ollama models: `gemma3:2b` or `qwen2.5:3b`, and `nomic-embed-text` - 8GB+ RAM (for 100K row processing) ## Documentation Full architecture and design decisions: [`docs/technical_doc.md`](docs/technical_doc.md)