CrystalPrime/argus-fraud-detection
GitHub: CrystalPrime/argus-fraud-detection
Stars: 0 | Forks: 0
# Argus — AI-Powered Fraud Detection Platform
Multi-layer fraud and anomaly detection platform built on the IEEE-CIS Fraud Detection dataset. Combines statistical anomaly detection, a configurable rule engine, context-aware scoring, RAG-based explainability, and multi-agent orchestration — all served via a FastAPI REST API.
## Quick Start
pip install -r requirements.txt
Place the dataset files in `data/raw/`:
data/raw/
├── train_transaction.csv
└── train_identity.csv
## Pipeline
# Step 1: Data loading, profiling & feature engineering
python pipeline_day1.py --full
# Step 2: Train the detection engine
python pipeline_day2.py
# Step 3: Context adjustment & rule engine
python pipeline_day3.py
# Step 4: RAG pipeline & multi-agent orchestration (Ollama must be running)
python pipeline_day4.py --llm gemma3:2b --embed nomic-embed-text
# Step 5: Start the API server
uvicorn src.api.main:app --reload --port 8000
Swagger UI: `http://localhost:8000/docs`
## API Endpoints
| Endpoint | Method | Description |
|---|---|---|
| `/score` | POST | Compute full anomaly score |
| `/explain` | POST | Full explainability report |
| `/rules/evaluate` | POST | Rule engine evaluation |
| `/rules/list` | GET | List all configured rules |
| `/rag/query` | POST | Query fraud policy knowledge base |
### Example Request — `/score`
curl -X POST http://localhost:8000/score \
-H "Content-Type: application/json" \
-d '{
"TransactionID": 12345,
"TransactionAmt": 3500.0,
"tx_hour": 3,
"is_night": 1,
"is_weekend": 1,
"is_business_hours": 0,
"entity_tx_count": 1,
"velocity_hourly": 9,
"tx_amt_is_round": 1,
"addr1_missing": 1,
"card4_risk": 1.0,
"email_domain_match": 0,
"anomaly_score": 0.72
}'
### Example Response
{
"transaction_id": 12345,
"anomaly_score": 0.72,
"context_score": 1.0,
"rule_score": 1.0,
"final_score": 1.0,
"risk_level": "critical",
"risk_label": "KRİTİK",
"recommended_action": "OTOMATIK_BLOKE: İşlem durduruldu, fraud ekibine iletildi"
}
## Project Structure
argus-fraud-detection/
├── src/
│ ├── data/ # Steps 1-2: Data loading, quality analysis, schema intelligence
│ ├── features/ # Step 3: Feature engineering (30 features)
│ ├── detection/ # Step 4: Multi-layer anomaly detection engine
│ │ ├── base.py
│ │ ├── column_detector.py
│ │ ├── multivariate_detector.py
│ │ ├── entity_detector.py
│ │ ├── temporal_detector.py
│ │ └── engine.py
│ ├── scoring/ # Step 5: Weighted score aggregation
│ ├── context/ # Step 6: Context adjust engine (8 rules)
│ ├── rules/ # Step 7: Configurable rule engine (YAML/JSON)
│ ├── rag/ # Step 8: RAG pipeline (ChromaDB + Ollama)
│ ├── agents/ # Step 9: Multi-agent orchestration
│ └── api/ # Step 10: FastAPI
│ └── routers/ # /score /explain /rules /rag
├── configs/
│ ├── rules.yaml # 10 fraud rules (YAML)
│ └── rules.json # 10 fraud rules (JSON)
├── knowledge_base/
│ └── fraud_policy.md # RAG knowledge base (fraud policies)
├── docs/
│ └── technical_doc.md # Detailed technical documentation
├── pipeline_day1.py
├── pipeline_day2.py
├── pipeline_day3.py
├── pipeline_day4.py
└── requirements.txt
## Architecture
### Anomaly Detection Layers
| Layer | Method | Weight |
|---|---|---|
| Column | IQR + Z-score hybrid | 0.25 |
| Multivariate | Isolation Forest + LOF | 0.40 |
| Entity | Personal baseline z-score | 0.20 |
| Temporal | Hourly / daily profiling | 0.15 |
### Context Adjustment Rules
| Context | Effect | Rationale |
|---|---|---|
| Night window (23:00–06:00) | +20% / +30% | High-risk time window |
| Outside business hours | +15% | Low bank intervention capacity |
| Weekend + high amount | +12% | Reduced oversight on weekends |
| Trusted entity | −30% | False positive reduction |
| High velocity (≥5/hour) | +25% | Card testing attack pattern |
| Round amount (≥$100) | +15% | Money laundering indicator |
| New entity | +20% | High uncertainty on first transaction |
| Geographic risk | +18% | Missing address fields |
### Risk Levels
| Score | Level | Action |
|---|---|---|
| 0.00 – 0.35 | Low | Auto-approve |
| 0.35 – 0.60 | Medium | Extra verification |
| 0.60 – 0.80 | High | Manual review |
| 0.80 – 1.00 | Critical | Auto-block |
### RAG Pipeline
fraud_policy.md → chunking (500 chars) → nomic-embed-text
→ ChromaDB (cosine similarity) → top-3 chunks → Ollama LLM → explanation
### Multi-Agent Flow
DataAgent → ScoringAgent → ContextAgent → RuleAgent → RAGAgent → ExplainAgent
Each agent operates independently via a shared `FraudDetectionState` TypedDict. Graceful fallback on errors.
## Key Design Decisions
**Isolation Forest + LOF combination:** Isolation Forest excels at global anomalies in high-dimensional space; LOF captures local density-based outliers. Together they cover complementary failure modes.
**Weighted aggregation over voting:** Fixed weights based on domain knowledge outperform equal voting because layer reliability varies — entity anomaly is meaningless for new users, while multivariate always applies.
**Local LLM (Ollama):** Fraud data contains sensitive financial information. Sending it to external APIs creates compliance risk. All inference stays local.
## Requirements
- Python 3.11+
- Ollama running locally (`ollama serve`)
- Ollama models: `gemma3:2b` or `qwen2.5:3b`, and `nomic-embed-text`
- 8GB+ RAM (for 100K row processing)
## Documentation
Full architecture and design decisions: [`docs/technical_doc.md`](docs/technical_doc.md)