gowtham27rajendran-commits/7-fraud-detection
GitHub: gowtham27rajendran-commits/7-fraud-detection
Stars: 0 | Forks: 0
# Real-Time Fraud Detection System
A streaming fraud detection system that scores transactions in <100ms using an ensemble of rule-based filters and ML models, with full explainability via SHAP.
## Architecture
Transaction Event (Kafka)
↓
Rule Engine (fast, deterministic filters)
↓ (passes rules)
Feature Extraction (Redis feature store)
↓
ML Scorer (XGBoost ensemble)
↓
SHAP Explainer (why flagged?)
↓
Decision: Allow / Flag / Block
↓
Kafka output topic → Case Management System
## Decision Layers
| Layer | Latency | Coverage |
|---|---|---|
| Rule engine | <1ms | Known patterns (stolen card BINs, impossible velocity) |
| ML model | 10–50ms | Unknown patterns, complex feature interactions |
| Human review | async | High-value, ambiguous cases |
## Key Design Decisions
| Decision | Choice | Reason |
|---|---|---|
| Rules before ML | Yes | Rules catch obvious fraud cheaply; saves ML compute |
| Model type | XGBoost ensemble | Best calibration on tabular data, SHAP compatible |
| Threshold | Configurable per merchant | High-risk merchants need lower threshold |
| Explainability | SHAP per-transaction | Regulatory requirement in EU (GDPR Art. 22) |
| False positive budget | Max 0.5% | Higher FP = legitimate customers declined = revenue loss |
## Features Used
- Velocity: txn_count_1h, txn_count_24h per card
- Amount anomaly: amount vs user's historical mean/std
- Geography: distance from last transaction, new country flag
- Time: hour of day, day of week (fraud peaks at 3am)
- Merchant: first-time merchant, merchant risk category
- Device: new device fingerprint, IP reputation score
## Running Locally
docker-compose up -d kafka redis
pip install -r requirements.txt
python app/streaming/consumer.py
python app/streaming/producer.py # simulate transaction stream
## Interview Talking Points
**"How do you handle class imbalance? (fraud is 0.1% of transactions)"**
SMOTE for oversampling minority class in training. XGBoost scale_pos_weight parameter. Optimize for F1 / precision-recall AUC, not accuracy — 99.9% accuracy means nothing if you just predict "not fraud" always.
**"What's your false positive vs false negative trade-off?"**
False positive: decline a legitimate transaction → customer angry, may churn. False negative: approve fraud → financial loss. We tune threshold per merchant based on their tolerance. Fraud-prone merchants (crypto exchanges) use lower threshold.
**"How do you detect concept drift in fraud patterns?"**
Fraudsters adapt. Monitor feature distributions weekly using KS-test. Monitor model score distribution — if average fraud score drops, model may be stale. Retrain monthly minimum, or trigger on drift detection.