gowtham27rajendran-commits/7-fraud-detection

GitHub: gowtham27rajendran-commits/7-fraud-detection

Stars: 0 | Forks: 0

# Real-Time Fraud Detection System A streaming fraud detection system that scores transactions in <100ms using an ensemble of rule-based filters and ML models, with full explainability via SHAP. ## Architecture Transaction Event (Kafka) ↓ Rule Engine (fast, deterministic filters) ↓ (passes rules) Feature Extraction (Redis feature store) ↓ ML Scorer (XGBoost ensemble) ↓ SHAP Explainer (why flagged?) ↓ Decision: Allow / Flag / Block ↓ Kafka output topic → Case Management System ## Decision Layers | Layer | Latency | Coverage | |---|---|---| | Rule engine | <1ms | Known patterns (stolen card BINs, impossible velocity) | | ML model | 10–50ms | Unknown patterns, complex feature interactions | | Human review | async | High-value, ambiguous cases | ## Key Design Decisions | Decision | Choice | Reason | |---|---|---| | Rules before ML | Yes | Rules catch obvious fraud cheaply; saves ML compute | | Model type | XGBoost ensemble | Best calibration on tabular data, SHAP compatible | | Threshold | Configurable per merchant | High-risk merchants need lower threshold | | Explainability | SHAP per-transaction | Regulatory requirement in EU (GDPR Art. 22) | | False positive budget | Max 0.5% | Higher FP = legitimate customers declined = revenue loss | ## Features Used - Velocity: txn_count_1h, txn_count_24h per card - Amount anomaly: amount vs user's historical mean/std - Geography: distance from last transaction, new country flag - Time: hour of day, day of week (fraud peaks at 3am) - Merchant: first-time merchant, merchant risk category - Device: new device fingerprint, IP reputation score ## Running Locally docker-compose up -d kafka redis pip install -r requirements.txt python app/streaming/consumer.py python app/streaming/producer.py # simulate transaction stream ## Interview Talking Points **"How do you handle class imbalance? (fraud is 0.1% of transactions)"** SMOTE for oversampling minority class in training. XGBoost scale_pos_weight parameter. Optimize for F1 / precision-recall AUC, not accuracy — 99.9% accuracy means nothing if you just predict "not fraud" always. **"What's your false positive vs false negative trade-off?"** False positive: decline a legitimate transaction → customer angry, may churn. False negative: approve fraud → financial loss. We tune threshold per merchant based on their tolerance. Fraud-prone merchants (crypto exchanges) use lower threshold. **"How do you detect concept drift in fraud patterns?"** Fraudsters adapt. Monitor feature distributions weekly using KS-test. Monitor model score distribution — if average fraud score drops, model may be stale. Retrain monthly minimum, or trigger on drift detection.