someshvermagithub/credit-card-fraud-detection-ml-pipeline
GitHub: someshvermagithub/credit-card-fraud-detection-ml-pipeline
Stars: 0 | Forks: 0
# 💳 Credit Card Fraud Detection — End-to-End ML Pipeline










## 📌 Overview
This project presents a **production-quality Credit Card Fraud Detection system** built using Machine Learning and Explainable AI techniques.
The notebook covers:
- 📊 Advanced Exploratory Data Analysis (EDA)
- 🛠️ Feature Engineering
- ⚖️ Imbalanced Learning Handling (SMOTE + Class Weights)
- 🤖 Multiple ML Models Benchmarking
- 📈 Threshold Optimization
- 🔍 SHAP Explainability
- 💼 Business Impact Analysis
## 🚀 Features
### ✅ Exploratory Data Analysis
- Fraud distribution analysis
- Transaction amount analysis
- Temporal fraud pattern analysis
- Correlation heatmaps
- Pairplots
- PCA feature behavior visualization
### ✅ Feature Engineering
- Cyclic hour encoding
- Log amount transformation
- Z-score normalization
- Interaction features
- Statistical aggregation features
### ✅ Imbalanced Learning
- SMOTE oversampling
- `class_weight='balanced'`
- `scale_pos_weight` for boosting models
### ✅ Machine Learning Models
- Logistic Regression
- Random Forest
- XGBoost
- LightGBM
- XGBoost + SMOTE
### ✅ Explainable AI
- SHAP Feature Importance
- SHAP Dependence Plot
- SHAP Waterfall Visualization
### ✅ Business Analytics
- Fraud savings estimation
- False positive investigation cost
- Threshold-based savings optimization
- Risk tier classification
# 📂 Dataset
Dataset used:
[Kaggle — Credit Card Fraud Detection](https://www.kaggle.com/datasets/mlg-ulb/creditcardfraud)
- 284,807 transactions
- 492 fraud cases
- Highly imbalanced dataset (0.17% fraud)
# 🧠 Tech Stack
## Languages
- Python
## Libraries
- NumPy
- Pandas
- Matplotlib
- Seaborn
- Scikit-learn
- XGBoost
- LightGBM
- SHAP
- Imbalanced-learn
- Joblib
# 📊 Project Workflow
Data Collection
↓
Data Cleaning
↓
EDA & Visualization
↓
Feature Engineering
↓
Train/Test Split
↓
Scaling + SMOTE
↓
Model Training
↓
Evaluation
↓
Threshold Optimization
↓
SHAP Explainability
↓
Business Impact Analysis
# 📈 Evaluation Metrics
Since fraud detection is highly imbalanced, the following metrics are prioritized:
- Average Precision (AUPRC)
- ROC-AUC
- F1 Score
- Precision
- Recall
# 🏆 Key Insights
- Fraud transactions are extremely rare (0.17%)
- Night-time transactions have higher fraud probability
- Fraudsters often use medium transaction amounts
- PCA features V14, V17, V12, V10 are highly discriminative
- Threshold tuning significantly improves fraud detection performance
# 📷 Visualizations
The notebook generates:
- Class Distribution Charts
- Amount Distribution Analysis
- Temporal Fraud Analysis
- Correlation Heatmaps
- Pairplots
- Precision-Recall Curves
- ROC Curves
- SHAP Explainability Charts
- Business Impact Dashboards
# ⚙️ Installation
## Clone Repository
git clone https://github.com/someshvermagithub/credit-card-fraud-detection-ml-pipeline.git
cd credit-card-fraud-detection-ml-pipeline
## Install Dependencies
pip install -r requirements.txt
# ▶️ Run Notebook
jupyter notebook
Open:
Credit_Card_Fraud_Detection_Somesh_Verma.ipynb
# 📁 Project Structure
credit-card-fraud-detection-ml-pipeline/
│
├── Credit_Card_Fraud_Detection_Somesh_Verma.ipynb
├── creditcard.csv
├── best_fraud_model.pkl
├── fraud_scaler.pkl
├── flagged_transactions.csv
├── requirements.txt
├── README.md
│
├── visualizations/
│ ├── viz_01_class_distribution.png
│ ├── viz_02_amount_analysis.png
│ ├── viz_03_temporal_analysis.png
│ ├── viz_04_feature_analysis.png
│ ├── viz_05_correlation.png
│ ├── viz_06_pairplot.png
│ ├── viz_07_model_evaluation.png
│ ├── viz_08_threshold_tuning.png
│ ├── viz_09_shap_analysis.png
│ ├── viz_10_shap_waterfall.png
│ └── viz_11_business_impact.png
# 📌 Future Improvements
- Real-time fraud detection API
- Streamlit dashboard deployment
- Deep learning models (Autoencoders, LSTMs)
- Online learning for live transaction streams
- Docker containerization
- Cloud deployment (AWS/GCP/Azure)
# 👨💻 Author
## Somesh Verma
- Machine Learning & Data Science Enthusiast
- Interested in Explainable AI, Fraud Detection & Intelligent Systems
# ⭐ If you like this project
Give this repository a ⭐ on GitHub!