someshvermagithub/credit-card-fraud-detection-ml-pipeline

GitHub: someshvermagithub/credit-card-fraud-detection-ml-pipeline

Stars: 0 | Forks: 0

# 💳 Credit Card Fraud Detection — End-to-End ML Pipeline ![Python](https://img.shields.io/badge/Python-3.10-blue?style=for-the-badge&logo=python) ![Machine Learning](https://img.shields.io/badge/Machine-Learning-orange?style=for-the-badge&logo=scikitlearn) ![Scikit-Learn](https://img.shields.io/badge/Scikit--Learn-F7931E?style=for-the-badge&logo=scikitlearn&logoColor=white) ![XGBoost](https://img.shields.io/badge/XGBoost-ML-success?style=for-the-badge) ![LightGBM](https://img.shields.io/badge/LightGBM-GradientBoosting-green?style=for-the-badge) ![SHAP](https://img.shields.io/badge/Explainable-AI-purple?style=for-the-badge) ![Fraud Detection](https://img.shields.io/badge/Fraud-Detection-red?style=for-the-badge) ![Data Science](https://img.shields.io/badge/Data-Science-black?style=for-the-badge) ![License](https://img.shields.io/badge/License-MIT-yellow?style=for-the-badge) ![Status](https://img.shields.io/badge/Status-Completed-brightgreen?style=for-the-badge) ## 📌 Overview This project presents a **production-quality Credit Card Fraud Detection system** built using Machine Learning and Explainable AI techniques. The notebook covers: - 📊 Advanced Exploratory Data Analysis (EDA) - 🛠️ Feature Engineering - ⚖️ Imbalanced Learning Handling (SMOTE + Class Weights) - 🤖 Multiple ML Models Benchmarking - 📈 Threshold Optimization - 🔍 SHAP Explainability - 💼 Business Impact Analysis ## 🚀 Features ### ✅ Exploratory Data Analysis - Fraud distribution analysis - Transaction amount analysis - Temporal fraud pattern analysis - Correlation heatmaps - Pairplots - PCA feature behavior visualization ### ✅ Feature Engineering - Cyclic hour encoding - Log amount transformation - Z-score normalization - Interaction features - Statistical aggregation features ### ✅ Imbalanced Learning - SMOTE oversampling - `class_weight='balanced'` - `scale_pos_weight` for boosting models ### ✅ Machine Learning Models - Logistic Regression - Random Forest - XGBoost - LightGBM - XGBoost + SMOTE ### ✅ Explainable AI - SHAP Feature Importance - SHAP Dependence Plot - SHAP Waterfall Visualization ### ✅ Business Analytics - Fraud savings estimation - False positive investigation cost - Threshold-based savings optimization - Risk tier classification # 📂 Dataset Dataset used: [Kaggle — Credit Card Fraud Detection](https://www.kaggle.com/datasets/mlg-ulb/creditcardfraud) - 284,807 transactions - 492 fraud cases - Highly imbalanced dataset (0.17% fraud) # 🧠 Tech Stack ## Languages - Python ## Libraries - NumPy - Pandas - Matplotlib - Seaborn - Scikit-learn - XGBoost - LightGBM - SHAP - Imbalanced-learn - Joblib # 📊 Project Workflow Data Collection ↓ Data Cleaning ↓ EDA & Visualization ↓ Feature Engineering ↓ Train/Test Split ↓ Scaling + SMOTE ↓ Model Training ↓ Evaluation ↓ Threshold Optimization ↓ SHAP Explainability ↓ Business Impact Analysis # 📈 Evaluation Metrics Since fraud detection is highly imbalanced, the following metrics are prioritized: - Average Precision (AUPRC) - ROC-AUC - F1 Score - Precision - Recall # 🏆 Key Insights - Fraud transactions are extremely rare (0.17%) - Night-time transactions have higher fraud probability - Fraudsters often use medium transaction amounts - PCA features V14, V17, V12, V10 are highly discriminative - Threshold tuning significantly improves fraud detection performance # 📷 Visualizations The notebook generates: - Class Distribution Charts - Amount Distribution Analysis - Temporal Fraud Analysis - Correlation Heatmaps - Pairplots - Precision-Recall Curves - ROC Curves - SHAP Explainability Charts - Business Impact Dashboards # ⚙️ Installation ## Clone Repository git clone https://github.com/someshvermagithub/credit-card-fraud-detection-ml-pipeline.git cd credit-card-fraud-detection-ml-pipeline ## Install Dependencies pip install -r requirements.txt # ▶️ Run Notebook jupyter notebook Open: Credit_Card_Fraud_Detection_Somesh_Verma.ipynb # 📁 Project Structure credit-card-fraud-detection-ml-pipeline/ │ ├── Credit_Card_Fraud_Detection_Somesh_Verma.ipynb ├── creditcard.csv ├── best_fraud_model.pkl ├── fraud_scaler.pkl ├── flagged_transactions.csv ├── requirements.txt ├── README.md │ ├── visualizations/ │ ├── viz_01_class_distribution.png │ ├── viz_02_amount_analysis.png │ ├── viz_03_temporal_analysis.png │ ├── viz_04_feature_analysis.png │ ├── viz_05_correlation.png │ ├── viz_06_pairplot.png │ ├── viz_07_model_evaluation.png │ ├── viz_08_threshold_tuning.png │ ├── viz_09_shap_analysis.png │ ├── viz_10_shap_waterfall.png │ └── viz_11_business_impact.png # 📌 Future Improvements - Real-time fraud detection API - Streamlit dashboard deployment - Deep learning models (Autoencoders, LSTMs) - Online learning for live transaction streams - Docker containerization - Cloud deployment (AWS/GCP/Azure) # 👨‍💻 Author ## Somesh Verma - Machine Learning & Data Science Enthusiast - Interested in Explainable AI, Fraud Detection & Intelligent Systems # ⭐ If you like this project Give this repository a ⭐ on GitHub!