do7a-mo/Digital-Payment-Fraud-Detection

GitHub: do7a-mo/Digital-Payment-Fraud-Detection

基于 XGBoost 和特征工程构建的数字支付欺诈检测模型,有效识别不平衡数据集中的欺诈交易。

Stars: 0 | Forks: 0

# 基于机器学习的数字支付欺诈检测 ## 概述 本项目专注于利用机器学习与高级特征工程技术,检测数字支付交易中的欺诈行为。 项目运用了数据预处理、特征工程、降维以及 XGBoost 分类方法,以高准确率识别可疑金融交易。 ## 项目目标 - 检测欺诈交易 - 处理不平衡数据集 - 构建有意义的风险相关特征 - 比较基准模型与基于 PCA 的模型 - 使用多种指标评估欺诈检测性能 ## 数据集 使用的数据集: - 数字支付欺诈检测基准数据集 目标列: - `is_fraud` - 0 → 合法交易 - 1 → 欺诈交易 数据集包含: - 交易信息 - 设备信息 - 风险评分 - 消费行为 - 地理位置特征 - 商户风险指标 ## 使用的技术 - Python - Pandas - NumPy - Matplotlib - Seaborn - Scikit-learn - XGBoost ## 项目工作流 ### 1. 数据加载 加载并检查数据集: - 形状 - 缺失值 - 欺诈分布 - 特征类型 ### 2. 数据清洗 移除若干不必要的列,包括: - 交易 ID - 客户 ID - 商户 ID - 交易时间戳 类别特征转换为 category 数据类型。 ### 3. 特征工程 创建了新的有意义的特征,以提升欺诈检测性能。 构建的特征包括: ### 金额与平均值之比 ``` transaction_amount / avg_monthly_spend Failed Transaction Risk failed_txn_count_24h * ip_risk_score Distance-Amount Interaction geo_distance_from_last_txn * transaction_amount Total Risk Score merchant_risk_score + ip_risk_score + post_auth_risk_score Exploratory Data Analysis (EDA) The project includes: Fraud distribution analysis Correlation analysis Feature importance visualization Performance comparison charts Data Splitting The dataset was split using Stratified Train-Test Split to preserve fraud distribution because the dataset is highly imbalanced. train_test_split( stratify=y ) Feature Scaling Numerical features were standardized using: StandardScaler Machine Learning Model XGBoost Classifier The main model used in the project is: XGBClassifier The model was optimized using: learning_rate max_depth min_child_weight subsample colsample_bytree scale_pos_weight Special handling was added for imbalanced fraud classes. Dimensionality Reduction PCA (Principal Component Analysis) PCA was applied to reduce dimensionality while preserving variance. Two approaches were tested: PCA with 95% variance preservation PCA with fixed 10 components Model Evaluation The models were evaluated using: Accuracy Precision Recall F1-Score ROC-AUC Score Confusion Matrix Results Baseline Model (Without PCA) Accuracy: 99.99% Fraud Recall: 99.90% Fraud F1-Score: 99.64% AUC Score: 1.00 PCA Model Accuracy: 97.46% Fraud Recall: 64.99% Fraud F1-Score: 45.41% AUC Score: 0.94 Key Findings The baseline XGBoost model achieved extremely high fraud detection performance. PCA reduced model performance significantly. Feature engineering played a major role in improving results. Handling class imbalance was critical for fraud recall performance. Visualizations Included Correlation Heatmaps Feature Importance Charts Model Performance Comparison Confusion Matrix Future Improvements Hyperparameter tuning using GridSearchCV Deep Learning approaches Real-time fraud detection system Model deployment using Streamlit or Flask SHAP explainability integration ```
标签:Apex, PCA降维, Python, XGBoost, 不平衡分类, 代码示例, 信用评分, 反欺诈, 开源安全, 异常检测, 数字支付, 数据分析, 数据科学, 数据预处理, 无后门, 机器学习, 模型评估, 欺诈检测, 特征工程, 资源验证, 逆向工具, 金融科技, 金融风控