DataScienceAIpath/mlops-template

GitHub: DataScienceAIpath/mlops-template

一个端到端 MLOps 模板，演示从合成数据、模型训练、MLflow 跟踪到 FastAPI 推理服务、Prometheus 监控与漂移检测的完整机器学习工程闭环。

Stars: 0 | Forks: 0

# 端到端 MLOps 模板 ![Python](https://img.shields.io/badge/Python-3.11-blue?logo=python) ![License](https://img.shields.io/badge/license-MIT-green) ![MLflow](https://img.shields.io/badge/MLflow-2.17-blue) ![FastAPI](https://img.shields.io/badge/FastAPI-0.115-teal) ## 问题大多数 ML 项目都止步于 notebook。此模板展示了从原始数据到可监控、可部署的推理服务的完整路径——从第一天起就内置了实验跟踪、模型版本控制、定期重训和漂移告警。 ## 架构 ``` graph TD subgraph Data Layer GEN[Data Generator\nSynthetic Churn CSV] --> VAL[Pydantic Validator] VAL --> FE[Feature Engineering\n+4 derived features] end subgraph Training Pipeline FE --> SPLIT[Train/Test Split] SPLIT --> LR[Logistic Regression] SPLIT --> RF[Random Forest ✓] SPLIT --> GB[Gradient Boosting] LR & RF & GB --> MLFLOW[(MLflow\nExperiment Tracker)] MLFLOW --> REG[(Model Registry\n.joblib on disk)] end subgraph Inference Server REG --> LOADER[Model Loader] LOADER --> API[FastAPI] API --> PRED[POST /predict] API --> BATCH[POST /predict/batch] API --> DRIFT_EP[GET /drift] API --> PROM[GET /metrics] end subgraph Observability PROM --> PROMETHEUS[(Prometheus)] PROMETHEUS --> GRAFANA[Grafana Dashboard] BATCH --> DRIFT[KS-Test Drift Detector] DRIFT --> ALERT[⚠️ Drift Alert] end subgraph CI/CD GH[GitHub Actions] GH -->|on push| CICD[lint + test] GH -->|weekly cron| RETRAIN[Scheduled Retraining] RETRAIN --> REG end ``` ## 演示 ``` $ make generate && make train Generated 5000 rows → ./data/raw/churn_data.csv | Churn rate: 26.4% Training random_forest… AUC=0.8821 F1=0.6134 Acc=0.8140 Model saved to ./models/churn_classifier.joblib $ make serve # → http://localhost:8000/docs # → http://localhost:8000/metrics (Prometheus) $ make mlflow-ui # → http://localhost:5000 (MLflow 实验跟踪器) ``` ## 快速开始 ``` git clone https://github.com/DataScienceAIpath/mlops-template cd mlops-template pip install -e ".[dev]" cp .env.example .env make generate # create synthetic dataset make train # train + log to MLflow make train-all # compare all 3 model types make mlflow-ui # open MLflow at http://localhost:5000 make serve # inference API at http://localhost:8000/docs make test # full test suite ``` ### 使用 Docker 的完整技术栈 ``` docker-compose up --build # API: http://localhost:8000 # MLflow: http://localhost:5000 # Prometheus: http://localhost:9090 # Grafana: http://localhost:3000 (admin/admin) ``` ## 技术栈 - **scikit-learn** — Logistic Regression、Random Forest、Gradient Boosting - **MLflow** — 实验跟踪、模型日志记录、artifact 存储 - **FastAPI + Uvicorn** — 异步推理服务器 - **Prometheus Client** — 预测计数器、延迟直方图、漂移指标 - **Grafana** — dashboard 配置（位于 `monitoring/grafana/` 的 JSON 文件） - **SciPy (KS test)** — 统计漂移检测 - **Pydantic** — API 边界处的输入验证 - **Docker + docker-compose** — 多服务技术栈（API、MLflow、Prometheus、Grafana） - **GitHub Actions** — CI lint+test 以及定期每周重训的 workflow - **pytest** — 涵盖数据、特征、训练、漂移和服务的 20 个测试 ## 核心文件 | 路径 | 作用 | |------|------| | `src/data/generate.py` | 具有真实特征相关性的合成流失数据 | | `src/data/validate.py` | Pydantic `CustomerFeatures` — 输入验证 | | `src/features/engineering.py` | 4 个衍生特征：费用比率、长期标签、参与度、高价值 | | `src/training/train.py` | 完整 pipeline：拆分 → 训练 → 评估 → MLflow 日志 → 保存 | | `src/training/evaluate.py` | Accuracy、AUC-ROC、F1、precision、recall | | `src/serving/app.py` | FastAPI：`/predict`、`/predict/batch`、`/metrics`、`/drift` | | `src/monitoring/metrics.py` | Prometheus 计数器、直方图和指标 | | `src/drift/detector.py` | 基于训练基线的 KS-test 漂移检测 | | `monitoring/prometheus.yml` | Prometheus 抓取配置 | | `monitoring/grafana/dashboard.json` | Grafana dashboard 面板 | | `.github/workflows/train.yml` | 定期重训（每周）+ 手动触发 | ## 扩展至生产环境 | 组件 | 开发环境（此仓库） | 生产环境 | |-----------|----------------|------------| | 模型注册表 | `./models/*.joblib` | MLflow Model Registry + S3 | | 实验跟踪 | 本地 `./mlruns` | MLflow Tracking Server（RDS 后端） | | 训练调度器 | GitHub Actions cron | Apache Airflow / Kubeflow Pipelines | | 监控 | Prometheus + Grafana | Grafana Cloud / Datadog | | 特征存储 | 内存 pipeline | Feast / Vertex AI Feature Store | | 漂移检测 | KS test | Evidently AI / WhyLogs | ## 许可证 MIT © Maniswaroop M

标签：Apex, AV绕过, Docker, FastAPI, MLOps, 安全规则引擎, 安全防御评估, 数据漂移检测, 机器学习, 模型部署, 自定义请求头, 请求拦截