DataScienceAIpath/mlops-template
GitHub: DataScienceAIpath/mlops-template
一个端到端 MLOps 模板,演示从合成数据、模型训练、MLflow 跟踪到 FastAPI 推理服务、Prometheus 监控与漂移检测的完整机器学习工程闭环。
Stars: 0 | Forks: 0
# 端到端 MLOps 模板
   
## 问题
大多数 ML 项目都止步于 notebook。此模板展示了从原始数据到可监控、可部署的推理服务的完整路径——从第一天起就内置了实验跟踪、模型版本控制、定期重训和漂移告警。
## 架构
```
graph TD
subgraph Data Layer
GEN[Data Generator\nSynthetic Churn CSV] --> VAL[Pydantic Validator]
VAL --> FE[Feature Engineering\n+4 derived features]
end
subgraph Training Pipeline
FE --> SPLIT[Train/Test Split]
SPLIT --> LR[Logistic Regression]
SPLIT --> RF[Random Forest ✓]
SPLIT --> GB[Gradient Boosting]
LR & RF & GB --> MLFLOW[(MLflow\nExperiment Tracker)]
MLFLOW --> REG[(Model Registry\n.joblib on disk)]
end
subgraph Inference Server
REG --> LOADER[Model Loader]
LOADER --> API[FastAPI]
API --> PRED[POST /predict]
API --> BATCH[POST /predict/batch]
API --> DRIFT_EP[GET /drift]
API --> PROM[GET /metrics]
end
subgraph Observability
PROM --> PROMETHEUS[(Prometheus)]
PROMETHEUS --> GRAFANA[Grafana Dashboard]
BATCH --> DRIFT[KS-Test Drift Detector]
DRIFT --> ALERT[⚠️ Drift Alert]
end
subgraph CI/CD
GH[GitHub Actions]
GH -->|on push| CICD[lint + test]
GH -->|weekly cron| RETRAIN[Scheduled Retraining]
RETRAIN --> REG
end
```
## 演示
```
$ make generate && make train
Generated 5000 rows → ./data/raw/churn_data.csv | Churn rate: 26.4%
Training random_forest…
AUC=0.8821 F1=0.6134 Acc=0.8140
Model saved to ./models/churn_classifier.joblib
$ make serve
# → http://localhost:8000/docs
# → http://localhost:8000/metrics (Prometheus)
$ make mlflow-ui
# → http://localhost:5000 (MLflow 实验跟踪器)
```
## 快速开始
```
git clone https://github.com/DataScienceAIpath/mlops-template
cd mlops-template
pip install -e ".[dev]"
cp .env.example .env
make generate # create synthetic dataset
make train # train + log to MLflow
make train-all # compare all 3 model types
make mlflow-ui # open MLflow at http://localhost:5000
make serve # inference API at http://localhost:8000/docs
make test # full test suite
```
### 使用 Docker 的完整技术栈
```
docker-compose up --build
# API: http://localhost:8000
# MLflow: http://localhost:5000
# Prometheus: http://localhost:9090
# Grafana: http://localhost:3000 (admin/admin)
```
## 技术栈
- **scikit-learn** — Logistic Regression、Random Forest、Gradient Boosting
- **MLflow** — 实验跟踪、模型日志记录、artifact 存储
- **FastAPI + Uvicorn** — 异步推理服务器
- **Prometheus Client** — 预测计数器、延迟直方图、漂移指标
- **Grafana** — dashboard 配置(位于 `monitoring/grafana/` 的 JSON 文件)
- **SciPy (KS test)** — 统计漂移检测
- **Pydantic** — API 边界处的输入验证
- **Docker + docker-compose** — 多服务技术栈(API、MLflow、Prometheus、Grafana)
- **GitHub Actions** — CI lint+test 以及定期每周重训的 workflow
- **pytest** — 涵盖数据、特征、训练、漂移和服务的 20 个测试
## 核心文件
| 路径 | 作用 |
|------|------|
| `src/data/generate.py` | 具有真实特征相关性的合成流失数据 |
| `src/data/validate.py` | Pydantic `CustomerFeatures` — 输入验证 |
| `src/features/engineering.py` | 4 个衍生特征:费用比率、长期标签、参与度、高价值 |
| `src/training/train.py` | 完整 pipeline:拆分 → 训练 → 评估 → MLflow 日志 → 保存 |
| `src/training/evaluate.py` | Accuracy、AUC-ROC、F1、precision、recall |
| `src/serving/app.py` | FastAPI:`/predict`、`/predict/batch`、`/metrics`、`/drift` |
| `src/monitoring/metrics.py` | Prometheus 计数器、直方图和指标 |
| `src/drift/detector.py` | 基于训练基线的 KS-test 漂移检测 |
| `monitoring/prometheus.yml` | Prometheus 抓取配置 |
| `monitoring/grafana/dashboard.json` | Grafana dashboard 面板 |
| `.github/workflows/train.yml` | 定期重训(每周)+ 手动触发 |
## 扩展至生产环境
| 组件 | 开发环境(此仓库) | 生产环境 |
|-----------|----------------|------------|
| 模型注册表 | `./models/*.joblib` | MLflow Model Registry + S3 |
| 实验跟踪 | 本地 `./mlruns` | MLflow Tracking Server(RDS 后端) |
| 训练调度器 | GitHub Actions cron | Apache Airflow / Kubeflow Pipelines |
| 监控 | Prometheus + Grafana | Grafana Cloud / Datadog |
| 特征存储 | 内存 pipeline | Feast / Vertex AI Feature Store |
| 漂移检测 | KS test | Evidently AI / WhyLogs |
## 许可证
MIT © Maniswaroop M
标签:Apex, AV绕过, Docker, FastAPI, MLOps, 安全规则引擎, 安全防御评估, 数据漂移检测, 机器学习, 模型部署, 自定义请求头, 请求拦截