ml-guard/ml-guard

GitHub: ml-guard/ml-guard

ML Guard 是一个安全扫描工具，用于检测机器学习管道中的恶意代码、泄露秘密和漏洞依赖项，并生成合规报告。

Stars: 0 | Forks: 0

# kept. So for "ML Guard", if "Guard" is a common word, I might translate it, but "ML" is technical. Let's think: "ML Guard" could be interpreted as "Machine Learning Guard", but in the context, it's likely a tool name. I'll assume it's a proper noun and keep it as "ML Guard" in the translation. [![PyPI](https://img.shields.io/pypi/v/mlsupplychain?logo=pypi&logoColor=white)](https://pypi.org/project/mlsupplychain/) [![Python](https://img.shields.io/pypi/pyversions/mlsupplychain?logo=python&logoColor=white)](https://pypi.org/project/mlsupplychain/) [![Downloads](https://img.shields.io/pypi/dm/mlsupplychain?logo=pypi&logoColor=white)](https://pypi.org/project/mlsupplychain/) [![License](https://img.shields.io/github/license/ml-guard/ml-guard)](LICENSE) [![CI](https://img.shields.io/github/actions/workflow/status/ml-guard/ml-guard/ci.yml?branch=main&logo=github)](https://github.com/ml-guard/ml-guard/actions/workflows/ci.yml) [![GitHub Marketplace](https://img.shields.io/badge/Marketplace-ML%20Guard%20Security%20Scan-blue?logo=github)](https://github.com/marketplace/actions/ml-guard-security-scan) ML Guard 扫描您的团队发布的构件——模型权重、配置文件、依赖清单、Notebook——并在它们进入生产环境前标记问题：恶意的 pickle 代码、safetensors 文件中嵌入的可执行文件、带有自定义插件的 ONNX 模型、泄露的 API 密钥、存在漏洞的 PyPI 依赖项、恶意软件包。它离线运行。它输出用于原生 GitHub 代码扫描的 SARIF 报告、用于审计的 CycloneDX SBOM，以及符合 **欧盟人工智能法案、NIST AI RMF、ISO 27001 和 SOC 2** 标准的 PDF 合规报告。 ## 状态 `v0.1.0` — 首个公开版本。全部五项扫描器和合规报告生成器已生产就绪；152 个测试覆盖了所有代码路径。 | 扫描器 | 状态 | 检测内容 | | ------------- | ---------- | ------------------------------------------------------- | | `pickle` | ✓ 已发布 | RCE 全局变量、可疑模块、PyTorch ZIP 文件、proto≥4 | | `safetensors` | ✓ 已发布 | 尾部负载、格式错误的偏移量、嵌入的 URI | | `onnx` | ✓ 已发布 | 自定义域操作、可疑的 external_data、Shell 脚本 | | `secrets` | ✓ 已发布 | AWS/GitHub/OpenAI 密钥、JWT、PEM 密钥、通用高熵字符串 | | `cve` | ✓ 已发布 | 基于 OSV 交叉校验 `requirements.txt`（离线数据库） | ## 安装说明 ``` # To be safe, I'll translate it to "ML Guard" as is, since the rule says keep professional terms and proper nouns in English. But the user said "translate to Simplified Chinese", so perhaps I need to output it in Chinese characters. For proper nouns, it's common to keep them in English in Chinese text. pip install mlsupplychain ``` 该 wheel 包附带一个精选的 OSV 小型数据库，涵盖约 150 个流行的 ML 软件包，因此 `pip install mlsupplychain && ml-guard scan` 即可**开箱即用**地发现真实漏洞——无需配置。如需覆盖所有 PyPI 的完整 CVE： ``` wget https://osv-vulnerabilities.storage.googleapis.com/PyPI/all.zip ml-guard cve-update all.zip ``` ## 快速开始 ``` ml-guard scan ./my-project ``` ``` ML Guard — scan report ======================================== Files scanned: 5 Time: 0.04s Summary: 6 critical, 12 high, 21 medium, 3 low ✗ CRITICAL model.pkl [offset 0x2a1] Dangerous global imported: os.system (known RCE primitive) ✗ CRITICAL requirements.txt [package ascii2text==1.0] Malicious package detected (advisory MAL-2022-7421). ✗ CRITICAL requirements.txt [package transformers==4.30.0] CVE-2023-6730: Deserialization of Untrusted Data vulnerability ! HIGH .env [line 1] GitHub Personal Access Token detected snippet: ghp_…6789 (len=40) ... ``` 如果任何发现满足 `--fail-on` 条件（默认为 `critical`），则退出代码为 1。 ## CI 集成 ``` - uses: ml-guard/scan-action@v1 with: path: ./models fail-on: critical format: sarif output: ml-guard.sarif - uses: github/codeql-action/upload-sarif@v3 with: sarif_file: ml-guard.sarif ``` SARIF 报告会出现在您仓库的 **Security → Code scanning** 中。 ## 合规报告 ML Guard 为四项标准生成机器可读的证据： | 标准 | ID | 覆盖内容 | | ------------- | ------------- | ------------------------------------------------------ | | 欧盟 AI 法案 | `eu-ai-act` | 第 9、10、11、12、13、15 条 — 风险管理、记录保存、技术文档、网络安全 | | NIST AI RMF | `nist-ai-rmf` | MEASURE 2.7, 2.10; MANAGE 4.1 | | ISO/IEC 27001 | `iso-27001` | 附录 A: 5.23, 5.34, 8.4, 8.7, 8.8, 8.25, 8.28 | | SOC 2 | `soc2` | 通用标准: CC6.1, 6.6, 6.7, 6.8, 7.1, 7.2 | 为审计生成 PDF 报告： ``` ml-guard compliance ./models --standard iso-27001 --output report.pdf ``` PDF 包含结论、逐条控制证据（附带文件/行号引用）、完整的发现附录以及 SHA-256 完整性校验码。 **重要提示：** 这些报告是*机器可读的技术证据*，而非合规性声明。确定法规合规性需要由合格人员（通知机构、数据保护官、注册会计师事务所）进行评估。 ## Looking back at the examples: 'Running Naabu' -> '运行 Naabu', so "Running" is translated to "运行", which is Chinese, and "Naabu" is kept in English. Similarly, for "ML Guard", I should translate any common words. "Guard" might be translated to "守护" or similar, but if it's part of a name, keep it. Since it's a heading and might be a product name, I'll keep "ML Guard" in English. ``` ml-guard sbom ./models -o ml-bom.json ``` 生成一个 CycloneDX 1.5 JSON 文件，包含每个构件（SHA-256 哈希）、依赖清单条目，并将发现编码为 `vulnerabilities`，带有正确的 `bom-ref` 链接。可直接导入 Dependency-Track、DefectDojo、sbom-utility 等工具。 ## 配置在项目根目录放置一个 `.ml-guard.yml` 文件： ``` fail_on: high # CI-only override (default: critical) include: - 'models/*.pkl' - 'configs/*.yaml' exclude: - 'tests/fixtures/**' scanners: - pickle - secrets rules: pickle-unusual-module: severity: low # downgrade secret-stripe-test: disabled: true # silence entirely ``` CLI 参数始终覆盖配置文件；配置文件提供默认值。 ## 输出格式 | 格式 | 标志 | 用例 | | ------- | ---------------- | ---------------------------------------------- | | `text` | `--format text` | 人类阅读（默认，带颜色） | | `json` | `--format json` | 脚本、自定义仪表盘 | | `sarif` | `--format sarif` | GitHub 代码扫描、GitLab SAST、IDE 插件 | ## 为何 pickle 是首要优先级 `pickle.load()` 和 `torch.load()` 在设计上会执行任意 Python 代码。一个 200 字节的 `.pkl` 文件可以在数据科学家打开它时植入一个反向 Shell。ML Guard 静态分析 pickle 字节码——**绝不执行它**——并标记在反序列化发生之前解析到的每一个可调用对象。完整的攻击面请参见 `docs/pickle-threat-model.md`。 ## 架构 ``` ml_guard/ ├── findings.py # Finding/Severity dataclasses ├── runner.py # walks paths, dispatches scanners ├── cli.py # click entrypoint ├── config.py # .ml-guard.yml loader ├── compliance.py # EU AI Act / NIST AI RMF / ISO 27001 / SOC 2 ├── sbom.py # CycloneDX 1.5 generator ├── cve_db.py # SQLite OSV index ├── _pdf.py # in-tree PDF 1.4 writer (no reportlab dep) ├── _protobuf.py # in-tree protobuf reader (no onnx dep) ├── data/ │ └── osv-mini.sqlite # bundled mini OSV DB (~530 KB compressed) ├── scanners/ │ ├── pickle_scanner.py │ ├── safetensors_scanner.py │ ├── onnx_scanner.py │ ├── secret_scanner.py │ └── cve_scanner.py └── output/ ├── text.py ├── json_fmt.py └── sarif.py rust_engine/ # optional native acceleration via PyO3 ``` Rust 引擎是**可选安装**的，通过 `pip install mlsupplychain[native]`。不安装它时，所有扫描器均使用纯 Python 运行，具有相同的正确性保证——仅在处理数 GB 的构件时速度较慢。 ## 文档 - [`docs/rules.md`](docs/rules.md) — 规则、严重性等级和覆盖示例的完整目录。 - [`docs/pickle-threat-model.md`](docs/pickle-threat-model.md) — 我们覆盖的内容和未覆盖的内容，并解释了攻击模式。 - [`docs/cve-database.md`](docs/cve-database.md) — OSV 更新工作流。 - [`docs/performance.md`](docs/performance.md) — 真实的基准测试数据。 - [`docs/releasing.md`](docs/releasing.md) — 面向维护者。 ## 贡献参见 [`CONTRIBUTING.md`](CONTRIBUTING.md)。安全策略： [`SECURITY.md`](SECURITY.md)。 ## 许可证 Apache 2.0。参见 [`LICENSE`](LICENSE)。

标签：EU AI Act合规, GitHub集成, ISO 27001合规, ML管道安全, NIST AI RMF合规, PDF合规报告, SARIF报告, SBOM生成, SOC 2合规, 依赖漏洞扫描, 凭证泄露检测, 可视化界面, 合规扫描, 图探索, 安全扫描, 密钥泄露防护, 时序注入, 机器学习安全, 离线扫描工具, 软件供应链安全, 远程方法调用, 逆向工具