Yelp/detect-secrets

GitHub: Yelp/detect-secrets

Yelp开源的企业级代码机密扫描工具,通过引入基线机制实现向后兼容,有效防止API密钥等敏感信息混入代码库。

Stars: 4431 | Forks: 535

[![Build Status](https://github.com/Yelp/detect-secrets/actions/workflows/ci.yml/badge.svg)](https://github.com/Yelp/detect-secrets/actions/workflows/ci.yml?query=branch%3Amaster++) [![PyPI version](https://badge.fury.io/py/detect-secrets.svg)](https://badge.fury.io/py/detect-secrets) [![Homebrew](https://img.shields.io/badge/dynamic/json.svg?url=https://formulae.brew.sh/api/formula/detect-secrets.json&query=$.versions.stable&label=homebrew)](https://formulae.brew.sh/formula/detect-secrets) [![PRs Welcome](https://img.shields.io/badge/PRs-welcome-ff69b4.svg)](https://github.com/Yelp/detect-secrets/issues?q=is%3Aissue+is%3Aopen+label%3A%22good+first+issue%22+) [![AMF](https://img.shields.io/badge/Donate-Charity-orange.svg)](https://www.againstmalaria.com/donation.aspx) # detect-secrets ## 关于 `detect-secrets` 是一个名副其实的模块,用于(惊喜吧)在代码库中**检测机密信息**。 然而,与其他仅专注于查找机密的类似软件包不同,本软件包是专为企业客户设计的:提供一种**向后兼容**、系统化的方式来: 1. 防止新的机密进入代码库, 2. 检测此类预防措施是否被明确绕过,以及 3. 提供一份需要轮换的机密清单,并将其迁移到更安全的存储中。 通过这种方式,你创建了一个 [separation of concern](https://en.wikipedia.org/wiki/Separation_of_concerns): 承认你的大型代码库中*目前*可能隐藏着机密(这就是我们所称的 _基线_),但要防止这个问题变得更严重,而无需处理迁移现有机密可能带来的巨大工作量。 它通过定期将差异输出与启发式构建的正则表达式语句进行比对来实现这一点,以识别是否提交了*新*机密。这样,它既避免了挖掘所有 git 历史记录的开销,也无需每次都扫描整个代码库。 要查看最近的更改,请参阅 [CHANGELOG.md](CHANGELOG.md)。 如果你想做出贡献,请参阅 [CONTRIBUTING.md](CONTRIBUTING.md)。 如需更详细的文档,请查看我们的其他 [documentation](docs/)。 ## 示例 ### 快速入门: 为你的 git 代码库中当前发现的潜在机密创建一个基线。 ``` $ detect-secrets scan > .secrets.baseline ``` 或者,要从不同的目录运行它: ``` $ detect-secrets -C /path/to/directory scan > /path/to/directory/.secrets.baseline ``` **扫描非 git 跟踪的文件:** ``` $ detect-secrets scan test_data/ --all-files > .secrets.baseline ``` ### 将新机密添加到基线: 这将重新扫描你的代码库,并且: 1. 更新/升级你的基线以兼容最新版本, 2. 将它发现的任何新机密添加到你的基线中, 3. 移除代码库中不再存在的机密 这也将保留你已标记的机密。 ``` $ detect-secrets scan --baseline .secrets.baseline ``` 对于早于 0.9 版本的基线,只需重新创建它。 ### 对新添加的机密发出警报: **仅扫描暂存文件:** ``` $ git diff --staged --name-only -z | xargs -0 detect-secrets-hook --baseline .secrets.baseline ``` **扫描所有跟踪文件:** ``` $ git ls-files -z | xargs -0 detect-secrets-hook --baseline .secrets.baseline ``` ### 查看所有已启用的插件: ``` $ detect-secrets scan --list-all-plugins ArtifactoryDetector AWSKeyDetector AzureStorageKeyDetector BasicAuthDetector CloudantDetector DiscordBotTokenDetector GitHubTokenDetector GitLabTokenDetector Base64HighEntropyString HexHighEntropyString IbmCloudIamDetector IbmCosHmacDetector IPPublicDetector JwtTokenDetector KeywordDetector MailchimpDetector NpmDetector OpenAIDetector PrivateKeyDetector PypiTokenDetector SendGridDetector SlackDetector SoftlayerDetector SquareOAuthDetector StripeDetector TelegramBotTokenDetector TwilioKeyDetector ``` ### 禁用插件: ``` $ detect-secrets scan --disable-plugin KeywordDetector --disable-plugin AWSKeyDetector ``` 如果你**只**想运行特定的插件,你可以这样做: ``` $ detect-secrets scan --list-all-plugins | \ grep -v 'BasicAuthDetector' | \ sed "s#^#--disable-plugin #g" | \ xargs detect-secrets scan test_data ``` ### 审计基线: 这是一个可选步骤,用于标记基线中的结果。它可用于缩小要迁移的机密清单范围,或更好地配置插件以提高其信噪比。 ``` $ detect-secrets audit .secrets.baseline ``` ### 在其他 Python 脚本中使用 **基本用法:** ``` from detect_secrets import SecretsCollection from detect_secrets.settings import default_settings secrets = SecretsCollection() with default_settings(): secrets.scan_file('test_data/config.ini') import json print(json.dumps(secrets.json(), indent=2)) ``` **更高级的配置:** ``` from detect_secrets import SecretsCollection from detect_secrets.settings import transient_settings secrets = SecretsCollection() with transient_settings({ # Only run scans with only these plugins. # This format is the same as the one that is saved in the generated baseline. 'plugins_used': [ # Example of configuring a built-in plugin { 'name': 'Base64HighEntropyString', 'limit': 5.0, }, # Example of using a custom plugin { 'name': 'HippoDetector', 'path': 'file:///Users/aaronloo/Documents/github/detect-secrets/testing/plugins.py', }, ], # We can also specify whichever additional filters we want. # This is an example of using the function `is_identified_by_ML_model` within the # local file `./private-filters/example.py`. 'filters_used': [ { 'path': 'file://private-filters/example.py::is_identified_by_ML_model', }, ] }) as settings: # If we want to make any further adjustments to the created settings object (e.g. # disabling default filters), we can do so as such. settings.disable_filters( 'detect_secrets.filters.heuristic.is_prefixed_with_dollar_sign', 'detect_secrets.filters.heuristic.is_likely_id_string', ) secrets.scan_file('test_data/config.ini') ``` ## 安装 ``` $ pip install detect-secrets ✨🍰✨ ``` 通过 [brew](https://brew.sh/) 安装: ``` $ brew install detect-secrets ``` ## 用法 `detect-secrets` 附带三种不同的工具,人们经常混淆该使用哪一种。使用这个方便的清单来帮助你决定: 1. 你想将机密添加到基线吗?如果是,请使用 **`detect-secrets scan`**。 2. 你想对不在基线中的新机密发出警报吗?如果是,请使用 **`detect-secrets-hook`**。 3. 你在分析基线本身吗?如果是,请使用 **`detect-secrets audit`**。 ### 将机密添加到基线 ``` $ detect-secrets scan --help usage: detect-secrets scan [-h] [--string [STRING]] [--only-allowlisted] [--all-files] [--baseline FILENAME] [--force-use-all-plugins] [--slim] [--list-all-plugins] [-p PLUGIN] [--base64-limit [BASE64_LIMIT]] [--hex-limit [HEX_LIMIT]] [--disable-plugin DISABLE_PLUGIN] [-n | --only-verified] [--exclude-lines EXCLUDE_LINES] [--exclude-files EXCLUDE_FILES] [--exclude-secrets EXCLUDE_SECRETS] [--word-list WORD_LIST_FILE] [-f FILTER] [--disable-filter DISABLE_FILTER] [path [path ...]] Scans a repository for secrets in code. The generated output is compatible with `detect-secrets-hook --baseline`. positional arguments: path Scans the entire codebase and outputs a snapshot of currently identified secrets. optional arguments: -h, --help show this help message and exit --string [STRING] Scans an individual string, and displays configured plugins' verdict. --only-allowlisted Only scans the lines that are flagged with `allowlist secret`. This helps verify that individual exceptions are indeed non-secrets. scan options: --all-files Scan all files recursively (as compared to only scanning git tracked files). --baseline FILENAME If provided, will update existing baseline by importing settings from it. --force-use-all-plugins If a baseline is provided, detect-secrets will default to loading the plugins specified by that baseline. However, this may also mean it doesn't perform the scan with the latest plugins. If this flag is provided, it will always use the latest plugins --slim Slim baselines are created with the intention of minimizing differences between commits. However, they are not compatible with the `audit` functionality, and slim baselines will need to be remade to be audited. plugin options: Configure settings for each secret scanning ruleset. By default, all plugins are enabled unless explicitly disabled. --list-all-plugins Lists all plugins that will be used for the scan. -p PLUGIN, --plugin PLUGIN Specify path to custom secret detector plugin. --base64-limit [BASE64_LIMIT] Sets the entropy limit for high entropy strings. Value must be between 0.0 and 8.0, defaults to 4.5. --hex-limit [HEX_LIMIT] Sets the entropy limit for high entropy strings. Value must be between 0.0 and 8.0, defaults to 3.0. --disable-plugin DISABLE_PLUGIN Plugin class names to disable. e.g. Base64HighEntropyString filter options: Configure settings for filtering out secrets after they are flagged by the engine. -n, --no-verify Disables additional verification of secrets via network call. --only-verified Only flags secrets that can be verified. --exclude-lines EXCLUDE_LINES If lines match this regex, it will be ignored. --exclude-files EXCLUDE_FILES If filenames match this regex, it will be ignored. --exclude-secrets EXCLUDE_SECRETS If secrets match this regex, it will be ignored. --word-list WORD_LIST_FILE Text file with a list of words, if a secret contains a word in the list we ignore it. -f FILTER, --filter FILTER Specify path to custom filter. May be a python module path (e.g. detect_secrets.filters.common.is_invalid_file) or a local file path (e.g. file://path/to/file.py::function_name). --disable-filter DISABLE_FILTER Specify filter to disable. e.g. detect_secrets.filters.common.is_invalid_file ``` ### 阻止不在基线中的机密 ``` $ detect-secrets-hook --help usage: detect-secrets-hook [-h] [-v] [--version] [--baseline FILENAME] [--list-all-plugins] [-p PLUGIN] [--base64-limit [BASE64_LIMIT]] [--hex-limit [HEX_LIMIT]] [--disable-plugin DISABLE_PLUGIN] [-n | --only-verified] [--exclude-lines EXCLUDE_LINES] [--exclude-files EXCLUDE_FILES] [--exclude-secrets EXCLUDE_SECRETS] [--word-list WORD_LIST_FILE] [-f FILTER] [--disable-filter DISABLE_FILTER] [filenames [filenames ...]] positional arguments: filenames Filenames to check. optional arguments: -h, --help show this help message and exit -v, --verbose Verbose mode. --version Display version information. --json Print detect-secrets-hook output as JSON --baseline FILENAME Explicitly ignore secrets through a baseline generated by `detect-secrets scan` plugin options: Configure settings for each secret scanning ruleset. By default, all plugins are enabled unless explicitly disabled. --list-all-plugins Lists all plugins that will be used for the scan. -p PLUGIN, --plugin PLUGIN Specify path to custom secret detector plugin. --base64-limit [BASE64_LIMIT] Sets the entropy limit for high entropy strings. Value must be between 0.0 and 8.0, defaults to 4.5. --hex-limit [HEX_LIMIT] Sets the entropy limit for high entropy strings. Value must be between 0.0 and 8.0, defaults to 3.0. --disable-plugin DISABLE_PLUGIN Plugin class names to disable. e.g. Base64HighEntropyString filter options: Configure settings for filtering out secrets after they are flagged by the engine. -n, --no-verify Disables additional verification of secrets via network call. --only-verified Only flags secrets that can be verified. --exclude-lines EXCLUDE_LINES If lines match this regex, it will be ignored. --exclude-files EXCLUDE_FILES If filenames match this regex, it will be ignored. --exclude-secrets EXCLUDE_SECRETS If secrets match this regex, it will be ignored. -f FILTER, --filter FILTER Specify path to custom filter. May be a python module path (e.g. detect_secrets.filters.common.is_invalid_file) or a local file path (e.g. file://path/to/file.py::function_name). --disable-filter DISABLE_FILTER Specify filter to disable. e.g. detect_secrets.filters.common.is_invalid_file ``` 我们建议将其设置为 pre-commit 钩子。一种方法是使用 [pre-commit](https://github.com/pre-commit/pre-commit) 框架: ``` # .pre-commit-config.yaml repos: - repo: https://github.com/Yelp/detect-secrets rev: v1.5.0 hooks: - id: detect-secrets args: ['--baseline', '.secrets.baseline'] exclude: package.lock.json ``` #### 内联白名单 有时我们想排除误报以免阻止提交,但又不想为此创建基线。你可以通过添加如下注释来实现: ``` secret = "hunter2" # pragma: allowlist secret ``` 或 ``` // pragma: allowlist nextline secret const secret = "hunter2"; ``` ### 审计基线中的机密 ``` $ detect-secrets audit --help usage: detect-secrets audit [-h] [--diff] [--stats] [--report] [--only-real | --only-false] [--json] filename [filename ...] Auditing a baseline allows analysts to label results, and optimize plugins for the highest signal-to-noise ratio for their environment. positional arguments: filename Audit a given baseline file to distinguish the difference between false and true positives. optional arguments: -h, --help show this help message and exit --diff Allows the comparison of two baseline files, in order to effectively distinguish the difference between various plugin configurations. --stats Displays the results of an interactive auditing session which have been saved to a baseline file. --report Displays a report with the secrets detected reporting: Display a summary with all the findings and the made decisions. To be used with the report mode (--report). --only-real Only includes real secrets in the report --only-false Only includes false positives in the report analytics: Quantify the success of your plugins based on the labelled results in your baseline. To be used with the statistics mode (--stats). --json Outputs results in a machine-readable format. ``` ## 配置 该工具通过**插件**和**过滤器**系统运行。 - **插件** 在代码中查找机密 - **过滤器** 忽略微报以提高扫描精度 你可以调整两者以满足你的精确度/召回率需求。 ### 插件 我们采用三种不同的策略来尝试在代码中查找机密: 1. 基于正则表达式的规则 这是最常见的插件类型,适用于结构良好的机密。 这些机密可以选择进行 [verified](docs/plugins.md#Verified-Secrets),这会提高扫描精度。然而,仅仅依赖这些可能会对你的扫描召回率产生负面影响。 2. 熵检测器 这通过各种启发式方法搜索“看起来像机密”的字符串。这非常适合非结构化机密,但可能需要调整以调整扫描精度。 3. 关键字检测器 这忽略机密值,并搜索通常与使用硬编码值分配机密相关的变量名。这非常适合“看起来不像机密”的字符串(例如 le3tc0de 密码),但可能需要调整过滤器以调整扫描精度。 想找到我们目前尚未捕获的机密吗?你也可以(轻松)开发自己的插件,并在引擎中使用它!有关更多信息,请查看 [plugin documentation](docs/plugins.md#Using-Your-Own-Plugin)。 ### 过滤器 `detect-secrets` 附带了几个不同的内置过滤器,可能满足你的需求。 #### --exclude-lines 有时,你可能希望全局允许扫描中的某些行,如果它们匹配特定模式的话。你可以这样指定正则规则: ``` $ detect-secrets scan --exclude-lines 'password = (blah|fake)' ``` 或者你可以这样指定多个正则规则: ``` $ detect-secrets scan --exclude-lines 'password = blah' --exclude-lines 'password = fake' ``` #### --exclude-files 有时,你可能希望忽略扫描中的某些文件。你可以指定一个正则表达式模式来这样做,如果文件名符合此正则模式,它将不会被扫描: ``` $ detect-secrets scan --exclude-files '.*\.signature$' ``` 或者你可以这样指定多个正则模式: ``` $ detect-secrets scan --exclude-files '.*\.signature$' --exclude-files '.*/i18n/.*' ``` #### --exclude-secrets 有时,你可能希望忽略扫描中的某些机密值。你可以这样指定正则规则: ``` $ detect-secrets scan --exclude-secrets '(fakesecret|\${.*})' ``` 或者你可以这样指定多个正则规则: ``` $ detect-secrets scan --exclude-secrets 'fakesecret' --exclude-secrets '\${.*})' ``` #### 内联白名单 有时,你可能希望对特定行应用排除,而不是全局排除它。 你可以使用内联白名单这样做: ``` API_KEY = 'this-will-ordinarily-be-detected-by-a-plugin' # pragma: allowlist secret ``` 这些注释支持多种语言。例如 ``` const GoogleCredentialPassword = "something-secret-here"; // pragma: allowlist secret ``` 你也可以使用: ``` # pragma: allowlist nextline secret API_KEY = 'WillAlsoBeIgnored' ``` 这可能是一种忽略机密的便捷方法,而无需重新生成整个基线。如果你需要显式搜索这些列入白名单的机密,你也可以这样做: ``` $ detect-secrets scan --only-allowlisted ``` 想编写更多自定义逻辑来过滤掉误报吗?请查看我们的 [filters documentation](docs/filters.md#Using-Your-Own-Filters) 了解如何操作。 ## 扩展 ### 词汇表 `--exclude-secrets` 标志允许你指定正则规则来排除机密值。然而,如果你想指定一个大的单词列表,你可以使用 `--word-list` 标志。 要使用此功能,请确保安装 `pyahocorasick` 包,或者简单地使用: ``` $ pip install detect-secrets[word_list] ``` 然后,你可以这样使用它: ``` $ cat wordlist.txt not-a-real-secret $ cat sample.ini password = not-a-real-secret # Will show results $ detect-secrets scan sample.ini # No results found $ detect-secrets scan --word-list wordlist.txt ``` ### 胡言乱语检测器 胡言乱语检测器是一个简单的机器学习模型,它试图确定机值是否实际上是胡言乱语,其假设是**真正的**机密值不像单词。 要使用此功能,请确保安装 `gibberish-detector` 包,或使用: ``` $ pip install detect-secrets[gibberish] ``` 查看 [gibberish-detector](https://github.com/domanchi/gibberish-detector) 包以获取有关如何训练模型的更多信息。将包含一个预训练模型(通过处理 RFC 播种)以便于使用。 你也可以这样指定自己的模型: ``` $ detect-secrets scan --gibberish-model custom.model ``` 这不是默认插件,因为它会忽略诸如 `password` 之类的机密。 ## 注意事项 这并不是防止机密进入代码库的万无一失的解决方案。只有适当的开发者培训才能真正解决这个问题。这个 pre-commit 钩子仅仅实现了几种启发式方法,试图防止提交机密的明显情况。 **无法预防的事情:** - 多行机密 - 不会触发 `KeywordDetector` 的默认密码(例如 `login = "hunter2"`) ## 常见问题 ### 常规 - **即使我在 git 仓库中,也遇到了“Did not detect git repository.”警告。** 检查你的 `git` 版本是否 >= 1.8.5。如果不是,请升级它,然后重试。 [More details here](https://github.com/Yelp/detect-secrets/issues/220)。 ### Windows - **`detect-secrets audit` 在创建基线后显示“Not a valid baseline file!”。** 确保你的基线文件的编码是 UTF-8。 [More details here](https://github.com/Yelp/detect-secrets/issues/272#issuecomment-619187136)。
标签:AI红队测试, CCS 2025, CI/CD 安全, CLI应用, DevSecOps, Git 安全, Google AI, GUI应用, meg, Python, SolidJS, Yelp, 上游代理, 云安全监控, 人工智能安全, 代码安全, 代码审计, 企业安全, 信息安全, 凭证嗅探, 合规性, 基线管理, 安全助手, 密钥扫描, 密钥泄露防护, 密钥管理, 对抗攻击, 带宽管理, 开源安全工具, 指令劫持, 敏感信息检测, 数据投毒防御, 文档结构分析, 无后门, 机器学习安全, 漏洞枚举, 用户界面自定义, 系统提示安全, 网络安全研究, 网络资产管理, 调试插件, 软件开发安全, 逆向工具, 逆向工程平台, 防御框架, 静态分析, 预提交钩子