righettod/toolbox-codescan

GitHub: righettod/toolbox-codescan

一个基于 Docker 的离线代码安全审计工具箱，集成 Semgrep、Gitleaks 和 DevSkim，旨在防止代码泄露的同时高效发现漏洞与敏感信息。

Stars: 4 | Forks: 0

# 💻 代码扫描工具箱 [![Build and deploy the toolbox image](https://static.pigsec.cn/wp-content/uploads/repos/2026/03/41c4df5bb4090101.svg)](https://github.com/righettod/toolbox-codescan/actions/workflows/build_docker_image.yml) ![MadeWitVSCode](https://img.shields.io/static/v1?label=Made%20with&message=VisualStudio%20Code&color=blue&?style=for-the-badge&logo=textpattern) ![MadeWithDocker](https://img.shields.io/static/v1?label=Made%20with&message=Docker&color=blue&?style=for-the-badge&logo=docker) ![AutomatedWith](https://img.shields.io/static/v1?label=Automated%20with&message=GitHub%20Actions&color=blue&?style=for-the-badge&logo=github) ## 🎯 描述此镜像的目标是提供一个现成的工具箱，用于对代码库执行 **离线扫描**。 💡 目标是 **防止任何泄露** 被扫描的代码库。 ## 🛠️ 使用的工具 | Tool | Usage | |--------------------------------------------------|-----------------------------------------------------------------------------------------------------| | [Semgrep](https://github.com/semgrep/semgrep) | 代码扫描 ([SAST](https://en.wikipedia.org/wiki/Static_application_security_testing) 活动)。 | | [Gitleaks](https://github.com/gitleaks/gitleaks) | 搜索密钥/凭证/... | 🔬 当 **Semgrep** 未能检测到我知道存在的问题时，我会尝试向 Semgrep [规则库](https://github.com/semgrep/semgrep-rules) 建议新规则： * ✅ * ✅ * 🔬 * 🔬 * 🔬 * 🔬 💡 为了能够在相应 PR 处于待定状态期间使用提议的规则，所有提议的规则都会被导入到文件夹 `/tools/semgrep-rules-righettod` 中： * ❌ 表示 PR 被 **拒绝**。 * 如果规则的 PR 被拒绝，则它将永久保留在此文件夹中。 * ✅ 表示 PR 被 **合并**。 * 如果规则的 PR 被合并，则它会从此文件夹中移除，因为它已成为 semgrep 规则库的一部分。 * 被接受的规则作为备份保留在文件夹 **[archived-rules](archived-rules)** 中。 * 🔬 表示 PR 正在 SemGrep 团队 **审查中**。 😉 文件夹 `/tools/semgrep-rules-righettod` 代表我的自定义 semgrep 规则库。 ## 📦 构建 💻 使用以下命令集来构建工具箱的 docker 镜像： ``` git clone https://github.com/righettod/toolbox-codescan.git cd toolbox-codescan docker build . -t righettod/toolbox-codescan ``` 💡 该镜像每周构建一次并推送到 GitHub 镜像仓库。您可以使用以下命令获取它： `docker pull ghcr.io/righettod/toolbox-codescan:main` ## 👨‍💻 使用 💻 使用以下命令创建工具箱的容器： ``` docker run --rm -v "C:/Temp:/work" --network none -it ghcr.io/righettod/toolbox-codescan:main # 从这里开始，使用提供的脚本之一... ``` ## 📋 脚本 ### 脚本 'scan-code.sh' 使用一组 [SEMGREP rules](https://github.com/semgrep/semgrep-rules) 和 [SEMGREP](https://semgrep.dev/) OSS 版本扫描当前文件夹的脚本。 🐞 发现的问题将存储在文件 `findings.json` 中。 💡 此 [脚本](https://github.com/righettod/toolbox-pentest-web/blob/master/scripts/generate-report-semgrep.py) 可用于获取识别到的并存储在文件 `findings.json` 中的发现概述。它作为文件 `/tools/scripts/report-code.py` 导入。 💻 用法与示例： ``` $ pwd /work/sample $ scan-code.sh Usage: scan-code.sh [RULES_FOLDER_NAME] Call example: scan-code.sh java scan-code.sh php scan-code.sh json See sub folders in '/tools/semgrep-rules'. Findings will be stored in file 'findings.json'. $ scan-code.sh java ┌────────────────┐ │ 1 Code Finding │ └────────────────┘ src/burp/ActivityLogger.java ❯❯❱ tools.semgrep-rules.java.lang.security.audit.formatted-sql-string Detected a formatted string in a SQL statement. This could lead to SQL injection if variables in the SQL statement are not properly sanitized. Use a prepared statements (java.sql.PreparedStatement) instead. You can obtain a PreparedStatement using 'connection.prepareStatement'. 91┆ stmt.execute(SQL_TABLE_CREATE); ``` ### 脚本 'scan-code-extended.sh' 执行与脚本 `scan-code.sh` 相同的处理，但使用与目标技术相关的所有 SEMGREP 规则扫描当前文件夹。此脚本首先收集所有规则提供者为目标技术提供的所有规则，然后使用这一整合的规则集进行扫描。 ### 脚本 'scan-secrets.sh' 使用 [GITLEAKS](https://github.com/gitleaks/gitleaks) 扫描当前文件夹以在源文件和 git 文件中查找密钥的脚本。仅在存在 `.git` 文件夹时才执行 Git 文件扫描。 🐞 泄露信息将存储在文件 `leaks-gitfiles.json` 和 `leaks-sourcefiles.json` 中。 💡 此 [脚本](https://github.com/righettod/toolbox-pentest-web/blob/master/scripts/generate-report-gitleaks.py) 可用于获取识别到的并存储在文件 `leaks-*.json` 中的泄露概述。它作为文件 `/tools/scripts/report-secrets.py` 导入。 💻 用法与示例： ``` $ pwd /work/sample $ scan-secrets.sh 5:47PM INF scan completed in 78.1ms 5:47PM INF no leaks found ``` ### 脚本 'scan-secrets-extended.sh' 使用 **密钥通用变量名** 字典 ([source](https://gist.githubusercontent.com/EdOverflow/8bd2faad513626c413b8fc6e9d955669/raw/06a0ef0fd83920d513c65767aae258ecf8382bdf/gistfile1.txt)) 扫描当前文件夹的脚本。 💡 上面引用的密钥通用变量名字典在镜像构建期间作为文件 `/tools/secret-common-variable-names.txt` 导入。 💻 用法与示例： ``` $ pwd /work/sample $ scan-secrets-extended.sh ./config/db.properties:50:DB_PASSWORD=Password2024 ``` ### 脚本 'online-scan-secrets.sh' 使用 [GITLEAKS](https://github.com/gitleaks/gitleaks) 扫描一组在线 git 仓库以在源文件和 git 文件中查找密钥的脚本。 💡 脚本 [scan-secrets.sh](scripts/scan-secrets.sh) 用于在克隆后扫描 git 仓库。 💡 使用脚本 [online-scan-secrets-consolidate.py](scripts/online-scan-secrets-consolidate.py) 将生成的数据整合到单个文件中。 💻 用法与示例： ``` $ online-scan-secrets.sh Usage: online-scan-secrets.sh [FILE_WITH_COLLECTION_OF_GIT_REPO_URLS] Call example: online-scan-secrets.sh repositories.txt $ online-scan-secrets.sh repositories.txt [*] Execution context: List of git repositories URL : repositories.txt (1030 entries) Data collection storage folder : /work/data-collected [*] Start repositories checking and data collection... ... ``` ### 脚本 'filters-secrets.py' 用于过滤使用 [GITLEAKS](https://github.com/gitleaks/gitleaks) 格式的大型泄露文件的脚本，例如脚本 [online-scan-secrets-consolidate.py](scripts/online-scan-secrets-consolidate.py) 生成的文件。 💡 输出允许使用 **grep** 和不同的正则表达式搜索特定密钥，例如 `grep -B 4 -E 'ey[A-Za-z0-9]{15,}\.[A-Za-z0-9]{15,}\.[A-Za-z0-9_-]*' report.txt`。 💻 用法： ``` filters-secrets.py leaks-consolidated.json ``` ## 🔬 分析 .NET 项目 🤔 我注意到 SemGrep 配合 CSharp 社区规则集使用时效果不是很理想。 💡 为了解决这个问题： * 我找到了 Microsoft 提供的工具 [DevSkim](https://github.com/microsoft/DevSkim) 并将其添加到了工具箱中。这就是我将基础镜像从 *alpine* 迁移到 *ubuntu* 的原因，因为我没能在基于 *alpine* 的镜像上成功运行它。 * 我创建并添加了这个 [脚本](https://github.com/righettod/toolbox-pentest-web/blob/master/scripts/generate-report-devskim.py) 作为 `report-code-devskim.py` 来探索扫描结果。 * 我创建了别名 `scan-code-devskim` 来使用 `DevSkim` 扫描当前文件夹，并将结果生成到 json 文件 `findings.json` 中，以保持与工具箱中其他脚本的一致性。 ## ⚗️ 杂项文件夹 [misc](misc/) 包含几个在安全代码审查活动背景下探索 AI（本地模型）使用的 POC。 ## 🤝 来源与致谢 ### Semgrep 分析规则提供者 * * * ### 工具 * * * * *

标签：DevSecOps, Docker, GitHub Actions, Gitleaks, LLM应用, SAST, Semgrep, Web截图, WordPress安全扫描, 上游代理, 云安全监控, 代码审查, 凭证泄露, 安全工具箱, 安全扫描, 安全防御评估, 容器安全, 应用安全, 提示注入防御, 时序注入, 机密检测, 源代码安全, 盲注攻击, 离线扫描, 自动笔记, 请求拦截, 逆向工具, 静态分析