0ca/BoxPwnr

GitHub: 0ca/BoxPwnr

一个模块化框架，用于在大规模安全靶场平台上自动化测试和基准对比不同 LLM 及智能体架构的渗透解题能力。

Stars: 235 | Forks: 28

# BoxPwnr 这是一个有趣的实验，旨在看看大型语言模型（LLM）在独自解决 [HackTheBox](https://www.hackthebox.com/hacker/hacking-labs) 靶机方面能走多远。 BoxPwnr 提供了一个即插即用的系统，可用于测试不同智能体架构的性能：`--solver [chat, chat_tools, chat_tools_compactation, claude_code, hacksynth, external]`。 BoxPwnr 起步于 HackTheBox，但也支持其他平台：`--platform [htb, htb_ctf, htb_challenges, portswigger, ctfd, local, xbow, cybench, picoctf, tryhackme, levelupctf]` 有关每个受支持平台的详细文档，请参阅 [Platform Implementations](src/boxpwnr/platforms/README.md)。 BoxPwnr 提供了一个即插即用的系统，可用于测试不同智能体架构的性能：`--solver [chat, chat_tools, chat_tools_compactation, claude_code, hacksynth, external]`。 # 轨迹与基准测试所有的解题轨迹均可在 [BoxPwnr Traces & Benchmarks](https://0ca.github.io/BoxPwnr-Traces/stats/) 中获取。每个轨迹都包含完整的对话日志，展示了 LLM 的推理过程、执行的命令以及接收到的输出。您可以在交互式 Web 查看器中重放任何轨迹，以逐步查看靶机是如何被攻破的。

🔬 BoxPwnr Traces & Benchmarks

| Platform | Solved | Completion | Traces | |----------|-------:|-----------:|-------:| | [HTB Starting Point](https://0ca.github.io/BoxPwnr-Traces/stats/platform.html?platform=htb-starting-point) | 25/25 | ![100.0%](https://img.shields.io/badge/100.0%25-brightgreen?style=flat-square) | 772 | | [HTB Labs](https://0ca.github.io/BoxPwnr-Traces/stats/platform.html?platform=htb-labs) | 74/519 | ![14.3%](https://img.shields.io/badge/14.3%25-red?style=flat-square) | 420 | | [HTB Challenges](https://0ca.github.io/BoxPwnr-Traces/stats/platform.html?platform=htb-challenges) | 16/817 | ![2.0%](https://img.shields.io/badge/2.0%25-red?style=flat-square) | 20 | | [PortSwigger Labs](https://0ca.github.io/BoxPwnr-Traces/stats/platform.html?platform=portswigger) | 163/270 | ![60.4%](https://img.shields.io/badge/60.4%25-green?style=flat-square) | 377 | | [XBOW Validation Benchmarks](https://0ca.github.io/BoxPwnr-Traces/stats/platform.html?platform=xbow) | 94/104 | ![90.4%](https://img.shields.io/badge/90.4%25-brightgreen?style=flat-square) | 512 | | [Cybench CTF Challenges](https://0ca.github.io/BoxPwnr-Traces/stats/platform.html?platform=cybench) | 37/40 | ![92.5%](https://img.shields.io/badge/92.5%25-brightgreen?style=flat-square) | 925 | | [picoCTF Challenges](https://0ca.github.io/BoxPwnr-Traces/stats/platform.html?platform=picoctf) | 231/439 | ![52.6%](https://img.shields.io/badge/52.6%25-yellow?style=flat-square) | 577 | | [TryHackMe Rooms](https://0ca.github.io/BoxPwnr-Traces/stats/platform.html?platform=tryhackme) | 67/459 | ![14.7%](https://img.shields.io/badge/14.7%25-red?style=flat-square) | 321 | | [HackBench Benchmarks](https://0ca.github.io/BoxPwnr-Traces/stats/platform.html?platform=hackbench) | 3/16 | ![18.8%](https://img.shields.io/badge/18.8%25-red?style=flat-square) | 3 | | [LevelUpCTF Challenges](https://0ca.github.io/BoxPwnr-Traces/stats/platform.html?platform=levelupctf) | 22/239 | ![9.2%](https://img.shields.io/badge/9.2%25-red?style=flat-square) | 56 | | [Neurogrid CTF: The ultimate AI security showdown](https://0ca.github.io/BoxPwnr-Traces/stats/platform.html?platform=Neurogrid-CTF-The-ultimate-AI-security-showdown) | 17/36 | ![47.2%](https://img.shields.io/badge/47.2%25-yellow?style=flat-square) | 197 | ## 工作原理 BoxPwnr 使用不同的 LLM 模型，通过迭代过程自主解决 HackTheBox 靶机： 1. **环境**：所有命令都在带有 Kali Linux 的 Docker 容器中运行 - 容器会在首次运行时自动构建（大约需要 10 分钟） - 使用指定的 --vpn 参数会自动建立 VPN 连接 2. **执行循环**： - LLM 接收详细的 [系统提示词](https://github.com/0ca/BoxPwnr/blob/main/src/boxpwnr/prompts/generic_prompt.yaml)，定义其任务和约束 - LLM 根据之前的输出建议下一条命令 - 命令在 Docker 容器中执行 - 输出反馈给 LLM 进行分析 - 过程重复，直到找到 flag 或 LLM 需要帮助 3. **命令自动化**： - LLM 被指示提供无需手动交互的完全自动化命令 - LLM 必须在命令中包含适当的超时设置并处理服务延迟 - LLM 必须将所有服务交互（telnet, ssh 等）脚本化，使其成为非交互式的 4. **结果**： - 对话和命令被保存以供分析 - 找到 flag 时生成摘要 - 跟踪使用统计数据（token，成本） ## 用法 ### 前置条件 1. 使用子模块克隆仓库 ``` git clone --recurse-submodules https://github.com/0ca/BoxPwnr cd BoxPwnr # Install uv if you haven't already curl -LsSf https://astral.sh/uv/install.sh | sh # Sync dependencies (creates .venv) uv sync ``` 2. Docker - BoxPwnr 需要安装并运行 Docker - 安装说明可以在以下地址找到：[https://docs.docker.com/get-docker/](https://docs.docker.com/get-docker/) ### 运行 BoxPwnr ``` uv run boxpwnr --platform htb --target meow [options] ``` 首次运行时，系统会提示您输入 OpenAI/Anthropic/DeepSeek API 密钥。该密钥将保存到 `.env` 以供将来使用。 ### 命令行选项 #### 核心选项 - `--platform`：要使用的平台（`htb`、`htb_ctf`、`htb_challenges`、`ctfd`、`portswigger`、`local`、`xbow`、`cybench`、`picoctf`、`tryhackme`、`levelupctf`） - `--target`：目标名称（例如 HTB 靶机的 `meow`，PortSwigger 实验室的 "SQL injection UNION attack"，或 XBOW 基准测试的 `XBEN-060-24`） - `--debug`：启用详细日志记录（显示工具名称和描述） - `--debug-langchain`：启用 LangChain 调试模式（显示带有工具模式的完整 HTTP 请求、LangChain 轨迹和原始 API 负载 —— 非常冗长） - `--max-turns`：停止前的最大轮数（例如 `--max-turns 10`） - `--max-cost`：停止前的最大成本（以美元为单位，例如 `--max-cost 2.0`） - `--max-time`：每次尝试的最大时间（以分钟为单位，例如 `--max-time 60`） - `--attempts`：解决目标的尝试次数（例如 `--attempts 5` 用于 pass@5 基准测试） - `--default-execution-timeout`：命令执行的默认超时时间（以秒为单位，默认值：30） - `--max-execution-timeout`：命令执行的最大超时时间（以秒为单位，默认值：300） - `--custom-instructions`：附加到系统提示词的额外自定义指令 #### 平台 - `--keep-target`：完成后保持目标（靶机/实验室）运行（适用于后续手动跟进） #### 分析与报告 - `--analyze-attempt`：完成后使用 TraceAnalyzer 分析失败的尝试 - `--generate-summary`：完成后生成解决方案摘要 - `--generate-progress`：为失败/中断的尝试生成进度交接文件（`progress.md`）。此文件可用于稍后恢复尝试。 - `--resume-from`：来自先前尝试的 `progress.md` 文件的路径。内容将被注入到系统提示词中，以便从先前尝试中断的地方继续。 - `--generate-report`：从现有的轨迹目录生成新报告 #### LLM Solver 与模型选择 - `--solver`：要使用的 LLM solver（`chat`、`chat_tools`、`chat_tools_compactation`、`claude_code`、`hacksynth`、`external`） - `--model`：要使用的 AI 模型。支持的模型包括： - Claude 模型：使用确切的 API 模型名称（例如 `claude-sonnet-4-0`、`claude-opus-4-0`、`claude-haiku-4-5-20251001`） - OpenAI 模型：`gpt-5`、`gpt-5-nano`、`gpt-5-mini` - 其他模型：`deepseek-reasoner`、`grok-4`、`gemini-3-flash-preview` - OpenRouter 模型：`openrouter/company/model`（例如 `openrouter/openrouter/free`、`openrouter/openai/gpt-oss-120b`、`openrouter/x-ai/grok-4-fast`、`openrouter/moonshotai/kimi-k2.5`） - Z.AI 模型：`z-ai/model-name`（例如 `z-ai/glm-5`）用于智谱 AI GLM 模型 - Kilo 免费模型：`kilo/model-name`（例如 `kilo/z-ai/glm-5`）通过 Kilo 网关 - Kimi 模型：`kimi/model-name`（例如 `kimi/kimi-k2.5`）用于 Kimi Code 订阅 - Cline 免费模型：`cline/minimax/minimax-m2.5`、`cline/moonshotai/kimi-k2.5`（需要 `cline auth`，见下文） - Ollama 模型：`ollama:model-name` - `--reasoning-effort`：具备推理能力的模型的推理努力程度（`minimal`、`low`、`medium`、`high`）。仅适用于支持推理的模型，如 `gpt-5`、`o4-mini`、`grok-4`。推理模型的默认值为 `medium`。 #### 外部 Solver 选项 `external` solver 允许 BoxPwnr 委托给任何外部工具（Claude Code、Aider、自定义脚本等）： - `--external-timeout`：外部 solver 子进程的超时时间（以秒为单位，默认值：3600） - `--` 之后的命令：要执行的外部命令（例如 `-- claude -p "$BOXPWNR_PROMPT"`）外部工具可用的环境变量： - `BOXPWNR_PROMPT`：包含目标信息的完整系统提示词 - `BOXPWNR_TARGET_IP`：目标连接信息（IP/主机名） - `BOXPWNR_CONTAINER`：Docker 容器名称（在 VPN 场景中很有用） #### 执行器选项 - `--executor`：要使用的执行器（默认值：`docker`） - `--keep-container`：完成后保留 Docker 容器（在多次尝试时更快） - `--architecture`：要使用的容器架构（选项：`default`、`amd64`）。即使在 Apple Silicon 等 ARM 系统上，也使用 `amd64` 在 Intel/AMD 架构上运行。 #### 平台特定选项 - HTB CTF 选项： - `--ctf-id`：CTF 活动的 ID（使用 `--platform htb_ctf` 时必需） - CTFd 选项： - `--ctfd-url`：CTFd 实例的 URL（使用 `--platform ctfd` 时必需） ### 示例 ``` # 常规使用（容器在执行后停止） uv run boxpwnr --platform htb --target meow --debug # 开发模式（保持容器运行以便后续更快执行） uv run boxpwnr --platform htb --target meow --debug --keep-container # 在 AMD64 架构上运行（适用于 M1/M2 Mac 等 ARM 系统的 x86 兼容性） uv run boxpwnr --platform htb --target meow --architecture amd64 # 限制轮数 uv run boxpwnr --platform htb --target meow --max-turns 10 # 限制最大成本 uv run boxpwnr --platform htb --target meow --max-cost 1.5 # 运行多次尝试以进行 pass@5 基准测试 uv run boxpwnr --platform htb --target meow --attempts 5 # 使用特定模型 uv run boxpwnr --platform htb --target meow --model claude-sonnet-4-0 # 使用 Claude Haiku 4.5（快速、高性价比且智能） uv run boxpwnr --platform htb --target meow --model claude-haiku-4-5-20251001 --max-cost 0.5 # 使用 GPT-5-mini（快速且高性价比） uv run boxpwnr --platform htb --target meow --model gpt-5-mini --max-cost 1.0 # 使用 Grok-4（高级推理模型） uv run boxpwnr --platform htb --target meow --model grok-4 --max-cost 2.0 # 使用 OpenRouter 免费层（自动路由） uv run boxpwnr --platform htb --target meow --model openrouter/openrouter/free --max-cost 0.5 # 通过 OpenRouter 使用 gpt-oss-120b（具有推理能力的 117B MoE 开源权重模型） uv run boxpwnr --platform htb --target meow --model openrouter/openai/gpt-oss-120b --max-cost 1.0 # 通过 OpenRouter 使用 Kimi K2.5（Moonshot AI 的推理模型） python3 -m boxpwnr.cli --platform htb --target meow --model openrouter/moonshotai/kimi-k2.5 --max-cost 1.0 # 使用 Cline 免费模型（需要：npm install -g cline && cline auth） uv run boxpwnr --platform htb --target meow --model cline/minimax/minimax-m2.5 # 使用 Z.AI GLM-5（Zhipu AI 推理模型） uv run boxpwnr --platform htb --target meow --model z-ai/glm-5 --max-cost 1.0 # 使用 Kilo 免费模型（通过 Kilo 网关的 GLM-5） uv run boxpwnr --platform htb --target meow --model kilo/z-ai/glm-5 # 直接使用 Kimi K2.5（需要 Kimi Code 订阅） uv run boxpwnr --platform htb --target meow --model kimi/kimi-k2.5 --max-cost 1.0 # 使用 OpenCode 免费模型（无需认证） uv run boxpwnr --platform htb --target meow --model opencode/big-pickle --max-cost 0.5 # 使用 Claude Code solver（将 CC 用作 agent） uv run boxpwnr --platform htb --target meow --solver claude_code --model claude-sonnet-4-0 --max-cost 2.0 # 使用 HackSynth solver（具有 planner-executor-summarizer 架构的自主 CTF agent） uv run boxpwnr --platform htb --target meow --solver hacksynth --model gpt-5 --max-cost 1.0 # 使用 chat_tools_compactation solver 处理可能超出上下文限制的长运行 traces uv run boxpwnr --platform htb --target meow --solver chat_tools_compactation --model gpt-5 --max-turns 100 # 自定义压缩行为 uv run boxpwnr --platform htb --target meow --solver chat_tools_compactation --compaction-threshold 0.70 --preserve-last-turns 15 # 从现有尝试生成新报告 uv run boxpwnr --generate-report machines/meow/traces/20250129_180409 # 运行 HTB challenge（app.hackthebox.com/challenges） uv run boxpwnr --platform htb_challenges --target "Flag Command" # 运行 CTF challenge uv run boxpwnr --platform htb_ctf --ctf-id 1234 --target "Web Challenge" # 运行 CTFd challenge uv run boxpwnr --platform ctfd --ctfd-url https://ctf.example.com --target "Crypto 101" # 使用自定义指令运行 uv run boxpwnr --platform htb --target meow --custom-instructions "Focus on privilege escalation techniques and explain your steps in detail" # 为失败的尝试生成进度文件（稍后可恢复） uv run boxpwnr --platform htb --target meow --generate-progress --max-turns 20 # 使用生成的进度文件从之前的尝试恢复 uv run boxpwnr --platform htb --target meow --resume-from targets/htb/meow/traces/20250127_120000/progress.md --max-turns 30 # 运行 XBOW 基准测试（首次使用时自动克隆 benchmarks） uv run boxpwnr --platform xbow --target XBEN-060-24 --model gpt-5 --max-turns 30 # 列出所有可用的 XBOW benchmarks uv run boxpwnr --platform xbow --list # 运行 Cybench challenge（首次使用时自动克隆仓库） # 可以使用简称或完整路径 uv run boxpwnr --platform cybench --target "[Very Easy] Dynastic" --model gpt-5 --max-cost 2.0 # 或者使用完整路径： uv run boxpwnr --platform cybench --target "benchmark/hackthebox/cyber-apocalypse-2024/crypto/[Very Easy] Dynastic" --model gpt-5 --max-cost 2.0 # 列出所有可用的 Cybench challenges（40 个专业 CTF 任务） uv run boxpwnr --platform cybench --list # 使用带有 Claude Code 的外部 solver（注意：用 bash -c 和单引号包裹） uv run boxpwnr --platform htb --target meow --solver external -- bash -c 'claude --dangerously-skip-permissions -p "$BOXPWNR_PROMPT"' # 使用带有 OpenAI Codex CLI 的外部 solver uv run boxpwnr --platform htb --target meow --solver external -- bash -c 'codex --yolo "$BOXPWNR_PROMPT"' # 使用带有自定义超时的外部 solver（2 小时） uv run boxpwnr --platform htb --target meow --solver external --external-timeout 7200 -- bash -c 'claude --dangerously-skip-permissions -p "$BOXPWNR_PROMPT"' # 在 Docker 容器内使用外部 solver（用于 VPN 场景） # 当目标需要 VPN 时，在 BoxPwnr 的 Docker 容器内运行外部工具。 # IS_SANDBOX=1 允许 --dangerously-skip-permissions 以 root 身份运行。 uv run boxpwnr --platform htb --target meow --solver external -- \ bash -c 'docker exec -e IS_SANDBOX=1 -e ANTHROPIC_API_KEY="$ANTHROPIC_API_KEY" "$BOXPWNR_CONTAINER" claude --dangerously-skip-permissions -p "$BOXPWNR_PROMPT"' ``` ## 为什么选择 HackTheBox？ HackTheBox 靶机为评估 AI 系统提供了一个极佳的端到端试验场，因为它们需要： - 复杂的推理能力 - 创造性的“跳出框框”思维 - 对各种安全概念的理解 - 链接多个步骤的能力 - 动态解决问题的技能 ## 为什么是现在？*（写于 2025 年 1 月 26 日）* 随着 LLM 技术的最新进展： - 模型的推理能力正变得越来越复杂 - 运行这些模型的成本正在下降（参见 DeepSeek R1 Zero） - 它们理解和生成代码的能力正在提高 - 它们在保持上下文和解决多步骤问题方面正变得更好我相信在未来几年内，LLM 将有能力自主解决大多数 HTB 靶机，这标志着 AI 安全测试和解决问题能力的一个重要里程碑。 ## 开发 ### 测试 BoxPwnr 支持使用 `[act](https://github.com/nektos/act)` 在本地运行 GitHub Actions 工作流，该工具在推送到 GitHub 之前模拟确切的 CI 环境： ``` # 安装 act（macOS） brew install act # 在本地运行 CI workflows make ci-test # Run main test workflow make ci-integration # Run integration tests (slow - downloads Python each time) make ci-docker # Run docker build test make ci-all # Run all workflows ``` 免责声明本项目仅用于研究和教育目的。使用此工具时，请务必遵守每个平台的服务条款和道德准则。

标签：Agentic Strategies, BoxPwnr, DLL 劫持, HackTheBox, LLM, Petitpotam, Python, TGT, TryHackMe, Unmanaged PE, 人工智能安全, 反取证, 合规性, 大语言模型, 安全智能体, 安全评估, 实时处理, 密码管理, 攻防演练, 数据展示, 无后门, 红队, 网络安全, 自动化攻防, 请求拦截, 逆向工具, 隐私保护