ESI-Bench/ESI-Bench

GitHub: ESI-Bench/ESI-Bench

ESI-Bench 是一个基于 OmniGibson 仿真的具身空间智能综合基准，旨在评估智能体通过主动感知与动作交互完成空间推理任务的闭环能力。

Stars: 119 | Forks: 5

ESI-Bench：迈向闭合感知-动作回路的
具身空间智能

Yining Hong*¹ Jiageng Liu*² Han Yin¹ Manling Li³ Leonidas Guibas¹ Fei-Fei Li¹ Jiajun Wu¹ Yejin Choi¹

¹Stanford University ²UCLA ³Northwestern University

ESI-Bench Teaser

## 概述空间智能通过感知-动作回路展开：智能体采取动作以获取观察，并推理观察如何随动作发生变化。它们并非被动地处理所*见*之物，而是主动地揭示*未见*之物——那些仅凭被动传感无法解析的遮挡结构、动态变化、包含关系及功能性。 **ESI-Bench** 超越了以往假设存在完美观察的空间智能概念，将观察者重塑为执行者。我们在 [OmniGibson](https://behavior.stanford.edu/omnigibson/) 的基础上，引入了一个涵盖 **10 个任务类别**和 **29 个子类别**的具身空间智能综合基准，并以 Spelke 的核心知识系统为基础。智能体必须决定部署哪些能力——感知、移动和操作——以及如何对这些能力进行排序，以主动累积与任务相关的证据。 ### 主要发现 - **主动探索显著优于被动 counterparts**，智能体在没有明确指令的情况下，自发地发现了涌现的空间策略。 - **尽管消耗了多得多的图像，被动多视角增加的是噪声而非有效信号**。 - **大多数失败源于动作盲视**：糟糕的动作选择导致糟糕的观察，进而引发连锁错误。 - **显式的 3D grounding 能够在对深度敏感的任务上稳定推理**，但不完美的重建被证明比 2D baselines 更有害。 - **模型表现出元认知鸿沟**：人类会在产生矛盾时寻找证伪视角并修正信念，而模型则不论证据质量如何，都会带着高置信度过早下定论。 ## 仓库结构 ``` esi-bench/ ├── dataset/ │ └── json_clean/ # Task question JSONs │ ├── Action Sequencing/ │ ├── Cognitive Mapping/ │ ├── Enumerative Perception/ │ ├── Metric Comparison/ │ ├── Perceptual Grounding/ │ ├── Physical Dynamics/ │ ├── Physical Structure/ │ ├── Spatial Relations/ │ ├── Specular Reflection/ │ └── Temporal Understanding/ ├── src/ │ ├── active_explore/ # Active exploration runner │ │ ├── main.py │ │ └── tasks/ # Per-task modules │ └── dataset_generation/ # Dataset construction scripts │ └── (see Dataset Generation section below) ├── outputs/ # Results and step images (git-ignored) └── README.md ``` ## 主动探索主动探索模块会加载一个 OmniGibson 场景，捕获步骤图像，调用 GPT 或 Gemini 模型，并写入 `answer.json`。 ### 环境设置 **请注意，与 [BEHAVIOR-1K](https://github.com/StanfordVL/BEHAVIOR-1K) 一样，模拟器仅适用于 <= 40 系列（例如 20 / 30 / 40 系列）的显卡；如果您使用 50 系列或 blackwell GPU，渲染结果将会非常差。** 使用现有的 `behavior` conda 环境： ``` source ~/miniconda3/etc/profile.d/conda.sh conda activate behavior ``` 根据服务商设置相应的 API key： ``` export OPENAI_API_KEY=... export GEMINI_API_KEY=... ``` 需要确保 conda 环境和本地机器设置中已包含 OmniGibson 和 BEHAVIOR-1K 资产。 #### 特殊 OmniGibson 设置在运行 ESI-Bench 之前，请从 OmniGibson 生成的地图中移除墙壁。在您的本地 OmniGibson 源码树中，编辑 `./asset_pipeline/b1k_pipeline/usd_conversion/make_maps.py`，使 `NEEDED_STRUCTURE_CATEGORIES` 仅包含地板类别： ``` WALL_CATEGORIES = ["walls", "rail_fence"] FLOOR_CATEGORIES = ["floors", "driveway", "lawn"] DOOR_CATEGORIES = ["door", "sliding_door", "garage_door", "gate"] IGNORE_CATEGORIES = ["carpet"] # NEEDED_STRUCTURE_CATEGORIES = FLOOR_CATEGORIES + WALL_CATEGORIES NEEDED_STRUCTURE_CATEGORIES = FLOOR_CATEGORIES ``` 参见 [issue #1](https://github.com/ESI-Bench/ESI-Bench/issues/1)。 ### 运行探索器在仓库根目录下运行： ``` python src/main.py \ --task counting \ --metadata "dataset/json_clean/Enumerative Perception/Spatial Segmentation/Merom_0_int/living_room_0/q_000.json" \ --provider gemini \ --model gemini-3.1-pro-preview \ --max-steps 30 \ --min-steps 1 \ --threshold 0.9 \ --results-root outputs/results \ --step-image-root outputs/steps \ --overwrite ``` 对于 GPT： ``` python src/main.py \ --task cognitivemap \ --metadata "dataset/json_clean/Cognitive Mapping.json" \ --question-index 0 \ --provider gpt \ --model gpt-5 \ --max-steps 30 \ --min-steps 1 \ --threshold 0.9 \ --results-root outputs/results \ --step-image-root outputs/steps \ --overwrite ``` `--metadata` 可以是 `dataset/json_clean` 下的单个标准问题 JSON，也可以是包含 `json_paths` 的大任务摘要 JSON（例如 `dataset/json_clean/Cognitive Mapping.json`）。使用 `--question-index` 从摘要列表中进行选择。有关每个小任务的 `--task`、摘要 JSON 以及示例 `--question-index` 映射，请参见 [`docs/run_tasks.md`](docs/run_tasks.md)。 ### 任务名称 `--task` 名称是 `src/active_explore/tasks` 下的模块名称： ``` action, angle_confusion, cognitivemap, counting, deformable, distance, line, mirror, multiagent, occlusion, pour, size, slope, stacking, storage, touching, transparent, triangle, unobserved_changes ``` 输入 JSON 目录遵循 ESI-Bench 的表格类别： ``` Action Sequencing, Cognitive Mapping, Enumerative Perception, Metric Comparison, Perceptual Grounding, Physical Dynamics, Physical Structure, Spatial Relations, Specular Reflection, Temporal Understanding ``` ### 输出格式运行器会写入： - 位于 `--results-root` 下的 `answer.json` - 位于 `--step-image-root` 下的 `step_*.png` ## 数据集生成所有任务类别的数据集构建脚本都位于 [`src/dataset_generation/`](src/dataset_generation/) 下。每个任务文件夹包含一个 Python 脚本和一个相应的 bash 运行器。要生成数据，请激活 `behavior` 环境并运行您想要执行的任务的 bash 脚本： ``` source ~/miniconda3/etc/profile.d/conda.sh conda activate behavior # 示例：生成 occlusion 数据 bash src/dataset_generation/task_hallucination/batch_occlusion_yining.sh # 示例：生成 slope/stacking 数据 bash src/dataset_generation/task_physics/batch_slope.sh bash src/dataset_generation/task_physics/batch_stack.sh ``` 在运行任何调用模型的脚本之前，请设置您的 API key： ``` export OPENAI_API_KEY=... export GEMINI_API_KEY=... ``` 任务文件夹及其脚本如下： | 文件夹 | 脚本 | |---|---| | `task_action_sequencing` | `batch_action` | | `task_capacity` | `batch_pour`, `batch_storage`, `batch_storage_multi`, `batch_water` | | `task_cognitive_map` | `batch_cognitivemap_connect`, `batch_cognitivemap_merge`, `batch_cognitivemap_plan`, `batch_cognitivemap_region` | | `task_comparison` | `batch_distance`, `batch_size`, `batch_size_robot` | | `task_confusing_relation` | `batch_equilateral`, `batch_isosceles`, `batch_randomtriangle`, `batch_line`, `batch_line_positive`, `batch_touching`, `batch_touching_false`, `batch_touching_real` | | `task_counting` | `batch_counting_merge` | | `task_deformable` | `batch_deformable` | | `task_hallucination` | `batch_angle_confusion`, `batch_angle_confusion_yining`, `batch_dependency`, `batch_occlusion`, `batch_occlusion_yining`, `batch_transparent`, `batch_transparent_false` | | `task_mirror` | `batch_mirror_correspondence`, `batch_mirror_distance`, `batch_mirror_merge`, `batch_mirror_object_reality` | | `task_multi_agent` | `batch_multi_agent` | | `task_physics` | `batch_slope`, `batch_stack` | | `task_unobserved_changes` | `batch_unobserved_changes` | ## 引用如果您的研究中发现 ESI-Bench 有用，请引用： ``` @inproceedings{hong2026esibench, title = {{ESI-Bench}: Towards Embodied Spatial Intelligence that Closes the Perception-Action Loop}, author = {Hong, Yining and Liu, Jiageng and Yin, Han and Li, Manling and Guibas, Leonidas and Li, Fei-Fei and Wu, Jiajun and Choi, Yejin}, year = {2026} } ``` 我们也基于 BEHAVIOR-1K 和 OmniGibson 进行了构建。请同样引用它们： ``` @inproceedings{li2023behavior1k, title = {{BEHAVIOR-1K}: A Benchmark for Embodied {AI} with 1,000 Everyday Activities and Realistic Simulation}, author = {Li, Chengshu and Zhang, Ruohan and Wong, Josiah and Gokmen, Cem and Srivastava, Sanjana and Mart{\'i}n-Mart{\'i}n, Roberto and Wang, Chen and Levine, Gabrael and Lingelbach, Michael and Sun, Jiankai and Anvari, Mona and Hwang, Minjune and Sharma, Manasi and Aydin, Arman and Bansal, Dhruva and Hunter, Samuel and Kim, Kyu-Young and Lou, Alan and Matthews, Caleb R and Villa-Renteria, Ivan and Tang, Jerry Huayang and Tang, Claire and Xia, Fei and Savarese, Silvio and Gweon, Hyowon and Liu, Karen and Wu, Jiajun and Fei-Fei, Li}, booktitle = {Proceedings of The 6th Conference on Robot Learning}, series = {Proceedings of Machine Learning Research}, volume = {205}, pages = {80--93}, publisher = {PMLR}, year = {2023} } @inproceedings{li2022omnigibson, title = {{OmniGibson}: A Platform for Accelerating Embodied {AI} Research Built upon {NVIDIA}'s Omniverse Engine}, author = {Li, Chengshu and Gokmen, Cem and Lingelbach, Michael and Srivastava, Sanjana and Mart{\'i}n-Mart{\'i}n, Roberto and Ber, Daniel and Shen, William and Hirose, Noriaki and Zhang, Ruohan and Liu, Karen and Gweon, Hyowon and Savarese, Silvio and Fei-Fei, Li and Wu, Jiajun}, booktitle = {Proceedings of The 6th Conference on Robot Learning}, year = {2022} } ``` ## 许可证该项目基于 MIT 许可证授权。详情请参见 [LICENSE](LICENSE)。

_{基于 OmniGibson 构建 · Stanford University · UCLA · Northwestern University}

标签：人工智能, 具身智能, 感知与决策, 机器人学, 用户模式Hook绕过, 空间智能, 逆向工具

ESI-Bench/ESI-Bench

ESI-Bench：迈向闭合感知-动作回路的具身空间智能

ESI-Bench：迈向闭合感知-动作回路的
具身空间智能