NVlabs/Eagle
GitHub: NVlabs/Eagle
Stars: 2178 | Forks: 182
# 🦅 Eagle: Frontier Vision-Language Models with Data-Centric Strategies
## Updates
- [2026/05] 🔥 Release [LocateAnything](./Embodied/) — A generalist vision-language grounding model based on Eagle.
- [2025/12] 🎉 A native resolution variant of the Eagle model is adopted as the VLM backbone of [GR00T-N1.6](https://huggingface.co/collections/nvidia/gr00t-n16). Check out the [tech blog](https://research.nvidia.com/labs/gear/gr00t-n1_6/) for more details.
- [2025/10] 🔥 Release Eagle 2.5 [source code](https://github.com/NVlabs/EAGLE/tree/main/Eagle2_5).
- [2025/09] 🔥 Eagle 2.5 is accepted to [NeurIPS 2025](https://neurips.cc/Conferences/2025).
- [2025/09] 🎉 Eagle 2 is supported in [Torch-TRT](https://github.com/pytorch/TensorRT/tree/main/tools/llm).
- [2025/07] 🎉 Release Eagle 2.5 [model](https://huggingface.co/nvidia/Eagle2.5-8B).
- [2025/06] 🔥 Eagle 2.5 is adopted as the VLM backbone of [GR00T-N1.5](https://huggingface.co/nvidia/GR00T-N1.5-3B). Check out the [tech blog](https://research.nvidia.com/labs/gear/gr00t-n1_5/) for more details.
- [2025/04] 🎉 Release Eagle 2.5 [tech report](https://arxiv.org/abs/2504.15271).
- [2025/03] 🔥 Eagle 2 is adopted as the VLM backbone (System-2) of [GR00T-N1](https://youtu.be/m1CH-mgpdYg?si=WSDroL32HHE9xYSN). Check out the [GTC launch](https://www.youtube.com/watch?v=_waPvOwL9Z8&t=6935s) and [white paper](https://arxiv.org/abs/2503.14734) for more details.
- [2025/01] 🎉 Release Eagle 2 [tech report](http://arxiv.org/abs/2501.14818) and [models](https://huggingface.co/collections/nvidia/eagle).
- [2025/01] 🎉 [Eagle](./Eagle/README.md) is accepted as [ICLR 2025](https://iclr.cc) Spotlight.
- [2024/08] 🎉 Release [Eagle](./Eagle/README.md).
## Resources
### 🌟 Get Started
- 📚 [Getting started with LocateAnything](./Embodied/README.md)
- 📚 [Getting started with Eagle 2.5](./Eagle2_5/document/0.onboarding.md)
- 📚 [README for Eagle](./Eagle/README.md)
## The Eagle VLM Family
**Eagle** is a family of frontier vision-language models (VLMs) from NVIDIA that explore data-centric strategies across general-purpose multimodal understanding, long-context reasoning, and embodied applications.
Beyond advancing multimodal foundation model research, Eagle has also served as a research and development platform that supports multiple flagship NVIDIA efforts across enterprise intelligence and Physical AI, including but not limited to [Llama-Nemotron Nano VLM](https://huggingface.co/blog/nvidia/llama-nemotron-nano-vl), [Nemotron VLMs](https://www.nvidia.com/en-us/ai-data-science/foundation-models/nemotron/) ([V2 Nano VL](https://arxiv.org/abs/2511.03929) / [V3 Nano Omni](https://arxiv.org/abs/2604.24954)), [NeMo Retriever](https://developer.nvidia.com/nemo-retriever) ([Llama Nemoretriever Colembed](https://arxiv.org/abs/2507.05513)), [NVIDIA Isaac GR00T](https://developer.nvidia.com/isaac/gr00t) [N1](https://developer.nvidia.com/blog/accelerate-generalist-humanoid-robot-development-with-nvidia-isaac-gr00t-n1/) ([Tech Report](https://arxiv.org/abs/2503.14734)) / [N1.5](https://research.nvidia.com/labs/gear/gr00t-n1_5/) / [N1.6](https://developer.nvidia.com/blog/building-generalist-humanoid-capabilities-with-nvidia-isaac-gr00t-n1-6-using-a-sim-to-real-workflow/), and [Cosmos](https://www.nvidia.com/en-us/ai/cosmos/).
| Model | Features | Summary |
|:--|:--|:--|
| **[LocateAnything](./Embodied/)** | Generalist grounding, detection and pointing | Fast & high quality vision-language grounding with Parallel Box Decoding |
| **[Eagle 2.5](./Eagle2_5/)** | Frontier VLM with SOTA image & video understanding | Framework & data strategy for long-context multimodal understanding |
| **[Eagle 2](./Eagle/README.md)** | Frontier VLM with SOTA image understanding | Exploring the post-training data strategies for frontier VLMs |
| **[Eagle](./Eagle/README.md)** | VLMs with mixture-of-encoders | Exploring the design space for vision-centric VLMs |
## Features and Capabilities
### Generalist Grounding & Understanding
|
Dense Object Detection LocateAnything performs diverse localization tasks under a unified VLM — document understanding, GUI grounding, dense object detection, and OCR. |
Fast Decoding Speed Parallel Box Decoding (PBD) vs. Quantized Coordinate Decoding — PBD predicts each bounding box atomically in a single forward pass for substantially faster throughput. |
### Long-Context Multimodal Understanding & Reasoning
|
User Prompt Analyze the video and divide it into distinct sections. For each section, create a title using a few words or phrases, and a detailed caption describing the content of the section, and indicate when it begins in seconds. Multiple sections are separated by line breaks.. |
VLM Captioning
1.05 seconds, Introductiondetailed caption: The clip begins with a majestic scene of a warrior standing atop a mountaintop, surrounded by mystical energy. The warrior is adorned in ornate, decorated armor with intricate patterns and elaborate designs. The background features a misty, ethereal landscape with mountains in the distance. The warrior then climbs a large stone adorned with intricate engravings. The stone is massive, set amidst a mystical and serene environment with other towering rocks nearby. As the warrior ascends, the camera zooms in to reveal the detailed engravings on the stone, showcasing its ancient and mystical nature. The scene transitions to the warrior reaching the top of the stone and spreading his arms wide, standing victorious. Suddenly, the focus shifts to a mystical figure in flowing robes, who appears to be a sage or a mystical character, standing amidst the rocky landscape. This figure is illuminated by an otherworldly light, suggesting his power and wisdom. The clip concludes with this mystical character appearing calm and serene, hinting at a connection to the warrior and the stone, underscoring themes of power, wisdom, and mystical journey. Show More5.99 seconds, Explaining game genredetailed caption: The clip begins with a character dressed in ornate armor, moving stealthily through a forest and up a mountain. The camera focuses on the intricate details of the armor, which has elaborate patterns and is adorned with gold accents. As the character ascends, a large, mystical landscape with towering trees and distant mountains comes into view. The character is then seen standing on the mountaintop, with an ethereal glow surrounding them. The camera shifts to show a wide view of the sky with clouds, creating a dramatic backdrop. Next, the character engages in combat, wielding a large weapon amidst an army dressed similarly, with a focus on their coordinated attack. The action intensifies as the character fights against a large, stone statue, which has a menacing expression and rough, textured surface. The clip continues with another character in white robes, who seems to be casting spells or invoking some form of power. The environment transitions to a snowy battlefield where the main character battles against another warrior, engaging in dynamic combat moves. The scene is filled with dramatic lighting effects, showing the two warriors clashing amidst a snowy landscape with large statues looming in the background. The clip concludes with the two characters continuing their intense battle, with the main character executing elaborate and powerful strikes. 517.10 seconds, Discussing controls detailed caption: The clip begins with two warriors locked in combat, using long, decorated staffs to strike each other in an icy landscape. The scene is intense, with the warriors demonstrating precise and powerful movements. As they clash, the camera shifts to focus on a mystical figure wearing ornate armor and a crown, who appears to be engaging in a magical ritual or challenge. The warrior in white robes seems to be performing a spell, summoning glowing lights and energy. The camera captures the intricate details of the armor, showing dragon and lion motifs intricately etched into the metal. As the scene progresses, the warrior in white robes is seen riding a large, mystical beast, which attacks the main warrior, creating a swirling vortex of energy. The background features a mountainous landscape, adding to the epic and fantastical atmosphere. The clip concludes with the warrior in white robes standing triumphantly, holding his staff aloft, as the spell they cast manifests before them, demonstrating their power and mastery over the mystical forces. 614.53 seconds, Talking about story & characters detailed caption: The clip begins with a wide shot of a vast forest with tall trees, and mountains visible in the distance partially covered by mist. The sky is bright and clear, suggesting a peaceful morning. The scene then transitions to a close-up of a mystical, swirling black rock with intricate patterns upon its surface. A hand, presumably belonging to a character named Wukong, reaches out and touches the rock, causing it to shift and reveal a hidden passage. As the hand touches the rock, a transformation appears to begin, symbolizing awakening or revelation. Next, the scene shifts to a different location where Wukong stands on a rocky terrain with trees and a mountainous backdrop, suggesting a connection to nature and ancient wisdom. Wukong is dressed in simple, flowing robes, and with an air of contemplation, gazes off into the distance. The clip transitions to another scene where Wukong is now seen in a more rugged outfit, equipped with a large staff, indicating a readiness for adventure or confrontation. The clip captures the essence of a journey filled with mystery, self-discovery, and the pursuit of enlightenment. Throughout the clip, text appears in Chinese characters, providing context and narrative to the unfolding story, including the names 'Wukong' (in both the original script and pinyin) and 'Peng Lao Dang,' adding depth to the characters and their world. 698.69 seconds, Describing visuals & sound detailed caption: The clip begins with an ancient stone structure partially submerged in the ocean, its smooth, curved surface hinting at its historical significance. As the camera pans out, large waves crash against the structure, creating a powerful and dramatic atmosphere. The water is a deep blue, and the waves crash with great force, splashing white foam around. In the next scene, the setting shifts to a lush forest at dusk, where dense foliage is illuminated by golden light filtering through the trees. The scene captures the tranquility and beauty of nature, with the play of light and shadow creating a mystical ambiance. The forest then transforms into a rocky mountainous terrain, where a waterfall cascades down a cliff. The water is clear, and the rocks are rugged, emphasizing the raw power of nature. Finally, the clip returns to the mystical realm, where a large, ancient stone stands in a serene landscape. This stone is covered in intricate carvings and stands as a testament to the enduring legacy of the place. The camera lingers on the details of the carvings, highlighting the artistry and craftsmanship of the ancient civilization. The clip concludes with a close-up of the stone, capturing the fine details and the texture of the carvings, leaving a lasting impression of the historical and natural beauty depicted throughout the clip. 738.62 seconds, Closing detailed caption: The clip begins with a character with white hair and a beard, wearing a loose robe and a beaded necklace, extending their hand towards a large, swirling rock on a rocky mountain. The rock features intricate designs that appear to be swirling patterns. The action proceeds as the character touches the rock, and the rock starts to glow with a mystical light, suggesting a magical transformation or unlocking of power. As the character's hand makes contact with the rock, the surroundings become more luminous, highlighting the magical ambiance. The scene transitions to the character walking away from the rock, moving down a series of rocky steps into a valley. As the character descends, more characters dressed in similar attire come into view, implying a gathering or a quest. The video captures the mystical atmosphere of the mountainous landscape, emphasizing the character's interaction with the rock and the mystical surroundings. |
### Applications across Virtual and Physical Worlds
Document & General VQA **Question:** Based on this graph, how did the population of Puerto Rico change over the last decade? **Answer:** The graph illustrates the population growth of each state in the United States between 2010 and 2020. The majority of states experienced an increase, with only two experiencing a decline. Puerto Rico is among these two states. The number of people living in Puerto Rico decreased by approximately 365,000 over the last decade. This represents a decrease of about 12%. |
3D Perception & Spatial Intelligence
Examples of 3D object detection in the wild by LocateAnything3D.
|
Smart City & MetropolisAn example of zero-shot ultra-dense pedestrian detection in the wild for a road crossing in Shibuya, Tokyo, one of the busiest areas in the world.
3D Perception & Spatial Intelligence
Examples of 3D object detection in the wild by LocateAnything3D.