MindJourney: Test-Time Scaling with World Models for Spatial Reasoning
- Reuben Tan, Microsoft
The video introduces MindJourney, a framework that enhances Vision-Language Models (VLMs), which excel at interpreting single images but struggle to infer the underlying three-dimensional world. By allowing the VLM to “imagine” moving through the scene given a spatial reasoning question, the model proposes trajectories in a simulated imagination space. A world model then generates novel views along these paths, expanding the available observations from a single image. This richer 3D context enables the VLM to answer previously challenging questions with greater ease.
-
-
Reuben Tan
Senior Researcher
-
-
次を見る
-
-
Magma: A foundation model for multimodal AI Agents
- Jianwei Yang
-
-
-
-
-
-
-
-
Panel: AI Frontiers
- Ashley Llorens,
- Sébastien Bubeck,
- Ahmed Awadallah