Structured 3D Latents for Scalable and Versatile 3D Generation

Jianfeng Xiang; Zelong Lv; Sicheng Xu; Yu Deng; Ruicheng Wang; Bowen Zhang; Dong Chen; Xin Tong; Jiaolong Yang

Structured 3D Latents for Scalable and Versatile 3D Generation

Jianfeng Xiang ,
Zelong Lv ,
Sicheng Xu ,
Yu Deng ,
Ruicheng Wang ,
Bowen Zhang ,
Dong Chen ,
Xin Tong ,
Jiaolong Yang

CVPR 2025 | May 2025

Download BibTex

We introduce a novel 3D generation method for versatile and high-quality 3D asset creation. The cornerstone is a unified Structured LATent (SLAT) representation which allows decoding to different output formats, such as Radiance Fields, 3D Gaussians, and meshes. This is achieved by integrating a sparsely-populated 3D grid with dense multiview visual features extracted from a powerful vision foundation model, comprehensively capturing both structural (geometry) and textural (appearance) information while maintaining flexibility during decoding. We employ rectified flow transformers tailored for SLAT as our 3D generation models and train models with up to 2 billion parameters on a large 3D asset dataset of 500K diverse objects. Our model generates high-quality results with text or image conditions, significantly surpassing existing methods, including recent ones at similar scales. We showcase flexible output format selection and local 3D editing capabilities which were not offered by previous models. Code, model, and data will be released.

GitHub

Publication Downloads

TRELLIS

August 1, 2025

TRELLIS is a large 3D asset generation model that creates high-quality 3D assets from simple text or image inputs. Using a unified latent space (SLAT), it delivers detailed, textured 3D models in formats like meshes, radiance fields, and 3D Gaussians. Its flexibility, editing capabilities, and superior quality enable faster, more adaptable workflows in gaming, virtual worlds, industrial design, and beyond.

Download Data