FreeInsert: Disentangled Text-Guided Object Insertion in 3D Gaussian Scene without Spatial Priors

Abstract

Text-driven object insertion in the 3D scene is an emerging task that enables intuitive scene editing through natural language. Despite its potential, existing 2D editing-based methods often suffer from reliance on spatial priors such as 2D masks, 3D bounding boxes to. And they struggle to ensure inserted object consistency. These constraints hinder flexibility and scalability in real-world applications.

In this paper, we propose FreeInsert, a novel framework that leverages foundation models (MLLMs, LGM, and diffusion model) to disentangle object generation and spatial placement, enabling unsupervised and flexible object insertion in 3D scenes without spatial priors. FreeInsert begins with an MLLM-based parser that extracts structured semantics—including object types, spatial relationships, and attachment regions—from user instructions. These semantics guide both the reconstruction of the inserted object for 3D consistency and the learning of its degrees of freedom. We first leverage the spatial reasoning capabilities of MLLMs to initialize the object's pose and scale. To further enhance natural integration with the scene, a hierarchical spatially-aware stage is employed to refine the object’s placement, incorporating both the spatial semantics and priors inferred by the MLLM. Finally, the object’s appearance is enhanced using inserted-object image to improve visual fidelity.

Experimental results demonstrate that FreeInsert enables semantically coherent, spatially precise, and visually realistic 3D insertions, without requiring any spatial priors, offering a user-friendly and flexible editing experience.

Pipeline

BibTeX

@inproceedings{li2025freeinsert,
  author    = {Li, Chenxi and Wang, Weijie and Li, Qiang and Lepri, Bruno and Sebe, Nicu and Nie, Weizhi},
  title     = {FreeInsert: Disentangled Text-Guided Object Insertion in 3D Gaussian Scene without Spatial Priors},
  journal   = {Proceedings of the 33nd ACM International Conference on Multimedia},
  year      = {2025},
}