Skip to content
Mobrief
Mobrief
Back to archive

Research · Hugging Face Daily Papers

LAMP: Lift Image-Editing as General 3 D Priors for Open-world Manipulation

Human-like generalization in open-world remains a fundamental challenge for robotic manipulation.

Apr 09, 2026 17:14 UTC · ~4 min read · Technical Source
Read original
  • Existing learning-based methods, including reinforcement learning, imitation learning, and vision-language-action-models (VLAs), often struggle with novel tasks and unseen environments.
  • Another promising direction is to explore generalizable representations that capture fine-grained spatial and geometric relations for open-world manipulation.
  • While large-language-model (LLMs) and vision-language-model (VLMs) provide strong semantic reasoning based on language or annotated 2 D representations, their limited 3 D awareness restricts their…

Context

Existing learning-based methods, including reinforcement learning, imitation learning, and vision-language-action-models (VLAs), often struggle with novel tasks and unseen environments. Another promising direction is to explore generalizable representations that capture fine-grained spatial and geometric relations for open-world manipulation. While large-language-model (LLMs) and vision-language-model (VLMs) provide strong semantic reasoning based on language or annotated 2 D representations, their limited 3 D awareness restricts their applicability to fine-grained manipulation. To address this, Hugging Face Daily Papers proposes LAMP, which lifts image-editing as 3 D priors to extract inter-object 3 D transformations as continuous, geometry-aware representations. Hugging Face Daily Papers's key insight is that image-editing inherently encodes rich 2 D spatial cues, and lifting these implicit cues into 3 D transformations provides fine-grained and accurate guidance for open-world manipulation. Extensive experiments demonstrate that \codename delivers precise 3 D transformations and achieves strong zero-shot generalization in open-world manipulation. Project page:…

Existing learning-based methods, including reinforcement learning, imitation learning, and vision-language-action-models (VLAs), often struggle with novel tasks and unseen environments.