RoboSafe: Safeguarding Embodied Agents via Executable Safety Logic

What’s new (20 sec)

Embodied agents powered by vision-language models (VLMs) are increasingly capable of executing complex real-world tasks, yet they remain vulnerable to hazardous instructions that may trigger unsafe behaviors.

Why it matters (2 min)

Embodied agents powered by vision-language models (VLMs) are increasingly capable of executing complex real-world tasks, yet they remain vulnerable to hazardous instructions that may trigger unsafe…
Runtime safety guardrails, which intercept hazardous actions during task execution, offer a promising solution due to their flexibility.
Open receipts to verify and go deeper.

Go deeper (8 min)

Context

Embodied agents powered by vision-language models (VLMs) are increasingly capable of executing complex real-world tasks, yet they remain vulnerable to hazardous instructions that may trigger unsafe behaviors. Runtime safety guardrails, which intercept hazardous actions during task execution, offer a promising solution due to their flexibility. However, existing defenses often rely on static rule filters or prompt-level control, which struggle to address implicit risks arising in dynamic, temporally dependent, and context-rich environments. To address this, we propose RoboSafe, a hybrid reasoning runtime safeguard for embodied agents through executable predicate-based safety logic. RoboSafe integrates two complementary reasoning processes on a Hybrid Long-Short Safety Memory. We first propose a Backward Reflective Reasoning module that continuously revisits recent trajectories in short-term memory to infer temporal safety predicates and proactively triggers replanning when violations are detected. We then propose a Forward Predictive Reasoning module that anticipates upcoming risks by generating context-aware safety predicates from the long-term safety memory and the agent's…

For builders

Builder: scan the abstract + experiments; look for code, datasets, and evals.

Verify

Prefer primary announcements, papers, repos, and changelogs over reposts.

Receipts

RoboSafe: Safeguarding Embodied Agents via Executable Safety Logic (arXiv cs.AI)