ServiceNow recherche

Multi-modal Learning

Representing Positional Information in Generative World Models for Object Manipulation
The ability to predict outcomes of interactions between embodied agents and objects is paramount in the robotic setting. While …
VCR: Visual Caption Restoration
We introduce Visual Caption Restoration (VCR), a novel vision-language task that challenges models to accurately restore partially …
WorkArena++: Towards Compositional Planning and Reasoning-based Common Knowledge Work Tasks
The ability of large language models (LLMs) to mimic human-like intelligence has led to a surge in LLM-based autonomous agents. Though …
Multimodal foundation world models for generalist embodied agents
Learning generalist embodied agents, able to solve multitudes of tasks in different domains is a long-standing problem. Reinforcement …
InCoRo: In-Context Learning for Robotics Control with Feedback Loops
One of the challenges in robotics is to enable robotic units with the reasoning capability that would be robust enough to execute …
StarVector: Generating Scalable Vector Graphics Code from Images and Text
Scalable Vector Graphics (SVGs) have become integral in modern image rendering and graphic design applications due to their infinite …
Are Diffusion Models Vision-And-Language Reasoners?
Text-conditioned image generation models have recently shown immense qualitative success using denoising diffusion processes. However, …
MAPL: Parameter-Efficient Adaptation of Unimodal Pre-Trained Models for Vision-Language Few-Shot Prompting
Large pre-trained models have proved to be remarkable zero- and (prompt-based) few-shot learners in unimodal vision and language tasks. …
FigGen: Text to Scientific Figure Generation
The generative modeling landscape has experienced tremendous growth in recent years, particularly in generating natural images and art. …
Haptics-based Curiosity for Sparse-reward Tasks
Robots in many real-world settings have access to force/torque sensors in their gripper and tactile sensing is often necessary in tasks …