About
People
Publications
Open Source
Demos
Events
Blog
Careers
Contact
English
English
Français
ServiceNow
ServiceNow Research
Tags
Multi-modal Learning
ServiceNow Research
Multi-modal Learning
VCR: Visual Caption Restoration
We introduce Visual Caption Restoration (VCR), a novel vision-language task that challenges models to accurately restore partially …
Tianyu Zhang
,
Suyuchen Wang
,
Lu Li
,
Ge Zhang
,
Perouz Taslakian
,
Sai Rajeswar Mudumba
,
Jie Fu
,
Bang Liu
,
Yoshua Bengio
Workshop at the Neural Information Processing Systems (NeurIPS), 2024.
PDF
Cite
Code
WorkArena++: Towards Compositional Planning and Reasoning-based Common Knowledge Work Tasks
The ability of large language models (LLMs) to mimic human-like intelligence has led to a surge in LLM-based autonomous agents. Though …
Léo Boisvert
,
Megh Thakkar
,
Maxime Gasse
,
Massimo Caccia
,
Thibault Le Sellier De Chezelles
,
Quentin Cappart
,
Nicolas Chapados
,
Alexandre Lacoste
,
Alexandre Drouin
NeurIPS Datasets and Benchmarks Track (NeurIPS Datasets), 2024.
PDF
Cite
Code
Video
Multimodal foundation world models for generalist embodied agents
Learning generalist embodied agents, able to solve multitudes of tasks in different domains is a long-standing problem. Reinforcement …
Pietro Mazzaglia
,
Tim Verbelen
,
Bart Dhoedt
,
Aaron Courville
,
Sai Rajeswar Mudumba
Workshop at the International Conference of Machine Learning (ICML), 2024.
PDF
Cite
Code
InCoRo: In-Context Learning for Robotics Control with Feedback Loops
One of the challenges in robotics is to enable robotic units with the reasoning capability that would be robust enough to execute …
Jiaquiang Ye Zhu
,
Carla Gomez
,
David Vazquez
,
Michal Drozdzal
ArXiv, 2024.
PDF
Cite
StarVector: Generating Scalable Vector Graphics Code from Images and Text
Scalable Vector Graphics (SVGs) have become integral in modern image rendering and graphic design applications due to their infinite …
Juan A. Rodriguez
,
Shubham Agarwal
,
Abhay Puri
,
Issam H. Laradji
,
Sai Rajeswar Mudumba
,
Pau Rodriguez
,
David Vazquez
,
Christopher Pal
,
Marco Pedersoli
ArXiv, 2024.
PDF
Cite
Are Diffusion Models Vision-And-Language Reasoners?
Text-conditioned image generation models have recently shown immense qualitative success using denoising diffusion processes. However, …
Benno Krojer
,
Elinor Poole-Dayan
,
Vikram Voleti
,
Christopher Pal
,
Siva Reddy
Conference on Neural Information Processing Systems (NeurIPS), 2023.
PDF
Cite
Code
MAPL: Parameter-Efficient Adaptation of Unimodal Pre-Trained Models for Vision-Language Few-Shot Prompting
Large pre-trained models have proved to be remarkable zero- and (prompt-based) few-shot learners in unimodal vision and language tasks. …
Oscar Manas
,
Pau Rodriguez
,
Saba Ahmadi
,
Aida Nematzadeh
,
Yash Goyal
,
Aishwarya Agrawal
European Chapter of the Association for Computational Linguistics (EACL), 2023.
PDF
Cite
FigGen: Text to Scientific Figure Generation
The generative modeling landscape has experienced tremendous growth in recent years, particularly in generating natural images and art. …
Juan A. Rodriguez
,
David Vazquez
,
Issam H. Laradji
,
Marco Pedersoli
,
Pau Rodriguez
International Conference of Learning Representations (ICLR), 2023.
PDF
Cite
Haptics-based Curiosity for Sparse-reward Tasks
Robots in many real-world settings have access to force/torque sensors in their gripper and tactile sensing is often necessary in tasks …
Sai Rajeswar Mudumba
,
Cyril Ibrahim
,
Nitin Surya
,
Florian Golemo
,
David Vazquez
,
Aaron Courville
,
Pedro O. Pinheiro
Conference on Robot Learning (CoRL), 2022.
PDF
Cite
Code
Adaptive Cross-Modal Few-shot Learning
Metric-based meta-learning techniques have successfully been applied to few-shot classification problems. In this paper, we propose to …
Chen Xing
,
Negar Rostamzadeh
,
Boris N. Oreshkin
,
Pedro O. Pinheiro
Conference on Neural Information Processing Systems (NeurIPS), 2019.
PDF
Cite
Code
«
»
Cite
×