Accueil
Équipe
Publications
Open Source
Démos
Évènements
Blog
Carrières
Nous joindre
Français
Français
English
ServiceNow
ServiceNow IA recherche
Tags
Multi-modal Learning
ServiceNow IA recherche
Multi-modal Learning
StarVector: Generating Scalable Vector Graphics Code from Images and Text
Scalable Vector Graphics (SVGs) are vital for modern image rendering due to their scalability and versatility. Previous SVG generation …
Juan A. Rodriguez
,
Abhay Puri
,
Shubham Agarwal
,
Issam H. Laradji
,
Pau Rodriguez
,
Sai Rajeswar Mudumba
,
David Vazquez
,
Christopher Pal
,
Marco Pedersoli
AAAI Demos, 2025.
PDF
Citation
Vidéo
BigDocs: A Permissively-Licensed Dataset for Training Vision-Language Models on Document and Code Tasks
Vision and language models that can accurately understand both images and text are crucial for deeper document understanding. These …
Juan A. Rodriguez
,
Xiangru Jian
,
Siba Smarak Panigrahi
,
Tianyu Zhang
,
Aarash Feizi
,
Abhay Puri
,
Akshay Kalkunte
,
Francois Savard
,
Amirhossein Abaskohi
,
Ahmed Masry
,
Shravan Nayak
,
Mahsa Massoud
,
Rabiul Awal
,
Pierre-André Noël
,
Mats L. Richter
,
Saverio Vadacchino
,
Shubham Agarwal
,
Sanket Biswas
,
Ying Zhang
,
Sathwik Tejaswi Madhusudhan
,
João Monteiro
,
Krishnamurthy (Dj) Dvijotham
,
Torsten Scholak
,
Nicolas Chapados
,
Sean Hughes
,
Tamer Özsu
,
Aishwarya Agrawal
,
Marco Pedersoli
,
Christopher Pal
,
Perouz Taslakian
,
David Vazquez
,
Issam H. Laradji
,
Spandana Gella
,
Sai Rajeswar Mudumba
Workshop at the Neural Information Processing Systems (NeurIPS), 2024.
PDF
Citation
Code
Vidéo
Multimodal foundation world models for generalist embodied agents
Learning generalist agents, able to solve multitudes of tasks in different domains is a long-standing problem. Reinforcement learning …
Pietro Mazzaglia
,
Tim Verbelen
,
Bart Dhoedt
,
Aaron Courville
,
Sai Rajeswar Mudumba
Neural Information Processing Systems (NeurIPS), 2024.
PDF
Citation
Code
Representing Positional Information in Generative World Models for Object Manipulation
The ability to predict outcomes of interactions between embodied agents and objects is paramount in the robotic setting. While …
Stefano Ferraro
,
Pietro Mazzaglia
,
Tim Verbelen
,
Sai Rajeswar Mudumba
Workshop at the Neural Information Processing Systems (NeurIPS), 2024.
PDF
Citation
VCR: Visual Caption Restoration
We introduce Visual Caption Restoration (VCR), a novel vision-language task that challenges models to accurately restore partially …
Tianyu Zhang
,
Suyuchen Wang
,
Lu Li
,
Ge Zhang
,
Perouz Taslakian
,
Sai Rajeswar Mudumba
,
Jie Fu
,
Bang Liu
,
Yoshua Bengio
Workshop at the Neural Information Processing Systems (NeurIPS), 2024.
PDF
Citation
Code
WorkArena++: Towards Compositional Planning and Reasoning-based Common Knowledge Work Tasks
The ability of large language models (LLMs) to mimic human-like intelligence has led to a surge in LLM-based autonomous agents. Though …
Léo Boisvert
,
Megh Thakkar
,
Maxime Gasse
,
Massimo Caccia
,
Thibault Le Sellier De Chezelles
,
Quentin Cappart
,
Nicolas Chapados
,
Alexandre Lacoste
,
Alexandre Drouin
NeurIPS Datasets and Benchmarks Track (NeurIPS Datasets), 2024.
PDF
Citation
Code
Vidéo
Multimodal foundation world models for generalist embodied agents
Learning generalist embodied agents, able to solve multitudes of tasks in different domains is a long-standing problem. Reinforcement …
Pietro Mazzaglia
,
Tim Verbelen
,
Bart Dhoedt
,
Aaron Courville
,
Sai Rajeswar Mudumba
Workshop at the International Conference of Machine Learning (ICML), 2024.
PDF
Citation
Code
InCoRo: In-Context Learning for Robotics Control with Feedback Loops
One of the challenges in robotics is to enable robotic units with the reasoning capability that would be robust enough to execute …
Jiaquiang Ye Zhu
,
Carla Gomez
,
David Vazquez
,
Michal Drozdzal
ArXiv, 2024.
PDF
Citation
StarVector: Generating Scalable Vector Graphics Code from Images and Text
Scalable Vector Graphics (SVGs) have become integral in modern image rendering and graphic design applications due to their infinite …
Juan A. Rodriguez
,
Shubham Agarwal
,
Abhay Puri
,
Issam H. Laradji
,
Sai Rajeswar Mudumba
,
Pau Rodriguez
,
David Vazquez
,
Christopher Pal
,
Marco Pedersoli
ArXiv, 2024.
PDF
Citation
Are Diffusion Models Vision-And-Language Reasoners?
Text-conditioned image generation models have recently shown immense qualitative success using denoising diffusion processes. However, …
Benno Krojer
,
Elinor Poole-Dayan
,
Vikram Voleti
,
Christopher Pal
,
Siva Reddy
Conference on Neural Information Processing Systems (NeurIPS), 2023.
PDF
Citation
Code
«
»
Citation
×