Multi-Modal Learning

InCoRo: In-Context Learning for Robotics Control with Feedback Loops

One of the challenges in robotics is to enable robotic units with the reasoning capability that would be robust enough to execute …

Jiaquiang Ye Zhu, Carla Gomez, David Vazquez, Michal Drozdzal

ArXiv, 2024.

StarVector: Generating Scalable Vector Graphics Code from Images and Text

Scalable Vector Graphics (SVGs) have become integral in modern image rendering and graphic design applications due to their infinite …

Juan A. Rodriguez, Shubham Agarwal, Abhay Puri, Issam H. Laradji, Sai Rajeswar Mudumba, Pau Rodriguez, David Vazquez, Christopher Pal, Marco Pedersoli

ArXiv, 2024.

Are Diffusion Models Vision-And-Language Reasoners?

Text-conditioned image generation models have recently shown immense qualitative success using denoising diffusion processes. However, …

Benno Krojer, Elinor Poole-Dayan, Vikram Voleti, Christopher Pal, Siva Reddy

Conference on Neural Information Processing Systems (NeurIPS), 2023.

MAPL: Parameter-Efficient Adaptation of Unimodal Pre-Trained Models for Vision-Language Few-Shot Prompting

Large pre-trained models have proved to be remarkable zero- and (prompt-based) few-shot learners in unimodal vision and language tasks. …

Oscar Manas, Pau Rodriguez, Saba Ahmadi, Aida Nematzadeh, Yash Goyal, Aishwarya Agrawal

European Chapter of the Association for Computational Linguistics (EACL), 2023.

FigGen: Text to Scientific Figure Generation

The generative modeling landscape has experienced tremendous growth in recent years, particularly in generating natural images and art. …

Juan A. Rodriguez, David Vazquez, Issam H. Laradji, Marco Pedersoli, Pau Rodriguez

International Conference of Learning Representations (ICLR), 2023.

Haptics-based Curiosity for Sparse-reward Tasks

Robots in many real-world settings have access to force/torque sensors in their gripper and tactile sensing is often necessary in tasks …

Sai Rajeswar Mudumba, Cyril Ibrahim, Nitin Surya, Florian Golemo, David Vazquez, Aaron Courville, Pedro O. Pinheiro

Conference on Robot Learning (CoRL), 2022.

Adaptive Cross-Modal Few-shot Learning

Metric-based meta-learning techniques have successfully been applied to few-shot classification problems. In this paper, we propose to …

Chen Xing, Negar Rostamzadeh, Boris N. Oreshkin, Pedro O. Pinheiro

Conference on Neural Information Processing Systems (NeurIPS), 2019.

Neural Multisensory Scene Inference

For embodied agents to infer representations of the underlying 3D physical world they inhabit, they should efficiently combine …

Jae Hyun Lim, Pedro O. Pinheiro, Negar Rostamzadeh, Christopher Pal, Sungjin Ahn

Conference on Neural Information Processing Systems (NeurIPS), 2019.

Integrating Vision and Language in Social Networks for Identifying Visual Patterns of Personality Traits

Social media, as a major platform for communication and information exchange, is a rich repository of the opinions and sentiments of …

Pau Rodriguez, Jordi Gonzalez, Josep M. Gonfaus, Xavier Roca

International Journal of Social Science and Humanity (IJSSH), 2019.

BabyAI: A Platform to Study the Sample Efficiency of Grounded Language Learning

Allowing humans to interactively train artificial agents to understand language instructions is desirable for both practical and …

Maxime Chevalier-Boisvert, Dzmitry Bahdanau, Salem Lahlou, Lucas Willems, Chitwan Saharia, Thien Huu Nguyen, Yoshua Bengio

International Conference on Learning Representations (ICLR), 2019.