About
People
Publications
Open Source
Demos
Events
Blog
Careers
Contact
English
English
Français
ServiceNow
ServiceNow AI Research
Tags
Multi-modal Learning
ServiceNow AI Research
Multi-modal Learning
WebMMU: A Benchmark for Multimodal Multilingual Website Understanding and Code Generation
Understanding diverse web data and automating web development presents an exciting challenge for agentic multimodal models. While …
Rabiul Awal
,
Mahsa Massoud
,
Zichao Li
,
Aarash Feizi
,
Suyuchen Wang
,
Christopher Pal
,
Aishwarya Agrawal
,
David Vazquez
,
Perouz Taslakian
,
Spandana Gella
,
Sai Rajeswar Mudumba
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2025.
PDF
Cite
Video
WebMMU: A Benchmark for Multimodal Multilingual Website Understanding and Code Generation
We present WebMMU, a multilingual benchmark that evaluates three core web tasks: (1) website visual question answering, (2) code …
Sai Rajeswar Mudumba
,
Christopher Pal
,
Perouz Taslakian
,
Spandana Gella
,
Rabiul Awal
,
Aarash Feizi
,
Mahsa Massoud
,
Zichao Li
,
Siva Reddy
,
David Vazquez
,
Suyuchen Wang
NOW AI, 2025.
Cite
BigCharts-R1: Enhanced Chart Reasoning with Visual Reinforcement Finetuning
Charts are essential to data analysis, transforming raw data into clear visual representations that support human decision-making. …
Ahmed Masry
,
Abhay Puri
,
Masoud Hashemi
,
Juan A. Rodriguez
,
Megh Thakkar
,
Khyati Mahajan
,
Vikas Yadav
,
Sathwik Tejaswi Madhusudhan
,
Alexandre Piche
,
Dzmitry Bahdanau
,
Christopher Pal
,
David Vazquez
,
Enamul Hoque Prince
,
Perouz Taslakian
,
Sai Rajeswar Mudumba
,
Spandana Gella
Conference on Language Modeling (COLM), 2025.
PDF
Cite
Video
UI-Vision: A Desktop-centric GUI Benchmark for Visual Perception and Interaction
Developing autonomous agents that can navigate diverse Graphical User Interfaces (GUIs) and solve complex tasks is essential for …
Shravan Nayak
,
Xiangru Jian
,
Kevin Lin
,
Juan A. Rodriguez
,
Motek Kalsi
,
Nicolas Chapados
,
Tamer Özsu
,
Aishwarya Agrawal
,
David Vazquez
,
Christopher Pal
,
Perouz Taslakian
,
Spandana Gella
,
Sai Rajeswar Mudumba
International Conference on Machine Learning (ICML), 2025.
PDF
Cite
Video
StarVector: Generating Scalable Vector Graphics Code from Images and Text
Scalable Vector Graphics (SVGs) are vital for modern image rendering due to their scalability and versatility. Previous SVG generation …
Juan A. Rodriguez
,
Abhay Puri
,
Shubham Agarwal
,
Issam H. Laradji
,
Pau Rodriguez
,
Sai Rajeswar Mudumba
,
David Vazquez
,
Christopher Pal
,
Marco Pedersoli
Computer Vision and Pattern Recognition (CVPR), 2025.
PDF
Cite
Video
AlignVLM: Bridging Vision and Language Latent Spaces for Multimodal Understanding
Aligning visual features with language embeddings is a key challenge in vision-language models (VLMs). The performance of such models …
Ahmed Masry
,
Juan A. Rodriguez
,
Tianyu Zhang
,
Suyuchen Wang
,
Chao Wang
,
Aarash Feizi
,
Akshay Kalkunte
,
Abhay Puri
,
Xiangru Jian
,
Pierre-André Noël
,
Sathwik Madhusudhan
,
Marco Pedersoli
,
Bang Liu
,
Nicolas Chapados
,
Yoshua Bengio
,
Enamul Hoque Prince
,
Christopher Pal
,
Issam H. Laradji
,
David Vazquez
,
Perouz Taslakian
,
Spandana Gella
,
Sai Rajeswar Mudumba
Workshop at the International Conference of Learning Representation (ICLR), 2025.
PDF
Cite
BigDocs: An Open and Permissively-Licensed Dataset for Training Multimodal Models on Document and Code Tasks
Multimodal AI has the potential to significantly enhance document-understanding tasks, such as processing receipts, understanding …
Juan A. Rodriguez
,
Xiangru Jian
,
Siba Smarak Panigrahi
,
Tianyu Zhang
,
Aarash Feizi
,
Abhay Puri
,
Akshay Kalkunte
,
Francois Savard
,
Ahmed Masry
,
Shravan Nayak
,
Rabiul Awal
,
Mahsa Massoud
,
Amirhossein Abaskohi
,
Zichao Li
,
Suyuchen Wang
,
Pierre-André Noël
,
Mats L. Richter
,
Saverio Vadacchino
,
Shubham Agarwal
,
Sanket Biswas
,
Sara Shanian
,
Ying Zhang
,
Sathwik Tejaswi Madhusudhan
,
João Monteiro
,
Krishnamurthy (Dj) Dvijotham
,
Torsten Scholak
,
Nicolas Chapados
,
Sepideh Kharaghani
,
Sean Hughes
,
Tamer Özsu
,
Siva Reddy
,
Marco Pedersoli
,
Yoshua Bengio
,
Christopher Pal
,
Issam H. Laradji
,
Spandana Gella
,
Perouz Taslakian
,
David Vazquez
,
Sai Rajeswar Mudumba
International Conference of Learning Representations (ICLR), 2025.
PDF
Cite
Video
VCR: Visual Caption Restoration
We introduce Visual Caption Restoration (VCR), a novel vision-language task that challenges models to accurately restore partially …
Tianyu Zhang
,
Suyuchen Wang
,
Lu Li
,
Ge Zhang
,
Perouz Taslakian
,
Sai Rajeswar Mudumba
,
Jie Fu
,
Bang Liu
,
Yoshua Bengio
International Conference of Learning Representations (ICLR), 2025.
PDF
Cite
StarVector: Generating Scalable Vector Graphics Code from Images and Text
Scalable Vector Graphics (SVGs) are vital for modern image rendering due to their scalability and versatility. Previous SVG generation …
Juan A. Rodriguez
,
Abhay Puri
,
Shubham Agarwal
,
Issam H. Laradji
,
Pau Rodriguez
,
Sai Rajeswar Mudumba
,
David Vazquez
,
Christopher Pal
,
Marco Pedersoli
AAAI Demos, 2025.
PDF
Cite
Video
BigDocs: A Permissively-Licensed Dataset for Training Vision-Language Models on Document and Code Tasks
Vision and language models that can accurately understand both images and text are crucial for deeper document understanding. These …
Juan A. Rodriguez
,
Xiangru Jian
,
Siba Smarak Panigrahi
,
Tianyu Zhang
,
Aarash Feizi
,
Abhay Puri
,
Akshay Kalkunte
,
Francois Savard
,
Amirhossein Abaskohi
,
Ahmed Masry
,
Shravan Nayak
,
Mahsa Massoud
,
Rabiul Awal
,
Pierre-André Noël
,
Mats L. Richter
,
Saverio Vadacchino
,
Shubham Agarwal
,
Sanket Biswas
,
Ying Zhang
,
Sathwik Tejaswi Madhusudhan
,
João Monteiro
,
Krishnamurthy (Dj) Dvijotham
,
Torsten Scholak
,
Nicolas Chapados
,
Sean Hughes
,
Tamer Özsu
,
Aishwarya Agrawal
,
Marco Pedersoli
,
Christopher Pal
,
Perouz Taslakian
,
David Vazquez
,
Issam H. Laradji
,
Spandana Gella
,
Sai Rajeswar Mudumba
Workshop at the Neural Information Processing Systems (NeurIPS), 2024.
PDF
Cite
Video
«
»
Cite
×