Accueil
Équipe
Publications
Open Source
Démos
Évènements
Blog
Carrières
Nous joindre
Français
Français
English
ServiceNow
ServiceNow IA recherche
Tags
Multi-modal Learning
ServiceNow IA recherche
Multi-modal Learning
AlignVLM: Bridging Vision and Language Latent Spaces for Multimodal Document Understanding
Aligning visual features with language embeddings is a key challenge in vision-language models (VLMs). The performance of such models …
Ahmed Masry
,
Juan A. Rodriguez
,
Tianyu Zhang
,
Suyuchen Wang
,
Chao Wang
,
Aarash Feizi
,
Akshay Kalkunte
,
Abhay Puri
,
Xiangru Jian
,
Pierre-André Noël
,
Sathwik Madhusudhan
,
Marco Pedersoli
,
Bang Liu
,
Nicolas Chapados
,
Yoshua Bengio
,
Enamul Hoque Prince
,
Christopher Pal
,
Issam H. Laradji
,
David Vazquez
,
Perouz Taslakian
,
Spandana Gella
,
Sai Rajeswar Mudumba
Neural Information Processing Systems (NeurIPS), 2025.
PDF
Citation
Vidéo
Rendering-Aware Reinforcement Learning for Vector Graphics Generation
Scalable Vector Graphics (SVG) offer a powerful format for representing visual designs as interpretable code. Recent advances in …
Juan A. Rodriguez
,
Haotian Zhang
,
Abhay Puri
,
Rishav Pramanik
,
Aarash Feizi
,
Pascal Wichmann
,
Arnab Mondal
,
Mohammad Reza Samsami
,
Rabiul Awal
,
Perouz Taslakian
,
Spandana Gella
,
Sai Rajeswar Mudumba
,
David Vazquez
,
Christopher Pal
,
Marco Pedersoli
Neural Information Processing Systems (NeurIPS), 2025.
PDF
Citation
The Promise of RL for Autoregressive Image Editing
While image generation techniques are now capable of producing high quality images that respect prompts which span multiple sentences, …
Saba Ahmadi
,
Rabiul Awal
,
Ankur Sikarwar
,
Amirhossein Kazemnejad
,
Ge Ya Luo
,
Juan A. Rodriguez
,
Sai Rajeswar Mudumba
,
Siva Reddy
,
Christopher Pal
,
Benno Krojer
,
Aishwarya Agrawal
Neural Information Processing Systems (NeurIPS), 2025.
PDF
Citation
Code
AlignVLM: Bridging Vision and Language Latent Spaces for Multimodal Document Understanding
Aligning visual features with language embeddings is a key challenge in vision-language models (VLMs). The performance of such models …
Perouz Taslakian
,
Sai Rajeswar Mudumba
,
Spandana Gella
,
Ahmed Masry
,
Tianyu Zhang
,
Juan A. Rodriguez
,
Chao Wang
,
Abhay Puri
,
Xiangru Jian
,
Pierre-André Noël
,
Issam H. Laradji
NOW AI, 2025.
Citation
BigCharts-R1: Enhanced Chart Reasoning With Visual Reinforcement Finetuning
Chart understanding is critical for ServiceNow for data analysis, reason over visualizations, such as interpreting trends, identifying …
Sai Rajeswar Mudumba
,
Perouz Taslakian
,
Ahmed Masry
,
David Vazquez
,
Christopher Pal
,
Abhay Puri
,
Megh Thakkar
,
Masoud Hashemi
,
Khyati Mahajan
,
Spandana Gella
NOW AI, 2025.
Citation
ColMate: Contrastive Late Interaction and Masked Text for Multimodal Document Retrieval
Retrieval-augmented generation has proven practical when models require specialized knowledge or access to the latest data. However, …
Ahmed Masry
,
Megh Thakkar
,
Patrice Béchard
,
Sathwik Madhusudhan
,
Rabiul Awal
,
Shambhavi Mishra
,
Akshay Kalkunte
,
Enamul Hoque Prince
,
Spandana Gella
,
Torsten Scholak
,
Sai Rajeswar Mudumba
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2025.
Citation
WebMMU: A Benchmark for Multimodal Multilingual Website Understanding and Code Generation
Understanding diverse web data and automating web development presents an exciting challenge for agentic multimodal models. While …
Rabiul Awal
,
Mahsa Massoud
,
Zichao Li
,
Aarash Feizi
,
Suyuchen Wang
,
Christopher Pal
,
Aishwarya Agrawal
,
David Vazquez
,
Perouz Taslakian
,
Spandana Gella
,
Sai Rajeswar Mudumba
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2025.
PDF
Citation
Code
Vidéo
BigCharts-R1: Enhanced Chart Reasoning with Visual Reinforcement Finetuning
Charts are essential to data analysis, transforming raw data into clear visual representations that support human decision-making. …
Ahmed Masry
,
Abhay Puri
,
Masoud Hashemi
,
Juan A. Rodriguez
,
Megh Thakkar
,
Khyati Mahajan
,
Vikas Yadav
,
Sathwik Tejaswi Madhusudhan
,
Alexandre Piche
,
Dzmitry Bahdanau
,
Christopher Pal
,
David Vazquez
,
Enamul Hoque Prince
,
Perouz Taslakian
,
Sai Rajeswar Mudumba
,
Spandana Gella
Conference on Language Modeling (COLM), 2025.
PDF
Citation
Vidéo
UI-Vision: A Desktop-centric GUI Benchmark for Visual Perception and Interaction
Developing autonomous agents that can navigate diverse Graphical User Interfaces (GUIs) and solve complex tasks is essential for …
Shravan Nayak
,
Xiangru Jian
,
Kevin Lin
,
Juan A. Rodriguez
,
Motek Kalsi
,
Nicolas Chapados
,
Tamer Özsu
,
Aishwarya Agrawal
,
David Vazquez
,
Christopher Pal
,
Perouz Taslakian
,
Spandana Gella
,
Sai Rajeswar Mudumba
International Conference on Machine Learning (ICML), 2025.
PDF
Citation
Code
Vidéo
StarVector: Generating Scalable Vector Graphics Code from Images and Text
Scalable Vector Graphics (SVGs) are vital for modern image rendering due to their scalability and versatility. Previous SVG generation …
Juan A. Rodriguez
,
Abhay Puri
,
Shubham Agarwal
,
Issam H. Laradji
,
Pau Rodriguez
,
Sai Rajeswar Mudumba
,
David Vazquez
,
Christopher Pal
,
Marco Pedersoli
Computer Vision and Pattern Recognition (CVPR), 2025.
PDF
Citation
Code
Vidéo
»
Citation
×