ServiceNow IA recherche

Multi-modal Learning

Rendering-Aware Reinforcement Learning for Vector Graphics Generation
Scalable Vector Graphics (SVG) offer a powerful format for representing visual designs as interpretable code. Recent advances in …
The Promise of RL for Autoregressive Image Editing
While image generation techniques are now capable of producing high quality images that respect prompts which span multiple sentences, …
AlignVLM: Bridging Vision and Language Latent Spaces for Multimodal Document Understanding
Aligning visual features with language embeddings is a key challenge in vision-language models (VLMs). The performance of such models …
BigCharts-R1: Enhanced Chart Reasoning With Visual Reinforcement Finetuning
Chart understanding is critical for ServiceNow for data analysis, reason over visualizations, such as interpreting trends, identifying …
ColMate: Contrastive Late Interaction and Masked Text for Multimodal Document Retrieval
Retrieval-augmented generation has proven practical when models require specialized knowledge or access to the latest data. However, …
WebMMU: A Benchmark for Multimodal Multilingual Website Understanding and Code Generation
Understanding diverse web data and automating web development presents an exciting challenge for agentic multimodal models. While …
UI-Vision: A Desktop-centric GUI Benchmark for Visual Perception and Interaction
Developing autonomous agents that can navigate diverse Graphical User Interfaces (GUIs) and solve complex tasks is essential for …
StarVector: Generating Scalable Vector Graphics Code from Images and Text
Scalable Vector Graphics (SVGs) are vital for modern image rendering due to their scalability and versatility. Previous SVG generation …