ServiceNow AI Research

Multi-modal Learning

Grounding Computer Use Agents on Human Demonstrations
Building reliable computer-use agents requires grounding: accurately connecting natural language instructions to the correct on-screen …
StarFlow: Generating Structured Workflow Outputs From Sketch Images
Workflows are a fundamental component of automation in enterprise platforms, enabling the orchestration of tasks, data processing, and …
Rendering-Aware Reinforcement Learning for Vector Graphics Generation
Scalable Vector Graphics (SVG) offer a powerful format for representing visual designs as interpretable code. Recent advances in …
The Promise of RL for Autoregressive Image Editing
While image generation techniques are now capable of producing high quality images that respect prompts which span multiple sentences, …
AlignVLM: Bridging Vision and Language Latent Spaces for Multimodal Document Understanding
Aligning visual features with language embeddings is a key challenge in vision-language models (VLMs). The performance of such models …
BigCharts-R1: Enhanced Chart Reasoning With Visual Reinforcement Finetuning
Chart understanding is critical for ServiceNow for data analysis, reason over visualizations, such as interpreting trends, identifying …
ColMate: Contrastive Late Interaction and Masked Text for Multimodal Document Retrieval
Retrieval-augmented generation has proven practical when models require specialized knowledge or access to the latest data. However, …
StarVLM ReRank: Better UI Grounding via Enhanced Visual Input and Element Position Perception
UI grounding is a fundamental task for enterprise workflow automation. This task maps natural language instructions to precise pixel …