ServiceNow AI Research

Multi-modal Learning

MAPL: Parameter-Efficient Adaptation of Unimodal Pre-Trained Models for Vision-Language Few-Shot Prompting
Large pre-trained models have proved to be remarkable zero- and (prompt-based) few-shot learners in unimodal vision and language tasks. …
FigGen: Text to Scientific Figure Generation
The generative modeling landscape has experienced tremendous growth in recent years, particularly in generating natural images and art. …
Haptics-based Curiosity for Sparse-reward Tasks
Robots in many real-world settings have access to force/torque sensors in their gripper and tactile sensing is often necessary in tasks …
Adaptive Cross-Modal Few-shot Learning
Metric-based meta-learning techniques have successfully been applied to few-shot classification problems. In this paper, we propose to …
Neural Multisensory Scene Inference
For embodied agents to infer representations of the underlying 3D physical world they inhabit, they should efficiently combine …
Integrating Vision and Language in Social Networks for Identifying Visual Patterns of Personality Traits
Social media, as a major platform for communication and information exchange, is a rich repository of the opinions and sentiments of …
BabyAI: A Platform to Study the Sample Efficiency of Grounded Language Learning
Allowing humans to interactively train artificial agents to understand language instructions is desirable for both practical and …