ServiceNow Research

Multi-modal Learning

MAPL: Parameter-Efficient Adaptation of Unimodal Pre-Trained Models for Vision-Language Few-Shot Prompting
Large pre-trained models have proved to be remarkable zero- and (prompt-based) few-shot learners in unimodal vision and language tasks. …
Haptics-based Curiosity for Sparse-reward Tasks
Robots in many real-world settings have access to force/torque sensors in their gripper and tactile sensing is often necessary in tasks …
Adaptive Cross-Modal Few-shot Learning
Metric-based meta-learning techniques have successfully been applied to few-shot classification problems. In this paper, we propose to …
Neural Multisensory Scene Inference
For embodied agents to infer representations of the underlying 3D physical world they inhabit, they should efficiently combine …
BabyAI: A Platform to Study the Sample Efficiency of Grounded Language Learning
Allowing humans to interactively train artificial agents to understand language instructions is desirable for both practical and …