ServiceNow recherche

Computer Vision

GEOBench-VLM: Benchmarking Vision-Language Models for Geospatial Tasks
While numerous recent benchmarks focus on evaluating generic Vision-Language Models (VLMs), they fall short in addressing the unique …
Deep Learning in Ultrasound localization Microscopy: Applications and perspectives
Ultrasound Localization Microscopy (ULM) is a novel super-resolution imaging technique that can image the vasculature in vivo at depth …
Pruning Sparse Tensor Neural Networks Enables Deep Learning for 3D Ultrasound Localization Microscopy
Ultrasound Localization Microscopy (ULM) is a non-invasive technique that allows for the imaging of micro-vessels in vivo, at depth and …
VCR: Visual Caption Restoration
We introduce Visual Caption Restoration (VCR), a novel vision-language task that challenges models to accurately restore partially …
Few-shot Learning for Sign Language Recognition with Embedding Propagation
Sign language is a primary channel for the deaf and hard-hearing to communicate. Sign language consists of many signs with different …
FM2DS: Few-Shot Multimodal Multihop Data Synthesis with Knowledge Distillation for Question Answering
Multimodal multihop question answering is a complex task that requires reasoning over multiple sources of information, such as images …
Pruning Sparse Tensor Neural Networks Enables Deep Learning for 3D Ultrasound Localization Microscopy
Ultrasound Localization Microscopy (ULM) is a non-invasive technique that allows for the imaging of micro-vessels in vivo, at depth and …
StarVector: Generating Scalable Vector Graphics Code from Images and Text
Scalable Vector Graphics (SVGs) have become integral in modern image rendering and graphic design applications due to their infinite …
Egocentric Planning for Scalable Embodied Task Achievement
Embodied agents face significant challenges when tasked with performing actions in diverse environments, particularly in generalizing …