Computer Vision

While numerous recent benchmarks focus on evaluating generic Vision-Language Models (VLMs), they fall short in addressing the unique …

International Conference on Computer Vision (ICCV), 2025.

Multimodal AI has the potential to significantly enhance document-understanding tasks, such as processing receipts, understanding …

International Conference of Learning Representations (ICLR), 2025.

Ultrasound Localization Microscopy (ULM) is a novel super-resolution imaging technique that can image the vasculature in vivo at depth …

Brice Rauby, Paul Xing, Maxime Gasse, Jean Provost

IEEE Transactions on Ultrasonics, Ferroelectrics, and Frequency Control (IEEE TUFFC), 2025.

Ultrasound Localization Microscopy (ULM) is a non-invasive technique that allows for the imaging of micro-vessels in vivo, at depth and …

Brice Rauby, Paul Xing, Jonathan Porée, Maxime Gasse, Jean Provost

IEEE Transactions on Image Processing (IEEE TIP), 2025.

Nafath, 2024.

We introduce Visual Caption Restoration (VCR), a novel vision-language task that challenges models to accurately restore partially …

Workshop at the Neural Information Processing Systems (NeurIPS), 2024.

Multimodal multihop question answering is a complex task that requires reasoning over multiple sources of information, such as images …

ArXiv, 2024.

Ultrasound Localization Microscopy (ULM) is a non-invasive technique that allows for the imaging of micro-vessels in vivo, at depth and …

Brice Rauby, Paul Xing, Jonathan Porée, Maxime Gasse, Jean Provost

ArXiv, 2024.

Scalable Vector Graphics (SVGs) have become integral in modern image rendering and graphic design applications due to their infinite …

ArXiv, 2024.

Recent progress in self-supervision shows that pre-training large neural networks on vast amounts of unsupervised data can lead to …

NeurIPS Datasets and Benchmarks Track (NeurIPS Datasets), 2023.