ServiceNow recherche

Benchmark

GEOBench-VLM: Benchmarking Vision-Language Models for Geospatial Tasks
While numerous recent benchmarks focus on evaluating generic Vision-Language Models (VLMs), they fall short in addressing the unique …
WebMMU: A Benchmark for Multimodal Multilingual Website Understanding and Code Generation
Understanding diverse web data and automating web development presents an exciting challenge for agentic AI. While existing benchmarks …
The Landscape of Causal Discovery Data: Grounding Causal Discovery in Real-World Applications
Causal discovery aims to automatically uncover causal relationships from data, a capability with significant potential across many …
WebMMU: A Benchmark for Multimodal Multilingual Website Understanding and Code Generation
Understanding diverse web data and automating web development presents an exciting challenge for agentic AI. While existing benchmarks …
MMTEB: Massive Multilingual Text Embedding Benchmark

Text embeddings are typically evaluated on a narrow set of tasks, limited in terms of languages, domains, and task types. To circumvent …

EarthView: A Large Scale Remote Sensing Dataset for Self-Supervision
This paper presents EarthView, a comprehensive dataset specifically designed for self-supervision on remote sensing data, intended to …
Context is Key: A Benchmark for Forecasting with Essential Textual Information
Forecasting is a critical task in decision making across various domains. While numerical data provides a foundation, it often lacks …
Context is Key: A Benchmark for Forecasting with Essential Textual Information
Forecasting is a critical task in decision making across various domains. While numerical data provides a foundation, it often lacks …
Benchmarking Bayesian Causal Discovery Methods for Downstream Treatment Effect Estimation
The practical utility of causality in decision-making is widely recognized, with causal discovery and inference being inherently …
Mastering the Unsupervised Reinforcement Learning Benchmark from Pixels
Controlling artificial agents from visual sensory data is an arduous task. Reinforcement learning (RL) algorithms can succeed but …