ServiceNow Research

Regions of Reliability in the Evaluation of Multivariate Probabilistic Forecasts

Abstract

Multivariate probabilistic time series forecasts are commonly evaluated via proper scoring rules, i.e., functions that are minimal in expectation for the ground-truth distribution. However, this property is not sufficient to guarantee good discrimination in the non-asymptotic regime. In this paper, we provide the first systematic finite-sample study of proper scoring rules for time series forecasting evaluation. Through a power analysis, we identify the ``region of reliability’’ of a scoring rule, i.e., the set of conditions (in terms of problem dimensionality and Monte Carlo approximation quality) where a practitioner can rely on that rule to identify forecasting errors. We carry out our analysis on a comprehensive synthetic benchmark, specifically designed to test several key discrepancies between ground-truth and forecast distributions, and we gauge the generalizability of our findings to real-world tasks with an application to an electricity production problem. Our results reveal critical shortcomings in the evaluation of multivariate probabilistic forecasts as commonly performed in the literature.

Publication
International Conference on Machine Learning (ICML)
Étienne Marcotte
Étienne Marcotte
Applied Research Scientist

Applied Research Scientist at Human Decision Support located at Montreal, QC, Canada.

Valentina Zantedeschi
Valentina Zantedeschi
Research Scientist

Research Scientist at Human Decision Support located at Montreal, QC, Canada.

Alexandre Drouin
Alexandre Drouin
Research Lead

Research Lead at Human Decision Support located at Montreal, QC, Canada.

Nicolas Chapados
Nicolas Chapados
VP of Research

VP of Research at Research Management located at Montreal, QC, Canada.