Leveraging Activation Patterns to Define Classifiers Able to Detect and Reject Anomalies

João Monteiro, Pau Rodriguez, Pierre-André Noël, Issam H. Laradji, David Vazquez

September 2022

Abstract

In this work, we introduce models that perform comparably with state-of-the-art alternatives in terms of prediction accuracy while offering simple mechanisms to detect anomalous inputs of various kinds. We do so by defining predictors whose predictions depend directly on verifiable properties of intermediate features. Concretely, we introduce Total Activation Classifiers (TAC): a component that can be added to any pre-trained classifiers. Given data, TAC decides on an output class depending on which set of features “fire up” more strongly. In doing so, we associate different sets of features to each class in the label set. At testing time, one can directly verify whether there’s a set of features whose activation is greater than the remaining features and decide to reject otherwise. TAC slices and reduces the activations of a stack of layers after a forward pass on given data. Concatenating the results of the slice/reduce steps across the depth of the model yields a vector that we refer to as the activation profile of the observed data. In addition, TACs are assigned a list of class codes that used to define models observed to improve over state-of-the-art anomaly detection scores.

Type

Workshop

Publication

Montreal AI Symposium (MAIS)

Machine Learning