Flaky Performances when Pre-Training on Relational Databases with a Plan for Future Characterization Efforts

Shengchao Liu, David Vazquez, Jian Tang, Pierre-André Noël

July 2022

Abstract

We explore the downstream task performances for graph neural network (GNN) self-supervised learning (SSL) methods trained on subgraphs extracted from relational databases (RDBs). Intuitively, this joint use of SSL and GNNs allows us to leverage more of the available data, which could translate to better results. However, while we observe positive transfer in some cases, others showed systematic performance degradation, including some spectacular ones. We hypothesize a mechanism that could explain this behaviour and draft the plan for future work testing it by characterize how much relevant information different strategies can (theoretically and/or empirically) extract from (synthetic and/or real) RDBs.

Type

Workshop

Publication

Workshop at the International Conference on Machine Learning (ICML)