Continual learning (CL) aims to design algorithms that can learn from non-stationarystreams of stationary tasks without forgetting. Modularity is an appealing solution for this:given ameaningfuldecomposition of knowledge into modules, new tasks are learned throughrecombination of modules (compositionality), addition of modules, or/and updating theexisting modules. Forgetting can be mitigated through sparsity of module updates; transfercan be achieved through only updating relevant modules. Achieving modularity presents anumber of challenges, among others: (a) how to decompose task into re-usable modules –while related to sub-task discovery [3;11], this has a promise of increasing sample complexitythrough positive transfer and improved OOD generalization ability [6]; (b) how to routesamples through a set of modules when the routing mechanism can be subject to forgetting;(c) how to prune and add new modules to ensure enough capacity for learning new tasks.