Parallelism

The advent of the transformer has sparked a quick growth in the size of language models, far outpacing hardware improvements. (Dense) …

Joel Lamy Poirier

ArXiv, 2024.