Accueil
Équipe
Publications
Open Source
Démos
Évènements
Blog
Carrières
Nous joindre
Français
Français
English
ServiceNow
ServiceNow recherche
Tags
Parallelism
ServiceNow recherche
Parallelism
Layered gradient accumulation and modular pipeline parallelism: fast and efficient training of large language models
The advent of the transformer has sparked a quick growth in the size of language models, far outpacing hardware improvements. (Dense) …
Joel Lamy Poirier
ArXiv, 2024.
PDF
Citation
Citation
×