About
People
Publications
Open Source
Demos
Events
Blog
Careers
Contact
English
English
Français
ServiceNow
ServiceNow Research
Tags
Parallelism
ServiceNow Research
Parallelism
Layered gradient accumulation and modular pipeline parallelism: fast and efficient training of large language models
The advent of the transformer has sparked a quick growth in the size of language models, far outpacing hardware improvements. (Dense) …
Joel Lamy Poirier
ArXiv, 2024.
PDF
Cite
Cite
×