PipelineRL: Faster On-policy Reinforcement Learning for Long Sequence Generation

Rafael Pardinas, Ehsan Kamalloo, Alexandre Piche, Dzmitry Bahdanau

janvier 2026

Type

Article de revue

Publication

Transactions on Machine Learning Research (TMLR)

Reinforcement Learning

Rafael Pardinas

Applied Research Scientist

Applied Research Scientist at Model Readiness located at London (remote), UK.