Off-policy Learning

Target networks are at the core of recent success in Reinforcement Learning. They stabilize the training by using old parameters to …

Transactions on Machine Learning Research (TMLR), 2023.