Bernstein Conference 2022: Exploring Distribution Parameterizations for Distributional Continuous Control

September 29, 2022 jannis

Felix Grün^{1, 2} , Muhammed Saif-ur-Rehman¹ , Ioannis Iossifidis¹

Department for Computer Science, Ruhr-West University of Applied Sciences, Lützowstraße 5, 46236 Bottrop, Germany
Institut für Neuroinformatik, Ruhr-University Bochum, Universitätsstraße 150, 44801 Bochum, Germany

Bernstein_Conference_2022_Poster-1 Download

The relation between the activity of dopaminergic neurons and the temporal difference error in Reinforcement Learning (RL) problems [1] is well-known to many in the fields of machine learning and neuroscience. More recently, distributional RL has inspired the successful search for evidence in favor of an equivalent neural mechnism [2]. Distributional RL methods aim to make better use of the available interactions of the agent with the environment. They do this by learning the probability distribution of the amount of reward expected in the future where non-distributional agents usually learn only the expectation of that distribution, the value. Distributional algorithms often outperform comparable non-distributional methods in terms of learning speed and final performance (usually benchmarked using the ALE[3]). Increasingly sample efficient Distributional RL algorithms for the discrete action domain have been developed over time that vary primarily in the way they parameterize their approximations of value distributions. We transfer three of the most well-known and successful of those to the continuous action domain by extending two powerful actor-critic algorithms with distributional critics. The parameterizations are all based on the quantile regression approach [4] and crucially differ in how the quantiles to be predicted are selected. We investigate whether the relative performance of the methods for the discrete action space translates to the continuous case. To that end we compare them empirically on the pybullet implementations of a set of mujoco continuous control tasks.

Acknowledgements

This work is supported by the Ministry of Economics, Innovation, Digitization and Energy of the State of North Rhine-Westphalia and the European Union, grants GE-2-2-023A (REXO) and IT-2-2-023 (VAFES)

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.