Investigation of the Interplay of Model-Based and Model-Free Learning Using Reinforcement Learning
- Felix Grün1, 2
- Muhammad Saif-ur-Rehman2
- Ioannis Iossifidis2
- Faculty for Electrical Engineering and Information Technology, Ruhr-University Bochum, Universitätsstraße 150, 44801 Bochum, Germany
- Institute for Computer Science, Ruhr-West University of Applied Sciences, Duisburger Straße 100, 45479 Mülheim an der Ruhr, Germany
The reward prediction error hypothesis of dopamine in the brain states that activity of dopaminergic neurons in certain brain regions correlates with the reward prediction error that corresponds to the temporal difference error, often used as a learning signal in model free reinforcement learning (RL). This suggests that some form of reinforcement learning is used in animal and human brains when learning a task. On the other hand, it is clear that humans are capable of building an internal model of a task, or environment, and using it for planning, especially in sequential tasks. In RL, these two learning approaches, model-driven and reward-driven, are known as model based and model-free RL approaches. Both systems were previously thought to exist in parallel, with some higher process choosing which to use. A decade ago, research suggested both could be used concurrently, with some subject-specific weight assigned to each [1]. Still, the prevalent belief appeared to be that model-free learning is the default mechanism used, replaced or assisted by model-based planning only when the task demands it, i.e. higher rewards justify the additional cognitive effort. Recently, Feher da Silva et al. [2] questioned this belief, presenting data and analyses that indicate model-based learning may be used on its own and can even be computationally more efficient. We take a RL perspective, consider different ways to combine model-based and model-free approaches for modeling and for performance, and discuss how to further study this interplay in human behavioral experiments.
References
- Daw, N. D., Gershman, S. J., Seymour, B., Dayan, P. & Dolan, R. J. Model-Based Influences on Humans’ Choices and Striatal Prediction Errors. Neuron 69, 1204–1215 (2011)., 10.1016/j.neuron.2011.02.027
- Feher da Silva, C., Lombardi, G., Edelson, M. & Hare, T. A. Rethinking model-based and model-free influences on mental effort and striatal prediction errors. Nature Human Behaviour 7, 956–969 (2023)., 10.1038/s41562-023-01573-1