On learning history-based policies for controlling Markov decision processes

Gandharv Patil; Aditya Mahajan; Doina Precup

On learning history-based policies for controlling Markov decision processes

Gandharv Patil, Aditya Mahajan, Doina Precup

Published: 19 Jun 2023, Last Modified: 09 Jul 2023Frontiers4LCDEveryoneRevisionsBibTeX

Abstract: Reinforcement learning (RL) folklore suggests that history-based function approximation methods, such as recurrent neural nets or history-based state abstraction, perform better than their memory-less counterparts, due to the fact that function approximation in Markov decision processes (MDP) can be viewed as inducing a Partially observable MDP. However, there has been little formal analysis of such history-based algorithms, as most existing frameworks focus exclusively on memory-less features. In this paper, we introduce a theoretical framework for studying the behaviour of RL algorithms that learn to control an MDP using history-based feature abstraction mappings. Furthermore, we use this framework to design a practical RL algorithm and we numerically evaluate its effectiveness on a set of continuous control tasks.

Keywords: Markov Decision Processes, Reinforcement Learning, State abstraction

Submission Number: 111

Loading