Title: Information Theory in Reinforcement Learning
Abstract: In reinforcement learning, a Partially Observable Markov Decision Process (POMDP) is a model of an agent interacting with its environment through observations and actions. The agent has to choose actions which maximize an external reward it gets at each step. The hardness of this problem in general is, in one aspect, due to the large size of the sufficient statistic of the observable history for the world state.
By framing the problem in an information-theoretic setting, we gain a number of benefits: a description of "typical" agents, and in particular understanding of how evolution has solved the problem; insight into the information metabolism of an intelligent agent as a solution to a sequential information-bottleneck problem; and the ability to apply information-theoretic methods to the problem, which provide new and, in some cases, more efficient solutions.
In this talk I will give some background on the general POMDP setting and challenge, extend it to the information-theoretic setting, and show an example of information-theoretic methods applied to reinforcement learning.