Showing posts with label markov. Show all posts
Showing posts with label markov. Show all posts

Tuesday, May 29, 2012

Roy Fox: May 22nd and 29th

Title: Information Theory in Reinforcement Learning

Abstract: In reinforcement learning, a Partially Observable Markov Decision Process (POMDP) is a model of an agent interacting with its environment through observations and actions. The agent has to choose actions which maximize an external reward it gets at each step. The hardness of this problem in general is, in one aspect, due to the large size of the sufficient statistic of the observable history for the world state.

By framing the problem in an information-theoretic setting, we gain a number of benefits: a description of "typical" agents, and in particular understanding of how evolution has solved the problem; insight into the information metabolism of an intelligent agent as a solution to a sequential information-bottleneck problem; and the ability to apply information-theoretic methods to the problem, which provide new and, in some cases, more efficient solutions.

In this talk I will give some background on the general POMDP setting and challenge, extend it to the information-theoretic setting, and show an example of information-theoretic methods applied to reinforcement learning.

Thursday, October 21, 2010

Micky Vidne: October 27th. s(MC)^2 or Hesitant Particle Filter.

In my talk I will describe a recent extension of the Sequential Monte Carlo (SMC) method. SMCs (particle filters) are a commonly used method to estimate a latent dynamical process from sequential noise-contaminated observations. SMCs are extremely powerful but suffer from sample impoverishment, a situation in which very few different particles represent the distribution of interest. I will describe our attempt to circumvent this fundamental problem by adding an extra MCMC step in the SMC algorithm. I will illustrate the usefulness of this algorithm by considering a toy neuroscience example.

Thursday, July 1, 2010

Carl Smith : July 8

We have developed a simple model neuron for inference on noisy spike trains. In particular, we have in mind to use this model for computationally tractable quantification of information loss due to spike-time jitter. I will introduce the model, and in particular its favorable scaling properties. I'll display some results from inference done on synthetic data. Lastly, I'll describe an efficient scheme we devised for inference with a particular class of priors on the stimulus space that could be interesting outside the context of this model.