I've been reading about Church, which is a new computer language, developed in a prize-winning MIT doctoral thesis, that's designed to make computers better at modeling probability distributions.
The idea is that simulations are cheap to run (given a probability distribution, generate an example outcome) but inferred distributions are expensive to run (from a set of data, what was the most likely probability distribution that could have generated it?) This is essentially a Bayesian task, and it's what we want to do to understand, say, which borrowers are likeliest to default, or where terrorists are likely to strike again. It's also the necessary building block of AI. The problem is that the space of probability distributions that can explain the data is very big. Infinitely big in reality, of course, but still exponentially big after discretizing. Also, while the computational complexity of evaluating f(g(x)) is just f + g, the computational complexity of composing two conditional probability distributions B|A and C|B is
ΣB P(C, B|A)
whose computational time will grow exponentially rather than linearly.
Church is an attempt to solve this problem. (Apparently it's a practical attempt, because the founders have already started a company, Navia Systems, using this structure to build probabilistic computers.) The idea is, instead of describing a probability distribution as a deterministic procedure that evaluates the probabilities of different events, represent them in terms of probabilistic procedures for generating samples from them. That is, a random variable is actually a random variable. This means that repeating a computation will not give the same result each time, because evaluating a random variable doesn't give the same result each time. There's a computational advantage here because it's possible to compose random variables without summing over all possible values.
Church is based on Lisp. At the lowest level, it replaces Boolean gates with stochastic digital circuits. These circuits are wired together to form Markov chains (the probabilistic counterpart of finite state machines.) At the top, it's possible to define probabilistic procedures for generating samples from recursively defined distributions.
When I saw this paper, I thought it might make an interesting top-level post -- unfortunately, I'm not the one to do it. I don't know enough computer science; it's been ages since I've touched Lisp, and lambda calculus is new to me. So this is an open call for volunteers -- any brave Bayesians want to blog about a brand new computer language?
(As an aside, I think we need more technical posts so we don't spend all our time hissing at each other; how would people feel about seeing summaries of recent research in the neuro/cognitive science/AI-ish cluster?)
Current universal inference methods are very limited, so the main advantages of using probabilistic programming languages are (1) the conceptual clarity you get by separating generative model and inference and (2) the ability to write down complex nonparametric models and immediately be able to do inference, even if it's inefficient. Writing a full model+inference implementation in Matlab, say, takes you much longer, is more confusing and less flexible.
That said, some techniques that were developed for particular classes of problems have a useful analog in the setting of programs. The gradient-based methods you mention have been generalized to work on any probabilistic program with continuous parameters.
Interesting, I suppose that does seem somewhat useful; for discussion purposes at the very least. I am curious about how a gradient-based method can work without continuous parameters: that is counter intuitive for me. Can you throw out some keywords? Keywords for what I was talking about: Metropolis-adjusted Langevin algorithm (MALA), Stochastic Newton, any MCMC with 'hessian' in the name.