Comment author: andreas 25 October 2010 04:33:47AM 3 points [-]

Now suppose you are playing against another timeless decision theory agent. Clearly, the best strategy is to be that actor which defects no matter what. If both agents do this, the worst possible result for both of them occurs.

Which shows that defection was not the best strategy in this situation.

Comment author: Daniel_Burfoot 24 October 2010 09:34:54PM 1 point [-]

you usually code both model and inference algorithm and make use of problem-specific approximations to Bayesian inference. Probabilistic programs separate model from inference by using universal inference algorithms.

Naively, separation of model from inference algorithm seems like a terrible idea. People use problem-specific approximation algorithms because if they don't exploit specific features of the model, inference will be completely intractable. Often this consideration is so important that people will use obviously sub-optimal models, if those models support fast inference.

Comment author: andreas 24 October 2010 11:03:10PM 0 points [-]

Yes, deriving mechanisms that take complex models and turn them into something tractable is mostly an open problem.

Comment author: jsalvatier 24 October 2010 02:00:35AM 1 point [-]

Interesting, I suppose that does seem somewhat useful; for discussion purposes at the very least. I am curious about how a gradient-based method can work without continuous parameters: that is counter intuitive for me. Can you throw out some keywords? Keywords for what I was talking about: Metropolis-adjusted Langevin algorithm (MALA), Stochastic Newton, any MCMC with 'hessian' in the name.

Comment author: andreas 24 October 2010 02:10:33AM 1 point [-]

They don't work without continuous parameters. If you have a probabilistic program that includes both discrete and continuous parameters, you can use gradient methods to generate MH proposals for your continuous parameters. I don't think there are any publications that discuss this yet.

Comment author: [deleted] 23 October 2010 09:30:42PM *  3 points [-]

In part 2, I sort of glossed over the technical stuff, but I was not talking about making up political dimensions like "more anti-war" and rating the answers to poll questions by hand. That is way too arbitrary for my taste. I'm talking about plain old dimensionality reduction. I had something like PCA in mind (if we were careful we might use a different method, but this is just illustrative.)

If you don't know about Principal Components Analysis, it's an important notion.

Wiki

Tutorial with a practical example

Another intro

The principle is that if you decide a priori what the "coordinates" are, you might pick wrong. You might not explain the variability in the data very well. Amazon.com doesn't have a pre-set category called "horror" and recommend you horror movies based on the fact that you've watched other horror movies. Amazon gauges "similarity" based on coordinates that arise naturally from the data (and maybe don't easily correspond to a property that can be given an English name.)

Maybe I'll write a top-level post explaining this sometime, if it isn't common knowledge.

In response to comment by [deleted] on Three kinds of political similarity
Comment author: andreas 23 October 2010 09:42:24PM 0 points [-]

A pdf of the nature intro is here.

Comment author: RichardKennaway 23 October 2010 08:54:56PM 0 points [-]

Writing a full model+inference implementation in Matlab, say, takes you much longer, is more confusing and less flexible.

Defining and implementing a whole programming language is way more work than writing a library in an existing language. A library, after all, is a language, but one you don't have to write a parser, interpreter, or compiler for.

Comment author: andreas 23 October 2010 09:06:34PM 1 point [-]

I was comparing the two choices people face who want to do inference in nontrivial models. You can either write the model in an existing probabilistic programming language and get inefficient inference for free or you can write model+inference in something like Matlab. Here, you may be able to use libraries if your model is similar enough to existing models, but for many interesting models, this is not the case.

Comment author: jsalvatier 23 October 2010 06:22:39PM *  3 points [-]

Are the inference methods really advanced enough to make this kind of thing practical? I know that in statistics, MCMC methods have long been specialized enough that in order to use them for even relatively simple problems (lets say a 20 parameter linear model with non normal errors) you have to know what you're doing. I have been interested in this field because recently there have been promising algorithms which promise to make inference much easier for problems with continuous variables (in particular gradient/hessian based algorithms analogous to newton methods on optimization). Probabilistic programs seems like a much more difficult category to work with likely to get very slow very fast. Am I missing something? What do people hope to accomplish with such a language?

Comment author: andreas 23 October 2010 08:44:33PM *  2 points [-]

Current universal inference methods are very limited, so the main advantages of using probabilistic programming languages are (1) the conceptual clarity you get by separating generative model and inference and (2) the ability to write down complex nonparametric models and immediately be able to do inference, even if it's inefficient. Writing a full model+inference implementation in Matlab, say, takes you much longer, is more confusing and less flexible.

That said, some techniques that were developed for particular classes of problems have a useful analog in the setting of programs. The gradient-based methods you mention have been generalized to work on any probabilistic program with continuous parameters.

Comment author: JoshuaZ 23 October 2010 04:17:03AM 1 point [-]

Regarding parenthetical, summaries of that sort would be good.

I'm in general pessimistic that were every going to have a really good approach to handling these sorts of probability calculations because they have some versions that are hard. For example, SAT which is NP-complete can be sort of thought of as trying to get a probability distribution where each event is assigned probability of 0 or 1. This doesn't resemble anything rigorous, more just a general intuition. However, the fact that they think that Church might be used for practical applications does make this look good. Also, I may be too focused on a computational complexity approach to things rather than a more general "this makes this easier to program and understand" approach.

Comment author: andreas 23 October 2010 08:20:26PM 3 points [-]

Probabilistic inference in general is NP-hard, but it is not clear that (1) this property holds for the kinds of problems people are interested in and, even if it does, that (2) approximate probabilistic inference is hard for this class of problems. For example, if you believe this paper, probabilistic inference without extreme conditional probabilities is easy.

Comment author: pdf23ds 23 October 2010 04:17:04AM *  1 point [-]

I've been doing audio-only with a $40 dictator from Wal-mart that fits in my pocket. It averages 150-200 MB a day. I generate hashes of each file and timestamp them so they're more likely to be useful if I ever need them for proof of something.

The thing that prompted me to start doing this was frequent arguments with close ones that often got down to "you said this", "no I didn't" type of stuff. It's oddly very assuring to have this recording. (FTR, I used it for that purpose more or less once. Although I find it useful for recording therapy sessions too.)

Comment author: andreas 23 October 2010 05:13:36AM *  2 points [-]

Combine this with speech-to-text transcription software and you get a searchable archive of your recorded interactions!

ETA: In theory. In practice, dictation software algorithms are probably not up to the task of turning noisy speech from different people into text with any reasonable accuracy.

Comment author: andreas 23 October 2010 12:32:57AM *  15 points [-]

The key idea behind Church and similar languages is that they allow us to express and formally reason about a large class of probabilistic models, many of which cannot be formalized in any concise way as Bayes nets.

Bayes nets express generative models, i.e. processes that generate data. To infer the states of hidden variables from observations, you condition the Bayes net and compute a distribution on the hidden variable settings using Bayesian inference or some approximation thereof. A particularly popular class of approximations is the class of sampling algorithms, e.g. Markov Chain Monte Carlo methods (MCMC) and importance sampling.

Probabilistic programs express a larger class of models, but very similar approximate inference algorithms can be used to condition a program on observations and to infer the states of hidden variables. In both machine learning and cognitive science, when you are doing Bayesian inference with some model that expresses your prior belief, you usually code both model and inference algorithm and make use of problem-specific approximations to Bayesian inference. Probabilistic programs separate model from inference by using universal inference algorithms.

If you are interested in this set of ideas in the context of cognitive science, I recommend this interactive tutorial.

Church is based on Lisp. At the lowest level, it replaces Boolean gates with stochastic digital circuits. These circuits are wired together to form Markov chains (the probabilistic counterpart of finite state machines.) At the top, it's possible to define probabilistic procedures for generating samples from recursively defined distributions.

This confuses Church as a language for expressing generative models with ideas on how to implement such a language in hardware. There are three different ideas here:

  • Church as a language for generative models with sampling-based semantics
  • MCMC as an approximate inference method for such models (that can be implemented on traditional von Neumann architectures)
  • Machine architectures that are well-suited for MCMC

So this is an open call for volunteers -- any brave Bayesians want to blog about a brand new computer language?

I'll write an exposition within the next weeks if people are interested.

Comment author: andreas 08 October 2010 02:49:11AM 1 point [-]

The notion of abstract state machines may be useful for a formalization of operational equivalence of computations.

View more: Prev | Next