MattG2 comments on Open thread, Sep. 19 - Sep. 25, 2016 - Less Wrong

2 Post author: DataPacRat 19 September 2016 06:34PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (92)

You are viewing a single comment's thread. Show more comments above.

Comment author: MattG2 22 September 2016 07:17:29PM *  1 point [-]

You want some sort of adaptive or sequential design (right?), so the optimal design not being terribly helpful is not surprising: they're more intended for fixed up-front designing of experiments.

So after looking at the problem I'm actually working on, I realize an adaptive/sequential design isn't really what I'm after.

What I really want is a fractional factorial model that takes a prior (and minimizes regret between information learned and cumulative score). It seems like the goal of multi-armed bandit is to do exactly that, but I only want to do it once, assuming a fixed prior which doesn't update over time.

Do you think your monte-carlo Bayesian experimental design is the best way to do this, or can I utilize some of the insights from Thompson sampling to make this process a bit less computationally expensive (which is important for my particular use case)?

Comment author: gwern 23 September 2016 04:34:44PM 1 point [-]

but I only want to do it once, assuming a fixed prior which doesn't update over time.

I still don't understand what you're trying to do. If you're trying to maximize test scores by increasing them through picking textbooks and this is done many times, you want a multi-armed bandit to help you find what is the best textbook over the many students exposed to different combinations. If you are throwing out the information from each batch and assuming the interventions are totally different each time, then your decision is made before you do any learning and your optimal choice is simply whatever your prior says: the value of information is the subsequent decisions it affects, except you're not updating your prior so the information can't change any decisions after the first one and is worthless.

Do you think your monte-carlo Bayesian experimental design is the best way to do this, or can I utilize some of the insights from Thompson sampling to make this process a bit less computationally expensive (which is important for my particular use case)?

Dunno. Simulation is the most general way of tackling the problem, which will work for just about anything, but can be extremely computationally expensive. There are many special cases which can reuse computations or have closed-form solutions, but must be considered on a case by case basis.