Open thread, Sep. 19 - Sep. 25, 2016

DataPacRat

If it's worth saying, but not worth its own post, then it goes here.

Notes for future OT posters:

1. Please add the 'open_thread' tag.

2. Check if there is an active Open Thread before posting a new one. (Immediately before; refresh the list-of-threads page before posting.)

3. Open Threads should start on Monday, and end on Sunday.

4. Unflag the two options "Notify me of new top level comments on this article" and "Make this post available under..." before submitting.

If it's worth saying, but not worth its own post, then it goes here.

Notes for future OT posters:

1. Please add the 'open_thread' tag.

2. Check if there is an active Open Thread before posting a new one. (Immediately before; refresh the list-of-threads page before posting.)

3. Open Threads should start on Monday, and end on Sunday.

4. Unflag the two options "Notify me of new top level comments on this article" and "Make this post available under..." before submitting.

You want some sort of adaptive or sequential design (right?), so the optimal design not being terribly helpful is not surprising: they're more intended for fixed up-front designing of experiments. They also tend to be oriented towards overall information or reduction of variance, which doesn't necessarily correspond to your loss function. Having priors affects the optimal design somewhat (usually, you can spend fewer datapoints on the variables with prior information; for a Bayesian experimental design, you can simulate a set of parameters from your priors and then simulate drawing n datapoints with a particular experimental design, fit the model, find your loss or your entropy/variance, record the loss/design, and repeat many times; then find the design with the best average loss.).

If you are running the learning material experiment indefinitely and want to maximize cumulative test scores, then it's a multi-armed bandit and so Thompson sampling on a factorial Bayesian model will work well & handle your 3 desiderata: you set your informative priors on each learning material, model as a linear model (with interactions?), and Thompson sample from the model+data.

If you want to find what set of learning materials is optimal as fast as possible by the end of your experiment, then that's the 'best-arm identification' multi-armed bandit problem. You can do a kind of Thompson sampling there too: best-arm Thompson sampling: http://imagine.enpc.fr/publications/papers/COLT10.pdf https://www.escholar.manchester.ac.uk/api/datastream?publicationPid=uk-ac-man-scw:227658&datastreamId=FULL-TEXT.PDF http://nowak.ece.wisc.edu/bestArmSurvey.pdf http://arxiv.org/pdf/1407.4443v1.pdf https://papers.nips.cc/paper/4478-multi-bandit-best-arm-identification.pdf One version goes: with the full posteriors, find the action A with the best expected loss; for all the other actions B..Z, Thompson sample their possible value; take the action with the best loss out of A..Z. This explores the other arms in proportion to their remaining chance of being the best arm, better than A, while firming up the estimate of A's value.

You want some sort of adaptive or sequential design (right?), so the optimal design not being terribly helpful is not surprising: they're more intended for fixed up-front designing of experiments.

So after looking at the problem I'm actually working on, I realize an adaptive/sequential design isn't really what I'm after.

What I really want is a fractional factorial model that takes a prior (and minimizes regret between information learned and cumulative score). It seems like the goal of multi-armed bandit is to do exactly that, but I only want to do it ... (read more)