We're interested in improving human rationality. Many of our techniques for improving human rationality take time. In real-time situations, you can lose by making the wrong decision, or by making the "right" decision too slowly. Most of us do not have inflexible-schedule, high-stakes decisions to make, though. How often does real-time decision making really come up?

Suppose you are making a fairly long-ranged decision. Call this decision 1. While analyzing decision 1, you come to a natural pause. At this pause you need to decide whether to analyze further, or to act on your best-so-far analysis. Call this decision 2. Note that decision 2 is made under tighter time pressure than decision 1. This scenario argues that decision-making is recursive, and so if there are any time bounds, then many decisions will need to be made at very tight time bounds.

A second, "covert" goal of this post is to provide a definitely-not-paradoxical problem for people to practice their Bayseian reasoning on. Here is a concrete model of real-time decisionmaking, motivated by medical-drama television shows, where the team diagnoses and treats a patient over the course of each episode. Diagnosing and treating a patient who is dying of an unknown disease is a colorful example of real-time decisionmaking.

To play this game, you need a coin, two six-sided dice, a deck of cards, and a helper to manipulate these objects. The manipulator sets up the game by flipping a coin. If heads (tails) the patient is suffering from an exotic fungus (allergy). Then the manipulator prepares a deck by removing all of the clubs (diamonds) so that the deck is a red-biased (black-biased) random-color generator. Finally, the manipulator determines the patients starting health by rolling the dice and summing them. All of this is done secretly.

Play proceeds in turns. At the beginning of each turn, the manipulator flips a coin to determine whether test results are available. If test results are available, the manipulator draws a card from the deck and reports its color. A red (black) card gives you suggestive evidence that the patient is suffering from a fungus (allergy). You choose whether to treat a fungus, allergy, or wait. If you treat correctly, the manipulator leaves the patient's health where it is (they're improving, but on a longer timescale). If you wait, the manipulator reduces the patient's health by one. If you treat incorrectly, the manipulator reduces the patient's health by two.

Play ends when you treat the patient for the same disease for six consecutive turns or when the patient reaches zero health.

Here is some Python code simulating a simplistic strategy. What Bayesian strategy yields the best results? Is there a concise description of this strategy? 

The model can be made more complicated. The space of possible actions is small. There is no choice of what to investigate next. In the real world, there are likely to be diminishing returns to further tests or further analysis. There could be uncertainty about how much time pressure there is. There could be uncertainty about how much information future tests will reveal. Every complication will make the task of computing the best strategy more difficult.

We need fast approximations to rationality (even quite bad approximations, if they're fast enough), as well as procedures that spend time in order to purchase a better result.

New Comment
18 comments, sorted by Click to highlight new comments since: Today at 4:03 AM

I think in general, the only way to solve these problems optimally is to build a full decision tree, and then apply backward induction. The reason is that at every decision point, the expected utility of a choice depends on what you do after that, so you need to first figure out what is the optimal strategy for the subtree rooted at that choice.

Sometimes, there are mathematical shortcuts that you can use to solve the problem faster. (Since the decision tree is exponential in size, doing backward induction may be infeasible.) I don't know if there is one here, but the decision tree seems to be small enough (at most size 9^12) that it can be solved by brute force using a computer. Apparently there are existing tools for doing this (which I found using Google) but I'm not really familiar with them.

I don't think the decision tree is really exponential growth because the order of operations doesn't matter. It only matters how many red and black cards you've gotten, not their order; it only matters how many treatments, not their order. Different orders involved different beliefs in the middle, but they don't involve different beliefs at the end, do they?

Yes, I think you're right. Because the order of observations doesn't influence posterior probabilities in this game, we can simplify the backward induction computation to take O(n^5) time instead of exponential. (n^5 because 5 variables, each less than n, are sufficient to describe a history: number of red and black cards, number of each type of treatment). This is still too long to do by hand, but easy with a computer.

I think three variables are sufficient: The excess of red over black cards, and the amount of damage done (by the combination of waiting and side effects) to each of (fungus-sufferers, allergy-sufferers).

According to my present implementation (which likely has bugs), the total number of triples that one needs to investigate before obtaining proof in the form of "if they weren't X, they''d be dead by now" is 906.

I crossposted the particular problem to a forum I frequent:

http://s6.zetaboards.com/EmpireLost/topic/8580222/

The conclusion in the thread is that you do best by treating based on the majority of the results at the time, but just sticking with any treatment you've already given the patient three times, on the theory that if it hasn't killed him yet it's probably right.

Unfortunately we reached this conclusion through Monte Carlo simulation, so there aren't any general arguments we can take away from it.

I love your rephrasing - awesome.

We can take away at least one general argument: Monte Carlo simulation is frequently a fast (that is, in human time) way to figure out decent strategies to stochastic decision problems.

At each step it's pretty easy to calculate the probability that the patient has each disease, and the probability that the patient will die if untreated.

You can get a .5 chance of saving the patient if you simply choose to treat for one disease and stick with that treatment. Let me arbitrarily say that if I choose this option I will treat for fungus.

I think a simple one move look ahead can give the optimum strategy: compare the chance of the patient dying on this turn to the chance of waiting saving the patient's life. It doesn't do any good to wait for more evidence if the extra evidence won't change your treatment anyway. So the chance of waiting saving patient is the confidence that you will have correctly identified the disease after waiting an extra turn minus .5 if changes your treatment and zero otherwise.

For example, if I have 1 red and one black card already and have not yet treated the patient and 4 turns have gone by, the chance of the patient dying next turn is I think 4/36. The the chance of waiting saving the patient's life is .5 .5 (x - .5) where the number x which I am too lazy to calculate is the probability the patient has an allergy given two black cards and one red one. (The first .5 is because there's only a coin flip chance of getting test results, the second because if I draw a red card it won't change my mind how to treat the patient).

I'd like to code your proposed strategy up and measure it by simulations, but I can't quite figure out what it is. Does your strategy start with default-treating someone for fungus, or with waiting? Does it switch to treating an allergy on the first suggestive evidence for allergy, or does it wait for conclusive proof?

By the way, you only need to treat him for the same disease six times to be guaranteed that it's helping. There's no need to prolong the game for the full twelve turns.

Yes, you're correct. Sorry about that.

Sounds fun! Easy to implement on a computer too. I wonder if players would discover the best strategy simply by practicing (and not by reasoning about the game).

And I wonder if doctors discover the best strategy the same way.

And I wonder if the 'best strategy' found by doctors in this way is a different thing entirely to the strategy that optimises patient outcomes. The best thing about learning by practising is that you can learn successfully without having to admit what your utility function looks like!

I don't see anything in this game that illustrates the need to decide whether to calculate more or go with what currently looks like the best decision. Waiting for more evidence is not the same as taking more time to calculate.

You're entirely correct; however, I couldn't figure out how to model taking more time to calculate except as waiting for more evidence. Do you have a model in mind?

OK, so you obviously wait on the first turn, because that can't possibly kill him. And if you get two cards of the same color on the first two turns, you treat him for that disease until he's cured or he dies, no matter what any future cards turn up. The interesting question is what if they alternate?

EDIT: THIS IS WRONG, DO NOT BELIEVE IT

[This comment is no longer endorsed by its author]Reply

Can you explain your reasoning about two cards of the same color? I don't follow.

That's probably because I'm wrong ^^;

I had the numbers working out to be that by the time you would get two cards worth of contradictory evidence for the original treatment, the fact that the original treatment hadn't killed him yet was enough evidence to keep going. But, my counting was off by a day. Whoops!