(This is the second post in a planned sequence.)
Let’s say you’re building an artificial intelligence named Bob. You’d like Bob to sally forth and win many utilons on your behalf. How should you build him? More specifically, should you build Bob to have a world-model in which there are many different actions he “could” take, each of which “would” give him particular expected results? (Note that e.g. evolution, rivers, and thermostats do not have explicit “could”/“would”/“should” models in this sense -- and while evolution, rivers, and thermostats are all varying degrees of stupid, they all still accomplish specific sorts of world-changes. One might imagine more powerful agents that also simply take useful actions, without claimed “could”s and “woulds”.)
My aim in this post is simply to draw attention to “could”, “would”, and “should”, as concepts folk intuition fails to understand, but that seem nevertheless to do something important for real-world agents. If we want to build Bob, we may well need to figure out what the concepts “could” and “would” can do for him.*
Introducing Could/Would/Should agents:
Let a Could/Would/Should Algorithm, or CSA for short, be any algorithm that chooses its actions by considering a list of alternatives, estimating the payoff it “would” get “if” it took each given action, and choosing the action from which it expects highest payoff.
That is: let us say that to specify a CSA, we need to specify:
- A list of alternatives a_1, a_2, ..., a_n that are primitively labeled as actions it “could” take;
- For each alternative a_1 through a_n, an expected payoff U(a_i) that is labeled as what “would” happen if the CSA takes that alternative.
To be a CSA, the algorithm must then search through the payoffs for each action, and must then trigger the agent to actually take the action a_i for which its labeled U(a_i) is maximal.
Note that we can, by this definition of “CSA”, create a CSA around any made-up list of “alternative actions” and of corresponding “expected payoffs”.
The puzzle is that CSAs are common enough to suggest that they’re useful -- but it isn’t clear why CSAs are useful, or quite what kinds of CSAs are what kind of useful. To spell out the puzzle:
Puzzle piece 1: CSAs are common. Humans, some (though far from all) other animals, and many human-created decision-making programs (game-playing programs, scheduling software, etc.), have CSA-like structure. That is, we consider “alternatives” and act out the alternative from which we “expect” the highest payoff (at least to a first approximation). The ubiquity of approximate CSAs suggests that CSAs are in some sense useful.
Puzzle piece 2: The naïve realist model of CSAs’ nature and usefulness doesn’t work as an explanation.
That is: many people find CSAs’ usefulness unsurprising, because they imagine a Physically Irreducible Choice Point, where an agent faces Real Options; by thinking hard, and choosing the Option that looks best, naïve realists figure that you can get the best-looking option (instead of one of those other options, that you Really Could have gotten).
But CSAs, like other agents, are deterministic physical systems. Each CSA executes a single sequence of physical movements, some of which we consider “examining alternatives”, and some of which we consider “taking an action”. It isn’t clear why or in what sense such systems do better than deterministic systems built in some other way.
Puzzle piece 3: Real CSAs are presumably not built from arbitrarily labeled “coulds” and “woulds” -- presumably, the “woulds” that humans and others use, when considering e.g. which chess move to make, have useful properties. But it isn’t clear what those properties are, or how to build an algorithm to compute “woulds” with the desired properties.
Puzzle piece 4: On their face, all calculations of counterfactual payoffs (“woulds”) involve asking questions about impossible worlds. It is not clear how to interpret such questions.
Determinism notwithstanding, it is tempting to interpret CSAs’ “woulds” -- our U(a_i)s above -- as calculating what “really would” happen, if they “were” somehow able to take each given action.
But if agent X will (deterministically) choose action a_1, then when he asks what would happen “if” he takes alternative action a_2, he’s asking what would happen if something impossible happens.
If X is to calculate the payoff “if he takes action a_2” as part of a causal world-model, he’ll need to choose some particular meaning of “if he takes action a_2” – some meaning that allows him to combine a model of himself taking action a_2 with the rest of his current picture of the world, without allowing predictions like “if I take action a_2, then the laws of physics will have been broken”.
We are left with several questions:
- Just what are humans, and other common CSAs, calculating when we imagine what “would” happen “if” we took actions we won’t take?
- In what sense, and in what environments, are such “would” calculations useful? Or, if “would” calculations are not useful in any reasonable sense, how did CSAs come to be so common?
- Is there more than one natural way to calculate these counterfactual “would”s? If so, what are the alternatives, and which alternative works best?
*A draft-reader suggested to me that this question is poorly motivated: what other kinds of agents could there be, besides “could”/“would”/“should” agents? Also, how could modeling the world in terms of “could” and “would” not be useful to the agent?
My impression is that there is a sort of gap in philosophical wariness here that is a bit difficult to bridge, but that one must bridge if one is to think well about AI design. I’ll try an analogy. In my experience, beginning math students simply expect their nice-sounding procedures to work. For example, they expect to be able to add fractions straight across. When you tell them they can’t, they demand to know why they can’t, as though most nice-sounding theorems are true, and if you want to claim that one isn’t, the burden of proof is on you. It is only after students gain considerable mathematical sophistication (or experience getting burned by expectations that don’t pan out) that they place the burden of proofs on the theorems, assume theorems false or un-usable until proven true, and try to actively construct and prove their mathematical worlds.
Reaching toward AI theory is similar. If you don’t understand how to reduce a concept -- how to build circuits that compute that concept, and what exact positive results will follow from that concept and will be absent in agents which don’t implement it -- you need to keep analyzing. You need to be suspicious of anything you can’t derive for yourself, from scratch. Otherwise, even if there is something of the sort that is useful in the specific context of your head (e.g., some sort of “could”s and “would”s that do you good), your attempt to re-create something similar-looking in an AI may well lose the usefulness. You get cargo cult could/woulds.
+ Thanks to Z M Davis for the above gorgeous diagram.
Warning: grouchiness follows.
Actually, I made the same criticism of that category, except in more detail. Was that acausal, or am I just more worthy of reviewing your drafts?
And your response in the footnote looks like little more than, "don't worry, you'll get it some day, like schoolkids and fractions". Not helpful.
Excuse me, isn't this just the classical "rational agent" model that research has long since refuted? For one thing, many actions people perform are trivially impossible to interpret this way (in the sense of your diagram), given reaction times and known computational properties of the brain. That is, the brain doesn't have enough time to form enough distinct substates isomorphic to several human-like responses, then evaluate them, then compare the evaluations.
For another, the whole heuristics and biases literature repeated ad infinitum on OB/LW.
Finally, even when humans do believe they're evaluating several choices looking for the best payoff (per some multivariable utility function), what really happens is that they pick one quickly based on "gut instinct" -- meaning some heuristic, good or bad -- and then bend all conscious evaluation to favor it. In at least some laboratory settings, this is shown explicitly: the researchers can predict what the subject will do, and then the subject gives some plausible-sounding rationalization for why they did it.
(And if you say, "using heuristics is a kind of evaluation of alternatives", then you're again stretching the boundaries of the concept of a CSA wide enough to be unhelpful.)
There are indeed cases where people do truly consider the alternatives and make sure they are computing the actual consequences and the actual congruence with their actual values, but this is an art people have to genuinely work towards; it is not characteristic of general human action.
In any case, all of the above assumes a distinction I'm not convinced you've made. To count as a CSA, is it necessary that you be physically able to extract the alternatives under consideration ("Silas considered making his post polite, but assigned it low utility")? Because the technology certainly doesn't exist to do that on humans. Or is it only necessary that it be possible in principle? If the latter, you run back into the problem of the laws of physics being embedded in all parts of the universe:
I observe a pebble. Therefore, I know the laws of the universe. Therefore, I can compute arbitrary counterfactuals. Therefore, I compute a zero pebble-utility for everything the pebble "pebble-could" do, except follow the laws of physics.
Therefore, there is no "not-CSA" option.
If it is possible in principle, to physically extract the alternatives/utility assignments etc., wouldn't that be sufficient to ground the CSA--non-CSA distinction, without running afoul of either current technological limitations, or the pebble-as-CSA problem? (Granted, we might not always know whether a given agent is really a CSA or not, but that doesn't seem to obviate the distinction itself.)