Well, one story is that humans and brains are irrational, and then you don't need a utility function or any other specific description of how it works. Just figure out what's really there and model it.
The other story is that we're hoping to make a Friendly AI that might make rational decisions to help people get what they want in some sense. The only way I can see to do that is to model people as though they actually want something, which seems to imply having a utility function that says what they want more and what they want less. Yes, it's not true, people aren't that rational, but if a FAI or anyone else is going to help you get what you want, it has to model you as wanting something (and as making mistakes when you don't behave as though you want something).
So it comes down to this question: If I model you as using some parallel decision theory, and I want to help you get what you want, how do I extract "what you want" from the model without first somehow converting that model to one that has a utility function?
That's suggestion five on the list:
Make sure that each CSA above the lowest level actually has "could", "should", and "would" labels on the nodes in its problem space, and make sure that those labels, their values, and the problem space itself can be reduced to the managing of the CSAs on the level below.
Figuring out exactly how it is that our preferences, i.e., our utility function, emerge from the managing of our subagents is my main motivation for suggesting the construction of a parallel decision theory, as well as understanding how our problem space emerges from the managing of other CSAs.
Make sure that each CSA above the lowest level actually has "could", "should", and "would" labels on the nodes in its problem space, and make sure that those labels, their values, and the problem space itself can be reduced to the managing of the CSAs on the level below.
That statement would be much more useful if you gave a specific example. I don't see how labels on the nodes are supposed to influence the final result.
There's a general principle here that I wish I could state well. It's something like "general ideas are easy, specific workable proposals are hard, and you're probably wasting people's time if you're only describing a solution to the easy parts of the problem".
One cause of this is that anyone who can solve the hard part of the problem can probably already guess the easy part, so they don't benefit much from you saying it. Another cause is that the solutions to the hard parts of the problem tend to have awkward aspects to them that are best dealt with by modifying the easy part, so a solution to just the easy part is sure to be unworkable in ways that can't be seen if that's all you have.
I have this issue with your original post, and most of the FAI work that's out there.
Well, you can see how labeling some nodes as "can" and some as "can't" could be useful, I'm sure. The "woulds" tell the agent what it can do from a given node, i.e., what nodes are connected to this node and how much pay off there is for choosing this node. The should labels are calculated from the "woulds" and utility function, i.e., the should label's value tells the agent to take the action or not take it.
I'm not trying to solve the specific parts of the problem in this post, I'm trying to pose the problem so that we can work out the specific parts; that's why it is called "Towards a New Decision Theory for Parallel Agents" rather than "A New Decision Theory for Parallel Agents". That's also why I am trying to assemble a team of people to help me work out the more specific parts of the problem. What you quoted was a suggestion I gave to any team that might pursue the goal of an axiomatic parallel decision theory independently, any such team, I would imagine, would understand what I meant by it, specially if they looked at the link right above it:
http://lesswrong.com/lw/174/decision_theory_why_we_need_to_reduce_could_would/ .
Observations relevant to the general case where the managing algorithm can be implicit or explicit in the system:
As Eliezer (IIRC) likes to point out, correlation doesn't happen sans shared causal origin. Correlation of utility functions is no different.
A (shared) cause doesn't need to be temporally antecedent to its effects; it can be timeless, as seen in e.g. convergent biological evolution, aliens in unconnected Hubble volumes coming up with the same laws of physics, aliens in unconnected Hubble volumes getting the same results from their arithmetic calculators when they punch in the same numbers, etc. You can sometimes usefully spin this as teleology, as evolutionary biologists tend to do.
This doesn't necessitate drawing a special causal arrow from the future to past. (/checks TDT document again. ) Though I'm still not seeing why you can't use standard Bayes nets to reason about logical causes---also informally known as teleology or generally timeless conditioning. In that case calling an arrow "logical" would just be a matter of convenience. Perhaps that's always been the case and I'd been wrongly assuming everyone was thinking something silly. Or I'm still thinking something silly 'cuz I'm missing an important point about the Markov condition or something. Anyway, point is, you can have causal arrows coming in from any direction, including the physical future. Just not the causal future, whatever that means. Determining which is which is a matter for theologians more than AI researchers.
So! Knowing that we have "parallel agents" that ended up with the same utility function, and knowing that those utility functions take a few bits to specify, we can thus infer that they have one or more common causes. For sake of simplicity we can reify those common causes into a "creator". XDT (exceptionless decision theory) becomes relevant. We can look for scenarios/conditions where XDT is invariant under labeling of different parts of its causal history (remember, that's not necessarily only its temporal history!) as "creator".
I have to leave for an appointment, but anyway, that could be a place to start importing relevant intuitions from.
I think you've brought up an excellent problem, but I have difficulty understanding what you concretely propose to study, and why it would be a good avenue of attack on these questions.
The fact that none of the decision-theory heavyweights have jumped in yet leads me to suspect that the proposed programme is either poorly communicated or the result of confused thought.
The sum of utility functions is also a utility function. Since humans can be inconsistent, our modules don't interact independently, but rather can amplify other parts and similar messy stuff. Once you allow that whole range of interactions, you're screwed - there's already no solution to many multiplayer games when things are independent!
A useful example is smokers that are quitting. Some part of their brains that can do complicated predictions doesn't want its body to smoke. This part of their brain wants to avoid death, i.e., will avoid death if it can, and knows that choosing the possible outcome of smoking puts its body at high risk for death.
I'm not even sure if this exact part of the brain wants to avoid death; maybe it's more complicated, and we have to introduce a third agent for that specific value (note that some say death might be a good thing, although they try to avoid it at all costs if necessary. This shows that the reasoning part of the brain doesn't automatically try to avoid death).
But there is some part that wants nicotine, and as such wants the body to have a cig, and some part that doesn't want to die, and as such wants the body to avoid smoking, that's all that's required.
Well, the connection between "I want to live" and "I'm probably going to die earlier if I smoke" has to be drewn somewhere, and if this requires an extra agent, things get more complicated. The brain is a mess anyway, I just wanted to make the remark that it might be even messier than it appears at a second look.
I think you may be partitioning things that need not necessarily be partitioned and it's important to note that. In the nicotine example (or the "lock the refrigerator door" example in the cited material), this is not necessarily a competition between the wants of different agents. This apparent dichotomy can also be resolved by internal states as well as utility discount factors.
To be specific, revisit the nicotine problem. When a person decides to quit they may not be suffering any discomfort so the utility of smoking at that moment is small. Instead then, the eventual utility of longer life wins out and the agent decides to stop smoking. However, once discomfort sets in, it combines with the action of smoking because smoking will relieve the discomfort. Now the individual still has the other utility assigned to not dying sooner (which would favor the "don't smoke" action). However, the death outcome will happen much later. Even though death is far worse than the current discomfort being felt (assuming a "normal" agent ;), so long as the utilities also operate on a temporal discount factor, that utility may be reduced to be smaller than the utility of smoking that will remove the current discomfort due to how much it gets discounted from it happen much further in the future.
At no point have we needed to postulate that these are separate competing agents with different wants and this seeming contradiction is still perfectly resolved with a single utility function. In fact, wildly different agent behavior can be revealed by mere changes in the discount factor for enumerable reinforcement learning (RL) agents where discount and reward functions are central to the design of the algorithm.
Now, which answer to the question is true? Is the smoke/don't smoke contradiction a result of competing agents or discount factors and internal states? I suppose it could be either one, but it's important to not assume that these examples directly indicate that there are competing agents with different desires, otherwise you may lose yourself looking for something that isn't there.
Of course, even if we assume that there are competing agents with different desires, it seems to me this still can be, at least mathematically, reduced to a single utility function. All it means, is that you apply weights to the utilities of different agents, and then standard reasoning mechanisms are employed.
I hope that the considerations above are enough to convince reductionists that we should develop a parallel decision theory if we want to reduce decision making to computing.
Because of time taken for light to travel from one side of the agent's brain to the other? Probably insignificant except for a few very rapid real-time decisions.
Decision algorithms are usually highly parallelizable anyway - e.g. see chess or go.
A recent post: Consistently Inconsistent, raises some problems with the unitary view of the mind/brain, and presents the modular view of the mind as an alternate hypothesis. The parallel/modular view of the brain not only deals better with the apparent hypocritical and contradictory ways our desires, behaviors, and believes seem to work, but also makes many successful empirical predictions, as well as postdictions. Much of that work can be found in Dennett's 1991 book: "Consciousness Explained" which details both the empirical evidence against the unitary view, and the intuition-fails involved in retaining a unitary view after being presented with that evidence.
The aim of this post is not to present further evidence in favor of the parallel view, nor to hammer any more nails in the the unitary view's coffin; the scientific and philosophical communities have done well enough in both departments to discard the intuitive hypothesis that there is some executive of the mind keeping things orderly. The dilemma I wish to raise is a question: "How should we update our decision theories to deal with independent, and sometimes inconsistent, desires and believes being had by one agent?"
If we model one agent's desires by using one utility function, and this function orders the outcomes the agent can reach on one real axis, then it seems like we might be falling back into the intuitive view that there is some me in there with one definitive list of preferences. The picture given to us by Marvin Mimsky and Dennett involves a bunch of individually dumb agents, each with a unique set of specialized abilities and desires, interacting in such a way so as to produce one smart agent, with a diverse set of abilities and desires, but the smart agent only apears when viewed from the right level of description. For convenience, we will call those dumb-specialized agents "subagents", and the smart-diverse agent that emerges from their interaction "the smart agent". When one considers what it would be useful for a seeing-neural-unit to want to do, and contrasts it with what it would be useful for a get that food-neural-unit to want to do, e.g., examine that prey longer v.s. charge that prey, turn head v.s. keep running forward, stay attentive v.s. eat that food, etc. it becomes clear that cleverly managing which unit gets to have how much control, and when, is an essential part of the decision making process of the whole. Decision theory, as far as I can tell, does not model any part of that managing process; instead we treat the smart agent as having its own set of desires, and don't discuss how the subagents' goals are being managed to produce that global set of desires.
It is possible that the many subagents in a brain act isomorphically to an agent with one utility function and a unique problem space, when they operate in concert. A trivial example of such an agent might have only two subagents "A" and "B", and possible outcomes O1 through On. We can plot the utilities that each subagent gives to these outcomes on a two dimensional positive Cartesian graph; A's assigned utilities being represented by position in X, and B's utilities by position in Y. The method by which these subagents are managed to produce behavior might just be: go for the possible outcome furthest from (0,0); in, which case, the utility function of the whole agent U(Ox) would just be the distance from (0,0) to (A's U(Ox) , B's U(Ox)).
An agent which manages its subagents so as to be isomorphic to one utility function on one problem space is certainly mathematically describable, but also implausible. It is unlikely that the actual physical-neural subagents in a brain deal with the same problem spaces, i.e., they each have their own unique set of O1 through On. It is not as if all the subagents are playing the same game, but each has a unique goal within that game – they each have their own unique set of legal moves too. This makes it problematic to model the global utility function of the smart agent as assigning one real number to every member of a set of possible outcomes, since there is no one set of possible outcomes for the smart agent as a whole. Each subagent has its own search space with its own format of representation for that problem space. The problem space and utility function of the smart agent are implicit in the interactions of the subagents; they emerge from the interactions of agents on a lower level; the smart agents utility function and problem space are never explicitly written down.
A useful example is smokers that are quitting. Some part of their brains that can do complicated predictions doesn't want its body to smoke. This part of their brain wants to avoid death, i.e., will avoid death if it can, and knows that choosing the possible outcome of smoking puts its body at high risk for death. Another part of their brains wants nicotine, and knows that choosing the move of smoking gets it nicotine. The nicotine craving subagent doesn't want to die, it also doesn't want to stay alive, these outcomes aren't in the domain of the nicotine-subagent's utility function at all. The part of the brain responsible for predicting its bodies death if it continues to smoke, probably isn't significantly rewarded by nicotine in a parallel manner. If a cigarette is around and offered to the smart agent, these subagents must compete for control of the relevant parts of their body, e.g., nicotine-subagent might set off a global craving, while predict-the-future-subagent might set off a vocal response saying "no thanks, I'm quitting." The overall desire to smoke or not smoke of the smart agent is just the result of this competition. Similar examples can be made with different desires, like a desire to over eat and a desire to look slim, or the desire to stay seated and the desire to eat a warm meal.
We may call the algorithm which settles these internal power struggles the "managing algorithm", and we may call a decision theory which models managing algorithms a "parallel decision theory". It's not the businesses of decision theorists to discover the specifics of the human managing process, that's the business of empirical science. But certain parts of the human managing algorithm can be reasonably decided on. It is very unlikely that our managing algorithm is utilitarian for example, i.e., the smart agent doesn't do whatever gets the highest net utility for its subagents. Some subagents are more powerful than others; they have a higher prior chance of success than their competitors; some others are weak in a parallel fashion. The question of what counts as one subagent in the brain is another empirical question which is not the business of decision theorists either, but anything that we do consider a subagent in a parallel theory must solve its problem in the form of a CSA, i.e., it must internally represent its outcomes, know what outcomes it can get to from whatever outcome it is at, and assign a utility to each outcome. There are likely many neural units that fit that description in the brain. Many of them probably contain as parts subsubagnets which also fit this description, but eventually, if you divide the parts enough, you get to neurons which are not CSAs, and thus not subagents.
If we want to understand how we make decisions, we should try to model a CSA, which is made out of more spcialized sub-CSAs competing and agreeing, which are made out of further specialized sub-sub-CSAs competing and agreeing, which are made out of, etc. which are made out of non-CSA algorithms. If we don't understand that, we don't understand how brains make decisions.
I hope that the considerations above are enough to convince reductionists that we should develop a parallel decision theory if we want to reduce decision making to computing. I would like to add an axiomatic parallel decision theory to the LW arsenal, but I know that that is not a one man/woman job. So, if you think you might be of help in that endeavor, and are willing to devote yourself to some degree, please contact me at hastwoarms@gmail.com. Any team we assemble will likely not meet in person often, and will hopefully frequently meet on some private forum. We will need decision theorists, general mathematicians, people intimately familiar with the modular theory of mind, and people familiar with neural modeling. What follows are some suggestions for any team or individual that might pursue that goal independently: