Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.

Alerus comments on Towards a New Decision Theory for Parallel Agents - Less Wrong

4 Post author: potato 07 August 2011 11:39PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (15)

You are viewing a single comment's thread.

Comment author: Alerus 25 December 2011 04:03:00PM 0 points [-]

I think you may be partitioning things that need not necessarily be partitioned and it's important to note that. In the nicotine example (or the "lock the refrigerator door" example in the cited material), this is not necessarily a competition between the wants of different agents. This apparent dichotomy can also be resolved by internal states as well as utility discount factors.

To be specific, revisit the nicotine problem. When a person decides to quit they may not be suffering any discomfort so the utility of smoking at that moment is small. Instead then, the eventual utility of longer life wins out and the agent decides to stop smoking. However, once discomfort sets in, it combines with the action of smoking because smoking will relieve the discomfort. Now the individual still has the other utility assigned to not dying sooner (which would favor the "don't smoke" action). However, the death outcome will happen much later. Even though death is far worse than the current discomfort being felt (assuming a "normal" agent ;), so long as the utilities also operate on a temporal discount factor, that utility may be reduced to be smaller than the utility of smoking that will remove the current discomfort due to how much it gets discounted from it happen much further in the future.

At no point have we needed to postulate that these are separate competing agents with different wants and this seeming contradiction is still perfectly resolved with a single utility function. In fact, wildly different agent behavior can be revealed by mere changes in the discount factor for enumerable reinforcement learning (RL) agents where discount and reward functions are central to the design of the algorithm.

Now, which answer to the question is true? Is the smoke/don't smoke contradiction a result of competing agents or discount factors and internal states? I suppose it could be either one, but it's important to not assume that these examples directly indicate that there are competing agents with different desires, otherwise you may lose yourself looking for something that isn't there.

Of course, even if we assume that there are competing agents with different desires, it seems to me this still can be, at least mathematically, reduced to a single utility function. All it means, is that you apply weights to the utilities of different agents, and then standard reasoning mechanisms are employed.