Contrived infinite-torture scenarios: July 2010

PlaidX

This is our monthly thread for collecting arbitrarily contrived scenarios in which somebody gets tortured for 3^^^^^3 years, or an infinite number of people experience an infinite amount of sorrow, or a baby gets eaten by a shark, etc. and which might be handy to link to in one of our discussions. As everyone knows, this is the most rational and non-obnoxious way to think about incentives and disincentives.

Please post all infinite-torture scenarios separately, so that they can be voted up/down separately. (If they are strongly related, reply to your own comments. If strongly ordered, then go ahead and post them together.)
No more than 5 infinite-torture scenarios per person per monthly thread, please.

Please post all infinite-torture scenarios separately, so that they can be voted up/down separately. (If they are strongly related, reply to your own comments. If strongly ordered, then go ahead and post them together.)
No more than 5 infinite-torture scenarios per person per monthly thread, please.

To me a reasonable utility function has to have a degree of self-consistency. A reasonable utility function wouldn't value both doing and undoing the same action simultaneously.

If an entity is using a utility function to determine its actions, then for every action the entity can perform, its utility function must be able to determine a utility value which then determines whether the entity does the action or not. If the utility function does not return a value, then the entity still has to act or not act, so the entity still has a utility function for that action (non-action).

The purpose of a utility function is to inform the entity so it seeks to perform actions that result in greater utility. A utility function that is self-contradictory defeats the whole purpose of a utility function. While an arbitrary utility function can in principle occur, an intelligent entity with a self-contradictory utility function would achieve greater utility by modifying its utility function until it was less self-contradictory.

It is probably not possible to have a utility function that is both complete (in that it returns a utility for each action the entity can perform) and consistent (that it returns a single value for the utility of each action the entity can perform) except for very simple entities. An entity complex enough to instantiate arithmetic is complex enough to invoke Gödel's theorem. An entity can substitute a random choice when its utility function does not return a value, but that will result in sub-optimal results.

In the example that FAWS used, a utility function that seeks to annoy me as much as possible, is inconsistent with the entity being an omnipotent AI that can simulate something as complex as me, an entity which can instantiate arithmetic. The only annoyance the AI has caused me is a -1 karma, which to me is less than a single dust mote in the eye.

While an arbitrary utility function can in principle occur, an intelligent entity with a self-contradictory utility function would achieve greater utility by modifying its utility function until it was less self-contradictory.

To the extent humans have utility functions (e.g. derived from their behavior), they are often contradictory, yet few humans try to change their utility functions (in any of several applicable senses of the word) to resolve such contradictions.

This is because human utility functions generally place negative value on changing your ... (read more)

2FAWS16y

I said as many times, not as much as possible. The AI might value that particular kind and degree of annoyance uniquely, say as a failed FAI that was programmed to maximize rich, not strongly negative human experience according to some screwed up definition of rich experiences, and according to this definition your state of mind between reading and replying to that message scores best, so the AI spends as many computational resources as possible on simulating you reacting to that message. Or perhaps it was supposed to value telling the truth to humans, there is a complicated formula for evaluating the value of each statement, due to human error it values telling the truth without being believed higher (the programmer thought non-obvious truths are more valuable), and simulating you reacting to that statement is the most efficient way to make a high scoring true statement that will not be believed. Or it could value something else entirely that's just not obvious to a human. There should be an infinite number of non-contradictory utility functions valuing doing what it supposedly did, even though the prior for most of them is pretty low (and only a small fraction of them should value still simulating you now, so by now you can be even more sure the original statement was wrong than you could be then for reasons unrelated to your deduction)

31

Contrived infinite-torture scenarios: July 2010

31

31

31

Contrived infinite-torture scenarios: July 2010

31

31