Clippy comments on Working hurts less than procrastinating, we fear the twinge of starting - Less Wrong

142 Post author: Eliezer_Yudkowsky 02 January 2011 12:15AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (138)

You are viewing a single comment's thread. Show more comments above.

Comment author: Will_Sawin 03 January 2011 07:12:11PM 2 points [-]

If they don't know that they are irrational in this manner:

"I'll give you tools when you need them / money when you work if you pay me now"

"OK, I'll work tomorrow, so that's a good deal"

"You never worked, so I got free money.

If they know they are irrational:

"I'll act as a commitment mechanism. Sign this contract saying you'll pay me if you don't work."

"This benefits me. OK."

<next day>

"I'll relax your commitment for you so you don't have to work. You still have to pay me some, though."

"This benefits me, I really don't want to work right now."

There is ALWAYS a way.

Comment author: Perplexed 03 January 2011 08:07:34PM 2 points [-]

That exploit works against a hyperbolic discounter who today wants to work tomorrow, but tomorrow doesn't want to work today.

It doesn't work against Clippy's example of an exponential discounter who doesn't want to work today and knows that tomorrow he still won't want to work today, but still claims to want to work someday, even though he can't say when.

Our agent cannot reason from "I want to work someday" to "There exists a day in the finitely distant future when I will want to work". He is missing some kind of reverse induction axiom. We agree that there is something wrong with this agent's thinking.

But, I don't see how to exploit that flaw.

Comment author: Clippy 03 January 2011 09:00:42PM *  1 point [-]

It doesn't work against Clippy's example of an exponential discounter who doesn't want to work today and knows that tomorrow he still won't want to work today, but still claims to want to work someday, even though he can't say when.

Almost. It depends on the agent's computational abilities. From the criteria I specified, it is unclear whether the agent realizes that tomorrow its decision theory will output the same action every day (i.e. that it recognizes the symmetry between today and tomorrow under the current decision theory).

If you assume the agent correctly infers that its current decision theory will lead it to perpetually defer work, then it will recognize that the outcome is suboptimal and search for a better decision theory. However, if the agent is unable to reach sufficient (correct) logical certainty about tomorrow's action, then it is vulnerable to the money pump that User:Will_Sawin described.

I was working from the assumption that the agent is able to recognize the symmetry with future actions and so did not consider the money pump that User:Will_Sawin described. Such an agent is still, in theory, exploitable, because (under my assumptions about how such an agent could fail), the agent will sometimes conclude that it ought to work, and sometimes that it ought not, with the money-pumper profiting from the (statistically) predictable shifts.

Even so, that would require that the agent I specified use one more predicate in its decision theory -- some source of randomness.