You are sitting in your bedroom, dorm, tent, sidewalk, wherever you are staying right now, reading these words, when a dark void appears in your peripheral vision. A figure walks out of the void. They look a bit different, but also familiar. Surprise! It’s you from the future! And you look good! The two yous perform a variety of steps to prove that this Future You either really is You, or else someone with an insane amount of embarrassingly private information about you that they somehow copied onto a very convincing replica.

You ask how the future is, and if AGI has come yet.

You nod, smiling.

Since you exist in the future with an AGI, you reason that alignment must have somehow been achieved, and you ask how. The You from the future winks, and tosses you a single sheet of paper outlining the main reasoning behind the solution.

And on that paper, you see…

Quick! Write down what that paper contains. Your best bet right now, given that we already solved alignment.

You still there? Great! Well, surprise! That wasn’t the actual plan to solve alignment! The future You laughs.

“Oh!” Future You says, “I can’t believe that you actually believed that! Please tell me I wasn’t that naive back then. Don’t you see how that could obviously go wrong?”

You feel the heat going to your cheeks. Future You is being so mean! Such an arrogant bully. You look back at the paper, determined to prove you weren’t that naive in the past, and that you can see how plans can go wrong.

“Well,” Future You says after a few minutes, grinning smugly, “Do you see now why this plan simply won’t work in the real world?”

What do you say?

Future You unfortunately hasn’t had enough fun with you yet. And they keep sending you false plan after false plan, each time promising that this is the actual plan this time, really, before bursting out laughing, and forcing you to reason about why it would have failed again. Repeat this exercise however many times you want, until you are out of ideas, or until you’ve run out of scenarios for how a plan could go wrong. Try to reason thoroughly-enough to slap the grin off of Future You’s smug face.

 

This is a more fun variation of Murphyjitsu

New to LessWrong?