Comment Permalink

TylerJay10y70

This is really interesting. I thought I understood it and I wanted to verify that by trying to summarize it (and maybe help others too) but now I'm not so sure...

Edit: Just to save anybody the reading time, my reasoning is false below. After sleeping on it, I see my mistake. Nothing below the "False Thermodynamic Miracles" subheading made sense to me yesterday because I thought the purpose of the setup was to have an "off switch" on the simulated AI under the false belief (letting it see the result of the signal after some time period). I get it now though. Maximizing "[P(no miracle) C] + [P(miracle) u(A) given miracle]" is the same as maximizing "u(A) given miracle". So the AI will act as if the miracle happened, because there's no cost associated with those actions if it didn't happen—only benefits if it did.

As I understand it, the basic premise is that the AI has a utility function u(w,A) where "w" is whether or not the specific thermodynamic miracle (TM) occurred and A represents a regular/normal input to a utility function (state of the universe, action taken, or whatever).

u(w,A) = C (a low constant) if w is false (TM did not happen), and u(w,A) is essentially "u(A) given w" when w is true (ie. it's a "normal" kind of utility function, dominated by things other than the TM, though it still has to incorporate the fact that the TM happened into its model of the world).

So, the signal is sent, and it calculates its expected utility for either scenario. If w = false, u(w,A) = a low constant, and if w = true, it determines an action A that maximizes the utility function u(A)|w. As long as 2^-100 u( A ) > C, it starts taking action A.

Is this right so far?

Then, after "ON" has gone through, it must adapt itself and backtrack on its previous behaviour.

This mechanism seems underspecified to me (though maybe it's obvious to others). Why would it backtrack (or what do you mean by backtrack)? Eventually, it observes that w = false (that "ON" went through unchanged) and that its actions are no longer beneficial, so it just stops doing anything, right? The process terminates or it goes to standby?

If we want the AI to actually backtrack, wouldn't we need to modify the utility function so that it takes a configuration parameter as well? u(w,A,X) s.t. X = true iff the AI is in its initial configuration, and set it so that u(F,A,F) = 0, but u(F,A,T) = C ? (this probably counts as just an implementation detail, but I figured I'd mention it anyway)

The part that makes me feel like I didn't actually understand it at all is:

However, the only possible world that it cares about, now, is one in which the "ON" signal was overwritten by a thermodynamic miracle... into exactly the same "ON" signal. As long as the possibility of this is non-zero, the AI will behave as if it's certain.

Can you explain this a bit more? Do you mean that even after it observes "ON", it doesn't stop acting as if w = true? That P(TM that overwrites ON with ON) * u(A)|w > C ? If that's the case, then it would never backtrack, right? So it's essentially a full simulation of an AI under the assumption w, but with the knowledge that w is incredibly unlikely, and no built-in halting condition?

Thanks

19

False thermodynamic miracles

19

Acts and beliefs

Noisy events and thermodynamic miracles

False Thermodynamic Miracles

In equations

Same result, computationally

19