Comment author: TylerJay 05 March 2015 09:03:54PM *  5 points [-]

This is really interesting. I thought I understood it and I wanted to verify that by trying to summarize it (and maybe help others too) but now I'm not so sure...

Edit: Just to save anybody the reading time, my reasoning is false below. After sleeping on it, I see my mistake. Nothing below the "False Thermodynamic Miracles" subheading made sense to me yesterday because I thought the purpose of the setup was to have an "off switch" on the simulated AI under the false belief (letting it see the result of the signal after some time period). I get it now though. Maximizing "[P(no miracle) * C] + [P(miracle) * u(A) given miracle]" is the same as maximizing "u(A) given miracle". So the AI will act as if the miracle happened, because there's no cost associated with those actions if it didn't happen—only benefits if it did.

As I understand it, the basic premise is that the AI has a utility function u(w,A) where "w" is whether or not the specific thermodynamic miracle (TM) occurred and A represents a regular/normal input to a utility function (state of the universe, action taken, or whatever).

u(w,A) = C (a low constant) if w is false (TM did not happen), and u(w,A) is essentially "u(A) given w" when w is true (ie. it's a "normal" kind of utility function, dominated by things other than the TM, though it still has to incorporate the fact that the TM happened into its model of the world).

So, the signal is sent, and it calculates its expected utility for either scenario. If w = false, u(w,A) = a low constant, and if w = true, it determines an action A* that maximizes the utility function u(A)|w. As long as 2^-100 * u( A* ) > C, it starts taking action A*.

Is this right so far?

Then, after "ON" has gone through, it must adapt itself and backtrack on its previous behaviour.

This mechanism seems underspecified to me (though maybe it's obvious to others). Why would it backtrack (or what do you mean by backtrack)? Eventually, it observes that w = false (that "ON" went through unchanged) and that its actions are no longer beneficial, so it just stops doing anything, right? The process terminates or it goes to standby?

If we want the AI to actually backtrack, wouldn't we need to modify the utility function so that it takes a configuration parameter as well? u(w,A,X) s.t. X = true iff the AI is in its initial configuration, and set it so that u(F,A,F) = 0, but u(F,A,T) = C ? (this probably counts as just an implementation detail, but I figured I'd mention it anyway)

The part that makes me feel like I didn't actually understand it at all is:

However, the only possible world that it cares about, now, is one in which the "ON" signal was overwritten by a thermodynamic miracle... into exactly the same "ON" signal. As long as the possibility of this is non-zero, the AI will behave as if it's certain.

Can you explain this a bit more? Do you mean that even after it observes "ON", it doesn't stop acting as if w = true? That P(TM that overwrites ON with ON) * u(A)|w > C ? If that's the case, then it would never backtrack, right? So it's essentially a full simulation of an AI under the assumption w, but with the knowledge that w is incredibly unlikely, and no built-in halting condition?

Thanks

Comment author: GMHowe 05 March 2015 09:59:23PM *  3 points [-]

Why would it backtrack (or what do you mean by backtrack)? Eventually, it observes that w = false (that "ON" went through unchanged) and that its actions are no longer beneficial, so it just stops doing anything, right? The process terminates or it goes to standby?

I think the presumption is that the case where the "ON" signal goes thru normally and the case where the "ON" signal is overwritten by a thermodynamic miracle... into exactly the same "ON" signal are equivalent. That is that after the "ON" signal has gone though the AI would behave identically to an AI that was not indifferent to worlds where the thermodynamic miracle did not occur.

The reason for this is that although the chance that the "ON" signal was overwritten into exactly the same "ON" signal is tiny, it is the only remaining possible world that the AI cares about so it will act as if that is what it believes.

Comment author: wobster109 28 January 2015 11:50:10PM 0 points [-]

In Tuxedage's rule set, if the gatekeeper leaves before 2 hours, it counts as an AI win. So it's a viable strategy. However ---

I am sure that it would work against some opponents, but my feeling is it would not work against people on Less Wrong. It was a good try though.

Comment author: GMHowe 29 January 2015 01:29:35AM *  2 points [-]

I was not aware of Tuxedage's ruleset. However any ruleset that allows for the AI to win without being explicitly released by the gatekeeper is problematic.

If asd had won due to the gatekeeper leaving it would only have demonstrated that being unpleasant can cause people to disengage from conversation, which is different from demonstrating that it is possible to convince a person to release a potentially dangerous AI.

Comment author: [deleted] 28 January 2015 07:32:56AM 0 points [-]

I tried to make him have such an unpleasant time that he would quit before the time is up, so that I would win.

Comment author: GMHowe 28 January 2015 07:41:03AM 8 points [-]

That's not really in the spirit of the experiment. For the AI to win the gatekeeper must explicitly release the AI. If the gatekeeper fails to abide by the rules that merely invalidates the experiment.

In response to A List of Nuances
Comment author: GMHowe 14 November 2014 03:25:57AM *  8 points [-]

Everything is actually about signalling.

Counterclaim: Not everything is actually about signalling.

Almost everything can be pressed into use as a signal in some way. You can conspicuously overpay for things to signal affluence or good taste or whatever. Or you can put excessive amounts of effort into something to signal commitment or the right stuff or whatever. That almost everything can be used as a signal does not mean that almost everything is being used primarily as a signal all of the time.

Signalling only makes sense in a social environment, so things that you would do or benefit from even if you were in a nonsocial environment are good candidates for things that are not primarily about signalling. Things like eating, wearing clothes, sleeping areas, medical attention and learning.

Some of the items from the list of X is not about Y:

"Food isn’t about nutrition. Clothes aren’t about comfort. Bedrooms aren’t about sleep. Laughter isn’t about humour. Charity isn’t about helping. Medicine isn’t about health. Consulting isn’t about advice. School isn’t about learning. Research isn’t about progress. Language isn’t about communication."

All these are primarily about something other than signalling. Yes they can be "about" signalling some of the time to varying degrees but not as their primary purpose. (At least not without becoming dysfunctional.)

View more: Prev