xkcd on the AI box experiment

FiftyTwo

What I don't quite understand is why the following, simpler argument isn't sufficient. It seems to lead to the same results, and it doesn't require acausal trade.

I'm not building just any AI. I want to build an AI that will, by design, reward its builders. Just like any other tool I build, I wouldn't do it if I didn't expect it to do certain things and not do other ones.

Similarly, if you cooperate with Roko's Basilisk, you try to build it because it's the kind of AI that punishes those who didn't try to build it. You know it punishes non-builders, because that's how you're building it. And the reason you're building it is that you fear that if you don't, someone else will, and then the AI will punish you for not building it first.

If you have a valid reason to fear someone else will build it, and you can't avert it by other means, then it makes sense for you to build it first. Similarly, if you think a likely outcome of an AI race is an AI that helps its builders (and doesn't harm anyone else), then you try to build the first one (and if helping others is part of your utility function the AI will do that too to reward you).

Of course, like any argument, if you don't accept the premises, then the conclusion doesn't hold. And I have no strong reason to think someone else is going to build a torture-everyone-else AI.

What does the acausal trade argument tell us beyond this simple model? Does it tell us to cooperate with the future AI even if we don't think it will be built if we cooperate, or will be built by someone else if we don't? Or does it tell us to cooperate quantitatively more? Or in other situations?

28

xkcd on the AI box experiment

28

28

28

xkcd on the AI box experiment

28

28