gRR comments on How can we ensure that a Friendly AI team will be sane enough? - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (64)
Remember, we're describing the situation where the cautionary position is provably correct. So your "greatest temptation ever" is (provably) a temptation to die a horrible death together with everyone else. Anyone smart enough to even start building AI would know and understand this.
That one has a provably Friendly AI is not the same thing as that any other AI is provably going to do terrible things.
My conditional was "cautionary position is the correct one". I meant, provably correct.
It's like with dreams of true universal objective morality: even if in some sense there is one, some agents are just going to ignore it.
Leaving out the "provably" makes a big difference. If you add "provably" then I think the conditional is so unlikely that I don't know why you'd assume it.
Well, assuming EY's view of intelligence, the "cautionary position" is likely to be a mathematical statement. And then why not prove it? Given several decades? That's a lot of time.
One is talking about a much stronger statement than provability of Friendliness (since one is talking about AI), so even if it is true, proving, or even formalizing, is likely to be very hard. Note that this is under the assumption that it is true: this seems wrong. Assume that one has a Friendliness protocol, and then consider the AI that has the rule "be Friendly but give 5% more weight to the preferences of people that have an even number of letters in their name" or even subtler "be Friendly, but if you ever conclude within 1-1/(3^^^^3) that confidence that 9/11 was done by time traveling aliens, then destroy humanity". The second will likely act identically to a Friendly AI.
I thought you were merely specifying that the FAI theory was proven to be Friendly. But you're also specifying that any AGI not implementing a proven FAI theory, is formally proven to be definitely disastrous. I didn't understand that was what you were suggesting.
Even then there remains a (slightly different) problem. An AGI may Friendly to someone (presumably its builders) at the expense of someone else. We have no reason to think any outcome an AGI might implement would truly satisfy everyone (see other threads on CEV). So there will still be a rush for the first-mover advantage. The future will belong to the team that gets funding a week before everyone else. These conditions increase the probability that the team that makes it will have made a mistake, a bug, cut some corners unintentionally, etc.