player_03 comments on The genie knows, but doesn't care - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (515)
This entire debate is supposed to about my argument, as presented in the original article I published on the IEET.org website ("The Fallacy of Dumb Superintelligence").
But in that case, what should I do when Rob insists on talking about something that I did not say in that article?
My strategy was to explain his mistake, but not engage in a debate about his red herring. Sensible people of all stripes would consider that a mature response.
But over and over again Rob avoided the actual argument and insisted on talking about his red herring.
And then FINALLY I realized that I could write down my original claim in such a way that it is IMPOSSIBLE for Rob to misinterpret it.
(That was easy, in retrospect: all I had to do was remove the language that he was using as the jumping-off point for his red herring).
That final, succinct statement of my argument is sitting there at the end of his blog ..... so far ignored by you, and by him. Perhaps he will be able to respond, I don't know, but you say you have read it, so you have had a chance to actually understand why it is that he has been talking about something of no relevance to my original argument.
But you, in your wisdom, chose to (a) completely ignore that statement of my argument, and (b) give me a patronizing rebuke for not being able to understand Rob's red herring argument.
I didn't mean to ignore your argument; I just didn't get around to it. As I said, there were a lot of things I wanted to respond to. (In fact, this post was going to be longer, but I decided to focus on your primary argument.)
Your story:
My version:
Your story:
My version:
In the rest of the scenario you described, I agree that the AI's behavior is pretty incoherent, if its goal is X. But if it's really aiming for Z, then its behavior is perfectly, terrifyingly coherent.
And your "obvious" fail-safe isn't going to help. The AI is smarter than us. If it wants Z, and a fail-safe prevents it from getting Z, it will find a way around that fail-safe.
I know, your premise is that X really is the AI's true goal. But that's my sticking point.
Making it actually have the goal X, before it starts self-modifying, is far from easy. You can't just skip over that step and assume it as your premise.
What you say makes sense .... except that you and I are both bound by the terms of a scenario that someone else has set here.
So, the terms (as I say, this is not my doing!) of reference are that an AI might sincerely believe that it is pursuing its original goal of making humans happy (whatever that means .... the ambiguity is in the original), but in the course of sincerely and genuinely pursuing that goal, it might get into a state where it believes that the best way to achieve the goal is to do something that we humans would consider to be NOT achieving the goal.
What you did was consider some other possibilities, such as those in which the AI is actually not being sincere. Nothing wrong with considering those, but that would be a story for another day.
Oh, and one other thing that arises from your above remark: remember that what you have called the "fail-safe" is not actually a fail-safe, it is an integral part of the original goal code (X). So there is no question of this being a situation where "... it wants Z, and a fail-safe prevents it from getting Z, [so] it will find a way around that fail-safe." In fact, the check is just part of X, so it WANTS to check as much as wants anything else involved in the goal.
I am not sure that self-modification is part of the original terms of reference here, either. When Muehlhauser (for example) went on a radio show and explained to the audience that a superintelligence might be programmed to make humans happy, but then SINCERELY think it was making us happy when it put us on a Dopamine Drip, I think he was clearly not talking about a free-wheeling AI that can modify its goal code. Surely, if he wanted to imply that, the whole scenario goes out the window. The AI could have any motivation whatsoever.
Hope that clarifies rather than obscures.
Ok, if you want to pass the buck, I won't stop you. But this other person's scenario still has a faulty premise. I'll take it up with them if you like; just point out where they state that the goal code starts out working correctly.
To summarize my complaint, it's not very useful to discuss an AI with a "sincere" goal of X, because the difficulty comes from giving the AI that goal in the first place.
As I see it, your (adopted) scenario is far less likely than other scenario(s), so in a sense that one is the "story for another day." Specifically, a day when we've solved the "sincere goal" issue.