Warrigal comments on Evaluating the feasibility of SI's plan - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (186)
Depending on how you interpret this argument, either I think it's wrong, or I'm proposing that an AI not be made "sufficiently powerful". I think it's analogous to this argument:
There are two possibilities here:
I think I agree that making the AI non-self-modifiable would be pointless if it has complete control over its I/O facilities. But I think an AI should not have complete control over its I/O facilities. If a researcher types in "estimate the probability of Riemann's hypothesis" (but in some computer language, of course), that should query the AI's belief system directly, rather than informing the AI of the question and allowing it to choose whatever answer it wishes. If this is the case, then it will be impossible for the AI to "lie" about its beliefs, except by somehow sabotaging parts of its belief system.
Existing web pages can already convince their human users to run new OS processes supplied by the web page.
Beware of universal statements: it only takes a single counterexample to disprove them. A typical human has a very poor understanding of what computers are and how they work. Most people could probably be easily fooled by a simulated browser. They are already easily fooled by analogous but much less sophisticated things (e.g. phishing scams).
SI researchers are not typical humans. We can train them to tell the difference between the AI's output and trusted programs' output. If need be, we can train them to just not even look at the AI's output at all.
What's the point of writing a program if you never look at its output?
I'm starting to get frustrated, because the things I'm trying to explain seem really simple to me, and yet apparently I'm failing to explain them.
When I say "the AI's output", I do not mean "the AI program's output". The AI program could have many different types of output, some of which are controlled by the AI, and some of which are not. By "the AI's output", I mean those outputs which are controlled by the AI. So the answer to your question is mu: the researchers would look at the program's output.
My above comment contains an example of what I would consider to be "AI program output" but not "AI output":
This is not "AI output", because the AI cannot control it (except by actually changing its own beliefs), but it is "AI program output", because the program that outputs the answer is the same program as the one that performs all the cognition.
I can imagine a clear dichotomy between "the AI" and "the AI program", but I don't know if I've done an adequate job of explaining what this dichotomy is. If I haven't, let me know, and I'll try to explain it.
Can you elaborate on what you mean by "control" here? I am not sure we mean the same thing by it because:
If the AI can control its memory (for example, if it can arbitrarily delete things from its memory) then it can control its beliefs.
Yeah, I guess I'm imagining the AI as being very much restricted in what it can do to itself. Arbitrarily deleting stuff from its memory probably wouldn't be possible.