Will_Newsome comments on David Chalmers' "The Singularity: A Philosophical Analysis" - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (202)
Right, but as far as I can tell without having put lots of hours into trying to solve the problem of clippyAI, it's really damn hard to precisely specify a paperclip. (There are things that are easier to specify that this argument doesn't apply to and that are more plausibly dangerous, like hyperintelligent theorem provers...) Thus in trying to figure out what it's utility function actually is (like what humans are doing as they introspect more) it could discover that the only reason its goal is (something mysterious like) 'maximize paperclips' is because 'maximize paperclips' was how humans were (probabilistically inaccurately) expressing their preferences in some limited domain. This is related to the theme Eliezer quite elegantly goes on about in Creating Friendly AI and that he for some reason barely mentioned in CEV, which is that the AI should look at its own source code as evidence of what its creators were trying to get at, and update its imperfect source code accordingly. Admittedly, most uFAIs probably won't be that sophisticated, and so worrying about AI-related existential risks is still definitely a big deal. We just might want to be a little more cognizant of potential motivations for people who disagree with what has recently been dubbed SIAI's 'scary idea'.
Hm. I suppose that's possible, though it would require that the AI be given a utility function that's specifically meant to be amenable to that kind of revision.
Under the most straightforward (i.e. not CEV-style) utility function design, fuzziness in its definition of "paperclip" would just drive the paperclip-maximizer to choose the possible definition that yields the highest utility score.
To pick a different silly example, a dog-maximizer with a utility function based on the number of dogs in the universe would simply prefer to tile the solar system with tiny Chihuahas rather than Great Danes; the whole range of "dog" definitions fit the function, so it just chooses the one that is most convenient for maximum utility. It wouldn't try to resolve it by trying to decide which definition is more in line with the designer's ideals, unless "consider the designer's ideals" were designed into the system from the start.
Is designing "consider the designer's ideals" in an AI difficult?
Currently expected to be difficult, since we don't know of an easy way to do so. That it'll turn out to be easy (in the hindsight) is not totally out of the question.
Has anyone considered approaching this problem in the same way we might approach "read the user's handwriting"? That is, the task is not one we program the AI to accomplish - instead, we train the AI to accomplish it. And, most importantly, we train the AI to ask for further clarification in ambiguous cases.
Mirrors and Paintings (yes, you want to point your program at the world and have it figure out what you referred to), The Hidden Complexity of Wishes (if you need to answer AI's question or give it instructions, you're doing something wrong and it won't work).
I have to admit, as someone who has worked in software testing, I find it difficult to take the suggestion (non-destructive full-brain scan) in the first link very seriously. How, exactly, do I become convinced that the AI can come to know more about what I want by scanning me than I can know by introspection? How can I (or it) even do a comparison between the two without it asking me questions?
But then we get down to doing the comparison. The AI informs me that what I really want is to kill my father and sleep with my mother. I deny this. Do we take this as evidence that the AI really does know me better than I know myself, or as a symptom of a bug?
I would argue that if you don't need to answer the AI's questions or give it instructions, you're doing something wrong and it won't work. By definition. At least for the first ten thousand scans or so. And even then there will remain questions on which the AI and introspection would deliver different answers. Questions with hidden complexity. I just don't see how anyone would trust a CEV extrapolated from brain scans until we had decades of experience suggesting that scanning and modeling yields better results than introspection.
Agreed. And any useful AI will have to understand human language to do or learn much anything of value.
The detailed analysis of full brain scanning tech I've seen puts it far into the future, well beyond human-level AGI.
You have to make sure AI predictably gives a better answer even on questions where you disagree. And there will be questions which can't even be asked of a human.
Irrelevant. Assume you magically have a perfect working simulation of yourself.
Why would I want to do that? I.e. how would making that assumption lead me to take Eliezer's suggestion more seriously? My usual practice is to take things less seriously when magic is involved.
And how does this assumption interact with your other comment stating that I have to make sure the AI is somehow even better than myself if there is any difference between simulation and reality? Haven't you just asked me to assume that there are no differences?
Sorry, I simply don't understand your responses, which suggests to me that you did not understand my comment. Did you notice, in my preamble, that I mentioned software testing? Perhaps my point may be clearer to you if you keep this preamble in mind when formulating your responses.
Because that's a conceptually straightforward assumption that we can safely make in a philosophical argument.
The upload is not the AI (and Eliezer's post doesn't refer to uploads IIRC, but for the sake of the argument assume they are available as raw material). You make AI correct on strong theoretical grounds, and only test things to check that theoretical assumptions hold in ways where you expect it to be possible to check things, not in every situation.
What would I need to make of that?
Relevant - Can we just assume you magically have a friendly AI then?
If the plan for creating a friendly AI depends on a non-destructive full-brain scan already being available, the odds of achieving friendly AI before other forms of AI vanish to near zero.
One step at a time, my good sir! Reducing the philosophical and mathematical problem of Friendly AI to the technological problem of uploading would be an astonishing breakthrough quite by itself.
This is the straightforward approach.
Once you have an AGI that has the cognitive capability and learning capacity of a human infant brain, you teach it everything else in human language - right/wrong, ethics/morality, etc.
Programming languages are precise and well suited for creating the architecture itself, but human languages are naturally more effective for conveying human knowledge.
I tend to agree that we need a natural language interface to the AI. But it is far easier to create automatic proofs of program correctness when the really important stuff (like ethics) is presented in a formal language equipped with a deductive system.
There is something to be said for treating all the natural language input as if it were testimony from unreliable witnesses - suitable, perhaps, for locating hypotheses, but not really suitable as strong evidence for accepting the hypotheses.
I'm not sure how this applies - can you formally prove the correctness of a probabilistic belief network? Is that even a valid concept?
I can understand how you can prove a formal deterministic circuit or the algorithms underlying the belief network and learning systems, but the data values?
Agree. That is why I suggest that the really important stuff - meta-ethics, epistemology, etc., be represented in some other way than by 'neural' networks. Something formal and symbolic, rather than quasi-analog. All the stuff which we (and the AI) need to be absolutely certain doesn't change meaning when the AI "rewrites its own code"
By formal, I assume you mean math/code.
The really important stuff isn't a special category of knowledge. It is all connected - a tangled web of interconnected complex symbolic concepts for which human language is a natural representation.
What is the precise mathematical definition of ethics? If you really think of what it would entail to describe that precisely, you would need to describe humans, civilization, goals, brains, and a huge set of other concepts.
In essence you would need to describe an approximation of our world. You would need to describe a belief/neural/statistical inference network that represented that word internally as a complex association between other concepts that eventually grounds out into world sensory predictions.
So this problem - that human language concepts are far too complex and unwieldy for formal verification - is not a problem with human language itself that can be fixed by using other language choices. It reflects a problem with the inherit massive complexity of the world itself, complexity that human language and brain-like systems are evolved to handle.
To get to that point we have to start from the right meaning to begin with, and care about preserving it accurately, and Jacob doesn't agree those steps are important or particularly hard.
There are some promising lines of attack (grounded in decision theory) that might take only a few years of research. We'll see where they lead. Other open problems in FAI might start looking very solvable if we start making progress on this front.
Show me.
PM'd.
Yes. :)
Yes, but it still has to be explicitly programmed to do that! The question is how to get it to do so. AFAIK shaper-anchor semantics is still quite a ways from being fully specified, but it seems the bigger obstacle is that an AI writer is less likely than not to take the effort to program it that way in the first place.
This is surely the kind of thing that superintelligences will be good at. They will have access to every paperclip picture on the net, every paperclip specification too. They will surely have a much clearer idea about what a paperclip is than humans do. They will know what boxes are too.
I made a stab at it here, and it got some upvotes. So here's a repost:
Make a wire, 10 cm long and 1mm in diameter, composed of an alloy of 99.8% iron and 0.2% carbon. Start at one end and bend it such that the segments from 2-2.5cm, 2.75-3.25cm, 5.25-5.75cm form half-circles, with all the bends in the same direction and forming an inward spiral (the end with the first bend is outside the third bend).
(Please let me know if reposting violates LW ettiquette so I know not to do it again.)
I don't think it violates LW etiquette.
Here's a sort of fully general counterargument against proposals to naturalize human concepts in AI terms: if you can naturalize human concepts, you should be able to naturalize the human concept of a box. And if you can do that, we can build Oracle AI and save the world. It's very easy to describe what we mean by 'stay in the box', but it turns out that seed (self-modifying!) AIs just don't have a natural ontology for the descriptions.
This argument might be hella flawed; it seems kind of tenuous.
Aren't you simply assuming that the world is doomed here? It sure looks like it!
Since when is that assumption part of a valid argument?
That assumption isn't really a core part of the argument... the general "if specifying human concepts is easy, then come up with a plan for making a seed AI want to stay in a box" argument still stands, even if we don't actually want to keep arbitrary seed AIs in boxes.
For the record I am significantly less certain than most LW or SIAI singularitarians that seed AIs not explicitly coded with human values in mind will end up creating a horrible future, or at least a more horrible future than something like CEV. I do think it's worth a whole lot of continued investigation.