Peterdjones comments on The genie knows, but doesn't care - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (515)
Richard: I'll stick with your original example. In your hypothetical, I gather, programmers build a seed AI (a not-yet-superintelligent AGI that will recursively self-modify to become superintelligent after many stages) that includes, among other things, a large block of code I'll call X.
The programmers think of this block of code as an algorithm that will make the seed AI and its descendents maximize human pleasure. But they don't actually know for sure that X will maximize human pleasure — as you note, 'human pleasure' is an unbelievably complex concept, so no human could be expected to actually code it into a machine without making any mistakes. And writing 'this algorithm is supposed to maximize human pleasure' into the source code as a comment is not going to change that. (See the first few paragraphs of Truly Part of You.)
Now, why exactly should we expect the superintelligence that grows out of the seed to value what we really mean by 'pleasure', when all we programmed it to do was X, our probably-failed attempt at summarizing our values? We didn't program it to rewrite its source code to better approximate our True Intentions, or the True Meaning of our in-code comments. And if we did attempt to code it to make either of those self-modifications, that would just produce a new hugely complex block Y which might fail in its own host of ways, given the enormous complexity of what we really mean by 'True Intentions' and 'True Meaning'. So where exactly is the easy, low-hanging fruit that should make us less worried a superintelligence will (because of mistakes we made in its utility function, not mistakes in its factual understanding of the world) hook us up to dopamine drips? All of this seems crucial to your original point in 'The Fallacy of Dumb Superintelligence':
It seems to me that you've already gone astray in the second paragraph. On any charitable reading (see the New Yorker article), it should be clear that what's being discussed is the gap between the programmer's intended code and the actual code (and therefore actual behaviors) of the AGI. The gap isn't between the AGI's intended behavior and the set of things it's smart enough to figure out how to do. (Nowhere does the article discuss how hard it is for AIs to do things they desire to. Over and over again is the difficulty of programming AIs to do what we want them to discussed — e.g., Asimov's Three Laws.)
So all the points I make above seem very relevant to your 'Fallacy of Dumb Superintelligence', as originally presented. If you were mixing those two gaps up, though, that might help explain why you spent so much time accusing SIAI/MIRI of making this mistake, even though it's the former gap and not the latter that SIAI/MIRI advocates appeal to.
Maybe it would help if you provided examples of someone actually committing this fallacy, and explained why you think those are examples of the error you mentioned and not of the reasonable fact/value gap I've sketched out here?
Maybe we didn't do it ithat way. Maybe we did it Loosemore's way, where you code in the high-level sentence, and let the AI figure it out. Maybe that would avoid the problem. Maybe Loosemore has solved FAi much more straightforwardly than EY.
Maybe we told it to. Maybe we gave it the low-level expansion of "happy" that we or our seed AI came up with together with an instruction that it is meant to capture the meaning of the high-level statement, and that the HL statement is the Prime Directive, and that if the AI judges that the expansion is wrong, then it should reject the expansion.
Maybe the AI will value getting things right because it is rational.
http://lesswrong.com/lw/rf/ghosts_in_the_machine/
If the AI is too dumb to understand 'make us happy', then why should we expect it to be smart enough to understand 'figure out how to correctly understand "make us happy", and then follow that instruction'? We have to actually code 'correctly understand' into the AI. Otherwise, even when it does have the right understanding, that understanding won't be linked to its utility function.
http://lesswrong.com/lw/igf/the_genie_knows_but_doesnt_care/
So it's impossible to directly or indirectly code in the compex thing called semantics, but possible to directly or indirectly code in the compex thing called morality? What? What is your point? You keep talking as if I am suggesting there is someting that can be had for free, without coding. I never even remotely said that.
I know. A Loosemore architecture AI has to treat its directives as directives. I never disputed that. But coding "follow these plain English instructions" isn't obviously harder or more fragile than coding "follow <<long expansion of human preferences>>". And it isn't trivial, and I didn't say it was.
Read the first section of the article you're commenting on. Semantics may turn out to be a harder problem than morality, because the problem of morality may turn out to be a subset of the problem of semantics. Coding a machine to know what the word 'Friendliness' means (and to care about 'Friendliness') is just a more indirect way of coding it to be Friendly, and it's not clear why that added indirection should make an already risky or dangerous project easy or safe. What does indirect indirect normativity get us that indirect normativity doesn't?
Robb, at the point where Peterdjones suddenly shows up, I'm willing to say - with some reluctance - that your endless willingness to explain is being treated as a delicious free meal by trolls. Can you direct them to your blog rather than responding to them here? And we'll try to get you some more prestigious non-troll figure to argue with - maybe Gary Drescher would be interested, he has the obvious credentials in cognitive reductionism but is (I think incorrectly) trying to derive morality from timeless decision theory.
Sure. I'm willing to respond to novel points, but at the stage where half of my responses just consist of links to the very article they're commenting on or an already-referenced Sequence post, I agree the added noise is ceasing to be productive. Fortunately, most of this seems to already have been exorcised into my blog. :)
Agree with Eliezer. Your explanatory skill and patience are mostly wasted on the people you've been arguing with so far, though it may have been good practice for you. I would, however, love to see you try to talk Drescher out of trying to pull moral realism out of TDT/UDT, or try to talk Dalyrmple out of his "I'm not partisan enough to prioritize human values over the Darwinian imperative" position, or help Preston Greene persuade mainstream philosophers of "the reliabilist metatheory of rationality" (aka rationality as systematized winning).
Semantcs isn't optional. Nothing could qualify as an AGI,let alone a super one, unless it could hack natural language. So Loosemore architectures don't make anything harder, since semantics has to be solved anyway.
It's a problem of sequence. The superintelligence will be able to solve Semantics-in-General, but at that point if it isn't already safe it will be rather late to start working on safety. Tasking the programmers to work on Semantics-in-General makes things harder if it's a more complex or roundabout way of trying to address Indirect Normativity; most of the work on understanding what English-language sentences mean can be relegated to the SI, provided we've already made it safe to make an SI at all.
Then solve semantics in a seed.
PeterDJones, if you wish to converse further with RobbBB, I ask that you do so on RobbBB's blog rather than here.