Tyrrell_McAllister comments on Cryptographic Boxes for Unfriendly AI - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (155)
The probability I assign to achieving a capability state where it is (1) possible to prove a mind Friendly even if it has been constructed by a hostile superintelligence, (2) possible to build a hostile superintelligence, and (3) not possible to build a Friendly AI directly, is very low.
In particular the sort of proof techniques I currently have in mind - what they prove and what it means - for ensuring Friendliness through a billion self-modifications of something that started out relatively stupid and built by relatively trusted humans, would not work for verifying Friendliness of a finished AI that was handed you by a hostile superintelligence, and it seems to me that the required proof techniques for that would have to be considerably stronger.
To paraphrase Mayor Daley, the proof techniques are there to preserve the Friendly intent of the programmers through the process of constructing the AI and through the AI's self-modification, not to create Friendliness. People hear the word "prove" and assume that this is because you don't trust the programmers, or because you have a psychological need for unobtainable absolute certainty. No, it's because if you don't prove certain things (and have the AI prove certain things before each self-modification) then you can't build a Friendly AI no matter how good your intentions are. The good intentions of the programmers are still necessary, and assumed, beyond the parts that are proved; and doing the proof work doesn't make the whole process absolutely certain, but if you don't strengthen certain parts of the process using logical proof then you are guaranteed to fail. (This failure is knowable to a competent AGI scientist - not with absolute certainty, but with high probability - and therefore it is something of which a number of would-be dabblers in AGI maintain a careful ignorance regardless of how you try to explain it to them, because the techniques that make them enthusiastic don't support that sort of proof. "It is hard to explain something to someone whose job depends on not understanding it.")
Could you elaborate on this? Your mere assertion is enough to make me much less confident than I was when I posted this comment. But I would be interested in a more object-level argument. (The fact that your own approach to building an FAI wouldn't pass through such a stage doesn't seem like enough to drive the probability "very low".)
The FAI theory required to build a proof for 1 would have to be very versatile, you have to understand friendliness very well to do it. 2 requires some understanding of the nature of intelligence. (especially if we know with well enough to put it in a box that we're building a superintelligence) If you understand friendliness that well, and intellignece that well, then friendly intelligence should be easy.
We never built space suits for horses, because by the time we figured out how to get to the moon, we also figured out electric rovers.
Eliezer has spent years making the case that FAI is far, far, far more specific than AI. A theory of intelligence adequate to building an AI could still be very far from a theory of Friendliness adequate to building an FAI, couldn't it?
So, suppose that we know how to build an AI, but we're smart enough not to build one until we have a theory of Friendliness. You seem to be saying that, at this point, we should consider the problem of constructing a certifier of Friendliness to be essentially no easier than constructing FAI source code. Why? What is the argument for thinking that FAI is very likely to be one of those problems were certifying a solution is no easier than solving from scratch?
This doesn't seem analogous at all. It's hard to imagine how we could have developed the technology to get to the moon without having built electric land vehicles along the way. I hope that I'm not indulging in too much hindsight bias when I say that, conditioned on our getting to the moon, our getting to electric-rover technology first was very highly probable. No one had to take special care to make sure that the technologies were developed in that order.
But, if I understand Eliezer's position correctly, we could easily solve the problem of AGI while still being very far from a theory of Friendliness. That is the scenario that he has dedicated his life to avoiding, isn't it?