we upload the philosopher separately and use them as a consultant.
And then we have to ensure the AI follows the consultant (probably doable) and define what querying process is acceptable (very hard).
But your solution (which is close to Paul Christiano's) works whatever the AI is, we just need to be able to upload a human. My point was that we could conceivably create an AI without understanding any of the hard problems, still stands. If you want I can refine it: allow partial uploads: we can upload brains, but they don't function as stable humans, as we haven't mapped all the fine details we need to. However, we can use these imperfect uploads, plus a bit of evolution, to produce AIs. And here we have no understanding of how to control its motivations at all.
Subscribe to RSS Feed
= f037147d6e6c911a85753b9abdedda8d)
Why are you confident that an AI that we do develop will not have these traits? You agree the mindspace is large, you agree we can develop some cognitive abilities without understanding them. If you add that most AI programmers don't take AI risk seriously and will only be testing their AI's in controlled environments, that the AI will be likely developed for a military or commercial purpose, I don't see why you'd have high confidence that they will converge on a safe design?
(EDIT: See below.) I'm afraid that I am now confused. I'm not clear on what you mean by "these traits", so I don't know what you think I am being confident about. You seem to think I'm arguing that AIs will converge on a safe design and I don't remember saying anything remotely resembling that.
EDIT: I think I figured it out on the second or third attempt. I'm not 100% committed to the proposition that if we make an AI and know how we did so that we can definitely make sure it's fun and friendly, as opposed to fundamentally uncontrollable and unknowable. However it seems virtually certain to me that we will figure out a significant amount about designing AIs to do what we want in the process of developing them. People who subscribe to various "FOOM" theories about AI coming out of nowhere will probably disagree with this as is their right, but I don't find any of those theories plausible.
I also I hope I didn't give the impression that I thought it was meaningfully possible to create a God-like AI without understanding how to make AI. It's conceivable in that such a creation story is not a logical contradiction like a square circle or a colourless green dream sleeping furiously, but that is all. I think it is actually staggeringly unlikely that we will make an AI without either knowing how to make an AI, or knowing how to upload people who can then make an AI and tell use how they did it.