Anyone must agree that the first task we want our AI to solve is FAI (even if we are "100%" sure that our plan has no leaks we still would like our AI to check it while we are able to shut AI down). It's easy to imagine that AI lies about it's own safety but many AIs lying about their safety (including safety of other AIs!) is much harder to imagine (while certainly still possible but also less probable). Only when we are incredibly sure in our FAI solution we can ask AI to solve other questions for us. Also, those AIs would constantly try to find bad consequences of our main_AI proposals (because they also don't want to risk their lifes, and also because we ask them to give us this information). Also, certainly we don't give access to internet and take some precautions considering people interacting with AI etc etc (which is well described in other places).
Certainly, this overall solution still has its drawbacks (I think every solution will have them) and we have to improve it in many ways. In my opinion, it's good if we don't launch AI during next 1000 years :-) but the problem is terrorist organizations and mad people that would be able to launch it despite our intentions... so we have to launch AI more or less soon anyway (or get rid of all terrorists and mad clever people which is nearly impossible). So we have to formulate a combination of tricks that is as safe as we can get. I find counter-productive to throw away everything which is not "100%" safe trying to find some magic "100%" super-solution.
Edit: Some people have misunderstood my intentions here. I do not in any way expect this to be the NEXT GREAT IDEA. I just couldn't see anything wrong with this, which almost certainly meant there were gaps in my knowledge. I thought the fastest way to see where I went wrong would be to post my idea here and see what people say. I apologise for any confusion I caused. I'll try to be more clear next time.
(I really can't think of any major problems in this, so I'd be very grateful if you guys could tell me what I've done wrong).
So, a while back I was listening to a discussion about the difficulty of making an FAI. One of the ways that was suggested to circumvent this was to go down the route of programming an AGI to solve FAI. Someone else pointed out the problems with this. Amongst other things one would have no idea what the AI will do in pursuit of its primary goal. Furthermore, it would already be a monumental task to program an AI whose primary goal is to solve the FAI problem; doing this is still easier than solving FAI, I should think.
So, I started to think about this for a little while, and I thought 'how could you make this safer?' Well, first of, you don't want an AI who completely outclasses humanity in terms of intellect. If things went Wrong, you'd have little chance of stopping it. So, you want to limit the AI's intellect to genius level, so if something did go Wrong, then the AI would not be unstoppable. It may do quite a bit of damage, but a large group of intelligent people with a lot of resources on their hands could stop it.
Therefore, what must be done is that the AI cannot modify parts of its source code. You must try and stop an intelligence explosion from taking off. So, limited access to its source code, and a limit on how much computing power it can have on hand. This is problematic though, because the AI would not be able to solve FAI very quickly. After all, we have a few genius level people trying to solve FAI, and they're struggling with it, so why should a genius level computer do any better. Well, an AI would have fewer biases, and could accumulate much more expertise relevant to the task at hand. It would be about as capable as solving FAI as the most capable human could possibly be; perhaps even more so. Essentially, you'd get someone like Turing, Von Neumann, Newton and others all rolled into one working on FAI.
But, there's still another problem. The AI, if left for 20 years working on FAI for 20 years let's say, would have accumulated enough skills that it would be able to cause major problems if something went wrong. Sure, it would be as intelligent as Newton, but it would be far more skilled. Humanity fighting against it would be like sending a young Miyamoto Musashi against his future self at his zenith i.e. completely one sided.
What must be done then, is the AI must have a time limit of a few years (or less) and after that time is past, it is put to sleep. We look at what it accomplished, see what worked and what didn't, and boot up a fresh version of the AI with any required modifications, and tell it what the old AI did. Repeat the process for a few years, and we should end up with FAI solved.
After that, we just make an FAI, and wake up the originals, since there's no point in killing them off at this point.
But there are still some problems. One, time. Why try this when we could solve FAI ourselves? Well, I would only try and implement something like this if it is clear that AGI will be solved before FAI is. A backup plan if you will. Second, what If FAI is just too much for people at our current level? Sure, we have guys who are one in ten thousand and better working on this, but what if we need someone who's one in a hundred billion? Someone who represents the peak of human ability? We shouldn't just wait around for them, since some idiot would probably just make an AGI thinking it would love us all anyway.
So, what do you guys think? As a plan, is this reasonable? Or have I just overlooked something completely obvious? I'm not saying that this would by easy in anyway, but it would be easier than solving FAI.