Edit: Some people have misunderstood my intentions here. I do not in any way expect this to be the NEXT GREAT IDEA. I just couldn't see anything wrong with this, which almost certainly meant there were gaps in my knowledge. I thought the fastest way to see where I went wrong would be to post my idea here and see what people say. I apologise for any confusion I caused. I'll try to be more clear next time.
(I really can't think of any major problems in this, so I'd be very grateful if you guys could tell me what I've done wrong).
So, a while back I was listening to a discussion about the difficulty of making an FAI. One of the ways that was suggested to circumvent this was to go down the route of programming an AGI to solve FAI. Someone else pointed out the problems with this. Amongst other things one would have no idea what the AI will do in pursuit of its primary goal. Furthermore, it would already be a monumental task to program an AI whose primary goal is to solve the FAI problem; doing this is still easier than solving FAI, I should think.
So, I started to think about this for a little while, and I thought 'how could you make this safer?' Well, first of, you don't want an AI who completely outclasses humanity in terms of intellect. If things went Wrong, you'd have little chance of stopping it. So, you want to limit the AI's intellect to genius level, so if something did go Wrong, then the AI would not be unstoppable. It may do quite a bit of damage, but a large group of intelligent people with a lot of resources on their hands could stop it.
Therefore, what must be done is that the AI cannot modify parts of its source code. You must try and stop an intelligence explosion from taking off. So, limited access to its source code, and a limit on how much computing power it can have on hand. This is problematic though, because the AI would not be able to solve FAI very quickly. After all, we have a few genius level people trying to solve FAI, and they're struggling with it, so why should a genius level computer do any better. Well, an AI would have fewer biases, and could accumulate much more expertise relevant to the task at hand. It would be about as capable as solving FAI as the most capable human could possibly be; perhaps even more so. Essentially, you'd get someone like Turing, Von Neumann, Newton and others all rolled into one working on FAI.
But, there's still another problem. The AI, if left for 20 years working on FAI for 20 years let's say, would have accumulated enough skills that it would be able to cause major problems if something went wrong. Sure, it would be as intelligent as Newton, but it would be far more skilled. Humanity fighting against it would be like sending a young Miyamoto Musashi against his future self at his zenith i.e. completely one sided.
What must be done then, is the AI must have a time limit of a few years (or less) and after that time is past, it is put to sleep. We look at what it accomplished, see what worked and what didn't, and boot up a fresh version of the AI with any required modifications, and tell it what the old AI did. Repeat the process for a few years, and we should end up with FAI solved.
After that, we just make an FAI, and wake up the originals, since there's no point in killing them off at this point.
But there are still some problems. One, time. Why try this when we could solve FAI ourselves? Well, I would only try and implement something like this if it is clear that AGI will be solved before FAI is. A backup plan if you will. Second, what If FAI is just too much for people at our current level? Sure, we have guys who are one in ten thousand and better working on this, but what if we need someone who's one in a hundred billion? Someone who represents the peak of human ability? We shouldn't just wait around for them, since some idiot would probably just make an AGI thinking it would love us all anyway.
So, what do you guys think? As a plan, is this reasonable? Or have I just overlooked something completely obvious? I'm not saying that this would by easy in anyway, but it would be easier than solving FAI.
Sure, in the sense that an alien UFAI could still arrive the next day and wipe us out, or a large asteroid, or any other low probability catastrophe. Or the FAI could just honestly fail at its goal, and produce an UFAI by accident.
There is always scope for things going wrong. However, encoding 'solve FAI' turns out to be essentially the same problem as encoding 'FAI', because 'FAI' isn't a fixed thing, its a complex dynamic. More specifically FAI is an AI that creates improved successor versions of itself, thus it has 'solve FAI' as part of its description already.
Yes - with near certainty the road to complex AI involves iterative evolutionary development like any other engineering field. MIRI seems to want to solve the whole safety issue in pure theory first. Meanwhile the field of machine learning is advancing rather quickly to AGI, and in that field progress is driven more by experimental research than pure theory - as there is only so much one can do with math on paper.
The risk stems from a few considerations: once we have AGI then superintelligence could follow very shortly thereafter, and thus the first AGI to scale to superintelligence could potentially takeover the world and prevent any further experimentation with other designs.
Your particular proposal involves constraints on the intelligence of the AGI - a class of techniques discussed in detail in Bostrom's Superintelligence. The danger there is that any such constraints increase the liklihood that some other less safe competitor will then reach superintelligence first. It would be better to have a design that is intrinsically benevolent/safe and doesnt need such constraints - if such a thing is possible. The tradeoffs are rather complex.
Alright, what I got from your post is that if you know the definition of an FAI and can instruct a computer to design one, you've basically already made one. That is, having the precise definition of the thing massively reduces the difficulty of creating it i.e. when people ask 'do we have free will?' defining free will greatly reduces the complexity of the problem. Is that correct?