jessicat comments on FAI Research Constraints and AGI Side Effects - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (58)
This model seems quite a bit different from mine, which is that FAI research is about reducing FAI to an AGI problem, and solving AGI takes more work than doing this reduction.
More concretely, consider a proposal such as Paul's reflective automated philosophy method, which might be able to be implemented using epsiodic reinforcement learning. This proposal has problems, and it's not clear that it works -- but if it did, then it would have reduced FAI to a reinforcement learning problem. Presumably, any implementations of this proposal would benefit from any reinforcement learning advances in the AGI field.
Of course, even if we a proposal like this works, it might require better or different AGI capabilities from UFAI projects. I expect this to be true for black-box FAI solutions such as Paul's. This presents additional strategic difficulties. However, I think the post fails to accurately model these difficulties. The right answer here is to get AGI researchers to develop (and not publish anything about) enough AGI capabilities for FAI without running a UFAI in the meantime, even though the capabilities to run it exist.
Assuming that this reflective automated philosophy system doesn't work, it could still be the case that there is a different reduction from FAI to AGI that can be created through armchair technical philosophy. This is often what MIRI's "unbounded solutions" research is about: finding ways you could solve FAI if you had a hypercomputer. Once you find a solution like this, it might be possible to define it in terms of AGI capabilities instead of hypercomputation, and at that point FAI would be reduced to an AGI problem. We haven't put enough work into this problem to know that a reduction couldn't be created in, say, 20 years by 20 highly competent mathematician-philosophers.
In the most pessimistic case (which I don't think is too likely), the task of reducing FAI to an AGI problem is significantly harder than creating AGI. In this case, the model in the post seems to be mostly accurate, except that it neglects the fact that serial advances might be important (so we get diminishing marginal progress towards FAI or AGI per additional researcher in a given year).
[Edited: replaced Gremaining with Fremaining, which is what I originally meant]
Thanks for the comment jessicat! I haven't read those posts yet, will do more research on reducing FAI to an AGI problem.
A few responses & clarifications:
Our framework assumes the FAI research would happen before AGI creation. If we can research how to reduce FAI to an AGI problem in a way that would reliably make a future AGI friendly, then that amount of research would be our variable Fremaining. If that is quite easy to do, then that's fantastic; an AI venture would have an easy time, and the leakage ratio would be low enough to not have to worry about. Additional required capabilities that we'll find out we need would be added to Fremaining.
"I think the post fails to accurately model these difficulties." -> This post doesn't attempt to model the individual challenges to understand how large Fremaining actually is. That's probably a more important question than what we addressed, but one for a different model.
"The right answer here is to get AGI researchers to develop (and not publish anything about) enough AGI capabilities for FAI without running a UFAI in the meantime, even though the capabilities to run it exist." -> This paper definitely advocates for AGI researchers to develop FAI research while not publishing much AGI research. I agree that some internal AGI research will probably be necessary, but hope that it won't be a whole lot. If the tools to create an AGI were figured out, even if they were kept secret by an FAI research group, I would be very scared. Those would be the most important and dangerous secrets of all time, and I doubt they could be kept secret for very long (20 years max?)
"In this case, the model in the post seems to be mostly accurate, except that it neglects the fact that serial advances might be important (so we get diminishing marginal progress towards FAI or AGI per additional researcher in a given year)."
-> This paper purposefully didn't model research effort, but rather, abstract units of research significance. "the numbers of rg and rf don't perfectly correlate with the difficulty to reach them. It may be that we have diminishing marginal returns with our current levels of rg, so similar levels of rf will be easier to reach."
A model that would also take into account the effort required would require a few more assumptions and additional complexity. I prefer to start simple and work from there, so we at least know what people do agree on before adding additional complexity.
Thanks for the detailed response! I do think the framework can still work with my assumptions. The way I would model it would be something like:
Also, it seems I had misinterpreted the part about rg and rf, sorry about that!
Good point.
I guess the most controversial, and hopefully false, assumption of this paper is #3: 'If Gremaining is reached before Fremaining, a UFAI will be created. If after, an FAI will be created.'
This basically is the AI Foom scenario, where the moment an AGI is created, it will either kill us or all or bring about utopia (or both).
If this is not the case, and we have a long time to work with the AGI as it develops to make sure it is friendly, then this model isn't very useful.
If we do assume these assumptions, I would also expect that we will reach Gremaining before Fremaining, or at least that a private organization will end up doing so. However, I am also very skeptical in the power of secrets. I think I find us reaching Fremaining first more likely than a private institution reaching Gremaining first, but hiding it until it later reaches Fremaining, though both may be very slim. If the US military or a similar group with a huge technological and secretive advantage were doing this, there could be more of a chance. This definitely seems like a game of optimizing small probabilities.
Either way, I think we definitely would agree here that the organization developing these secrets can strategically choose projects that deliver the high amounts of FAI research relative to the amount AGI research they will have to keep secretive. Begin with the easy, non-secretive wins and work from there.
We may need the specific technology to create a paperclip maximizer before we make an FAI, but if we plan correctly, we hopefully will be really close to reaching an FAI by that point.
The question is not "if". The questions are "how quickly" and "to what height". An AI capable of self-improving to world-destroying levels within moments is plainly unrealistic. An AI capable of self-improving to dangerous levels (viz: levels where it can start making humans do the dangerous work for it) in the weeks, months, or even years it would take a team of human operators to cross-examine the formally unspecified motivation engines for Friendliness is dangerously realistic.