Assume there is no strong first-mover advantage (intelligence explosion), and even no strong advantage of AGIs over humanity. Even in this case, a FAI allows to stop the value drift if it's adequately competitive with whatever other agents it coexists with (including humanity, which is going to change its values over time, not being a cleanly designed agent with a fixed goal definition). If FAI survives, that guarantees that some nontrivial portion of world's resources will ultimately go to production of human value, as opposed to other things produced by drifted-away humanity (for example, Hanson's efficiency-obsessed ems) and random AGIs.
(I expect there is a strong first-mover advantage, but this argument doesn't depend on that assumption.)
I also expect a big first-mover advantage. Assuming that, you aren't answering the question of the post. Which is: if someone invents FAI theory but not AGI theory, how can they best make or convince the eventual first-mover on AGI to use that FAI theory? (Suppose the inclusion of the FAI theory has some negative side effects for the AGI builder, like longer development time or requiring more processing power because the FAI theory presupposes a certain architecture.)
All the posts on FAI theory as of late have given me cause to think. There's something in the conversations about it that has always bugged me, but it is something that I haven't found the words for before now.
It is something like this:
Say that you manage to construct an algorithm for FAI...
Say that you can show that it isn't going to be a dangerous mistake...
And say you do all of this, and popularize it, before AGI is created (or at least, before an AGI goes *FOOM*)...
...
How in the name of Sagan are you actually going to ENFORCE the idea that all AGIs are FAIs?
I mean, if it required some rare material (like nuclear weapons) or large laboratories (like biological wmds) or some other resource that you could at least make artificially scarce, you could set up a body that ensures that any AGI created is an FAI.
But if all it is, is the right algorithms, the right code, and enough computing power... even if you design a theory for FAI, how would you keep someone from making UFAI anyway? Between people experimenting with the principles (once known), making mistakes, and the prospect of actively malicious *humans*... it just seems like unless you somehow come up with an internal mechanism that makes FAI better and stronger than any UFAI could be, and the solution turns out to be such that any idiot could see that it was a better solution... that UFAI is going to exist at some point no matter what.
At that point, it seems like the question becomes not "How do we make FAI?" (although that might be a secondary question) but rather "How do we prevent the creation of, eliminate, or reduce potential damage from UFAI?" Now, it seems like FAI might be one thing that you do toward that goal, but if UFAI is a highly likely consequence of AGI even *with* an FAI theory, shouldn't the focus be on how to contain a UFAI event?