Have the FAI create James+, who is smarter than me but shares my values. In a simulation in which I spend a long time living with James+ I agree that he is an improved me. Let James++ be similarly defined with respect to James+. Continue this process until the FAI isn't capable of making improvements. Next, let this ultimate-James spend a lot of time in the world created by the FAI and have him evaluate it compared with possible alternatives. Finally, do the same with everyone else alive at the creation of the FAI and if they mostly think that the world created by the FAI is about the best it could do given whatever constraints it faces, then the FAI has achieved an excellent outcome.
I've been returning to my "reduced impact AI" approach, and currently working on some idea.
What I need is some ideas on features that might distinguish between an excellent FAI outcome, and a disaster. The more abstract and general the ideas, the better. Anyone got some suggestions? Don't worry about quality at this point, originality is more prized!
I'm looking for something generic that is easy to measure. At a crude level, if the only options were "papercliper" vs FAI, then we could distinguish those worlds by counting steel content.
So basically some more or less objective measure that has a higher proportion of good outcomes than the baseline.