I presume the answer you're looking for isn't "fun theory", but I can't tell from OP whether you're looking for distinguishers from our perspective or from an AI's perspective.
I'm looking for something generic that is easy to measure. At a crude level, if the only option were "papercliper" vs FAI, then we could distinguish those worlds by counting steel content.
So basically some more or less objective measure that has a higher proportion of good outcomes than the baseline.
I've been returning to my "reduced impact AI" approach, and currently working on some idea.
What I need is some ideas on features that might distinguish between an excellent FAI outcome, and a disaster. The more abstract and general the ideas, the better. Anyone got some suggestions? Don't worry about quality at this point, originality is more prized!
I'm looking for something generic that is easy to measure. At a crude level, if the only options were "papercliper" vs FAI, then we could distinguish those worlds by counting steel content.
So basically some more or less objective measure that has a higher proportion of good outcomes than the baseline.