Merely higher proportion, and we're not worried about the criterion being reverse-engineered? Give a memory expert a large prime number to memorize and talk about outcomes where it's possible to factor a large composite number that has that prime as a factor. Happy outcomes will have that memory expert still be around in some form.
EDIT: No, I take that back because quantum. Some repaired version of the general idea might still work, though.
we're not worried about the criterion being reverse-engineered?
I'm trying to think about ways that might potentially prevent reverse engineering...
I've been returning to my "reduced impact AI" approach, and currently working on some idea.
What I need is some ideas on features that might distinguish between an excellent FAI outcome, and a disaster. The more abstract and general the ideas, the better. Anyone got some suggestions? Don't worry about quality at this point, originality is more prized!
I'm looking for something generic that is easy to measure. At a crude level, if the only options were "papercliper" vs FAI, then we could distinguish those worlds by counting steel content.
So basically some more or less objective measure that has a higher proportion of good outcomes than the baseline.