Based on a certain SSC post, a matrix-like scenario where a non-friendly AI decides to spare a few cubic meters of computronium to faithfully simulate humanity, rather than to simply get rid of it. For sentimental reasons and because human-boxing is ironic.
It has near maximal computational capacity, but that capacity isn't being "used" for anything in particular that is easy to determine.
This is actually a very powerful criteria, in terms of number of false positive and negatives. Sadly, the false positives it DOES have still far outweigh the genuine positives, and includes all the WORST outcomes (aka, virtual hells) as well.
I presume the answer you're looking for isn't "fun theory", but I can't tell from OP whether you're looking for distinguishers from our perspective or from an AI's perspective.
I'm looking for something generic that is easy to measure. At a crude level, if the only option were "papercliper" vs FAI, then we could distinguish those worlds by counting steel content.
So basically some more or less objective measure that has a higher proportion of good outcomes than the baseline.
Merely higher proportion, and we're not worried about the criterion being reverse-engineered? Give a memory expert a large prime number to memorize and talk about outcomes where it's possible to factor a large composite number that has that prime as a factor. Happy outcomes will have that memory expert still be around in some form.
EDIT: No, I take that back because quantum. Some repaired version of the general idea might still work, though.
we're not worried about the criterion being reverse-engineered?
I'm trying to think about ways that might potentially prevent reverse engineering...
Smiles, laughter, hugging, the humming or whistling of melodies in a major key, skipping, high-fiving and/or brofisting, loud utterance of "Huzzah" or "Best thing EVER!!!", airborne nanoparticles of cake, streamers, balloons, accordion music? On the assumption that the AI was not explicitly asked to produce these things, of course.
If you're planning to simulate an AI in a universe in a box and examine whether the produced universe is good via some process that doesn't allow the AI to talk to you, the AI is just gonna figure out it's being simulated and pretend to be an FAI (Note that an AI that pretends to be an FAI maximizes not for friendliness, but for apparent friendliness, so this is no pathway to FAI) so you'll let it loose on the real world.
To the first approximation, having a box that contains an AI anywhere in it output even a few bits of info tends to choose those bits that maximize the AI's IRL utility.
(If you have math-proofs that no paperclipper can figure out that it's in your box, it's just gonna maximize a mix of its apparent friendliness score and the number of paperclips (whether it is let loose in the box or in the real world), which doesn't cost it much compared to either maximizing paperclips or apparent friendliness because of the tails thing)
Have the FAI create James+, who is smarter than me but shares my values. In a simulation in which I spend a long time living with James+ I agree that he is an improved me. Let James++ be similarly defined with respect to James+. Continue this process until the FAI isn't capable of making improvements. Next, let this ultimate-James spend a lot of time in the world created by the FAI and have him evaluate it compared with possible alternatives. Finally, do the same with everyone else alive at the creation of the FAI and if they mostly think that the world created by the FAI is about the best it could do given whatever constraints it faces, then the FAI has achieved an excellent outcome.
Unfortunately our brains lack capacity of thinking about superior intelligence. As I understood you want to describe particular examples of what lies between scenarios 0 (stands for human extinction) and 1 (mutual cooperation and new better level of everything)
First, there are scenarios where human race is standing on the edge of extinction, but somehow ables to fight back and surive, call that Skynet scenario. Analogously, you can think of a scenario where emergence of FAI don't do any great harm, but also don't provide too much new insights and never really get far beyond Human-level intelligence
First, there are scenarios where human race is standing on the edge of extinction, but somehow ables to fight back and surive, call that Skynet scenario.
Skynet is no realistic scenario.
After someone goes out and films Clippy: The Movie, will we be also prevented from using Clippy as shorthand for a specific hypothetical AI scenario?
Yeah I was talking about that terminator guy in terms that AI got ultimate control and used its power against humans, but was defeated, it is not obligatory to include here cyborgs time-travellers.
How you do measure, besides your gut feelings, realisticness of these kinds of scenarios? There is no way to assign probabilities accurately, all we can and should do is imagine as much consequences as possible
Ultimate control and getting defeated don't mesh well. In Hollywood there a chance that an AGI that gets ultimate control gets afterwards defeated. In the real world not so much.
How you do measure, besides your gut feelings, realisticness of these kinds of scenarios?
Analysis in multiple different ways. Keeping up with the the discourse.
There is no way to assign probabilities accurately
That's debatable. You can always use Bayesian reasoning but it's not the main issue of this debate.
Oh thanks, now I see it, these almost-there cases looking somewhat holywoodish, like the villain obligatory should pronounce lingering monologue before actually killing his victim, and thanks God hero appears in last moment.
Okay, Skynet which won't instantly get rid of humanity is improbable, if really superintelligent, and if it has this goal.
We can easily imagine many kinds of possible catastrophe which can happen, but we are not equally good at producing heavenlike utopian views, but this is an evidence only about our lack of imagination
I think if there was an FAI, and it successfully built utopia, it would have to solve this problem of convincing us that it did (?using something like the Arthur/Merlin protocol?)
edit: I realize that there is a nearly limitless room for trickery here, I am just saying that if I had to define FAI success it would have to be in terms of some sort of convincing game with asymmetric computing power setup, how else would we even do it? Importantly, Merlin is allowed to lie.
I've been returning to my "reduced impact AI" approach, and currently working on some idea.
What I need is some ideas on features that might distinguish between an excellent FAI outcome, and a disaster. The more abstract and general the ideas, the better. Anyone got some suggestions? Don't worry about quality at this point, originality is more prized!
I'm looking for something generic that is easy to measure. At a crude level, if the only options were "papercliper" vs FAI, then we could distinguish those worlds by counting steel content.
So basically some more or less objective measure that has a higher proportion of good outcomes than the baseline.