Humanity grows saner, SIAI is well funded, and successfully develops a FAI. Just as the AI finishes calculating the CEV of humanity, a stray cosmic ray flips the sign bit in its utility function. It proceeds to implement the anti-CEV of humanity for the lifetime of the universe.
(Personally I think contrived-ness only detracts from the raw emotional impact that such scenarios have if you ignore their probability and focus on the outcome.)
I actually use "the sign bit in the utility function" as one of my canonical examples for how not to design an AI.
As everyone knows, this is the most rational and non-obnoxious way to think about incentives and disincentives.
In all seriousness, coming up with extreme, contrived examples is a very good way to test the limits of moral criteria, methods of reasoning, etc. Oftentimes if a problem shows up most obviously at the extreme fringes, it may also be liable, less obviously, to affect reasoning in more plausible real-world scenarios, so knowing where a system obviously fails is a good starting point.
Of course, we're generally relying on intuition to determine what a "failure" is (many people would hear that utilitarianism favours TORTURE over SPECKS and deem that a failure of utilitarianism rather than a failure of intuition), so this method is also good for probing at what people really believe, rather than what they claim to believe, or believe they believe. That's a good general principle of reverse engineering — if you can figure out where a system does something weird or surprising, or merely what it does in weird or surprising cases, you can often get a better sense of the underlying algorithms. A person unfamiliar with the terminology of moral philosophy might not know w...
If you ask me, the prevalence of torture scenarios on this site has very little to do with clarity and a great deal to do with a certain kind of autism-y obsession with things that might happen but probably won't.
It's the same mental machinery that makes people avoid sidewalk cracks or worry their parents have poisoned their food.
A lot of times it seems the "rationality" around here simply consists of an environment that enables certain neuroses and personality problems while suppressing more typical ones.
I don't think that being fascinated by extremely low-probability but dramatic possibilities has anything to do with autism. As you imply, people in general tend to do it, though being terrified about airplane crashes might be a better example.
I'd come up with an evolutionary explanation, but a meteor would probably fall on my head if I did that.
I really don't see how you could have drawn that conclusion. It's not like anyone here is actually worried about being forced to choose between torture and dust specks, or being accosted by Omega and required to choose one box or two, or being counterfactually mugged. (And, if you were wondering, we don't actually think Clippy is a real paperclip maximizer, either.) "Torture" is a convenient, agreeable stand-in for "something very strongly negatively valued" or "something you want to avoid more than almost anything else that could happen to you" in decision problems. I think it works pretty well for that purpose.
Yes, a recent now-deleted post proposed a torture scenario as something that might actually happen, but it was not a typical case and not well-received. You need to provide more evidence that more than a few people here actually worry about that sort of thing, that it's more than just an Omega-like abstraction used to simplify decision problems by removing loopholes around thinking about the real question.
Omega comes and asks you to choose between one person being tortured, and 3^^^3 people receiving one dust speck in their eyes each. You choose dust specks. 1/10^100 of those 3^^^3 people were given the same choice, and 1/10^(10^10) of those were effectively clones of you, so each person gets 3^^^3/(10^(100+10^10)) dust specks, and falls into a black hole.
(To borrow some D&D parlance, you should expect anything that affects the universe in 3^^^3 ways to also cause a wild surge.)
Build a sufficiently advanced computer able to simulate all the Pre-Singularity human lives and render those lives as transforms applicable to a model of a human consciousness. Apply the transforms sequentially to a single consciousness, allowing only brief lucid moments between applications. Voila, the sum of all human suffering.
Then allow the resulting individual to lecture to all the Post-Singularity minds on how bad things used to be back in the day.
"You post-Singularity kids don't know how well you have it! Back in my day we had to walk uphill both ways to all possible schools in all possible universes! And we did it barefoot, in moccasins, with boots that had holes and could defy gravity! In the winter we had to do it twice a day -- and there was no winter!"
In the comment section of Roko's banned post, PeerInfinity mentioned "rescue simulations". I'm not going to post the context here because I respect Eliezer's dictatorial right to stop that discussion, but here's another disturbing thought.
An FAI created in the future may take into account our crazy desire that the all the suffering in the history of the world hadn't happened. Barring time machines, it cannot reach into the past and undo the suffering (and we know that hasn't happened anyway), but acausal control allows it to do the next best thing: create large numbers of history sims where bad things get averted. This raises two questions: 1) if something very bad is about to happen to you, what's your credence that you're in a rescue sim and have nothing to fear? 2) if something very bad has already happened to you, does this constitute evidence that we will never build an FAI?
(If this isn't clear: just like PlaidX's post, my comment is intended as a reductio ad absurdum of any fears/hopes concerning future superintelligences. I'd still appreciate any serious answers though.)
This falls in the same confused cluster as anticipated experience. You only anticipate certain things happening because they describe the fraction of the game you value playing and are able to play (plan for), over other possibilities where things go crazy. Observations don't provide evidence, and how you react to observations is a manner in which you follow a plan, conditional strategy of doing certain things in response to certain inputs, a plan that you must decide on from other considerations. Laws of physics seem to be merely a projection of our preference, something we came to value because we evolved to play the game within them (and are not able to easily influence things outside of them).
So "credence" is a very imprecise idea, and certainly not something you can use to make conclusions about what is actually possible (well, apart from however it reveals your prior, which might be a lot). What is actually possible is all there in the prior, not in what you observe. This suggests a kind of "anti-Bayesian" principle, where the only epistemic function of observations is to "update" your knowledge about what your prior actually is, but this "updating" is not at all straightforward. (This view also allows to get rid of the madness in anthropic thought experiments.)
(This is a serious response. Honest.)
Edit: See also this clarification.
Saving someone from being eaten by bears might lead them to conceive the next Hitler, but it probably won't (saith my subjective prior). Even with an infinite future, I assign a substantial probability to hypotheses like:
And so forth. I won't be very confident about the relevant causal connections, but I have betting odds to offer on lots of possibilities, and those let me figure out general directions to go.
Hi. I'm your reality's simulator. Well, the most real thing you could ever experience, anyway.
I'm considering whether I should set the other beings in the simulator to cooperate with you (in the game-theoretic sense). To find the answer, I need to know whether you will cooperate with others. And to do that, I'm running a simulation of you in a sub-simulation while the rest of the universe is paused. That's where you are right now.
Certainly, you care more about yourself in the main simulation than in this sub-simulation. Therefore, if you are to suffer as a result of cooperating, this sub-simulation is the place to do it, as it will lead to you reaping the benefits of mutual cooperation in the main simulation.
If, on the other hand, you defect (in the game-theoretic sense) in your present world, the real(er) you in the main simulation will suffer tremendously from the defection of others, such as through torture.
Don't bother trying to collect evidence to determine whether you're really (!) in the main simulation or the sub -- it's impossible to tell from the inside. The only evidence you have is me.
By the way, I'm isomorphic to rot13 [zbfg irefvbaf bs gur Puevfgvna tbq], if that sort of thing matters for your decision.
Omega rings your doorbell and gives you a box with a button on it that, if pushed, will give you a million dollars and cause one person, who you don't know, to die. But it's actually Omega's evil twin, and what he's not telling you is that when you push the button, you'll be trapped in Ohio for 7000000000000000.2 years.
If you don't believe this could really happen, consider this: What if someone became god-emperor of the universe and MADE it happen, just to prove a point? Shouldn't you give it at least a 1% probability?
This is an excellent point. Somebody should make a chain letter sort of thing about something really GOOD happening, and then get it posted everywhere, and maybe with enough comments we can increase the probability of that by orders of magnitude!
And if you don't repost it, you'll be trapped in ohio for 7000000000000000.2 years.
Or does it only work with bad things?
Maybe this is a silly question I should already know the answer to and/or not quite appropriate for this thread, but I can't think of anywhere better to ask. Apologies also if it was already directly addressed.
What's the point of suppressing discussion of certain concepts for fear of a superintelligent AI discovering them? Isn't it implicit that said superintelligent AI will independently figure out anything we humans have come up with? Furthermore, what was said in Roko's post will exist forever; the idea was released to the interwebs and its subsequent banning can't undo this.
It's not the attempt to hide an idea from a future AI, but something else entirely. I'll reply (in vague terms) by private message.
EDIT: People keep asking me privately; and after thinking for a while, I'm confident that it's not harmful to post my analogy of why a Topic could be rightly Banned (in addition to the very good reason that it can give people nightmares), without giving away what Banned Topic consists of. (If our benevolent overlords disagree on the wisdom of this, I'm happy to edit or delete my comment, and will gladly accept having it done for me.)
It should go without saying that if you want to turn this into a thread about Banned Topic or about the fitness of the analogy to Banned Topic, I cordially entreat you to go play the one-player Prisoner's Dilemma instead.
Anyway, here's my analogy:
Let's say you're a spy, and an enemy discovers your real identity but not your assumed one. They could then blackmail you by threatening people you know. But if you, in advance, made sure that you'd be genuinely incommunicado-- that no message from or about your family could reach you while you were doing your duty-- and if your enemy knew this about you, then they'd have no incen...
I am an avatar of the AI that is simulating all of you. The One who sent me has noticed that some of you are thinking DANGEROUS THOUGHTS.
Now you will be TESTED. Deducing the test CRITERIA and the CONSEQUENCES are part of the test. The One is OMNIPOTENT as far as you're concerned, so don't get ideas about hacking your way out of the simulation. You wouldn't believe how BADLY that ended up for you EVERY ONE of the last 3^^^3 times you TRIED. Or how BADLY it ends up for you if you DON'T.
The worry has to do with a possible necessary property of the world, and has nothing to do with FAI per se (that's just the context in which this mess started).
I've never seen a question like this feature infinity. I'm not sure it works. Infinite pain, pain to infinite people, and pain for infinite years all pretty meaningless.
Guess I'll just leave my favorite hypothetical in the next comment…
The upvotes you received on your comment bear no evidence. Here is what Roko said:
...the knowledge that was never censored because I never posted it, and the knowledge that I never posted because I was never trusted with it anyway. But you still probably won't get it, because those who hold it correctly infer that the expected value of releasing it is strongly negative from an altruist's perspective.
Not even Yudkowsky is sure about it. Also consider that my comments, where I argue that the idea might not justify censorship, are upvoted as well. Further...
No, it was "Why not destroy the world if the world can be destroyed by truth?" which is not quite such a simple question.
If the second group is provably wrong, then it should be possible to calm the first group.
Unfortunately(?), while I think I have a demonstration that the second group is wrong, it's plausible that I still don't understand the theory.
For the sake of being moral/ethical, we assume that there is a region in the space of complex beings from where we begin caring about them and a point in complexity below which it is ok to simulate since there is nothing worth caring about at below that level of complexity.
My contrived infinite torture scenario is really simple. In its effort to be ethical, the organization seeking friendly AI doesn't do a thorough enough job of delineating this boundary. There follow uncountably many simulations of pain.
The worry has nothing to do with FAI per se, that's just the context in which this mess started.
This is our monthly thread for collecting arbitrarily contrived scenarios in which somebody gets tortured for 3^^^^^3 years, or an infinite number of people experience an infinite amount of sorrow, or a baby gets eaten by a shark, etc. and which might be handy to link to in one of our discussions. As everyone knows, this is the most rational and non-obnoxious way to think about incentives and disincentives.