The first time I read Torture vs. Specks about a year ago I didn't read a single comment because I assumed the article was making a point that simply multiplying can sometimes get you the wrong answer to a problem. I seem to have had a different "obvious answer" in mind.

And don't get me wrong, I generally agree with the idea that math can do better than moral intuition in deciding questions of ethics. Take this example from Eliezer’s post Circular Altruism which made me realize that I had assumed wrong:

Suppose that a disease, or a monster, or a war, or something, is killing people. And suppose you only have enough resources to implement one of the following two options:
1. Save 400 lives, with certainty.
2. Save 500 lives, with 90% probability; save no lives, 10% probability.

I agree completely that you pick number 2. For me that was just manifestly obvious, of course the math trumps the feeling that you shouldn't gamble with people’s lives…but then we get to torture vs. dust specks and that just did not compute. So I've read most every argument I could find in favor of torture(there are a great deal and I might have missed something critical), but...while I totally understand the argument (I think) I'm still horrified that people would choose torture over dust specks.

I feel that the way that math predominates intuition begins to fall apart when you the problem compares trivial individual suffering with massive individual suffering, in a way very much analogous to the way in which Pascal’s Mugging stops working when you make the credibility really low but the threat really high. Like this. Except I find the answer to torture vs. dust specks to be much easier...

 

Let me give some examples to illustrate my point.

Can you imagine Harry killing Hermione because Voldemort threatened to plague all sentient life with one barely noticed dust speck each day for the rest of time? Can you imagine killing your own best friend/significant other/loved one to stop the powers of the Matrix from hitting 3^^^3 sentient beings with nearly inconsquential dust specks? Of course not. No. Snap decision.

Eliezer, would you seriously, given the choice by Alpha, the Alien superintelligence that always carries out its threats, give up all your work, and horribly torture some innocent person, all day for fifty years in the face of the threat of a 3^^^3 insignificant dust specks barely inconveniencing sentient beings? Or be tortured for fifty years to avoid the dust specks?

I realize that this is much more personally specific than the original question: but it is someone's loved one, someone's life. And if you wouldn't make the sacrifice what right do you have to say someone else should make it? I feel as though if you want to argue that torture for fifty years is better than 3^^^3 barely noticeable inconveniences you had better well be willing to make that sacrifice yourself.

And I can’t conceive of anyone actually sacrificing their life, or themselves to save the world from dust specks. Maybe I'm committing the typical mind fallacy in believing that no one is that ridiculously altruistic, but does anyone want an Artificial Intelligence that will potentially sacrifice them if it will deal with the universe’s dust speck problem or some equally widespread and trivial equivalent? I most certainly object to the creation of that AI. An AI that sacrifices me to save two others - I wouldn't like that, certainly, but I still think the AI should probably do it if it thinks their lives are of more value. But dust specks on the other hand....

This example made me immediately think that some sort of rule is needed to limit morality coming from math in the development of any AI program. When the problem reaches a certain low level of suffering and is multiplied it by an unreasonably large number it needs to take some kind of huge penalty because otherwise to an AI it would be vastly preferable the whole of Earth be blown up than 3^^^3 people suffer a mild slap to the face.

And really, I don’t think we want to create an Artificial Intelligence that would do that.

I’m mainly just concerned that some factor be incorporated into the design of any Artificial Intelligence that prevents it from murdering myself and others for trivial but widespread causes. Because that just sounds like a sci-fi book of how superintelligence could go horribly wrong.

New to LessWrong?

New Comment
57 comments, sorted by Click to highlight new comments since: Today at 11:01 AM

would you seriously, given the choice by Alpha, the Alien superintelligence that always carries out its threats, give up all your work, and horribly torture some innocent person, all day for fifty years in the face of the threat of a 3^^^3 insignificant dust specks barely inconveniencing sentient beings? Or be tortured for fifty years to avoid the dust specks?

Likewise, if you were faced with your Option 1: Save 400 Lives or Option 2: Save 500 Lives with 90% probability, would you seriously take option 2 if your loved ones were included in the 400? I wouldn't. Faced with statistical people I'd take option 2 every time. But make Option 1: Save 3 lives and those three lives are your kids or option 2: Save 500 statistical lives with 90% probability I don't think I'd hesitate to pick my kids.

In some sense, I'm already doing that. For the cost of raising three kids, I could have saved something like 250 statistical lives. So I don't know that our unwillingness to torture a loved one is a good argument against the math of the dust specks.

I was about to post something similar, although I don't have kids myself.

In some sense, I'm already doing that. For the cost of raising three kids, I could have saved something like 250 statistical lives. So I don't know that our unwillingness to torture a loved one is a good argument against the math of the dust specks.

Huh. That's ... a pretty compelling argument against having kids, actually.

Hmm, the microeconomics of outsourcing child production to countries with cheaper human-manufacturing costs... then we import them once they're university aged? You know you've got a good econ paper going when it could also be part of a dystopia novel plot.

I'm going to make the point I always do in this discussion: I think "dust specks" is a very badly chosen example. Do we really have to make this about literally the least possible discomfort someone can suffer? I think the "least possible" aspect is preventing people from actually multiplying; we are notoriously bad with small numbers, and I think most people are rounding off "dust speck" to "exactly zero".

I propose that the experiment should instead be named "broken fingers versus torture". Would you prefer 3^^^^3 broken thumbs - someone takes a hammer, forcibly puts the thumb on a hard surface, and whacks it quite hard - over fifty years of torture? This, to my mind, at least makes it a question that humans are capable of grasping; a broken thumb does not round off to zero. Or if you feel the thumbs are too much, then at least make it something that's not literally the least possible increment of discomfort; something that doesn't get rounded to nothing. The "least possible" is the least important part of the thought experiment, yet it's the one that causes all the discomfort and angst, through a bog-standard rounding error in human brains. This really ought to be avoidable.

I propose paper cut. Then the number can be much smaller than 3^^^3 and still make people pause.

Violating a coherence theorem always carries with an appropriate penalty of incoherence. What is your reply to the obvious argument from circular preference?

because otherwise to an AI it would be vastly preferable the whole of Earth be blown up than 3^^^3 people suffer a mild slap to the face.

It would be utterly disastrous to create an AI which would allow someone to be slapped in the face to avoid a 1/3↑↑↑3 probability of destroying the Earth.

Suppose that I would tentatively choose to torture one person to save a googolplex people from dust specks, and that additionally I would choose torture to save only a googol people from a papercut. Do I have circular preferences if I would be much, much more willing to save a googolplex people from dust specks by giving paper cuts to googol people than to save either group from specks or paper cuts by torturing one person?

I can achieve the exact same total utility by giving specks to googolplex people, giving papercuts to a googol people, or torturing one person. If I had to save 3^^^3 people from dust specks I'd give 3^^^3*googol/googolplex people paper cuts instead of torturing anyone. I'd much prefer saving 3^^^3 people from dust specks by subjecting perhaps 2^^^2 people to a relatively troublesome dust speck. So why exactly do I prefer troublesome dust specks over papercuts over torture even if utility is maximized either way? I think that I'm probably doing utilitarianism as more of a maximin calculation; maximizing the minimum individual utility function in some way. I can't maximize total utility in the cases where additional utility for some people must be bought at the cost of negative utility for others; it requires more of a fair exchange between individuals in order to increase total utility.

2^^^2 is 4, so I'd choose that in a heartbeat. 2^^^3 is the kind of number you were probably thinking about. Though, if we're choosing fair-sounding situations, I'd like to cut one of my fingernails too short to generate a MJ/K of negentropy.

I've got one way of thinking this problem through that seems to fit with what you're saying – though of course, it has its own flaws: represent each person's utility (is that the right word in this case) such that 0 is the maximum possible utility they can have, then map each individual's utility with x ⟼ -(e^(-x)), so that lots of harm to one person is weighted higher than tiny harms to many people. This is almost certainly a case of forcing the model to say what we want it to say.

[-][anonymous]11y00

Er... wouldn't it be vastly preferable for the AI to /not/ slap people in the face to avoid 1/3↑↑↑3 probability events for non-multiplication reasons? Building an AGI that acts on 1/3↑↑↑3 probability is to make a god that, to outsiders, comes across as both arbitrarily capricious and overwhelmingly interventionist. Even if the AGI's end result as a net effect on its well-defined utility function is positive, I'd wager modern humans or even extrapolated transhumans wouldn't like being slapped in the face lest they consider running the next-next-gen version of the LHC in their AGI's favorite universe. You don't even need Knuth notation for that to run into place: a 1/10^100^100 or even 1/10^100 event quickly gets to the point.

Even from a practical viewpoint, that seems incredibly prone to oscillation. There's a reason we don't set air conditioners to keep local temperatures to five nines, and 1/3↑↑↑3 is, to fall to understatement, a lot of sensitivity past that.

This is incoherent because qualitative boundaries are naturally incoherent -- why is one 2/10^100 risk worth processing power, but six separate 1/10^100 risks not worth processing, to give the blunt version of the sorites paradox? That's a major failing from a philosophical standpoint, where incoherence is functionally incorrect. But AGI aren't pure philosophy: there are strong secondary benefits to an underinterventionist AGI, and the human neurological bias trends toward underintentervention.

Of course, in an AGI situation you have to actually program it, and actually defining slapping one person 50 times versus slapping 100 people once each is programatically difficult enough

Edit : never mind; retracting this as off topic and misunderstanding the question.

[This comment is no longer endorsed by its author]Reply

Moved to Discussion, fixed formatting.

This seems to me to be the kind of thing that would belong in discussion, not main.

Beyond that, I'm not sure what you're getting at. Do you think math trumps moral intuition or not? If you think math trumps moral intuitions in some cases but not others, how do you formalize that?

Also, I think "would you really do X? Really?" is a bad argument. People generally suffer from akrasia and it's well known that utilitarianism doesn't accord perfectly with people's evolved moral beliefs and motivations.

Thirdly, there may be practical considerations at play, like the ends not justifying the means (among humans) or morality being short-sighted.

I just want to say thanks to everyone for your comments and I now realize the obvious flaw of incorporating any extremely personal connection into a mathematical morality calculation. Because, as BlueSun pointed out that causes problems on whatever scale of pain involved.

if you were faced with your Option 1: Save 400 Lives or Option 2: Save 500 Lives with 90% probability, would you seriously take option 2 if your loved ones were included in the 400? I wouldn't. Faced with statistical people I'd take option 2 every time. But make Option 1: Save 3 lives and those three lives are your kids or option 2: Save 500 statistical lives with 90% probability I don't think I'd hesitate to pick my kids.

I also learned not to grandstand on morality questions. Sorry, about the "would you do it? really?" argument, I won't do that again.

However, I still fall on the side of the dust specks after rethinking the issue, but due to the reasoning that the 3^^^3 individuals would probably be willing to suffer the dust specks to save someone from torture, while the tortured person wouldn't likely be willing to be tortured to save others from dust specks.

I'm somewhat sympathetic to your position, but I'm curious:

1) Which side do you think you'd come down on if it were (3^^^3 / 1 billion) dust specks vs (50 / 1 billion) years (= 1.6 seconds) of torture?

2) How about the same (3^^^3 / 1 billion) dust specks and (50 / 1 billion) years of torture but the dust specks were divided among (3^^^3 / (billion^2)) people so that each received 1 billion dust specks?

EDIT: I think these questions weren't very clear about what I was getting at. Eliezer's argument from Circular Altruism is along the lines of what I was going for, but much more well developed:

But let me ask you this. Suppose you had to choose between one person being tortured for 50 years, and a googol people being tortured for 49 years, 364 days, 23 hours, 59 minutes and 59 seconds. You would choose one person being tortured for 50 years, I do presume; otherwise I give up on you.

And similarly, if you had to choose between a googol people tortured for 49.9999999 years, and a googol-squared people being tortured for 49.9999998 years, you would pick the former.

A googolplex is ten to the googolth power. That's a googol/100 factors of a googol. So we can keep doing this, gradually - very gradually - diminishing the degree of discomfort, and multiplying by a factor of a googol each time, until we choose between a googolplex people getting a dust speck in their eye, and a googolplex/googol people getting two dust specks in their eye.

If you find your preferences are circular here, that makes rather a mockery of moral grandstanding.

Well, torture is highly nonlinear, so utility((50 years / billion) of torture ) is much milder than utility(50 years of torture)/billion.

As for #2, the you're leaving the LCPW for the original problem. Dustspecks are also nonlinear.

nonlinear

Hmm, going back and reading Circular Altruism, Eliezer's argument really seems to be predicated on linearity, doesn't it?

EDIT: Oops, read it again. You guys are right, it's not.

The main argument is predicated on linearity of probability. Probability is linear. I was pointing out the way the suggested comparison does not satisfy this.

It's not.

I am basing my reasoning on the probable preferences of those involved, so my answer would depend on the feelings of the people to being dust specked/tortured.

I'm not entirely clear what exactly you are asking with number 1: are you just asking 1.6 seconds of torture vs. 3^^^3/ 1 billion dust specks? If so, I'm essentially indifferent, it seems like both are fairly inconsequential as long as the torture only causes pain for the 1.6 seconds.

For number 2, a billion dust specks would probably get to be fairly noticeable in succession, so I'd prefer to get 1.6 seconds of torture over with, because that isn't really enough time for it actually to really be torturous (depending on what exactly that torture was) rather than deal with a constant annoyance.

This recent comment of mine seems relevant here.

The world, it is made of atoms and such. Quarks. What ever is the lowest level, does not matter. Not of people, not of happiness, and not of dust specks. The hypothetical utility function that corresponds to people's well being has to take at it's input the data about contents of a region of space (including laws of physics), and process that, and it has to somehow identify happiness and suffering happening in that whole region.

This hypothetical function does not have a property that well being of a region of space is equal to sum of well being of it's parts considered individually, for the obvious reasons that this won't work for very tiny parts, or because you value your head as it is more than you value your head diced into 27 cubes and rearranged randomly like a Rubik's cube - yet you can't tell the difference if you consider those cubes individually.

The dustspecks vs torture reasoning had confused f(dustspeck, dustspeck, dustspeck... n times) with n*f(dustspeck) , which would only be valid if the above-mentioned property held.

There's nothing inconsistent about having an utility function which returns the worst suffering it can find inside the region of space; it merely doesn't approximate human morality very well - and neither does an utility function that merely sums well being of people.

But there are other matters that you would regard as quantitatively comparable, yes? In which you are willing to say that one A is worse than one B, but enough Bs can add up to something worse than an A?

There are three counterarguments to "torture over specks" that I take seriously.

1) The harm coming from a speck in the eye is on a different plane to that coming from torture, so simply no amount of specks ever adds up to something as bad as torture.

2) The utilitarian calculation is overridden by the fact that the 3^^^3 people (if they were being moral about it) would not want some other person to undergo the torture, in order to avoid the specks.

3) Altruism is a mistake; you shouldn't choose 50 years of torture for yourself, for any reason.

ETA btw I'm not asserting that any of these counterarguments is correct; just that they might be.

A varient on 1: fifty one years after getting the dust speck, the victims will be as they would have been without it. It gets flat-out forgotten. Probably never even makes it into long-term memory. The physical damage is completely repaired. But the torture victim will likely be mentally broken. If you think of consequences when it's over, specs are preferable. So there is a specific qualitative difference between specs and torture, which supports our intuition to round small things to zero.

Okay, let's account for that and, uhm, double the amount of dust specks. Or just rule out indirect effects by having an untortured "back-up" upload of the victim that you use to repair whatever damage happened, including traumatic memories. The whole point of thought experiments like this is to get you thinking about just the variables that are mentioned to figure out what principles you consider to be relevant. If we find further arguments to support a conclusion, we are not playing properly.

In that case, it depends on if the torture victim would live until the singularity/cryonics, or if they would die of old age and then that trauma would be gone as well.

The problem I have, intuitively, with the torture option is that "one person tortured for 50 years" is not the only consequence. Other consequences include "now we're in a society where I don't have a right not to be tortured." Sure, in theory torture will only happen when it's globally utility-maximizing. Just like in theory my emails will only be read by the government if there's reason to suspect they'll reveal a crime or that someone involved in the correspondence is not a US citizen. I have no reason to believe this kind of promise.

Yes, I know this is analogous to "But how do I know the fat man will really stop the trolley?" And that's why I didn't comment on the main post.

Other consequences include "now we're in a society where I don't have a right not to be tortured."

Suppose the torture is done completely secretly, such that only on the order of 1 person knows about it. Then what you know about your rights is unchanged. And I would submit what your actual rights are is also unchanged: you are already subject to torture by kidnappers, the insane, governments that don't in fact prohibit torture, and even by government agents acting perhaps illegally in your opinion but not in someone else's opinion.

So at least that consequence, that now everybody has to worry a little more about being tortured, I think is not a necessary problem here.

Yes, you can make assumptions that get around this. You then have to assume a lot of things about the competence of everyone involved, etc. You could also assume you have read certain Nazi research papers that prove that a person of this size is guaranteed to safely stop a trolley of that size on such-and-such slope and terrain. My point here is mainly to grump about thought experiments that demand I turn off all the doubt and paranoia etc. that are key to making reasonable decisions in the real world.

My true rejection of torture over dust specks comes from not being a utilitarian at all (and therefore doesn't really belong in this discussion).

A thought experiment is a lot like a regular experiment, in that you need to isolate the variables being tested if you actually care about answering the question. All of those other factors (i.e. societal implications of torture, the physics which might allow you to inconvenience 3^^^3 people, the effects of mass eye irritation on the annual GDP of Peru) are all irrelevant to the issue; is an incomprehensibly huge amount of harm which is very widely distributed preferable to a large amount of harm inflicted on one person? And, in either case, why?

In other words, what would your answer be in the Least Convenient Possible World?

Or you can rephrase the thought experiment so that it's about saving a person from torture vs saving people from dustspecks (rather than torturing people), and thus avoid right-based objections.

A more technical way of asking the basic question here is:

Assume that the "badness" of single harms and of aggregations of harms can be expressed in real numbers.

As the number of times a minor harm happens approaches infinity, does the limit of its "badness" also approach infinity? If it does, then there must be some number of paper cuts that is worse than 50 years of torture. If it doesn't, then you have another bullet to bite: there's some level of harm at which you have to draw an infinitely thin bright line between it and the "next worst" harm so that there's a finite amount of the slightly worse harm you'd prefer to an infinite amount of the slightly less bad harm.

The bullet I prefer to bite is that harm can't be expressed as a single real number. Treating utility as a real thing rather than a convenient simplification of an intractably complex phenomenon makes people go funny in the head.

Seriously, I'm starting to feel the same way about the word "utility" as I do about the word "sin" - that is to say, an observed regularity gets promoted to an ontologically fundamental object, and humans spend lifetimes trying to reason their way around the ensuing ontological crisis.

I'm not sure that the "real number" assumption is key here. Just imagine an arbitrarily finely graded sequence that looks like "dust speck, paper cut, bruise, broken arm, fifty years of torture". If you claim that no number of dust specks is worse than the 50 years of torture, then one of the following also has to be true:

  1. No number of dust specks is worse than a paper cut.

  2. No number of paper cuts is worse than a bruise.

  3. No number of bruises is worse than a broken arm.

  4. No number of broken arms is worse than 50 years of torture.

So far, because the sequence is fairly short, several of these still look fairly plausible. But if you make the sequence longer, then in my opinion all your options begin to look ridiculous.

Given that nerve impulses are almost digital and that dust specks probably only activate the touch sense while paper cuts directly activate the pain sense, I'd say that practically humans do divide dust specks into a fundamentally different category than paper cuts. No matter how often I occasionally got a dust speck in my eye it would never feel as painful as a paper cut. On reflection, I might realize that I had spent a lot more time being annoyed by dust specks than paper cuts and make some sort of utilitarian deal regarding wasted time, but there is still some threshold at which the annoyance of a dust speck simply never registers in my brain the same way that a paper cut does. It physically can't register the same way.

My brain basically makes this distinction for me automatically; I wear clothes, and that should register like a whole lot of dust-speck-equivalents touching my skin all the time and I should prefer some lottery where I win papercuts instead of feeling my clothes on my skin. Instead, my brain completely filters out the minor discomfort of wearing clothes. I can't filter out paper cuts, broken arms, or torture.

I understand that "dust specks" is really meant as a stand-in for "the least amount of dis-utility that you can detect and care about", so it may just be that "dust specks" was slightly too small an amount of dis-utility for a lot of people and it created the counter-intuitive feelings. I would never subject one person to a speck of dust if by doing so I could save 3^^^3 people from being hit twice as hard by a stray air molecule, for instance. I don't know how I feel about saving 3^^^3 people from papercuts by torturing someone. It still feels intuitively wrong.

I think that being a finite creature, you have a finite amount of value to assign, and that finite value is assigned in concentric circles about you and your life. The value assigned to the outer circle gets split between the gazillions in the outer circle. For the most part, more people just means each one gets less.

Does anyone feel strongly that one choice is better than the other if the average number of deaths is the same?

(1) 400 out of 500 people certainly die

(2) All 500 live with 80% probability, all die with 20% probability.

[-]tim11y60

Just because the expected values are equal doesn't mean the expected utilities are equal. For example, if you are choosing between a 100% chance of $400 and an 80% chance of $500 but you need $500 to make the rent and not get evicted, it would be stupid not to try for the additional $100 since it has a disproportionately high value in that particular context. That is, while the expected value of the choices are identical, the expected utilities are not. Conversely, if you have zero money, the expected utility of a sure $400 far outweighs risking it all for an additional $100.

Basically you need an external context and a value system that assigns utilities to the possible outcomes to discriminate between the two. And if the expected utilities come back equal then you have no preference.

All types of normative ethics run into trouble when taken to extremes and utilitarianism is no exception. Whether Eliezer's idea that a super-smart intelligence can come up with some new CEV-based ethics that never leads to any sort of repugnant conclusion, or at least minimizes its repugnance according to some acceptable criterion, is an open question.

From my experience dealing with ideas from people who are much smarter than me, I suspect that the issues like dust specks vs torture or the trolley problem result from our constrained and unimaginative thinking and would never arise or even make sense in an ethical framework constructed by an FAI, assuming FAI is possible.

I wonder if there is any right answer. We are talking about moral intuitions, feelings, which come from some presumably algorithmic part of our brain which seems it can be influenced, at least somewhat changed, by our rational minds. When we ask the question: which is worse 3^^^3 dust specks of 50 years of torture, are we asking what our feeling would be when confronted with a real world in which somehow we could see this happening? As if we could ever see 3^^^3 of anything? Because that seems to be the closest I can get to defining right and wrong.

Yes, the brain is a great tool for running hypotheticals to predict how the real thing might go. So we are running hypotheticals of 3*3 dust specks and hypotheticals of 50 years of torture, and we are trying to see how we feel about them?

I think it is not that hard to "break" the built-in for determining how one feels morally about various events. The thing was designed for real world circumstances and honed by evolution to produce behavior that would cause us to cooperate in a way that would ensure our survival, enhance our productivity and reproductivity. When we get to corner cases that clearly the original moral systems could not have been designed for, because NOTHING like them ever happened in our evolution, what can we say about our moral intuition? Even if it would make sense to say there is one right answer as to how our moral intuition "should" work in this corner case, why would we put any stock behind it?

I think in real life, the economic forces of a gigantic civilization of humans would happily torture a small number of individuals for 50 years if it resulted in the avoidance of 3^^^3 dust specks. We build bridges and buildings where part of the cost is a pretty predictable loss of human life, and a predictable amount of that being painful and a predictable amount of crippling, from construction accidents. We drive and fly and expose ourselves to carcinogens. We expose others to carcinogens.

In some important sense, when I drive a car that costs $10,000 instead of one that costs $8000, but I send the extra $2000 to Africa to relieve famine, I am trading some number of African starvations for leather seats and a nice stereo.

What is the intuition I am supposed to be pumping with dust specks? I think it might be that how you feel morally about things that are way beyond the edges of the environment in which your moral intuitions evolved are not meaningfully either moral or immoral.

Givewell does not think you can save one African from starving with $2000. You might be able to save one child from dying of malaria via insecticide-treated mosquito bednets. But this of course will not be the optimal use of $2K even on conventional targets of altruism; well-targeted science research should beat that (where did mosquito nets come from?).

... are we asking what our feeling would be when confronted with a real world in which somehow we could see this happening? As if we could ever see 3^^^3 of anything? Because that seems to be the closest I can get to defining right and wrong.

As a simple improvement, you could approximate the (normatively) "correct answer" as the decision reached as a result of spending a billion years developing moral theory, formal decision-making algorithms and social institutions supporting the research process, in the hypothetical where nothing goes wrong during this project. (Then, whatever you should actually decide in the hypothetical where you are confronted with the problem and won't spend a billion years might be formalized as an attempt to approximate that (explicitly inaccessible) approximation, and would itself be expected to be a worse approximation.)

As a simple improvement, you could approximate the (normatively) "correct answer" as the decision reached as a result of spending a billion years developing moral theory, formal decision-making algorithms and social institutions supporting the research process, in the hypothetical where nothing goes wrong during this project.

Presumably this requires that a billion years developing theory with nothing going wrong during the development would need to converge on a single answer. That is, a million identical starting points like our world would need to end up after that billion years within a very small volume of morality-space, in order for this process to be meaningful.

Considering that the 4 billion years (or so) spent by evolution on this project has developed quite divergent moralities across not only different species, but with quite a bit of variation even within our own species, that convergence seems exceedingly unlikely to me. Do you have any reason to think that there is "an answer" to what would happen after a billion years of good progress?

Supposing another billion years of evolution and natural selection. If the human species splits into multiple species or social intelligences arise among other species, one would (or at least i would in absence of a reason not to) expect each species to have a different morality. Further since it has been evolution that has gotten us to the morality we have, presumably another billion years of evolution would have to be considered as a "nothing going wrong" method. Perhaps some of these evolved systems will address the 3^^^3 dust specks vs torture question better than our current system does, but is there any reason to believe that all of them will get the same answer? Or I should say, any rational reason, as faith in a higher morality that we are a pale echo of would constitute a reason but not a rational one?

That is, a million identical starting points like our world would need to end up after that billion years within a very small volume of morality-space, in order for this process to be meaningful.

(The hypothetical I posited doesn't start with our world, but with an artificial isolated/abstract "research project" whose only goal is to answer that one question.) In any case, determinism is not particularly important as long as the expected quality across the various possible outcomes is high. For example, for a goal of producing a good movie to be meaningful, it is not necessary to demand that only very similar movies can be produced.

Considering that the 4 billion years (or so) spent by evolution

Evolution is irrelevant, not analogous to intelligent minds purposefully designing things.

presumably another billion years of evolution would have to be considered as a "nothing going wrong" method

No, see "value drift", fragility of value.

Nitpicking note : you should fix your formatting, the text changes size/font at various points (without any relation to the content) it's painful to read.

[-][anonymous]10y10

blee

[This comment is no longer endorsed by its author]Reply

OK.

Given two people A and B, and given a choice between A breaking ten bones and B breaking nine bones, you prefer A breaking ten bones if you're A (because you have empathy).
Yes?

Similarly, presumably if you're B, you prefer that B break nine bones, again because you have empathy.
Yes?

And in both of those cases you consider that the morally correct choice... an AI that arrives at a different answer is "unfriendly"/immoral.
Yes?

So, OK.
Some additional questions.

  1. If you are neither A nor B and are given that choice, which do you prefer? Which do you consider the morally correct choice?

  2. If you are A, and you have a choice between A breaking ten bones, B breaking nine bones, and letting B make the choice instead, which do you prefer? Which do you consider the morally correct choice? Does it affect your choice if you know that B shares your preferences and has empathy, and will therefore predictably choose ten broken bones for themselves?

Can you imagine Harry killing Hermione because Voldemort threatened to plague all sentient life with one barely noticed dust speck each day for the rest of time?

I think I see here what's wrong with your thinking. "all sentient life", which is what would be plagued with dust specks in this example, is incredibly, inconceivably, astronomically less than 3^^^3. Actually, it's even worse than that. It's incredibly, inconceivably, astronomically less than a googleplex (10^(10^100)). And that is incredibly, inconceivably, astronomically less than 3^^^3. Actually, no. You can add a vast number of levels before the difference between one and the one a step higher up can be accurately described as "incredibly, inconceivably, astronomically less".

From what you have written, it seems you have not internalised this. We are not talking about a trillion dust specks, a quadrillion dust specks, or a quintillion dust specks. There is a difference. A large difference.

I'd assume that 3^^^3 people would prefer to have a barely noticeable dust speck in their eye momentarily over seeing me get tortured for 50 years & thus I'd choose the dust specks. If I'm wrong, that is fine, any of those 3^^^3 people can take me to court & sue me for "damages", of which there were none. Maybe appropriate reimbursement would be something like 1/3^^^3 of a cent per person?

Your specific examples of Eliezer, me, or my best (and N best, until 4) friend(s) are contributing to the friendly AI problem, which has way, way bigger number of average affected beings than a mere 3^^^3. If you somehow remove those considerations (maybe by the entity making the offer increasing someone else's ability to contribute to an equal degree), or instead mention any family member or fifth+ best friend, then my obvious snap judgement is of coarse I'd chose the torture/death, without hesitation, and am confused and disturbed by how anyone could answer otherwise.

I'm planing on possibly doing much worse things to myself if it becomes necessary, which it might. Or to other people if I can figure out eliezers ethical injunction stuff well enough to safely make exceptions and that information tells me to.

[-]gjm11y110

the friendly AI problem, which has way, way bigger number of average affected beings than a mere 3^^^3.

?????????

Well, assuming they exist, in order to be dustspecked ...

Might wanna do a SAN check there - you just claimed that in real life, FAI would affect a way way bigger number than 3^^^3. If that wasn't a typo, then you don't seem to know what that number is.

[This comment is no longer endorsed by its author]Reply

I agree with you a lot, but would still like to raise a counterpoint. To illustrate the problem with mathematical calculations involving truly big numbers though, what would you regard as the probability that some contortion of this universe's laws allows for literally infinite computation? I don't give it a particularly high probability at all, but I couldn't in any honesty assign it one anywhere near 1/3^^^3. The naive expected number of minds FAI affects (effects?) doesn't even converge in that case, which at least for me is a little problematic

Yes, if he had said "I think there is a small-but-reasonable probability that FAI could affect way way more than 3^^^3 people", I wouldn't have had a problem with that (modulo certain things about how big that probability is).

Well, small-but-reasonable times infinite equals infinite. Which is indeed way, way bigger than 3^^^3.

That's what I DID say, "average", and my reasoning is roughly the the same as wanderingsouls, except I don't consider it to be any kind of problem. The omega point, triggered inflation creating child universes, many other things we haven't even thought about... I'd estimate the probability that FAI will find practically infinite computational power around 10% or so.

And yea if I had chosen the working myself I'd probably have chosen something a bit more humble that I actually can comprehend, like a gogolplex, but 3^^^3 is the standard "incomprehensibly large number" used here, and I'm just using it to mean "would be infinite if we could assume transfinite induction".

Ack! Sorry, I must have missed the 'average'. Retracted.

[-]Dallas11y-20

Can you imagine Harry killing Hermione because Voldemort threatened to plague all sentient life with one barely noticed dust speck each day for the rest of time? Can you imagine killing your own best friend/significant other/loved one to stop the powers of the Matrix from hitting 3^^^3 sentient beings with nearly inconsquential dust specks? Of course not. No. Snap decision.

My breaking point would be about 10 septillion people, which is far, far less... no, wait, that's for a single-event dust speck.

What's your definition of all sentient life? Are we talking Earth, observable universe, or what? What's 'the rest of time'?

3^^^3 is so large that claims on this order of magnitude are hard to judge. See Pascal's Muggle for a discussion of this.

given the choice by Alpha, the Alien superintelligence that always carries out its threats

We call that "Omega". Let's use the same term as always.

And if you wouldn't make the sacrifice what right do you have to say someone else should make it?

The obvious response to this is that "should" and "would" are different verbs. There are probably lots of things that I "should" do that I wouldn't do. Knowing the morally proper thing is different from doing the morally proper thing.

Most of your expressed disagreement seems to stem from this: You don't imagine anyone would do this, so you claim you don't think that someone should. Once you've fully understood and internalized that these are different verbs, the confusion disappears.

I’m mainly just concerned that some factor be incorporated into the design of any Artificial Intelligence that prevents it from murdering myself and others for trivial but widespread causes.

Yeah, the factor is that we don't actually want the AI to be moral, we want it to follow humanity's Coherent Extrapolated Volition instead.