Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.

Comment author: benelliott 16 December 2010 05:54:19PM 9 points [-]

The colloquial meaning of "x is impossible" is probably closer to "x has probability <0.1%" than "x has probability 0"

Comment author: CynicalOptimist 18 November 2016 12:15:12AM 0 points [-]

This is good, but I feel like we'd better represent human psychology if we said:

Most people don't make a distinction between the concepts of "x has probability <0.1%" and "x is impossible".

I say this because I think there's an important difference between the times when people have a precise meaning in mind, which they've expressed poorly, and the times when people's actual concepts are vague and fuzzy. (Often, people don't realise how fuzzy their concepts are).

Comment author: someonewrongonthenet 01 October 2013 03:14:40AM *  0 points [-]

I agree, it's quite possible that someone might deconstruct "me" and "life" and "death" and "subjective experience" to the same extent that I have and still value never deleting certain information that is computationally descended from themselves more than all the other things that would be done with the resources that are used to maintain them.

Hell, I might value it to that extent. This isn't something I'm certain about. I'm still exploring this. My default answer is to live forever - I just want to make sure that this is really what I want after consideration and not just a kicking, screaming survival instinct (AKA a first order preference)

Comment author: CynicalOptimist 17 November 2016 08:29:45PM *  0 points [-]

This seems to me like an orthogonal question. (A question that can be entirely extricated and separated from the cryonics question).

You're talking about whether you are a valuable enough individual that you can justify resources being spent on maintaining your existence. That's a question that can be asked just as easily even if you have no concept of cryonics. For instance: if your life depends on getting medical treatment that costs a million dollars, is it worth it? Or should you prefer that the money be spent on saving other lives more efficiently?

(Incidentally, i know that utilitarianism generally favours the second option. But I would never blame anyone for choosing the first option if the money was offered to them.)

I would accept an end to my existence if it allowed everyone else on earth to live for as long as they wished, and experience an existentially fulfilling form of happiness. I wouldn't accept an end to my existence if it allowed one stranger to enjoy an ice cream. There are scenarios where I would think it was worth using resources to maintain my existence, and scenarios where I would accept that the resources should be used differently. I think this is true when we consider cryonics, and equally true if we don't.

The cryonics question is quite different.

For the sake of argument, I'll assume that you're alive and that you intend to keep on living, for at least the next 5 years. I'll assume that If you experienced a life-threatening situation tomorrow, and someone was able to intervene medically and grant you (at least) 5 more years of life, then you would want them to.

There are many different life-threatening scenarios, and many different possible interventions. But for decision making purposes, you could probably group them into "interventions which extend my life in a meaningful way" and interventions that don't. For instance, an intervention that kept your body alive but left you completely brain-dead would probably go in the second category. Coronary bypass surgery would probably go in the first.

The cryonics question here is simply: "If a doctor offered to freeze you, then revive you 50 years later" would you put this in the same category as other "life-saving" interventions? Would you consider it an extension of your life, in the same way as a heart transplant would be? And would you value it similarly in your considerations?

And of course, we can ask the same question for a different intervention, where you are frozen, then scanned, then recreated years later in one (or more) simulations.

Comment author: TheOtherDave 01 October 2013 03:46:42AM 0 points [-]

I take it you're assuming that information about my husband, and about my relationship to my husband, isn't in the encyclopedia module along with information about mice and omelettes and your relationship to your wife.

If that's true, then sure, I'd prefer not to lose that information.

Comment author: CynicalOptimist 17 November 2016 08:08:01PM *  0 points [-]

I think I've got a good response for this one.

My non-episodic memory contains the "facts" that Buffy the Vampire Slayer was one of the best television shows that was ever made, and the Pink Floyd aren't an interesting band. My boyfriend's non-episodic memory contains the facts that Buffy was boring, unoriginal, and repetetive (and that Pink Floyd's music is trancendentally good).

Objectively, these are opinions, not facts. But we experience them as facts. If I want to preserve my sense of identity, then I would need to retain the facts that were in my non-episodic memory. More than that, I would also lose my sense of self if I gained contradictory memories. I would need to have my non-episodic memories and not have the facts from my boyfriend's memory.

That's the reason why "off the shelf" doesn't sound suitable in this context.

Comment author: CynicalOptimist 17 November 2016 07:26:48PM 0 points [-]

Very interesting. I'm going to try my hand at a short summary:

Assume that you have a number of different options you can choose, that you want to estimate the value of each option and you have to make your best guess as to which option is most valuable. In step one, you generate individual estimates using whatever procedure you think is best. In step 2 you make the final decision, by choosing the option that had the highest estimate in step one.

The point is: even if you have unbiased procedures for creating the individual estimates in step one (ie procedures that are equally likely to overestimate as to underestimate) biases will still be introduced in step 2, when you're looking at the list of all the different estimates. Specifically, the biases are that the highest estimate(s) are more likely to be overestimates, and the lowest estimate(s) are more likely to be underestimates.

Comment author: Vladimir_Nesov 16 September 2011 09:35:37AM 15 points [-]

But all you've done after "adjusting" the expected value estimates was producing a new batch of expected value estimates, which just shows that the original expected value estimates were not done very carefully (if there was an improvement), or that you face the same problem all over again...

Am I missing something?

Comment author: CynicalOptimist 17 November 2016 06:28:11PM *  0 points [-]

Well in some circumstances, this kind of reasoning would actually change the decision you make. For example, you might have one option with a high estimate and very high confidence, and another option with an even higher estimate, but lower confidence. After applying the approach described in the article, those two options might end up switching position in the rankings.

BUT: Most of the time, I don't think this approach will make you choose a different option. If all other factors are equal, then you'll probably still pick the option that has the highest expected value. I think that what we learn from this article is more about something else: It's about understanding that the final result will probably be lower than your supposedly "unbiased" estimate. And when you understand that, you can budget accordingly.

Comment author: Brickman 28 September 2011 12:15:15PM 1 point [-]

Oh, I understand now. Even if we don't know how it's distributed, if it's the top among 9 choices with the same variance that puts it in the 80th percentile for specialness, and signal and noise contribute to that equally. So it's likely to be in the 80th percentile of noise.

It might have been clearer if you'd instead made the boxes actually contain coins normally distributed about 40 with variance 15 and B=30, and made an alternative of 50/1, since you'd have been holding yourself to more proper unbiased generation of the numbers and still, in all likelihood, come up with a highest-labeled box that contained less than the sure thing. You have to basically divide your distance from the norm by the ratio of specialness you expect to get from signal and noise. The "all 45" thing just makes it feel like a trick.

Comment author: CynicalOptimist 17 November 2016 05:27:56PM 0 points [-]

I think there's some value in that observation that "the all 45 thing makes it feel like a trick". I believe that's a big part of why this feels like a paradox.

If you have a box with the numbers "60" and "20" as described above, then I can see two main ways that you could interpret the numbers:

A: The number of coins in this box was drawn from a probability distribution with a mean of 60, and a range of 20.

B: The number of coins in this box was drawn from an unknown probability distribution. Our best estimate of the number of coins in this box is 60, based on certain information that we have available. We are certain that the actual value is within 20 gold coins of this.

With regards to understanding the example, and understanding how to apply the kind of Bayesian reasoning that the article recommends, it's important to understand that the example was based on B. And in real life, B describes situations that we're far more likely to encounter.

With regards to understanding human psychology, human biases, and why this feels like a paradox, it's important to understand that we instinctively tend towards "A". I don't know if all humans would tend to think in terms of A rather than B, but I suspect the bias applies widely amongst people who've studied any kind of formal probability. "A" is much closer to the kind of questions that would be set as exercises in a probability class.

In response to comment by RobbBB on Magical Categories
Comment author: TheAncientGeek 16 January 2014 12:49:09PM *  0 points [-]

That's not generally true of human-level intelligences. We wouldn't expect a random alien species that happens to be as smart as humans to be very successful at figuring out human morality.

Assuming morality is lots of highly localised, different things...which I don't , particularly. if it is not, then you can figure it out anywhere, If it is,then the problem the aliens have is not that morality is imponderable, but that they are don't have access to the right data. They don't know how things on earth. However, an AI built on Earth would. So the situation is not analogous. The only disadvantage an AI would have is not having biological drives itself, but it is not clear that an entity needs to have drives in order to understand them. We could expect a SIAI to get incrementally betyter at maths than us until it surpasses us; we wouldn't worry that i would hit on the wrong maths, because maths is not a set of arbitrary, disconnected facts.

But humans aren't very good at figuring out morality; they can make serious mistakes

An averagely intelligent AI with an average grasp of morality would not be more of a threat than an average human. A smart AI, would, all other things being equal, be better at figuring out morality. But all other things are not equal, because you want to create problems by walling off the UF.

(He deliberately picked ones that sound 'stupid' to a human mind, to make the point that human concepts have a huge amount of implicit complexity built in.)

I'm sure they do. That seems to be why progress in AGI , specifically use of natural language,has been achingly slow. But why should moral concepts be so much more difficult than others? An AI smart enough to talk its way out of a box would be able to understand the implicit complexity: an AI too dumb to understand implicit complexity would be boxable. Where is the problem?

Utility functions that change over time are more dangerous than stable ones, because it's harder to predict how a descendant of a seed AI with a heavily modified utility function will behave than it is to predict how a descendant with the same utility function will behave.

Things are not inherently dangerous just because they are unpredictable. If you have some independent reason fo thinking something might turn dangerous, then it becomes desirable to predict it.

But Superintelligent artificial general intelligences are generally assumed to be good at everything: they are not assumed to develop mysterious blind spots about falconry or mining engineering, Why assume they will develop a blind spot about morality? Oh yes...because you have assumed from the outset that the UF must be walled off from self improvement...in order to be safe. You are only facing that particular failure mode because of something you decided on to be safe.

If we don't solve the problem of Friendly AI ourselves, we won't know what trajectory of self-modification to set the AI on in order for it to increasingly approximate Friendliness

The average person manages to solve the problem of being moral themselves, in a good-enough way. You keep assuming, without explanation that an AI can't do the same.

We can't tell it to increasingly approximate something that we ourselves cannot formalize and cannot point to clear empirical evidence of.

Why isn't having a formalisation of morality a problem with humans? We know how humans incrementally improve as moral reasoners: it's called the Kohlberg hierarchy.

We don't understand human morality or desire, so we can't design a Morality Test or Wish Test that we know for sure will reward all and only the good or desirable actions.

We don't have perfect morality tests. We do have morality tests. Fail them and you get pilloried in the media or sent to jail.

We can make the AI increasingly approximate something, sure, but how do we know in advance that that something is something we'd like?

Again, you are assuming that morality is something highly local and arbitrary. If it works like arithmetic, that is if it is an expansion of some basic principles, then we can tell that is heading in the right direction by identifying that its reasoning is in line with those principles.

Comment author: CynicalOptimist 10 November 2016 12:34:15PM 0 points [-]

I think that RobbBB has already done a great job of responding to this, but I'd like to have a try at it too. I'd like to explore the math/morality analogy a bit more. I think I can make a better comparison.

Math is an enormous field of study. Even if we limited our concept of "math" to drawing graphs of mathematical functions, we would still have an enormous range of different kinds of functions: Hyperbolic, exponential, polynomial, all the trigonometric functions, etc. etc.

Instead of comparing math to morality, I think it's more illustrative to compare math to the wider topic of "value-driven-behaviour".

An intelligent creature could have all sorts of different values. Even within the realm of modern, western, democratic morality we still disagree about whether it is just and propper to execute murderers. We disagree about the extent to which a state is obligated to protect its citizens and provide a safety net. We disagree about the importance of honesty, of freedom vs. safety, freedom of speech vs. protection from hate speech.

If you look at the wider world, and at cultures through history, you'll find a much wider range of moralities. People who thought it was not just permitted, but morally required that they enslave people, restrict the freedoms of their own families, and execute people for religious transgressions.

You might think that these are all better or worse approximations of the "one true morality", and that a superintelligence could work out what that true morality is. But we don't think so. We believe that these are different moralities. Fundamentally, these people have different values.

Then we can step further out, and look at the "insane" value systems that a person could hold. Perhaps we could believe that all people are so flawed that they must be killed. Or we could believe that no one should ever be allowed to die, and so we extend life indefinitely, even for people in agony. Or we might believe everyone should be lobotomised for our own safety.

And then there are the truly inhuman value systems: the paperclip maximisers, the prime pebble sorters, and the baby eaters. The idea is that a superintelligence could comprehend any and all of these. It would be able to optimise for any one of them, and foresee results and possible consequences for all of them. The question is: which one would it actually use?

A superintelligence might be able to understand all of human math and more besides, but we wouldn't build one to simply "do all of maths". We would build it with a particular goal and purpose in mind. For instance (to pick an arbitrary example) we might need it to create graphs of Hyperbolic functions. It's a bad example, I know. But I hope it serves to help make the point.

Likewise, we would want the intelligence to adopt a specific set of values. Perhaps we would want them to be modern, western, democratic liberal values.

I wouldn't expect a superintelligence to start generating Hyperbolic functions, despite the fact that it's smart enough to do so. The AI would have no reason to start doing that particular task. It might be smart enough to work out that that's what we want of course, but that doesn't mean it'll do it (unless we've already solved the problem of getting them to do "what humans want it to do".) If we want Hyperbolic functions, we'll have to program the machine with enough focus to make it do that.

Likewise, a computer could have any arbitrary utility function, any arbitrary set of values. We can't make sure that a computer has the "right" values unless we know how to clearly define the values we want.

With Hyperbolic functions, it's relatively easy to describe exactly, unambiguously, what we want. But morality is much harder to pin down.

Comment author: TheOtherDave 10 January 2014 06:19:21PM 1 point [-]

Sure, if all I care about is whether I get what I want, and I don't care about whether my wishes are fulfilled safely, then there's no problem.

Comment author: CynicalOptimist 09 November 2016 12:29:20AM 0 points [-]

But if you do care about your wishes being fulfilled safely, then safety will be one of the things that you want, and so you will get it.

So long as your preferences are coherent, stable, and self-consistent then you should be fine. If you care about something that's relevant to the wish then it will be incorporated into the wish. If you don't care about something then it may not be incorporated into the wish, but you shouldn't mind that: because it's something you don't care about.

Unfortunately, people's preferences often aren't coherent and stable. For instance an alcoholic may throw away a bottle of wine because they don't want to be tempted by it. Right now, they don't want their future selves to drink it. And yet they know that their future selves might have different priorities.

Is this the sort of thing you were concerned about?

Comment author: Eric_1 25 November 2007 04:33:23PM 1 point [-]

"Ultimately, most objects, man-made or not are 'black boxes.'"

OK, I see what you're getting at.

Three questions about black boxes:

1) Does the input have to be fully known/observable to constitute a black box? When investigating a population of neurons, we can give stimulus to these cells, but we cannot be sure that we are aware of all the inputs they are receiving. So we effectively do not entirely understand the input being given.

2) Does the output have to be fully known/observable to constitute a black box? When we measure the output of a population of neurons, we also cannot be sure of the totality of information being sent out, due to experimental limitations.

3) If one does not understand a system one uses, does that fact alone make that system a black box? In that case there are absolute black boxes, like the human mind, about which complete information *is not known*, and relative black boxes, like the car or TCP/IP, about which complete information *is not known to the current user*.

4) What degree of understanding is sufficient for something not to be called a black box?

Depending on how we answer these things, it will determine whether black box comes to mean:

1) Anything that is identifiable as a 'part', whose input and output is known but whose intermediate working/processing is not understood. 2) Anything that is identifiable as a 'part' whose input, output and/or processing is not understood. 3) Any 'part' that is not completely understood (i.e. presuming access to all information) 4) Anything that is not understood by the user at the time 5) Anything that is not FULLY understood by the user at the time.

We will quickly be in the realm where anything and everything on earth is considered to be a black box, if we take the latter definitions. So how can this word/metaphor be most profitably wielded?

Comment author: CynicalOptimist 08 November 2016 11:46:28PM *  0 points [-]

I like this style of reasoning.

Rather than taking some arbitrary definition of black boxes and then arguing about whether they apply, you've recognised that a phrase can be understood in many ways, and we should use the word in whatever way most helps us in this discussion. That's exactly the sort of rationality technique we should be learning.

A different way of thinking about it though, is that we can remove the confusing term altogether. Rather than defining the term "black box", we can try to remember why it was originally used, and look for another way to express the intended concept.

In this case, I'd say the point was: "Sometimes, we will use a tool expecting to get one result, and instead we will get a completely different, unexpected result. Often we can explain these results later. They may even have been predictable in advance, and yet they weren't predicted."

Computer programming is especially prime to this. The computer will faithfully execute the instructions that you gave it, but those instructions might not have the net result that you wanted.

Comment author: Eric_1 24 November 2007 04:52:21PM 0 points [-]

It seems contradictory to previous experience that humans should develop a technology with "black box" functionality, i.e. whose effects could not be foreseen and accurately controlled by the end-user. Technology has to be designed and it is designed with an effect/result in mind. It is then optimized so that the end user understands how to call forth this effect. So positing an effective equivalent of the mythological figure "Genie" in technological form ignores the optimization-for-use that would take place at each stage of developing an Outcome-Pump. The technology-falling-from-heaven which is the Outcome Pump demands that we reverse engineer the optimization of parameters which would have necessarily taken place if it had in fact developed as human technologies do.

I suppose the human mind has a very complex "ceteris paribus" function which holds all these background parameters at equal to their previous values, while not explicitly stating them, and the ironic-wish-fulfillment-Genie idea relates to the fulfillment of a wish while violating an unspoken ceteris paribus rule. Demolishing the building structure violates ceteris paribus more than the movements of a robot-retriever would in moving aside burning material to save the woman. Material displaced from building should be as nearly equal to the womans body weight as possible, inducing an explosion is a horrible violation of the objective, if the Pump could just be made to sense the proper (implied) parameters.

If the market forces of supply and demand continue to undergird technological progress (i.e. research and development and manufacturing), then the development of a sophisticated technology not-optimized-for-use is problematic: who pays for the second round of research implementation? Surely not the customer, when you give him an Outcome Pump whose every use could result in the death and destruction of his surrounding environs and family members. Granted this is an aside and maybe impertinent in the context of this discussion.

Comment author: CynicalOptimist 08 November 2016 11:30:09PM *  0 points [-]

"if the Pump could just be made to sense the proper (implied) parameters."

You're right, this would be an essential step. I'd say the main point of the post was to talk about the importance, and especially the difficulty, of achieving this.

Re optimisation for use: remember that this involves a certain amount of trial and error. In the case of dangerous technologies like explosives, firearms, or high speed vehicles, the process can often involve human beings dying, usually in the "error" part of trial and error.

If the technology in question was a super-intelligent AI, smart enough to fool us and engineer whatever outcome best matched its utility function? Then potentially we could find ourselves unable to fix the "error".

Please excuse the cheesy line, but sometimes you can't put the genie back in the bottle.

Re the workings of the human brain? I have to admit that I don't know the meaning of ceteris paribus, but I think that the brain mostly works by pattern recognition. In a "burning house" scenario, people would mostly contemplate the options that they thought were "normal" for the situation, or that they had previously imagined, heard about, or seen on TV

Generating a lot of different options and then comparing them for expected utility isn't the sort of thing that humans do naturally. It's the sort of behaviour that we have to be trained for, if you want us to apply it.

View more: Next