Proper value learning through indifference

16 Stuart_Armstrong 19 June 2014 09:39AM

A putative new idea for AI control; index here.

Many designs for creating AGIs (such as Open-Cog) rely on the AGI deducing moral values as it develops. This is a form of value loading (or value learning), in which the AGI updates its values through various methods, generally including feedback from trusted human sources. This is very analogous to how human infants (approximately) integrate the values of their society.

The great challenge of this approach is that it relies upon an AGI which already has an interim system of values, being able and willing to correctly update this system. Generally speaking, humans are unwilling to easily update their values, and we would want our AGIs to be similar: values that are too unstable aren't values at all.

So the aim is to clearly separate the conditions under which values should be kept stable by the AGI, and conditions when they should be allowed to vary. This will generally be done by specifying criteria for the variation ("only when talking with Mr and Mrs Programmer"). But, as always with AGIs, unless we program those criteria perfectly (hint: we won't) the AGI will be motivated to interpret them differently from how we would expect. It will, as a natural consequence of its program, attempt to manipulate the value updating rules according to its current values.

How could it do that? A very powerful AGI could do the time honoured "take control of your reward channel", by either threatening humans to give it the moral answer it wants, or replacing humans with "humans" (constructs that pass the programmed requirements of being human, according to the AGI's programming, but aren't actually human in practice) willing to give it these answers. A weaker AGI could instead use social manipulation and leading questioning to achieve the morality it desires. Even more subtly, it could tweak its internal architecture and updating process so that it updates values in its preferred direction (even something as simple as choosing the order in which to process evidence). This will be hard to detect, as a smart AGI might have a much clearer impression of how its updating process will play out in practice than it programmers would.

The problems with value loading have been cast into the various "Cake or Death" problems. We have some idea what criteria we need for safe value loading, but as yet we have no candidates for such a system. This post will attempt to construct one.

continue reading »

Why I haven't signed up for cryonics

29 Swimmer963 12 January 2014 05:16AM

(OR)

How I'm now on the fence about whether to sign up for cryonics

I'm not currently signed up for cryonics. In my social circle, that makes me a bit of an oddity. I disagree with Eliezer Yudkowsky; heaven forbid. 

My true rejection is that I don't feel a visceral urge to sign up. When I query my brain on why, what I get is that I don't feel that upset about me personally dying. It would suck, sure. It would suck a lot. But it wouldn't suck infinitely. I've seen a lot of people die. It's sad and wasteful and upsetting, but not like a civilization collapsing. It's neutral from a point of pleasure vs suffering for the dead person, and negative for the family, but they cope with it and find a bit of meaning and move on. 

(I'm desensitized. I have to be, to stay sane in a job where I watch people die on a day to day basis. This is a bias; I'm just not convinced that it's a bias in a negative direction.)

I think the deeper cause behind my rejection may be that I don't have enough to protect. Individuals may be unique, but as an individual, I'm fairly replaceable. All the things I'm currently doing can and are being done by other people. I'm not the sole support person in anyone's life, and if I were, I would be trying really, really hard to fix the situation. Part of me is convinced that wanting to personally survive and thinking that I deserve to is selfish and un-virtuous or something. (EDIT: or that it's non-altruistic to value my life above the amount Givewell thinks is reasonable to save a life–about $5,000. My revealed preference is that I obviously value my life more than this.)  

However, I don't think cryonics is wrong, or bad. It has obvious upsides, like being the only chance an average citizen has right now to do something that might lead to them not permanently dying. I say "average citizen" because people working on biological life extension and immortality research are arguably doing something about not dying. 

When queried, my brain tells me that it's doing an expected-value calculation and the expected value of cryonics to me is is too low to justify the costs; it's unlikely to succeed and the only reason some people have positive expected value for it is that they're multiplying that tiny number by the huge, huge number that they place on the value of my life. And my number doesn't feel big enough to outweigh those odds at that price. 

Putting some numbers in that

If my brain thinks this is a matter of expected-value calculations, I ought to do one. With actual numbers, even if they're made-up, and actual multiplication.

So: my death feels bad, but not infinitely bad. Obvious thing to do: assign a monetary value. Through a variety of helpful thought experiments (how much would I pay to cure a fatal illness if I were the only person in the world with it and research wouldn't help anyone but me and I could otherwise donate the money to EA charities; does the awesomeness of 3 million dewormings outway the suckiness of my death; is my death more or less sucky than the destruction of a high-end MRI machine), I've converged on a subjective value for my life of about $1 million. Like, give or take a lot. 

Cryonics feels unlikely to work for me. I think the basic principle is sound, but if someone were to tell me that cryonics had been shown to work for a human, I would be surprised. That's not a number, though, so I took the final result of Steve Harris' calculations here (inspired by the Sagan-Drake equation). His optimistic number is a 0.15 chance of success, or 1 in 7; his pessimistic number is 0.0023, or less than 1/400. My brain thinks 15% is too high and 0.23% sounds reasonable, but I'll use his numbers for upper and lower bounds. 

I started out trying to calculate the expected cost by some convoluted method where I was going to estimate my expected chance of dying each year and repeatedly subtract it from one and multiply by the amount I'd pay each year to calculate how much I could expect pay in total. Benquo pointed out to me that calculation like this are usually done using perpetuities, or PV calculations, so I made one in Excel and plugged in some numbers, approximating the Alcor annual membership fee as $600. Assuming my own discount rate is somewhere between 2% and 5%, I ran two calculations with those numbers. For 2%, the total expected, time-discounted cost would be $30,000; for a 5% discount rate, $12,000.

Excel also lets you do calculations on perpetuities that aren't perpetual, so I plugged in 62 years, the time by which I'll have a 50% chance of dying according to this actuarial table. It didn't change the final results much; $11,417 for a 5% discount rate and $21,000 for the 2% discount rate. 

That's not including the life insurance payout you need to pay for the actual freezing. So, life insurance premiums. Benquo's plan is five years of $2200 a year and then nothing from then on, which apparently isn't uncommon among plans for young healthy people. I could probably get something as good or better; I'm younger. So, $11,00 for total life insurance premiums. If I went with permanent annual payment, I could do a perpetuity calculation instead. 

In short: around $40,000 total, rounding up.

What's my final number?

There are two numbers I can output. When I started this article, one of them seemed like the obvious end product, so I calculated that. When I went back to finish this article days later, I walked through all the calculations again while writing the actual paragraphs, did what seemed obvious, ended up with a different number, and realized I'd calculated a different thing. So I'm not sure which one is right, although I suspect they're symmetrical. 

If I multiply the value of my life by the success chance of cryonics, I get a number that represents (I think) the monetary value of cryonics to me, given my factual beliefs and values. It would go up if the value of my life to me went up, or if the chances of cryonics succeeding went up. I can compare it directly to the actual cost of cryonics.

I take $1 million and plug in either 0.15 or 0.00023, and I get $150,000 as an upper bound and $2300 as a lower bound, to compare to a total cost somewhere in the ballpark of $40,000.

If I take the price of cryonics and divide it by the chance of success (because if I sign up, I'm optimistically paying for 100 worlds of which I survive in 15, or pessimistically paying for 10,000 worlds in which I survive in 23), I get the total expected cost per my life being saved, which I can compare to the figure I place on the value of my life. It goes down if the cost of cryonics goes down or the chances of success go up. 

I plug in my numbers and get a lower bound of $267,000 and an upper bound of 17 million. 

In both those cases, the optimistic success estimates make it seem worthwhile and the pessimistic success estimates don't, and my personal estimate of cryonics succeeding falls closer to pessimism. But it's close. It's a lot closer than I thought it would be. 

Updating somewhat in favour that I'll end up signed up for cryonics. 

Fine-tuning and next steps

I could get better numbers for the value of my life to me. It's kind of squicky to think about, but that's a bad reason. I could ask other people about their numbers and compare what they're accomplishing in their lives to my own life. I could do more thought experiments to better acquaint my brain with how much value $1 million actually is, because scope insensitivity. I could do upper and lower bounds.

I could include the cost of organizations cheaper than Alcor as a lower bound; the info is all here and the calculation wouldn't be too nasty but I have work in 7 hours and need to get to bed. 

I could do my own version of the cryonics success equation, plugging in my own estimates. (Although I suspect this data is less informed and less valuable than what's already there).

I could ask what other people think. Thus, write this post. 

 

Vanilla and chocolate and preference judgements

29 Swimmer963 18 April 2011 10:14PM

Related to: 2-Place and 1-Place Worlds, Offence versus harm minimization.

Note: edited to replace 'value' with 'preference' as suggested by orthonormal.

Imagine you overheard two children having an argument over whether vanilla ice cream was better than chocolate ice cream. To you as an observer, it would be obvious that this kind of argument has no content. The children aren’t disputing anything measurable in the exterior world; they would agree with each other than chocolate ice cream contains elements from cocoa beans, and vanilla contains the extract from vanilla beans. Most adults wouldn’t have this argument at all, because it’s self-evident that if Mary says, truthfully, that she likes vanilla better than chocolate ice cream, and her husband Albert confesses that he prefers chocolate, then both of them are right. There is no contradiction; vanilla and chocolate are both neutral items until they come into contact with human tastebuds and human brains, at which point their positive or negative weighting is a fact about those brains, not about the substances themselves.

I think that this concept generalizes. Imagine that Mary and Albert are having an argument. Mary hates how Albert leaves papers spread across the kitchen table with empty coffee mugs on top. She wishes he would remember to put his clothes in the laundry basket instead of leaving them on the floor. She nags about it. Albert is helplessly baffled at why she thinks it’s such a big deal. He accuses her of being a nitpicker and a perfectionist.1

It’s hard to say that both of them are right, if each is hurting the other’s feelings. Again, though, their argument isn’t about anything factual. They both agree that there are papers on the desk and clothes on the floor, and that Albert is the one responsible. Where they diverge is the preference they place on this world-state.

continue reading »

Imperfect Levers

6 blogospheroid 17 November 2010 07:12PM

Related to : Lost Purposes, The importance of Goodhart's Law, Homo Hypocritus, SIAI's scary idea, Value Deathism

Summary : Whenever human beings seek to achieve goals far beyond their individual ability, they use leverage of some kind of another. Creating organizations to achieve goals is a very powerful source of leverage. However due to their nature, organizations are imperfect levers and the primary purpose is often lost. The inertia of present forms and processes dominates beyond its useful period. The present system of the world has many such imperfect organizations in power and any of them developing near-general intelligence without significant redesign of their utility function can be a source of existential risk/values risk.

continue reading »

Human values differ as much as values can differ

13 PhilGoetz 03 May 2010 07:35PM

George Hamilton's autobiography Don't Mind if I Do, and the very similar book by Bob Evans, The Kid Stays in the Picture, give a lot of insight into human nature and values.  For instance: What do people really want?  When people have the money and fame to travel around the world and do anything that they want, what do they do?  And what is it that they value most about the experience afterward?

You may argue that the extremely wealthy and famous don't represent the desires of ordinary humans.  I say the opposite: Non-wealthy, non-famous people, being more constrained by need and by social convention, and having no hope of ever attaining their desires, don't represent, or even allow themselves to acknowledge, the actual desires of humans.

I noticed a pattern in these books:  The men in them value social status primarily as an ends to a means; while the women value social status as an end in itself.

continue reading »

View more: Next