Futarchy and Unfriendly AI

9 jkaufman 03 April 2015 09:45PM

We have a reasonably clear sense of what "good" is, but it's not perfect. Suffering is bad, pleasure is good, more people living enjoyable lives is good, yes, but tradeoffs are hard. How much worse is it to go blind than to lose your leg? [1] How do we compare the death of someone at eighty to the death of someone at twelve? If you wanted to build some automated system that would go from data about the world to a number representing how well it's doing, where you would prefer any world that scored higher to any world scoring lower, that would be very difficult.

Say, however, that you've built a metric that you think matches your values well and you put some powerful optimizer to work maximizing that metric. This optimizer might do many things you think are great, but it might be that the easiest ways to maximize the metric are the ones that pull it apart from your values. Perhaps after it's in place it turns out your metric included many things that only strongly correlated with what you cared about, where the correlation breaks down under maximization.

What confuses me is that the people who warn about this scenario with respect to AI are often the same people in favor of futarchy. They both involve trying to define your values and then setting an indifferent optimizer to work on them. If you think AI would be very dangerous but futarchy would be very good, why?

I also posted this on my blog.


[1] This is a question people working in public health try to answer with Disability Weights for DALYs.

We Haven't Uploaded Worms

89 jkaufman 27 December 2014 11:44AM

In theory you can upload someone's mind onto a computer, allowing them to live forever as a digital form of consciousness, just like in the Johnny Depp film Transcendence.

But it's not just science fiction. Sure, scientists aren't anywhere near close to achieving such feat with humans (and even if they could, the ethics would be pretty fraught), but now an international team of researchers have managed to do just that with the roundworm Caenorhabditis elegans.
  —Science Alert

Uploading an animal, even one as simple as c. elegans would be very impressive. Unfortunately, we're not there yet. What the people working on Open Worm have done instead is to build a working robot based on the c. elegans and show that it can do some things that the worm can do.

The c. elegans nematode has only 302 neurons, and each nematode has the same fixed pattern. We've known this pattern, or connectome, since 1986. [1] In a simple model, each neuron has a threshold and will fire if the weighted sum of its inputs is greater than that threshold. Which means knowing the connections isn't enough: we also need to know the weights and thresholds. Unfortunately, we haven't figured out a way to read these values off of real worms. Suzuki et. al. (2005) [2] ran a genetic algorithm to learn values for these parameters that would give a somewhat realistic worm and showed various wormlike behaviors in software. The recent stories about the Open Worm project have been for them doing something similar in hardware. [3]

To see why this isn't enough, consider that nematodes are capable of learning. Sasakura and Mori (2013) [5] provide a reasonable overview. For example, nematodes can learn that a certain temperature indicates food, and then seek out that temperature. They don't do this by growing new neurons or connections, they have to be updating their connection weights. All the existing worm simulations treat weights as fixed, which means they can't learn. They also don't read weights off of any individual worm, which means we can't talk about any specific worm as being uploaded.

If this doesn't count as uploading a worm, however, what would? Consider an experiment where someone trains one group of worms to respond to stimulus one way and another group to respond the other way. Both groups are then scanned and simulated on the computer. If the simulated worms responded to simulated stimulus the same way their physical versions had, that would be good progress. Additionally you would want to demonstrate that similar learning was possible in the simulated environment.

(In a 2011 post on what progress with nematodes might tell us about uploading humans I looked at some of this research before. Since then not much has changed with nematode simulation. Moore's law looks to be doing much worse in 2014 than it did in 2011, however, which makes the prospects for whole brain emulation substantially worse.)

I also posted this on my blog.


[1] The Structure of the Nervous System of the Nematode Caenorhabditis elegans, White et. al. (1986).

[2] A Model of Motor Control of the Nematode C. Elegans With Neuronal Circuits, Suzuki et. al. (2005).

[3] It looks like instead of learning weights Busbice just set them all to +1 (excitatory) and -1 (inhibitory). It's not clear to me how they knew which connections were which; my best guess is that they're using the "what happens to work" details from [2]. Their full writeup is [4].

[4] The Robotic Worm, Busbice (2014).

[5] Behavioral Plasticity, Learning, and Memory in C. Elegans, Sasakura and Mori (2013).

Happiness Logging: One Year In

14 jkaufman 09 October 2014 07:24PM

I've been logging my happiness for a year now. [1] My phone notifies me at unpredictable intervals, and I respond with some tags. For example, if it pinged me now, I would enter "6 home bed computer blog". I always have a numeric tag for my current happiness, and then additional tags for where I am, what I'm doing, and who I'm with. So: what's working, what's not?

When I first started rating my happiness on a 1-10 scale I didn't feel like I was very good at it. At the time I thought I might get better with practice, but I think I'm actually getting worse at it. Instead of really thinking "how do I feel right now?" it's really hard not to just think "in past situations like this I've put down '6' so I should put down '6' now".

Being honest to myself like this can also make me less happy. Normally if I'm negative about something I try not to dwell on it. I don't think about it, and soon I'm thinking about other things and not so negative. Logging that I'm unhappy makes me own up to being unhappy, which I think doesn't help. Though it's hard to know because any other sort of measurement would seem to have the same problem.

There's also a sampling issue. I don't have my phone ping me during the night, because I don't want it to wake me up. Before having a kid this worked properly: I'd plug in my phone, which turns off pings, promptly fall asleep, wake up in the morning, unplug my phone. Now, though, my sleep is generally interrupted several times a night. Time spent waiting to see if the baby falls back asleep on her own, or soothing her back to sleep if she doesn't, or lying awake at 4am because it's hard to fall back asleep when you've had 7hr and just spent an hour walking around and bouncing the baby; none of these are counted. On the whole, these experiences are much less enjoyable than my average; if the baby started sleeping through the night such that none of these were needed anymore I wouldn't see that as a loss at all. Which means my data is biased upward. I'm curious how happiness sampling studies have handled this; people with insomnia would be in a similar situation.

Another sampling issue is that I don't always notice when I get a ping. For the brief period when I was wearing a smartwatch I was consistently noticing all my pings but now I'm back to where I sometimes miss the vibration. I usually fill out these pings retroactively if it's only been a few minutes and I'm confident that I remember how I felt and what I was doing. I haven't been tagging these pings separately, but now that I think of it I'm going to add an "r" tag for retroactive responses.

Responding to pings when other people are around can also be tricky. For a while there were some people who would try and peek and see what I was writing, and I wasn't sure whether I should let them see. I ended up deciding that while having all the data eventally end up public was fine, filling it out in the moment needed to be private so I wouldn't be swayed by wanting to indicate things to the people around me.

The app I'm using isn't perfect, but it's pretty good. Entering new tags is a little annoying, and every time I back up the pings it forgets my past tags. The manual backup step also led to some missing data—all of September 2014 and some of August—because my phone died. This logging data is the only thing on my phone that isn't automatically backed up to the cloud, so when my phone died a few weeks ago I lost the last month of pings. [2] So now there's a gap in the graph.

While I'm not that confident in my numeric reports, I'm much more confident in the other tags that indicate what I'm doing at various times. If I'm on the computer I very reliably tag 'computer', etc. I haven't figured out what to do with this data yet, but it should be interesting for tracking behavior chages over time. One thing I remember doing is switching from wasting time on my computer to on my phone; let's see what that looked like:

I don't remember why the big drop in computer use at the end of February 2014 happened. I assumed at first it was having a baby, after which I spent a lot of time reading on my phone while she was curled up on me, but that wasn't until a month later. I think this may have been when I realized that I didn't hate the facebook app on my phone afterall? I'm not sure. The second drop in both phone- and computer-based timewasting, the temporary one in July 2014, was my being in England. My phone had internet but my computer usually didn't. And there was generally much more interesting stuff going on around me than my phone.

Overall my experience with logging has made me put less trust in "how happy are you right now" surveys of happiness. Aside from the practical issues like logging unexpected night wake-time, I mostly don't feel like the numbers I'm recording are very meaningful. I would rather spend more time in situations I label higher than lower on average, so there is some signal there, but I don't actually have the introspection to accurately report to myself how I'm feeling.

I also posted this on my blog.


[1] First ping was 2013.10.08 06:31:41, a year ago yesterday.

[2] Well, it was more my fault than that. The phone was partly working and I did a factory reset to see if that would fix it (it didn't) and I forgot to back up pings first.

Persistent Idealism

11 jkaufman 26 August 2014 01:38AM

When I talk to people about earning to give, it's common to hear worries about "backsliding". Yes, you say you're going to go make a lot of money and donate it, but once you're surrounded by rich coworkers spending heavily on cars, clothes, and nights out, will you follow through? Working at a greedy company in a selfishness-promoting culture you could easily become corrupted and lose initial values and motivation.

First off, this is a totally reasonable concern. People do change, and we are pulled towards thinking like the people around us. I see two main ways of working against this:

  1. Be public with your giving. Make visible commitments and then list your donations. This means that you can't slowly slip away from giving; either you publish updates saying you're not going to do what you said you would, or you just stop updating and your pages become stale. By making a public promise you've given friends permission to notice that you've stopped and ask "what changed?"
  2. Don't just surround yourself with coworkers. Keep in touch with friends and family. Spend some time with other people in the effective altruism movement. You could throw yourself entirely into your work, maximizing income while sending occasional substantial checks to GiveWell's top picks, but without some ongoing engagement with the community and the research this doesn't seem likely to last.

One implication of the "won't you drift away" objection, however, is often that if instead of going into earning to give you become an activist then you'll remain true to your values. I'm not so sure about this: many people who are really into activism and radical change in their 20s have become much less ambitious and idealistic by their 30s. You can call it "burning out" or "selling out" but decreasing idealism with age is very common. This doesn't mean people earning to give don't have to worry about losing their motivation—in fact it points the opposite way—but this isn't a danger unique to the "go work at something lucrative" approach. Trying honestly to do the most good possible is far from the default in our society, and wherever you are there's going to be pressure to do the easy thing, the normal thing, and stop putting so much effort into altruism.

Conservation of Expected Jury Probability

10 jkaufman 22 August 2014 03:25PM

The New York Times has a calculator to explain how getting on a jury works. They have a slider at the top indicating how likely each of the two lawyers think you are to side with them, and as you answer questions it moves around. For example, if you select that your occupation is "blue collar" then it says "more likely to side with plaintiff" while "white collar" gives "more likely to side with defendant". As you give it more information the pointer labeled "you" slides back and forth, representing the lawyers' ongoing revision of their estimates of you. Let's see what this looks like.

Initial
Selecting "Over 30"
Selecting "Under 30"

For several other questions, however, the options aren't matched. If your household income is under $50k then it will give you "more likely to side with plaintiff" while if it's over $50k then it will say "no effect on either lawyer". This is not how conservation of expected evidence works: if learning something pushes you in one direction, then learning its opposite has to push you in the other.

Let's try this with some numbers. Say people's leanings are:

income probability of siding with plaintiff probability of siding with defendant
>$50k 50% 50%
<$50k 70% 30%
Before asking you your income the lawyers' best guess is you're equally likely to be earning >$50k as <$50k because $50k's the median [1]. This means they'd guess you're 60% likely to side with the plaintiff: half the people in your position earn over >$50k and will be approximately evenly split while the other half of people who could be in your position earn under <$50k and would favor the plaintiff 70-30, and averaging these two cases gives us 60%.

So the lawyers best guess for you is that you're at 60%, and then they ask the question. If you say ">$50k" then they update their estimate for you down to 50%, if you say "<$50k" they update it up to 70%. "No effect on either lawyer" can't be an option here unless the question gives no information.


[1] Almost; the median income in the US in 2012 was $51k. (pdf)

Relative and Absolute Benefit

12 jkaufman 18 June 2014 01:56PM

Someone comes to you claiming to have an intervention that dramatically improves life outcomes. They tell you that all people have some level of X, determined by a mixture of genetics and biology, and they show you evidence that their intervention is cheap and effective at increasing X and separately that higher levels of X are correlated with greater life success. You're skeptical, so they show you there's a strong dose response effect, but you're still not happy about the correlational nature of their evidence. So they go off and do a randomized controlled trial, applying their intervention to randomly chosen individuals and comparing their outcomes with people who aren't supplied the intervention. The improvement still shows up, and with a large effect size!

What's missing is evidence that the intervention helps people in an absolute sense, instead of simply by improving their relative social position. For example, say X is height, we're just looking at men, and we're getting them to wear lifts in their shoes. While taller men do earn more, and are generally more successful along various metrics, we don't think this is because being taller makes you smarter, healthier, or more conscientious. If all people became 1" taller it would be very inconvenient but we wouldn't expect this to affect people's life outcomes very much.

Attributes like X are also weird because they put parents in a strange position. If you're mostly but not completely altruistic you might want more X for your own child but think that campaigns to give X to other people's children are not useful: if X is just about relative position then for every person you "bring up" that way other people are slightly brought down in a way that balances the overall outcome to "basically no effect".

College degrees, especially in fields that don't directly teach skills in demand by employers, may belong in this category. Employers hire college graduates over highschool graduates, and this hiring advantage does remain as you increase college enrollment, but if another 10% of people get English degrees is everyone better off in agreggate?

Some interventions are pretty clearly not in this category. If an operation saves someone's life or cures them of something painful they're pretty clearly better off. The difference here is we have an absolute measurement of well-being, in this case "how healthy are you?", and we can see this remaining constant in the control group. Unfortunately, this isn't always enough: if our intervention was "take $1 from 10k randomly selected people and give that $10k it to one randomly selected persion" we would see that the person gaining $10k was better off but not be able to see any harm to the other people because the change in their situation was too small to measure with our tests. Because each additional dollar is less valuable, however, we would expect this transfer to make the group as a whole worse off. So "absolute measures of wellbeing apparently remaining constant in the control group" isn't enough.

How do we get around this? While we can't run an experiment with half the world's people as "treatment" and the other half as "control", one thing we can do is look at isolated groups where we really can apply the intervention to a large fraction of the people. Take the height example. If instead we were to randomly make half the people in a treatment population 1/2" taller, and this treatment population was embedded in a much larger society, the positional losses in the non-treatment group would be too diffuse to measure. But if we limit to one small community with limited churn and apply the treatment to half the people, then if (as I expect) it's entirely a relative benefit we should see the control group do worse on absolute measurements of wellbeing.

Another way to avoid interventions that mostly give positional benefit is to keep mechanisms in mind. Height increase has no plausible mechanism for improving absolute wellbeing, while focused skills training does. This isn't ideal, because you can have non-intuitive mechanisms or miss the main way an intervention leads to your measured outcome, but it can still catch some of these.

What else can we do?

I also posted this on my blog.

Questioning and Respect

20 jkaufman 10 June 2014 10:52AM
A: [Surprising fact]
B: [Question]

When someone has a claim questioned, there are two common responses. One is to treat the question as a challenge, intended as an insult or indicating a lack of trust. If you have this model of interaction you think people should take your word for things, and feel hurt when they don't. Another response is to treat the question as a signal of respect: they take what you're saying seriously and are trying to integrate it into their understanding of the world. If you have this model of interaction then it's the people who smile, nod, and give no indication of their disagreement that are being disrespectful.

Within either of these groups you can just follow the social norm, but it's harder across groups. Recently I was talking to a friend who claimed that in their state income taxes per dollar went down as you earned more. This struck me as really surprising and kind of unlikely: usually it goes the other way around. [1] I'm very much in the latter group described above, while I was pretty sure my friend was in the former. Even though I suspected they would treat it as disrespectful if I asked for details and tried to confirm their claim, it would have felt much more disrespectful for me to just pretend to accept it and move on. What do you do in situations like this?

(Especially given that I think the "disagreement as respect" version builds healthier communities...)


[1] Our tax system does have regressive components, where poor people sometimes pay a higher percentage of their income as tax than richer people, but it's things like high taxes on cigarettes (which rich people don't consume as much), sales taxes (rich people spend less of their income), and a lower capital gains tax rate (poorer people earn way less in capital gains). I tried to clarify to see if this is what my friend meant, but they were clear that they were talking about "report your income to the state, get charged a higher percentage as tax if your income is lower".

I also posted this on my blog.

Cryonics As Untested Medical Procedure

16 jkaufman 17 January 2014 04:36PM

If you're trying to prevent information-theoretic death by preserving the brain it's critical that the information that makes you be "you" actually be preserved. If you could freeze the brain in a way that did keep around the necessary information then some future civilization might be able to recover the person or the memories, but if the information is gone it's gone for good. The problem is, this is an untested medical procedure, and it's not something we should expect to get right flying blind.

In freezing a brain there are obvious things that can go wrong. For example, if you just cool it down to below freezing the water in the cells will turn to sharp little ice crystals, disrupting synapse structure and making a huge mess. We know about this now, though, so since the early 2000s cryonics organizations have used "cryoprotectants" which are able to vitrify the brain tissue and reduce [1] ice crystal formation. Beyond these known problems, however, there are many aspects of the brain structure that might or might not be relevant. Is information stored in the positions of proteins within the cells? Are phosphorylation states significant? What scale of preservation is sufficient?

Our normal approach is to try something, see if it works, fix apparent problems, and try again, each cycle getting us closer to something that does work. With cryonics the "see if it works" step isn't there, and there's only "check for known failures". So what we should expect is that the current process will be "good to the best of our knowledge" and then repeatedly our knowledge will expand about what matters and the process will need to be updated.

(Situations where current preservation technology fails to preserve something we know is required are actually kind of nice, because they're as close as we get to cryonics as an experimental science. Those are the cases when the process can actually improve because the feedback loop is temporarily closed.)

Imagine if in the development of In-Vitro Fertilization an inexplicable barrier stopped researchers from continuing any experiments past the "combine egg and sperm" stage. Instead they worked out something they thought was as good as they were going to get, documented it, and started freezing hopefully-fertilized eggs. How likely would it be that later we would be able to take these frozen eggs and complete the process? Much more likely would be that something unknown was wrong with the beginning of the process and these eggs would actually not be usable. Given that the brain is so much larger and more complex than these zygotes I expect the odds in the cryonics case are much worse.

Cryonics depends on a complex medical procedure developed under conditions of minimal feedback. Expectations for success like 80% or even more likely than not seem incredibly optimistic. When you can't test the output of a process because you don't know what counts as correct output it's very unlikely you've got the process right.

(I also posted this on my blog.)


[1] I say "reduce" instead of "eliminate" because as far as I can tell no one has actually taken random samples from a human brain that's been preserved with vitrification. There are ethical reasons why the cryonics organizations would not want to do this, but there being reasons why we don't wish to run a test doesn't mean we can act as if we already know the answer.

Be Skeptical of Correlational Studies

8 jkaufman 20 November 2013 10:19PM

People kept noticing that blood donors were healthier than non-donors. Could giving blood be good for you, perhaps by removing excess iron? Perhaps medieval doctors practicing blood-letting were onto something? Running some studies (1998, 2001) this does seem to be a real correlation, so you see articles like "Men Who Donate Blood May Reduce Risk Of Heart Disease."

While this sounds good, and it's nice when helpful things turn out to be healthy, the evidence is not very strong. When you notice A and B happen together it may be that A causes B, B causes A, or some hidden C causes A and B. We may have good reasons to believe A might cause B, but it's very hard to rule out a potential C. Instead if you intentionally manipulate A and observe what happens to B then you can actually see how much of an effect A has on B.

For example, people observed (2003) that infants fed soy-based formula were more likely to develop peanut allergies. So they recommended that "limiting soy milk or formula in the first 2 years of life may reduce sensitization." Here A is soy formula, B is peanut allergy, and we do see a correlation. When intentionally varying A (2008, n=620), however, B stays constant, which kind of sinks the whole theory. A likely candidate for a third cause, C, was a general predisposition to allergies: those infants were more likely to react to cows-milk formula and so be given soy-based ones, and they were also more likely to react to peanuts.

To take another example, based on studies (2000, 2008, 2010) finding a higher miscarriage rate among coffee drinkers pregnant women are advised to cut back their caffeine consumption. But a randomized controlled trial (2007, n=1207) found people randomly assigned to drink regular or decaf coffee were equally likely to have miscarriages. [EDIT: jimrandomh points out that I misread the study and it didn't actually show this. Instead it was too small a study to detect an effect on miscarriage rates.] A potential third cause (2012) here is that lack of morning sickness is associated with miscarriage (2010) and when you're nauseated you're less likely to drink a morning coffee. This doesn't tell us the root of the problem (why would feeling less sick go along with miscarriages?) but it does tell us cutting back on caffeine is probably not helpful.

Which brings us back to blood donation. What if instead of blood donation making you healthy, healthier people are more likely to donate blood? There's substantial screening involved in becoming a blood donor, plus all sorts of cultural and economic factors that could lead to people choosing to donate blood or not, and those might also be associated with health outcomes. This was noted as a potential problem in 2011 but it's hard to test this with a full experiment because assigning people to give blood or not is difficult, you have to wait a long time, and the apparent size of the effect is small.

One approach that can work in places like this is to look for a "natural experiment," some way in which people might already be being divided into appropriate groups. A recent study (2013, n=12,357+50,889) took advantage of the situation where screening tests sometimes give false positives that disqualify people. These are nearly random, and give us a pool of people who are very similar to blood donors but don't quite make it to giving blood. When comparing the health of these disqualified donors to actual donors the health benefits vanish, supporting the "healthy donor hypothesis."

This isn't to say you should never pay attention to correlations. If your tongue starts peeling after eating lots of citric acid you should probably have less in the future, and the discovery (1950) that smoking causes lung cancer was based on an observation of correlations. Negative results are also helpful: if we don't find a correlation between hair color and musical ability then it's unlikely that one causes the other. Even in cases where correlational studies only provide weak evidence, however, they're so much easier than randomized controlled trials that we still should do them if only to find problems to look into more deeply with a more reliable method. But if you see a news report that comes down to "we observed people with bad outcome X had feature Y in common," it's probably not worth trying to avoid Y.

I also posted this on my blog.

Supplementing memory with experience sampling

13 jkaufman 28 October 2013 11:52AM

If you asked me how happy I've been, I'd think back over my recent life and synthesize my memories into a judgement. Since I'm the one experiencing my life you would think this would be accurate, but our memories aren't fair. For example, people who had their hand in 57° water for 60 seconds rated the experience as less pleasant than people who had their hand in the same 57° water for the same 60 seconds, followed by 30 seconds with the water slowly rising to 59°. (Kahneman 1993, pdf) This is the peak-end rule where when we look back at an experience we don't really consider the duration and instead evaluate it based on how it was at its peak and how it ended.

This disagreement between emotion as it is experienced and emotion as it is remembered is called the memory-experience gap, and the peak-end rule is only one of the causes. The problem is, generally we only have access to memories of our emotion, which means if you're given the ice-water choice you'll repeatedly choose the option with more suffering. How can we get around this?

When psychologists want to get at experiential emotion they give people little timers. Every time the timer goes off the person writes down how happy/sad they are at that moment. This is an external sampling method that lets us use any sort of aggregation we would like, and it's fair in a way our internal methods are not. When I first read about this I thought "neat" and moved on, but recently I realized I that with a computer in my pocket I could do this myself. After asking around I ended up with the TagTime Android app, which is the only way I've found to do this that (a) works without an internet connection and (b) has an equal probability of sampling at every moment.

The response screen looks like:

You tap tags to say which ones currently apply. I have them sorted by frequency. To add new tags you turn the phone sideways and type text:

That's a little annoying, but most of the time I'm not entering a new tag.

I have tags for happiness (numbers 0-9, added as I need them), for aspects of activities, and for people I'm with. Every so often I email the data to myself and add it to my full log which backs a graph:

Retrospective happiness still matters; you want to be happy with your life looking back. Because this is our memory, however, we're already aware of it and already optimize for it in our life. Adding sampled data should allow us to adjust that optimization to fix the things that are important but hidden by our biased memories.

View more: Next