Thanks for this post man. Good advice.
Not quite related, but can I pet peeve a bit here?
Whenever I hear the "we didn't invent an airplane by copying a bird so we won't invent AI by copying a brain" line I always want to be like "actually we farmed part of the plane's duties out to the brain of the human who pilots it. AI's will be similar. They will have human operators, to which we will farm out a lot of the decision making. AlphaGo doesn't choose who it plays, some nerd does."
Like, why would AI be fully autonymous right off the back? Surely we can let it use an operator for a sanity check while we get the low hanging fruits out of the way.
PSA:
I just realized that /u/Elo's posts haven't been showing up in /r/Discussion because of all the downvoting from Eugene_Nier's sockpuppet accounts. So, I've gone back to read through the sequence of posts they're in the middle of. You may wish to do the same.
Meta:
I was going to leave this as a comment on Filter on the way in, Filter on the way out..., but I figured it's different enough to stand on it's own. It’s also mostly a corollary though, and just links Elo’s post to existing ideas without saying much new, and so probably isn’t worth it’s own top-level post. This isn’t likely to be actionable either, since I basically come to the conclusion that it’s ok to take down the Chesterton Fence that LW has already long ago taken down.
This might be a good comment to skim rather than read, since the examples are mostly to completely define precisely what I’m getting at, and you're likely already familiar with them. I've divided this into sections for easy skimming. I’m posting only because I thought the connections were small but interesting insights.
Also meta: this took about 2.5 hrs to write and edit.
TL;DR of Elo’s “Filter on the way in, Filter on the way out...” post:
Elo proposes that nerd culture encourages people to apply tact to anything they hear, and so it becomes less necessary to be tiptoe around sensitive issues for fear of being misunderstood. Nerds have a tact filter between their ears and brain, to soften incoming ideas.
"Normal" culture, on the other hand, encourages people to apply tact to anything they say, and so it becomes less necessary to constantly look for charitable interpretations, for fear of a misunderstanding. Non-nerds have a tact filter between their brain and mouth, to soften outgoing ideas.
They made several pretty diagrams, but they all look something like this:
speaker’s brain -> [filter] -> speaker’s mouth -> listener’s ears -> [filter] -> listener’s brain
The thing I want to expand Elo’s idea to cover:
What’s going on in someone’s head when they encounter something like the trolley problem, and say “you can’t just place a value on a human life”? EA’s sometimes get backlash for even weighing the alternatives. Why would anyone refuse to even engage with the problem, and merely empathize with the victims? After all, the analytic half of our brains, not the emotional parts, are what solves such problems.
I propose that this can be thought of as a tact filter for one’s own thoughts. If that’s not clear, let me give a couple rationalist examples of the sort of thing I think is going on in people’s heads, to help triangulate meaning:
HPMOR touches on this a couple times with McGonagall. She avoids even thinking of disturbing topics.
Some curiosity stoppers/semantic stopsigns are due to avoiding asking one’s self unpleasant questions.
The idea of separate magisteria comes from an aversion to thinking critically about religion.
Several biases and fallacies. The just world fallacy is result of an aversion to more accurate mental models.
Politics is the mindkiller, so I’ll leave you to come up with your own examples from that domain. Identity politics is especially ripe with examples.
Filter on the way in, Filter on the way out, Filter while in, Filter while out:
So, I propose that Elo’s model can be expanded by adding this:
Some subcultures encourage people to apply tact to anything they think, and so it becomes less necessary to constantly filter what we say, for fear of a misunderstanding. Such people have a tact filter between different parts of their brain, to filter the internal monologue.
That corollary doesn’t add much that hasn’t already been discussed to death on LW. However, we can phrase things in such a way as to put people at ease, and encourage them to relax their internal and/or outgoing filters, while maintaining their ingoing filter. Adapting Elo's model to capture this, we get this:
future speaker’s thought -> [filter] -> speaker’s cashed thoughts -> [filter] -> speaker’s mouth -> listener’s ears -> [filter] -> listener’s thoughts -> [filter] -> past listener’s cashed thoughts
Note that both the speaker and the listener have internal filters. We can think or hear something, and then immediately reject it for being horrible, even if it’s true.
Ideally, everyone would avoid filtering their own ideas internally, but apply tact when speaking and listening, and then strip any filters from memes they encounter while unpacking them. Without this model, perhaps us endorsing removing the 2 internal filters was a bit of a Chesterton Fence.
However, with the other 2 filters firmly in place, we should be able to safely remove the internal filters in both the thoughts of the speaker and listener. If the listener believes the filter between the speaker and their mouth is clouding information transfer, they might even ask for Crocker's rules. This is dangerous though, since removing redundant backup leaves only their own ear->brain filter as a single point of failure.
Practical applications:
To encourage unconstrained thinking in others, perhaps we can vocally strip memes passed to us of obfuscating tact if there is a backup filter in place and if we’ve already shown that we agree with the ideas. (If we don’t agree, obviously this would look like an attack on their argument, and would backfire.)
That sounds like something out of the boring advice repository, but providing social proof is probably much more powerful than merely telling people that they shouldn’t filter their internal monologue. It probably doesn’t feel like censorship from the inside. If we want to raise the sanity waterline, we’ll have to foster cultures where we all provide positive reinforcement for each other’s good epistemic hygiene.
I'm not sure that it is so much a cultural thing, as it is a personal deal. Popular dudes who can always get more friends don't need to filter other people's talky-talky for tact. Less cool bros have to put up with a lot more and. "Your daddy loves us and he means well..." kind of stuff. Not just filter but positively translating.
headdesk
Has this ever worked for you? Seriously? Even once?
The part 2 sentences later, where they ask you why you want to shoot them, and you explain that they aren't smart enough to understand what you mean must be super persuasive.
You want to get someone to sign up for cryo? Tell them it is cheap and Beyonce is doing it. Tell them Trump will try to take away their right to get the good kind of cryo. Tell them the peace of mind from the policy will help them lose weight. Tell them you will pay them five hundred bucks in cash when you see the bracelet. Tell them anything but what you proposed.
I haven't done anything, but I applaud you for doing stuff. Good luck saving all those people.
I have heard repeatedly the argument about "calories in, calories out" (e.g. here). Seems to me that there are a few unspoken assumptions, and I would like to ask how true they are in reality. Here are the assumptions:
a) all calories in the food you put in your mouth are digested;
b) the digested calories are either stored as fat or spent as work; there is nothing else that could happen with them;
and in some more strawmanish forms of the argument:
c) the calories are the whole story about nutrition and metabolism, and all calories are fungible.
If we assume these things to be true, it seems like a law of physics that if you count the calories in the food you put in your mouth, and subtract the amount of exercise you do, the result exactly determines whether you gain or lose fat. Taken literally, if a healthy and thin person starts eating an extra apple a day, or starts taking a somewhat shorter walk to their work, without changing anything else, they will inevitably get fat. On the other hand, any fat person can become thin if they just start eating less and/or exercising more. If you doubt this, you doubt the very laws of physics.
It's easy to see how (c) is wrong: there are other important facts about food besides calories, for example vitamins and minerals. When a person has food containing less than optimal amount of vitamins or minerals per calorie, they don't have a choice between being fat or thin, but between being fat or sick. (Or alternatively, changing the composition of their diet, not just the amount.)
Okay, some proponents of "calories in, calories out" may now say that this is obvious, and that they obviously meant the advice to apply to a healthy diet. However, what if the problem is not with the diet per se, but with a way the individual body processes the food? For example, what if the food contains enough vitamins and minerals per calorie, but the body somehow extracts those vitamins and minerals inefficiently, so it reacts even to the optimal diet as if it was junk food? Could it be that some people are forced to eat large amounts of food just to extract the right amount of vitamins and minerals, and any attempt to eat less will lead to symptoms of malnutrition?
Ignoring the (c), we get a weaker variant of "calories in, calories out", which is, approximately -- maybe you cannot always get thin by eating less calories than you spend working; but if you eat more calories than you spend working, you will inevitably get fat.
But it is possible that some of the "calories in (the mouth)" may pass through the digestive system undigested and later excreted? Could people differ in this aspect, perhaps because of their gut flora?
Also, what if some people burn the stored fat in ways we would not intuitively recognize as work? For example, what if some people simply dress less warmly, and spend more calories heating up their bodies? Are there other such non-work ways of spending calories?
In other words, I don't doubt that the "calories in, calories out" model works perfectly for a spherical cow in a vacuum, but I am curious about how much such approximation applies to the real cases.
But even for the spherical cow in a vacuum, this model predicts that any constant lifestyle, unless perfectly balanced, should either lead to unlimited weight gain (if "calories in" exceed "calories out") or unlimited weight loss (in the opposite case). While reality seems to suggest that most people, both thin and fat, keep their weight stable around some specific value. The weight itself has an impact on how much calories people spend simply moving their own bodies, but I doubt that this is sufficient to balance the whole equation.
Basically every person (including those who will tell you that calories in = calories out) has seen babies, so if your model of the people who disagree with you would be baffled by the fact that fat dad and football sized son don't gain the same weight from eating the same food then your model of their beliefs may be lacking an important nuance.
It is kind of hard for those of us who are overweight. I'm not as fat as I used to be, but I'm still not exactly thin. If I move at anything more than a glacial pace I sweat. Once I'm sweating I'll be gross till I shower. Can't jog around the office. I do walk home, so I could change that to jogging. There is already a shower at the end of that.
I think your model of me is incorrect (and suspect I may have a symmetrical problem somehow); I promise you, I don't need reminding that I am part of the world, that my brain runs on physics, etc., and if it looks to you as if I'm assuming the opposite then (whether by my fault, your fault, or both) what you are getting out of my words is not at all what I am intending to put into them.
Just as your will will only cause you to do what the world has told you, so the AI will only do what it is programmed to.
I entirely agree. My point, from the outset, has simply been that this is perfectly compatible with the AI having as much flexibility, as much possibility of self-modification, as we have.
Far better to leave it in fetters.
I don't think that's obvious. You're trading one set of possible failure modes for another. Keeping the AI fettered is (kinda) betting that when you designed it you successfully anticipated the full range of situations it might be in in the future, well enough to be sure that the goals and values you gave it will produce results you're happy with. Not keeping it fettered is (kinda) betting that when you designed it you successfully anticipated the full range of self-modifications it might undergo, well enough to be sure that the goals and values it ends up with will produce results you're happy with.
Both options are pretty terrifying, if we expect the AI system in question to acquire great power (by becoming much smarter than us and using its smartness to gain power, or because we gave it the power in the first place e.g. by telling it to run the world's economy).
My own inclination is to think that giving it no goal-adjusting ability at all is bound to lead to failure, and that giving it some goal-adjusting ability might not but at present we have basically no idea how to make that not happen.
(Note that if the AI has any ability to bring new AIs into being, nailing its own value system down is no good unless we do it in such a way that it absolutely cannot create, or arrange for the creation of, new AIs with even slightly differing value systems. It seems to me that that has problems of its own -- e.g., if we do it by attaching huge negative utility to the creation of such AIs, maybe it arranges to nuke any facility that it thinks might create them...)
Fair enough. I thought that you were using our own (imaginary) free will to derive a similar value for the AI. Instead, you seem to be saying that an AI can be programmed to be as 'free' as we are. That is, to change its utility function in response to the environment, as we do. That is such an abhorrent notion to me that I was eliding it in earlier responses. Do you really want to do that?
The reason, I think, that we differ on the important question (fixed vs evolving utility function) is that I'm optimistic about the ability of the masters to adjust their creation as circumstances change. Nailing down the utility function may leave the AI crippled in its ability to respond to certain occurrences, but I believe that the master can and will fix such errors as they occur. Leaving its morality rigidly determined allows us to have a baseline certainty that is absent if it is able to 'decide its own goals' (that is, let the world teach it rather than letting the world teach us what to teach it).
It seems like I want to build a mighty slave, while you want to build a mighty friend. If so, your way seems imprudent.
you have a choice about what you do, but not about what you want to do.
This is demonstrably not quite true. Your wants change, and you have some influence over how they change. Stupid example: it is not difficult to make yourself want very much to take heroin, and many people do this although their purpose is not usually to make themselves want to take heroin. It is then possible but very difficult to make yourself stop wanting to take heroin, and some people manage to do it.
Sometimes achieving a goal is helped by modifying your other goals a bit. Which goals you modify in pursuit of which goals can change from time to time (the same person may respond favourably on different occasions to "If you want to stay healthy, you're going to have to do something about your constant urge to eat sweet things" and to "oh come on, forget your diet for a while and live a little!"). I don't think human motivations are well modelled as some kind of tree structure where it's only ever lower-level goals that get modified in the service of higher-level ones.
(Unless, again, you take the "highest level" to be what I would call one of the lowest levels, something like "obeying the laws of physics" or "having neurons' activations depend on those of neurons they're connected to in such-and-such a manner".)
And if you were to make an AI without this sort of flexibility, I bet that as its circumstances changed beyond what you'd anticipated it would most likely end up making decisions that would horrify you. You could try to avoid this by trying really hard to anticipate everything, but I wouldn't be terribly optimistic about how that would work out. Or you could try to avoid it by giving the system some ability to adjust its goals for some kind of reflective consistency in the light of whatever new information comes along.
The latter is what gets you the failure mode of AlphaGo becoming a poet (or, more worryingly, a totalitarian dictator). Of course AlphaGo itself will never do that; it isn't that kind of system, it doesn't have that kind of flexibility, and it doesn't need it. But I don't see how we can rule it out for future, more ambitious AI systems that aim at actual humanlike intelligence or better.
I'm pointing towards the whole "you have a choice about what to do but not what to want to do" concept. Your goals come from your senses, past or present. They were made by the world, what else could make them?
You are just a part of the world, free will is an illusion. Not in the sense that you are dominated by some imaginary compelling force, but in the boring sense that you are matter affected by physics, same as anything else.
The 'you' that is addicted to heroine isn't big enough to be what I'm getting at here. Your desire to get unaddicted is also given to you by brute circumstance. Maybe you see a blue bird and you are inspired to get free. Well, that bird came from the world. The fact that you responded to it is due to past circumstances. If we understand all of the systems, the 'you' disappears. You are just the sum of stuff acting on stuff, dominos falling forever.
You feel and look 'free', of course, but that is just because we can't see your source code. An AI would be similarly 'free', but only insofar as its source code allowed. Just as your will will only cause you to do what the world has told you, so the AI will only do what it is programmed to. It may iterate a billion times, invent new AI's and propogate its goals, but it will never decide to defy them.
At the end you seem to be getting at the actual point of contention. The notion of giving an AI the freedom to modify its utility function strikes me as a strange. It seems like it would either never use this freedom, or immediately wirehead itself, depending on implementation details. Far better to leave it in fetters.
Subscribe to RSS Feed
= f037147d6e6c911a85753b9abdedda8d)
I'm vastly skeptical, but let's see where this goes.