LESSWRONG
LW

All of Alerus's Comments + Replies

Is friendly AI "trivial" if the AI cannot rewire human values?

Thanks, I appreciate that. I have no problem with people disagreeing with me as confronting disagreement is how people (self included) grow. However, I was taken aback by the amount of down voting I received merely for disagreeing with people here and the fact that by merely choosing to respond to people's arguments it would effectively guarantee even more down votes—a system tied to how much you can participate in the community—made it more concerning to me. At least on the discussion board side of the site, I expected down voting to be reserved for post... (read more)

Is friendly AI "trivial" if the AI cannot rewire human values?

I appreciate that sentiment and I'll also add that I appreciate that even in your prior post you made an effort to suggest what you thought I was driving at.

Is friendly AI "trivial" if the AI cannot rewire human values?

When you think of a nation conquering another, the US and Japan is really what comes to your mind? Are you honestly having trouble grasping the distinction I was making? Because personally, I'm really not interested in continuing an irrelevant semantics debate.

Is friendly AI "trivial" if the AI cannot rewire human values?

Yes. I find it odd that this argument is derailed into demanding a discussion on the finer points of the semantics for "conquer."

1asr13y

A confession: Often when reading LW, I will notice some claim that seems wrong, and will respond, without reading the thread context carefully or checking back far enough to understand exactly why a given point came up. This results in an inadvertent tendency to nit-pick, for which I apologize.

Is friendly AI "trivial" if the AI cannot rewire human values?

Conquer is typically used to mean that you take over the government and run the country, not just win a war.

Americans did rule Japan by military force for about five years after WWII ended, demilitarized the nation, and left behind a sympathetic government of American design. However, if you do not wish to use the word 'conquer' to describe such a process, that is your prerogative.

4asr13y

The US ran the Japanese government for a period of several years. I think you mean to add something about "run the country without intent to transfer power back to the locals".

Is friendly AI "trivial" if the AI cannot rewire human values?

You're missing the point of talking about opposition. The AI doesn't want the outcome of opposition because that has terrible effects on the well-being its trying to maximize, unlike the nazis. This isn't about winning the war, its about the consequence of war on the measured well-being of people and other people who live in a society where an AI would kill people for what amounted to thought-crime.

And if the machine thinks that's the best way to make people happy (for whatever horrible reason--perhaps it is convinced by the Repugnant Conclusion and want

... (read more)

0[anonymous]13y

Are you trying to argue that, of all the humans who have done horrible horrible things, not a single one of them 1) modeled other humans at the average or above-average level that humans usually model each other, and 2) not a single one of them thought they were trying to make the world better off? Or are you trying to argue that not a single one of them ever caused an existential threat? My guess is that Lenin, for instance, had an above-average human-modeling mind and thought he was taking the first steps of bringing the whole world into a new prosperous era free from class war and imperialism. And he was wrong and thousands of people died. The kulaks opposed, in the form of destroying their farms. Lenin probably didn't "want the outcome of opposition," but that didn't stop him from thinking mass slaughter was the solution. The ability to model the well-being of humans and the "friendliness" of the AI are the same thing, provided the AI is programmed to maximize that well-being value. If your AI can't ever make mistakes like that, it's a friendly AI. If it can, it's trouble whether or not it can alter human values.

Is friendly AI "trivial" if the AI cannot rewire human values?

And as for the others? Or are you saying the AI trying to maximize well-being will try and succeed in effectively wiping out everyone and then condition future generations to have the desired easily maximized values? If so, this behavior is conditioned on the idea that the AI could be very confident in its ability to do so, because otherwise the chance of failing and the cost of war in expected value of human well-being would massively drop the expected value. I think you should also make clear what you think these values might end up being to which it will try to change human values to more easily maximize.

Is friendly AI "trivial" if the AI cannot rewire human values?

We also didn't conquer Japan, we won the war. Those are two different things.

4J_Taylor13y

What sort of things would be different if it were the case that America conquered Japan?

Is friendly AI "trivial" if the AI cannot rewire human values?

Considering there were many people in germany who vehemently disliked the nazis too (even ignoring jews), it seems like a pretty safe bet that after being conquered we wouldn't have suddenly viewed the nazis as great people. Why do you think otherwise?

0Swimmy13y

It's irrelevant. In a world of world-destroying technologies, a really bad thing happening for only a small amount of time is all it takes. The Cold War wasn't even close to the horror of Nazi domination (probably)--there were still lots of happy people with decent governments in the west! But everyone still could have died. What if Nazis had developed nuclear weapons? What if the AI self-reproduces, without self-improving, such that the Big Bad they're supporting has an army of super-efficient researchers and engineers? What if they had gotten to the hydrogen bomb around the same time the US had gotten the atom bomb? What if the Big Bad develops nanomachines, programmable to self-replicate and destroy anyone who opposes, or who passes a certain boundary? What kind of rebellion or assassination attempt could stand up to that? What if the humans want the AI, rather than another human, to be the leader of their Big Bad Movement, making their evil leader both easily replicable and immune to nanomachine destruction? Hell, what if the AI gets no more competent or powerful than a human? It can still, in the right position, start a thermonuclear war, just the same as high-level weapons techs or--hell!--technical errors can. Talented spies can make it to sufficiently high levels of government operation; why couldn't a machine do so? Or hire a spy to do so? And if the machine thinks that's the best way to make people happy (for whatever horrible reason--perhaps it is convinced by the Repugnant Conclusion and wants to maximize utility by wiping out all the immiserated Russians), we're still in trouble. However, if you're trying to describe an AI that is set to maximize human value, understands the complexities of the human mind, and won't make such mistakes, then you are describing friendly AI. Edit: In other words, I contend that the future threat of General AI is not in modifying humans with nanotechnology. It is in simple general ability to shape the world, even if th

1[anonymous]13y

For the Poles at least I fear it probable not many would be around say 20 years after victory.

6J_Taylor13y

The Japanese are rather fond of America, if I am not mistaken. I assume that it is not uncommon for the conquered to eventually grow satisfied with their conquerors.

Is friendly AI "trivial" if the AI cannot rewire human values?

Alerus13y-10

Lets lose the silly straw man arguments. I've already explicitly commented on how I don't believe the universe is fair and I think from that it should be obvious that I don't think really bad things can't happen. As far as moral progress goes, I think it happens in so far as its functional. Morals that lead to more successful societies win the competition and stick around. This often happens to move societies (not necessarily all people in the society) toward greater tolerance of peoples and less violence because oppressing people and allowing for more vio... (read more)

2[anonymous]13y

Consider what we think we know of the Nazis, are you sure about this one?

Is friendly AI "trivial" if the AI cannot rewire human values?

Hunter gathers is not something sustainable for a large scale complex society. It is not a position we would favor at all and I'm struggling to see why an AI would try to make us value that set up or how you think a society with technology strong enough to make strong AI would be able to be convinced to it.

Views of killing animals is more flexible as the reason humans object to it seems to come from a level of innate compassion for life itself. So I could see that value being more manipulatable as a result. I don't see what that has to do with a doomsday ... (read more)

Is friendly AI "trivial" if the AI cannot rewire human values?

Most of our changes to where we are now seem to be a result of what works better in complex society and I therefore have difficulty accepting that a society in the highly advanced state it would be in by the time we had strong AI could be pushed to a non-productive doomsday set of values. So lets make the argument more clear then: what set of values do you think the AI could push us to through persuasion that would be effectively what we consider a doomsday scenario while and allowed the AI to more easily satisfy well-being?

1JoshuaZ13y

I'm not sure why running a complex society needs to be a condition. If we all revert to hunter-gatherers then it still satisfies the essential conditions. That's a problem even if it isn't a doomsday scenario. Changes in animal welfare attitudes would probably make most of us unhappy, but having a society where torturing cute animals to death wouldn't hurt running a complex society. Similarly, allowing infanticide would work fine (heck for that one I can think of some pretty decent arguments why we should allow it). And while not a doomsday scenario, other scenarios that could suck a lot can also be constructed. For example, you could have a situation where we're all stuck with 1950s gender roles. That would be really bad but wouldn't destroy a complex society.

Is friendly AI "trivial" if the AI cannot rewire human values?

I feel like I've already responded to this argument multiple times in various other responses I've made. If you think there's something I've overlooked in those responses let me know, but this seems like a restatement of things I've already addressed. Also, if there is something in one of the responses I've made with which you disagree and have a different reason than what's been presented, let me know.

Is friendly AI "trivial" if the AI cannot rewire human values?

Alerus13y-40

There is a profound difference between being persuasive and manipulating all sensory input of a human. Is your argument not that it would try to persuade but that an AI would hook up all humans to a computer that controlled everything we perceived? If you want to make that your argument, I'm game for discussing it, but I think it should be made clear that this is a very different argument than an AI trying to change people's minds through persuasion. But lets discuss it. This suggestion of manipulating the senses of humans seems to imply a massive use of t... (read more)

Is friendly AI "trivial" if the AI cannot rewire human values?

Do you honestly think a universe the size of ours can only support six billion people before reaching the point of diminishing returns?

That's not my point. The point is people aren't going to be happy if an AI starts making people that are easier to maximize for the sole reason that they're easier to maximize. This will suggest a problem to us by the very virtue that we are discussing hypotheticals where doing so is considered a problem by us.

If you allow it to use the same tools but better, it will be enough. If you don't, it's likely to only try to

... (read more)

Is friendly AI "trivial" if the AI cannot rewire human values?

I'm not sure how common it is, but I at least consider total well-being to be important. The more people the better. The easier to make these people happy, the better.

You must also consider that well-being need not be defined as a positive function. Even if it wasn't, if the gain of adding a person was less than drop in well-being of others, it wouldn't be beneficial unless the AI was able to without prevention, create many more such people.

An AI is much better at persuasion than you are. It would pretty much be able to convince you whatever it wants.

... (read more)

0DanielLC13y

Do you honestly think a universe the size of ours can only support six billion people before reaching the point of diminishing returns? If you allow it to use the same tools but better, it will be enough. If you don't, it's likely to only try to do things humans would do, on the basis that they're not smart enough to do what they really want done.

Is friendly AI "trivial" if the AI cannot rewire human values?

I don't think I live in a fair universe at all. Regardless, acknowledging that we don't live in a fair universe doesn't support your claim that an AI would be able to radically change the values of all humans on earth without outrage from others through persuasion alone.

4faul_sname13y

Humans can radically change the values of humans through weak social pressure alone.

Is friendly AI "trivial" if the AI cannot rewire human values?

[+]Alerus13y-70

0DanArmak13y

This is completely wrong. Again, you give "persuasion" a very narrow scope. A baby is born without language, certainly without many opinions. It can be shaped by its environment ("persuasion") to be almost anything. Certainly, very few of the extremely diverse cultures and sub-cultures known from history have had any trouble raising their kids to behave like the adults, with only a small typical proportion of adolescents who left for another society. And these people had no understanding of how the brain really works - unlike what a superintelligent AI might have. Short version: it doesn't matter if people do think for themselves, because they only get to think about their sensory inputs and the AI can control those. Even a perfect Bayesian superintelligence would reach any conclusion you wished if you truly fully controlled all the information it ever received (as long as it had no priors of 0 or 1). If you end up in an environment controlled by an unfriendly AI, having read this site won't help you; it's game over. LW rationality skills work in some worlds, not in any possible world. How is this different from saying it's not going to let me take actions that cause extreme outrage? I hope you aren't planning on building an AI that has a sense of personal responsibility and doesn't care if humans subvert its utility function as long as it didn't cause them to do so.

Is friendly AI "trivial" if the AI cannot rewire human values?

[+]Alerus13y-70

1DanArmak13y

You underestimate "persuasion alone". Please consider that (by your definition) all human opinions on all subjects that have existed to date, have been created pretty much "by persuasion alone". Also, I don't want to live in a world where what I'm allowed to do or be is constrained by whether it provokes "outrages from large sects of humanity". There are plenty of sects (properly so called ;-) today that don't want me to continue existing even the way I already am, at least not without major brainwashing.

Is friendly AI "trivial" if the AI cannot rewire human values?

[+]Alerus13y-110

0JoshuaZ13y

The point that other humans fought against it doesn't change the central point that a very large fraction of humans could have a radically different effective morality. Moreover, if Germany hadn't gone to war but had instead done the exact same thing to its internal minorities, most of the world likely would not have intervened. If you don't like this example so much, one can just look at changing attitudes on many issues. See for example Pinker's book "The Better Angels of Our Nature" where he documents extreme changes in historical attitudes about the ethics of violence. For example, war is considered much more of a negative now than it was a few centuries ago. Going to war to gain territory is essentially unthinkable today. Similarly, attitudes about animals have changed a lot. In the Middle Ages, forms of entertainment that were considered normal included not just bear bating and similar actions but such crude behavior as lighting a cat on fire and seeing how long it took to die. Our moral attitudes are very much a product of our culture and how we are raised.

8Shmi13y

You seem to think that you are living in a magical fair universe. Just because nothing really really bad happened to you/us yet, doesn't mean it cannot.

Is friendly AI "trivial" if the AI cannot rewire human values?

[+]Alerus13y-60

Is friendly AI "trivial" if the AI cannot rewire human values?

This is not meant to be a resolution to FAI since you can't stop technology. It's meant to highlight whether the bad behavior of AI ends up being due to future technology to more directly change humanity. I'm asking the question because the answer to this may give insights as to how to tackle the problem.

Is friendly AI "trivial" if the AI cannot rewire human values?

[+]Alerus13y-80

6[anonymous]13y

Ah implicit belief in moral progress (or at least values and morality being preserved) and a universe where really bad things can't happen. I sometimes miss that. The Nazis where defeated not because they where destructive but because the Soviet Union, UK and the US where stronger. Speaking of Stalin, how does Communism fit into your model?

9drethelin13y

If you don't think world war 2 was a large scale effect, then I don't know what to say to you.

Is friendly AI "trivial" if the AI cannot rewire human values?

Can you give examples of what you think humans capability to rewire another's values are?

As for what justifies the assumption? Nothing. I'm not asking it specifically because I don't think AIs will have it, I'm asking it so we can identify where the real problem lies. That is, I'm curious whether the real problem in terms of AI behavior being bad is entirely specific to advances in biological technology to which eventual AIs will have access, but we don't today. If we can conclude this is the case, it might help us in understanding how to tackle the proble... (read more)

4Shmi13y

As plenty of religious figures have shown over the years, this capability is virtually unlimited. An AI would just have to start a new religion, or take over an existing one and adapt it to its liking.

Is friendly AI "trivial" if the AI cannot rewire human values?

Alerus13y-20

Thanks for the link, I'll give it a read.

Creating new people is potentially a problem, but I'm not entirely convinced. Let me elaborate. When you say:

What you need to do is program it so that it does what people would like if they were smarter, faster, and more the people they wish they were. In other words, use CEV.

Doesn't this kind of restate in different words that it models human well-being and tries to maximize that? I imagine when you phrased it this way that such an AI wouldn't create new people that are easier to maximize because that isn't wha... (read more)

0DanielLC13y

I'm not sure how common it is, but I at least consider total well-being to be important. The more people the better. The easier to make these people happy, the better. An AI is much better at persuasion than you are. It would pretty much be able to convince you whatever it wants. Our best neuroscientists are still mere mortals. Also, even among mere mortals, making small changes towards someones values are not difficult, and I don't think significant changes are impossible. For example, the consumer diamond industry would be virtually non-existant if De Beers didn't convince people to want diamonds.

Is friendly AI "trivial" if the AI cannot rewire human values?

What is wrong with the statement? The idea I'm trying to portray is that I as a person now, cannot go and forcefully rewire another person's values. The only ability I have to try an affect them is to be persuasive in argument or perhaps being deceptive about certain things to try and get them to a different position (e.g., consider the state of politics).

In contrast, one of the concerns for the future is that an AI may have the technological ability to more directly manipulate a person. So the question I'm asking is: is the future technology at the dispos... (read more)

5DanArmak13y

Consider that every human who ever existed, was shaped purely by environment + genes. Consider how much humans have achieved merely by controlling the environment: converting people to insane religions which they are willing to die and kill for, making torturers, "the banality of evil", etc. etc. Now imagine what an entity could achieve with that plus 1) complete understanding of how the brain is shaped by the environment and/or 2) complete control of the environment (via VR, smart dust, whatever) for a human from age 0 onwards. I think the conservative assumption is that any mind we would recognize as human, and many we wouldn't, could be produced by such an optimization process. You're not limiting your AI at all.

6drethelin13y

Consider that humans have modified human values to results as different as nazism and as jainism.

Jason Silva on AI safety

[+]Alerus13y-100

Consequentialist Formal Systems

So I think my basic problem here is I'm not familiar with this construct for decision making or why it would be favored over others. Specifically, why make logical rules about which actions to take? Why not take an MDP value-learning approach where the agent chooses an action based on which action has the highest predicted utility. If the estimate is bad, it's merely updated and if that situation arises again, the agent might choose a different action as a result of the latest update to it.

[SEQ RERUN] Science Isn't Strict Enough

I feel like the suggested distinction between bayes and science is somewhat forced. Before I knew of bayes, I knew of Occam's razor and its incredible role in science. I had always been under the impression that science favored simpler hypotheses. If it is suggested that we don't see people rigorously adhering to bayes theorem when developing hypotheses, then the answer to why is not because science doesn't value the simpler hypotheses suggested by bayes and priors, but because determining the simplest hypothesis is incredibly difficult to do in many cases... (read more)

Delayed Gratification vs. a Time-Dependent Utility Function

Yeah I agree that the ripple effect of your personal theft would be negligible. I see it as similar to littering. You do it in a vacuum, no big deal, but when many have that mentality, it causes problems. Sounds like you agree too :-)

Delayed Gratification vs. a Time-Dependent Utility Function

Right, so if you can choose your utility function, then it's better to choose one that can be better maximized. Interestingly though, if we ever had this capability, I think we could just reduce the problem by using an unbiased utility function. That is, explicit preferences (such as liking math versus history) would be removed and instead we'd work with a more fundamental utility function. For instance, death is pretty much a universal stop point since you cannot gain any utility if you're dead, regardless of your function. This would be in a sense the ba... (read more)

0momothefiddler13y

Hm. If people have approximately-equivalent utility functions, does that help them all accomplish their utility better? If so, it makes sense to have none of them value stealing (since having all value stealing could be a problem). In a large enough society, though, the ripple effect of my theft is negligible. That's beside the point, though. "Avoid death" seems like a pretty good basis for a utility function. I like that.

On what rationality-related topic should I give a school presentation?

It's hard for me to gauge your audience, so maybe this wouldn't be terribly useful, but a talk outlining logical fallacies (especially lesser-known ones) and why they are fallacies seems like it would have a high impact since I think the layperson commits fallacies quite frequently. Or should I say, I observe people committing fallacies more often than I'd like :p

Welcome to Less Wrong! (2012)

Hi! So I've actually already made a few comments on this site, but had neglected to introduce myself so I thought I'd do so now. I'm a PhD candidate in computer science at the University of Maryland, Baltimore County. My research interests are in AI and Machine Learning. Specifically, my dissertation topic is on generalization in reinforcement learning (policy transfer and function approximation).

Given this, AI is obviously my biggest interest, but as a result, my study of AI has led me to applying the same concepts to human life and reasoning. Lately, I'v... (read more)

[SEQ RERUN] Science Doesn't Trust Your Rationality

I disagree with the quoted part of the post. Science doesn't reject your bayesian conclusion (provided it is rational), it's simply unsatisfied by the fact that it's a probabilistic conclusion. That is, probabilistic conclusions are never knowledge of truth. They are estimations of the likelihood of truth. Science will look at your bayesian conclusion and say "99% confident? That's good!, but lets gather more data and raise the bar to 99.9%!). Science is the constant pursuit of knowledge. It will never reach it it, but it will demand we never stop try... (read more)

Delayed Gratification vs. a Time-Dependent Utility Function

Yeah I agree that you would have to consider time. However, my feeling is that for the utility calculation to be performed at all (that is, even in the context of a fixed utility), you must also consider time through the state of being in all subsequent states, so now you just add and expected utility calculation to each of those subsequent states (and therefore implicitly capture the length of time it lasts) instead of the fixed utility. It is possible, I suppose, that the probability could be conditional on the previous state's utility function too. That... (read more)

0momothefiddler13y

I'm not saying I can change to liking civil war books. I'm saying if I could choose between A) continuing to like scifi and having fantasy books, or B) liking civil war books and having civil war books, I should choose B, even though I currently value scifi>stats>civil war. By extension, if I could choose A) continuing to value specific complex interactions and having different complex interactions, or B) liking smiley faces and building a smiley-face maximizer I should choose B even though it's counterintuitive. This one is somewhat more plausible, as it seems it'd be easier to build an AI that could change my values to smiley faces and make smiley faces than it would be to build one that works toward my current complicated (and apparently inconsistent) utility function. I don't think society-damaging actions are "objectively" bad in the way you say. Stealing something might be worse than just having it, due to negative repercussions, but that just changes the relative ordering. Depending on the value of the thing, it might still be higher-ordered than buying it.

Delayed Gratification vs. a Time-Dependent Utility Function

So it seems to me that the solution is use an expected utility function rather than a fixed utility function. Lets speak abstractly for the moment, and consider the space of all relevant utility functions (that is, all utility functions that would change the utility evaluate of an action). At each time step, we now will associate a probability of you transitioning from your current utility function to any of these other utility functions. For any given future state then, we can compute the expected utility. When you run your optimization algorithm to deter... (read more)

0momothefiddler13y

I like this idea, but I would also, it seems, need to consider the (probabilistic) length of time each utility function would last. That doesn't change your basic point, though, which seems reasonable. The one question I have is this: In cases where I can choose whether or not to change my utility function - cases where I can choose to an extent the probability of a configuration appearing - couldn't I maximize expected utility by arranging for my most-likely utility function at any given time to match the most-likely universe at that time? It seems that would make life utterly pointless, but I don't have a rational basis for that - it's just a reflexive emotional response to the suggestion.

Towards a New Decision Theory for Parallel Agents

I think you may be partitioning things that need not necessarily be partitioned and it's important to note that. In the nicotine example (or the "lock the refrigerator door" example in the cited material), this is not necessarily a competition between the wants of different agents. This apparent dichotomy can also be resolved by internal states as well as utility discount factors.

To be specific, revisit the nicotine problem. When a person decides to quit they may not be suffering any discomfort so the utility of smoking at that moment is small. ... (read more)