...and no, it's not because of potential political impact on its goals.  Although that's also a thing.

The Politics problem is, at its root, about forming a workable set of rules by which society can operate, which society can agree with.

The Friendliness Problem is, at its root, about forming a workable set of values which are acceptable to society.

Politics as a process (I will use "politics" to refer to the process of politics henceforth) doesn't generate values; they're strictly an input, by which the values of society are converted into rules which are intended to maximize them.  While this is true, it is value agnostic; it doesn't care what the values are, or where they come from.  Which is to say, provided you solve the Friendliness Problem, it provides a valuable input into politics.

Politics is also an intelligence.  Not in the "self aware" sense, or even in the "capable of making good judgments" sense, but in the sense of an optimization process.  We're each nodes in this alien intelligence, and we form what looks, to me, suspiciously like a neural network.

The Friendliness Problem is equally applicable to Politics as it is to any other intelligence.  Indeed, provided we can provably solve the Friendliness Problem, we should be capable of creating Friendly Politics.  Friendliness should, in principle, be equally applicable to both.  Now, there are some issues with this - politics is composed of unpredictable hardware, namely, people.  And it may be that the neural architecture is fundamentally incompatible with Friendliness.  But that is discussing the -output- of the process.  Friendliness is first an input, before it can be an output.

More, we already have various political formations, and can assess their Friendliness levels, merely in terms of the values that went -into- them.

Which is where I think politics offers a pretty strong hint to the possibility that the Friendliness Problem has no resolution:

We can't agree on which political formations are more Friendly.  That's what "Politics is the Mindkiller" is all about; our inability to come to an agreement on political matters.  It's not merely a matter of the rules - which is to say, it's not a matter of the output: We can't even come to an agreement about which values should be used to form the rules.

This is why I think political discussion is valuable here, incidentally.  Less Wrong, by and large, has been avoiding the hard problem of Friendliness, by labeling its primary functional outlet in reality as a mindkiller, not to be discussed.

Either we can agree on what constitutes Friendly Politics, or not.  If we can't, I don't see much hope of arriving at a Friendliness solution more broadly.  Friendly to -whom- becomes the question, if it was ever anything else.  Which suggests a division in types of Friendliness; Strong Friendliness, which is a fully generalized set of human values, and acceptable to just about everyone; and Weak Friendliness, which isn't fully generalized, and perhaps acceptable merely to a plurality.  Weak Friendliness survives the political question.  I do not see that Strong Friendliness can.

(Exemplified: When I imagine a Friendly AI, I imagine a hands-off benefactor who permits people to do anything they wish to which won't result in harm to others.  Why, look, a libertarian/libertine dictator.  Does anybody envisage a Friendly AI which doesn't correspond more or less directly with their own political beliefs?)

New Comment
97 comments, sorted by Click to highlight new comments since:
Some comments are truncated due to high volume. (⌘F to expand all)Change truncation settings

The Friendliness Problem is, at its root, about forming a workable set of values which are acceptable to society.

No, that's the special bonus round after you solve the real friendliness problem. If that were the real deal, we could just tell an AI to enforce Biblical values or the values of Queen Elizabeth II or the US Constitution or something, and although the results would probably be unpleasant they would be no worse than the many unpleasant states that have existed throughout history.

As opposed to the current problem of having a very high likelihood that the AI will kill everyone in the world.

The Friendliness problem is, at its root, about communicating values to an AI and keeping those values stable. If we tell the AI "do whatever Queen Elizabeth II wants" - which I expect would be a perfectly acceptable society to live in - the Friendliness problem is how to get the AI to properly translate that into statements like "Queen Elizabeth wants a more peaceful world" and not things more like "INCREASE LEVEL OF DOPAMINE IN QUEEN ELIZABETH'S REWARD CENTER TO 3^^^3 MOLES" or "ERROR: QUEEN ELIZABETH NOT AN OBVIOUSLY CLOSED SYSTEM, CONVERT EVERYTH... (read more)

5Eugine_Nier
That depends on whether we mean 2013!Queen Elizabeth II or Queen Elizabeth after the the resulting power goes to her head.
0OrphanWilde
I don't think you get the same thing from that document that I do. (Incidentally, I disagree with a lot of the design decisions inherent in that document, such as self-modifying AI, which I regard as inherently and uncorrectably dangerous. When you stop expecting the AI to make itself better, the "Keep your ethics stable across iterations" part of the problem goes away.) Either that or I'm misunderstanding you. Because my current understanding of your view of the Friendliness problem has less to do with codifying and programming ethics and more to do with teaching the AI to know exactly what we mean and not to misinterpret what we ask for. (Which I hope you'll forgive me if I call "Magical thinking." That's not necessarily a disparagement; sufficiently advanced technology and all that. I just think it's not feasible in the foreseeable future, and such an AI makes a poor target for us as we exist today.)

I imagine a Friendly AI, I imagine a hands-off benefactor who permits people to do anything they wish to which won't result in harm to others.

Yeah, I like personal freedom, too, but you have to realize that this is massively, massively underspecified. What exactly constitutes "harm", and what specific mechanisms are in place to prevent it? Presumably a punch in the face is "harm"; what about an unexpected pat on the back? What about all other possible forms of physical contact that you don't know how to consider in advance? If loud verbal abuse is harm, what about polite criticism? What about all other possible ways of affecting someone via sound waves that you don't know how to consider in advance? &c., ad infinitum.

Does anybody envisage a Friendly AI which doesn't correspond more or less directly with their own political beliefs?

I'm starting to think this entire idea of "having political beliefs" is crazy. There are all sorts of possible forms of human social organization, which result in various outcomes for the humans involved; how am I supposed to know which one is best for people? From what I know about economics, I can point out some ... (read more)

I'm starting to think this entire idea of "having political beliefs" is crazy.

Most of my "political beliefs" is awareness of specific failures in other people's beliefs.

2ikrase
That's fairly common, and rarely realized, I think.
1Viliam_Bur
Fairly common among rational (I don't mean LW-style) people. But I also know people who really believe things, and it's kind of scary.
9Vladimir_Nesov
These examples also only compare things with status quo. Status quo is most likely itself "harm" when compared to many of the alternatives.
7OrphanWilde
There are many more ways to arrange things in a defective manner than an effective one. I'd consider deviations from the status quo to be harmful until proven otherwise.
3torekp
Or in other words: most mutations are harmful.
1Vladimir_Nesov
(Fixed the wording to better match the intended meaning: "compared to the many alternatives" -> "compared to many of the alternatives".)
5RomeoStevens
All formulations of human value are massively underspecified. I agree that expecting humans to know what sorts of things would be good for humans in general is terrible. The problem is that we also can't get an honest report of what people think would be good for them personally because lying is too useful/humans value things hypocritically.
-5whowhowho

Which is where I think politics offers a pretty strong hint to the possibility that the Friendliness Problem has no resolution:

We can't agree on which political formations are more Friendly. That's what "Politics is the Mindkiller" is all about; our inability to come to an agreement on political matters. It's not merely a matter of the rules - which is to say, it's not a matter of the output: We can't even come to an agreement about which values should be used to form the rules.

I'm pretty sure this is a problem with human reasoning abilities, and not a problem with friendliness itself. Or in other words, I think this is only very weak evidence that friendliness is unresolvable.

3Ben Pace
Indeed. If we were perfect bayesians, who had unlimited introspective access, and we STILL couldn't agree after an unconscionable amount of argument and discussion, then we'd have a bigger problem.
5OrphanWilde
Are perfect Bayesians with unlimited introspective access more inclined to agree on matters of first principles? I'm not sure. I've never met one, much less two.
1Plasmon
yes

They will agree on what values they have, and what the best action is relative to those values, but they still might have different values.

1Ben Pace
My point exactly. Only if we are sure agents are best representing themselves, can we be sure their values are not the same. If an agent is unsure of zir values, or extrapolates them incorrectly, then there will be disagreement that doesn't imply different values. With seven billion people, none of which are best representing themselves (they certainly aren't perfect bayesians!) then we should expect massive disagreement. This is not an argument for fundamentally different values.
-2OrphanWilde
I disagree with the first statement, but agree with the second. That is, I disagree with a certainty that the problem is with our reasoning abilities, but agree that the evidence is very weak.
0Adele_L
Um, I said I was "pretty sure". Not absolutely certain.
-1OrphanWilde
Upvoted, and I'll consider it fair if you downvote my reply. Sorry about that!
1Adele_L
No worries!
1[anonymous]
I'm amused that you've retracted the post in question after posting this.

There are some analogies between politics and friendliness, but the differences are also worth mentioning.

In politics, you design a system which must be implemented by humans. Many systems fail because of some property of human nature. Whatever rules you give to humans, if they have incentives to act otherwise, they will. Also, humans have limited intelligence and attention, lot of biases and hypocrisy, and their brains are not designed to work in communities with over 300 members, or to resist all the superstimuli of modern life.

If you construct a friendly AI, you don't have a problem with humans, besides the problem of extracting human values.

5OrphanWilde
I fully agree. I don't think even a perfect Friendliness theorem would suffice in making politics well and truly Friendly. Such an expectation is like expecting Friendly AI to work even while it's being bombarded with ionic radiation (or whatever) that is randomly flipping bits in its working memory.
2ikrase
Actually it's worse: It's like expecting to build a Friendly AI using a computer with no debugging utilities, an undocumented program interpreter, and a text editor that has a sense of humor. You have to implement it.

Politics is a harder problem than friendliness: politics is implemented with agents. Not only that, but largely self-selected agents who are thus usually not the ideal selections for implementing politics.

Friendliness is implemented (inside an agent) with non-agents you can build to task.

(edited for grammarz)

0OrphanWilde
Friendliness can only be implemented after you've solved the problem of what, exactly, you're implementing.
1Luke_A_Somers
Right, but the point is you don't need to get everyone to agree what's right (there's always going to be someone out there who's going to hate it no matter what you do). You just need it to actually be friendly... and, as hard as that is, at least you don't have to work with only corrupted hardware.
-7OrphanWilde

We can't agree on which political formations are more Friendly.

We also can't agree on, say, the correct theory of quantum gravity. But reality is there and it works in some particular way, which we may or may not be able to discover.

The values of a friendly AI are usually assumed to be an idealization of universal human values. More precisely: when someone makes a decision, it is because their brain performs a particular computation. To the extent that this computation is the product of a specific cognitive architecture universal to our species (and no... (read more)

1OrphanWilde
If such an idealization exists, that would of course be preferable. I suspect it doesn't, which may color my position here, but I think it's important to consider the alternatives if there isn't a generalizable ideal; specifically, we should be working from the opposing end, and try to generalize from the specific instances; even if we can't arrive at Strong Friendliness (the fully generalized ideal of human morality), we might still be able to arrive at Weak Friendliness (some generalized ideal that is at least acceptable to a majority of people). Because the alternative for those of us who aren't neurologists, as far as I can tell, is to wait.

That's what "Politics is the Mindkiller" is all about; our inability to come to an agreement on political matters.

In a sense, but most would not agree. I think all would agree that motivated cognition on strongly held values makes for some of the mindkilling.

I agree with what I take as your basic point, that people have different preferences, and Friendliness, political or AI, will be a trade off between them. But, many here don't. In a sense, you and I believe they are mindkilled, but in a different way - structural commitment to an incorre... (read more)

The real politic question is: should US government invest money in creating FAI, preventing existential risks and life extension?

0NancyLebovitz
Why just the US government?
0turchin
Of course, not only US government, but of all other countries, which have potential to influence AI research, existential risks. For example North Korea could play important role in existential risks, as it is said to develop small pox bioweapons. In my opinion, we need global government to address existential risks, and AI which will take over the world will be a form of global government. I was routinely downvoted for such posts and comments in LW, so it probably not appropriate place to discuss these issues.
0NancyLebovitz
Smallpox isn't an existential risk-- existential risks affect the continuation of the human race. So far as I know, the big ones are UFAI and asteroid strike. I don't know of classifications for very serious but smaller risks.
1turchin
Look, common smallpox is not existential risk, but biological weapons could be if they were specially designed to be existential risk. The simplest way to do it is simulanious use of many different pathogens. If we have 10 viruses with 50 per cent mortality, it would mean 1000 times reduction of human population, and this last million people would be very scattered and unadapted, so they could continue to extinction. North korea is said to develope 8 different bioweapons, but with progress of biotechnology it could be hundreds. But my main idea here was not a classification of existential risks, but to adress the idea that preventing them is the question of global politic - or it least it should be if we want to survive.
3OrphanWilde
Infectious agents with high mortality rates tend to weed themselves out of the population. There's a sweet spot for infectious disease; prolific enough to pass themselves on, not so prolific as to kill their host before they got the opportunity. Additionally, there's a strong negative feedback to particularly nasty disease in the form of quarantine. A much bigger risk to my mind actually comes from healthcare, which can push that sweet spot further into the "mortal peril" section. Healthcare provokes an arms race with infectious agents; the better we are at treating disease and keeping it from killing people, the more dangerous an infectious agent can be and still successfully propagate.

There's a value, call it "weak friendliness", that I view as a prerequisite to politics: it's a function that humans already implement successfully, and is the one that says "I don't want to be wire-headed, drugged in to a stupor, victim of a nuclear winter, or see Earth turned in to paperclips".

A hands-off AI overlord can prevent all of that, while still letting humanity squabble over gay rights and which religion is correct.

And, well, the whole point of an AI is that it's smarter than us, and thus has a chance of solving harder problems.

2TimS
I'm not sure this is true in any useful sense. Louis XIV probably agrees with me that "I don't want to be wire-headed, drugged in to a stupor, victim of a nuclear winter, or see Earth turned in to paperclips." But I think is is pretty clear than the Sun King was not implementing my moral preferences, and I am not implementing his. Either one of us is not "weak friendly" or "weak friendly" is barely powerful enough to answer really easy moral questions like "should I commit mass murder for no reason at all?" (Hint: no). If weak friendly morality is really that weak, then I have no confidence that a weak-FAI would be able to make a strong-FAI, or even would want to. In other words, I suspect that what most people mean by weak friendly is highly generalized applause lights that widely diverging values could agree with without any actual agreement on which actions are more moral.
0RomeoStevens
I think a lower bound on weak friendliness is whether or not entities living within the society consider their lives worthwhile. Of course this opens up debate about house elves and such but it's a useful starting point.
2Document
That (along with this semi-recent exchange) reminds me of a stupid idea I had for a group decision process a while back. * Party A dislikes the status quo. To change it, they declare to the sysop that they would rather die than accept it. * The sysop accepts this and publicly announces a provisionally scheduled change. * Party B objects to the change and declares that they'd rather die than accept A's change. * If neither party backs down, a coin is flipped and the "winner" is asked to kill the loser in order for their preference to be realized; face-to-face to make it as difficult as possible, thereby maximizing the chances of one party or the other backing down. * If the parties consist of multiple individuals, the estimated weakest-willed person on the majority side has to kill (or convince to forfeit) the weakest person on the minority side; then the next-weakest, until the minority side is eliminated. If they can't or won't, then they're out of the fight, and replaced with the next-weakest person, et cetera until the minority is eliminated or the majority becomes the minority. Basically, formalized war, only done in the opposite way of the strawman version in A Taste of Armageddon; making actual killing more difficult rather than easier. A few reasons it's stupid: * People will tolerate conditions much worse than death (for themselves, or for others unable to self-advocate) rather than violate the taboo against killing or against "threatening" suicide. * The system may make bad social organizations worse by removing the most socially enlightened and active people first. * People have values outside themselves, so they'll stay alive and try to work for change rather than dying pointlessly and leaving things to presumably get worse and worse from their perspective. * Prompting people to kill or die for their values will galvanize them and make reconciliation less likely. * Real policy questions aren't binary, and how a question is framed or what ord
0Document
Actually, I think I'm now remembering a better (or better-sounding) idea that occurred to me later: rather than something as extreme as deletion, let people "vote" by agreeing to be deinstantiated, giving up the resources that would have been spent instantiating them. It might be essentially the same as death if they stayed that way til the end of the universe, but it wouldn't be as ugly. Maybe they could be periodically awakened if someone wants to try to persuade them to change or withdraw their vote. That would hopefully keep people from voting selfishly or without thorough consideration. On the other hand, it might insulate them from the consequences of poor policies. Also, how to count votes is still a problem; where would "the resources that would have been spent instantiating them" come from? Is this a socialist world where everyone is entitled to a certain income, and if so, what happens when population outstrips resources? Or, in a laissez-faire world where people can run out of money and be deinstantiated, the idea amounts to plain old selling of votes to the rich, like we have now. Basically, both my ideas seem to require a eutopia already in place, or at least a genuine 100% monopoly on force. I think that might be my point. Or maybe it's that a simple-sounding, socially acceptable idea like "If someone would rather die than tolerate the status quo, that's bad, and the status quo should be changed" isn't socially acceptable once you actually go into details and/or strip away the human assumptions.
0RomeoStevens
Can this be set up in a round robin fashion with sets of mutually exclusive values such that everyone who is willing to kill for their values kills each other?
0Document
Maybe if the winning side's values mandated their own deaths. But then it would be pointless for the sysop to respond to their threat of suicide to begin with, so I don't know. I'm not sure if there's something you're getting at that I'm not seeing.
0OrphanWilde
"I'm not going to live there. There's no place for me there... any more than there is for you. Malcolm... I'm a monster.What I do is evil. I have no illusions about it, but it must be done. " * The Operative, from Serenity. (On the off-chance that somebody isn't familiar with that quote.)
0RomeoStevens
I'm thinking if you do the matchup's correctly you only wind up with one such person at the end, whom all the others secretly precommit to killing. ...maybe this shouldn't be discussed publicly.
0Document
I don't think the system works in the first place without a monopoly on lethal force. You could work within the system by "voting" for his death, but then his friends (if any) get a chance to join in the vote, and their friends, til you pretty much have a new war going. (That's another flaw in the system I could have mentioned.)
0handoflixue
I think the vast majority of the population would agree that genocide and mass murder are bad, same as wire heading and turning the earth in to paperclips. A single exception isn't terribly noteworthy - I'm sure there's at least a few pro-wire-heading people out there, and I'm sure at least a few people have gotten enraged enough at humanity to think paperclips would be a better use of the space. If you have a reason to suspect that "mass murder" is a common preference, that's another matter.
1TimS
Mass murder is an easy question. Is the Sun King (who doesn't particularly desire pointless mass murder) more moral than I am? Much harder, and your articulation of "weak Friendliness" seems incapable of even trying to answer. And that doesn't even get into actual moral problems society actually faces every day (i.e. what is the most moral taxation scheme?). If weak-FAI can't solve those types of problems, or even suggest useful directions to look, why should we believe it is a step on the path to strong-FAI?
0handoflixue
That's my point. I'm not sure where the confusion is, here. Why would you call it useless to prevent wireheading, UFAI, and nuclear winter, just because it can't also do your taxes? If it's easier to solve the big problems first, wouldn't we want to do that? And then afterwards we can take our sweet time figuring out abortion and gay marriage and tax codes, because a failure there doesn't end the species.
3TimS
For reasons related to Hidden Complexity of Wishes, I don't think weak-FAI actually is likely to prevent "wireheading, UFAI, and nuclear winter." At best, it prohibits the most obvious implementations of those problems. And it is terribly unlikely to be helpful in creating strong-FAI. And your original claim was that common human preferences already implement weak-FAI preferences. I think that the more likely reason why we haven;t had the disasters you reference is that for most of human history, we lacked the capacity to cause those problems. As actual society shows, hidden complexity of wishes make implementing social consensus hopeless, much less whatever smaller set of preferences is weak-FAI preferences.
1handoflixue
My basic point was that we shouldn't worry about politics, at least not yet, because politics is a wonderful example of all the hard questions in CEV, and we haven't even worked out the easy questions like how to prevent nuclear winter. My second point was that humans do seem to have a much clearer CEV when it comes to "prevent nuclear winter", even if it's still not unanimous. Implicit in that should have been the idea that CEV is still ridiculously difficult. Just like intelligence, it's something humans seem to have and use despite being unable to program for it. So, then, summarized, I'm saying that we should perhaps work out the easy problems first, before we go throwing ourselves against harder problems like politics.
1TimS
There's not a clear dividing line between "easy" moral questions and hard moral questions. The Cold War, which massively increased the risk of nuclear winter, was a rational expression of Great Power relations between two powers. Until we have mutually acceptable ways of resolving disputes when both parties are rationally protecting their interests, we can't actually solve the easy problems either.
0handoflixue
from you: and from me: So, um, we agree, huzzah? :)
0fubarobfusco
Sure, genocide is bad. That's why the Greens — who are corrupting our precious Blue bodily fluids to exterminate pure-blooded Blues, and stealing Blue jobs so that Blues will die in poverty — must all be killed!
2gwern
We usually call that the 'sysop AI' proposal, I think.
2OrphanWilde
There's a bootstrapping problem inherent to handing AI the friendliness problem to solve. Edit: Unless you're suggesting we use a Weakly Friendly AI to solve the hard problem of Strong Friendliness?
5handoflixue
Your edit pretty much captures my point, yes :) If nothing else, a Weak Friendly AI should eliminate a ton of the trivial distractions like war and famine, and I'd expect that humans have a much more unified volition when we're not constantly worried about scarcity and violence. There's not a lot of current political problems I'd see being relevant in a post-AI, post-scarcity, post-violence world.
2Dre
The problem is that we have to guarantee that the AI doesn't do something really bad while trying to stop these problems; what if it decides it really needs more resources suddenly, or needs to spy on everyone, even briefly? And it seems (to me at least) that stopping it from having bad side effects is pretty close, if not equivalent to, Strong Friendliness.
0handoflixue
I should have made that more clear: I still think Weak-Friendliness is a very difficult problem. My point is simply that we only need an AI that solves the big problems, not an AI that can do our taxes. My second point was that humans seem to already implement weak-friendliness, barring a few historical exceptions, whereas so far we've completely failed at implementing strong-friendliness. I'm using Weak vs Strong here in the sense of Weak being a "SysOP" style AI that just handles catastrophes, whereas Strong is the "ushers in the Singularity" sort that usually gets talked about here, and can do your taxes :)
2OrphanWilde
This... may be an amazing idea. I'm noodling on it.
0[anonymous]
Edit: Completely misread the parent.
0Rukifellth
I know this wasn't the spirit of your post, but I wouldn't refer to war and famine as "trivial distractions".
0Rukifellth
Wait, if you're regarding the elimination of war, famine and disease as consolation prizes for creating an wFAI, what are people expecting from a sFAI?
2Fadeway
God. Either with or without the ability to bend the currently known laws of physics.
2Rukifellth
No, really.
5Richard_Kennaway
Really. That really is what people are expecting of a strong FAI. Compared with us, it will be omniscient, omnipotent, and omnibenevolent. Unlike currently believed-in Gods, there will be no problem of evil because it will remove all evil from the world. It will do what the Epicurean argument demands of any God worthy of the name.
0Rukifellth
Are you telling me that if a wFAI were capable of eliminating war, famine and disease, it wouldn't be developed first?
3Richard_Kennaway
Well, I don't take seriously any of these speculations about God-like vs. merely angel-like creations. They're just a distraction from the task of actually building them, which no-one knows how to do anyway.
0Rukifellth
But still, if a wFAI was capable of eliminating those things, why be picky and try for sFAI?
1RomeoStevens
Because we have no idea how hard it is to specify either. If, along the way it turns out to be easy to specify wFAI and risky to specify sFAI, then the reasonable course is expected. Doubly so since a wFAI would almost certainly be useful in helping specify a sFAI. Seeing as human values are a miniscule target, it seems probable that specifying wFAI is harder than sFAI though.
0Rukifellth
"Specify"? What do you mean?
0RomeoStevens
specifications a la programming.
0Rukifellth
Why would it be harder? One could tell the wFAI improve factors that are strongly correlated with human values, such as food stability, resources that cure preventable diseases (such as diarrhea, which, as we know, kills way more people than it should) and security from natural disasters.
0RomeoStevens
Because if you screw up specifying human values you don't get wFAI you just die (hopefully).
0Rukifellth
It's not optimizing human values, it's optimizing circumstances that are strongly correlated with human values. It would be a logistics kind of thing.
2RomeoStevens
Have you ever played corrupt a wish?
0Rukifellth
No, but I'm guessing I'm about to. "I wish for a list of possibilities for sequences of actions, any of whose execution would satisfy the following conditions. * Within twenty years, for Nigeria to have standards of living such that it would receive the same rating as Finland on [Placeholder UN Scale of People's-Lives-Not-Being-Awful]." The course of action would be evaluated by a think-tank, until they decided that the course of actions was acceptable, and the wFAI was given the go.
0RomeoStevens
The AI optimizes only for that and doesn't generate a list of non-obvious side effects. You implement one of them and something horrible happens to finland, and or countries beside nigeria. or In order to generate said list I simulate Nigeria millions of times to a resolution such that entities within the simulation pass the turing test. Most of the simulations involve horrible outcomes for all involved. or I generate such a list including many sequences of actions that lead to a small group being able to take over nigeria and or finland and or the world. (or generates some other power differential that screws up international relations) or In order to execute such an action I need more computing power, and you forgot to specify what are acceptable actions for obtaining it. or The wFAI is much cleverer than a single human thinking about this for 2 minutes and can screw things up in ways that are as opaque to you as human actions are to a dog. In general, specifying an oracle/tool AI is not safe: http://lesswrong.com/lw/cze/reply_to_holden_on_tool_ai/ Even more generally, our ability to build an AI that is friendly will have nothing to do with our ability to generate clauses in english that sound reasonable.

Part of the problem is the many factors involved in the political issues. People explain things through their own specialty, but lack knowledge of other specialties.

Why do you restrict Strong Friendliness to human values? Is there some value which an intelligence can have that can never be a human value?

0OrphanWilde
Because we're the one that has to live with the thing, and I don't know but my inclination is that the answer is "Yes"
0Decius
Implication: A Strongly Friendly (paperclip maximizer) AI is actually a meaningful phrase. (As opposed to all Strongly Friendly AIs being compatible with everyone) Why all human values?

You're making the perfect the enemy of the good.

I'm fine with at least a thorough framework for Weak Friendliness. That's not gonna materialize out of nothing. There are no actual Turing Machines (infinite tapes required), yet it is a useful model and its study yields useful results for real world applications.

Studying Strong Friendliness is a useful activity in finding a heuristic for best-we-can-do friendliness, which is way better than nothing.

Politics as a process doesn't generate values; they're strictly an input,

Politics is part about choosing goals/values. (E.g., do we value equality or total wealth?) It is also about choosing the means to achieving the goals. And it is also about signaling power. Most of these are not relevant to designing a future Friendly AI.

Yes, a polity is an "optimizer" in some crude sense, optimizing towards a weighted sum of the values of its members with some degree of success. Corporations and economies have also been described as optimizers. But I don't see too much similarity to AI design here.

2[anonymous]
Deciding what we value isn't relevant to friendliness? Could you explain that to me?
2Larks
The whole point of CEV is that we give the AI an algorithm for educing our values, and let it run. At no point do we try to work them out ourselves.
0[anonymous]
I mentally responded to you and forgot to, you know, actually respond. I'm a bit confused by this and since it was upvoted I'm less sure I get CEV.... It might clear things up to point out that I'm making a distinction between goals or preferences vs. values. CEV could be summarized as "fulfill our ideal rather than actual preferences", yeah? As in, we could be empirically wrong about what would maximize the things we care about, since we can't really be wrong about what to care about. So I imagine the AI needing to be programmed with our values- the meta wants that motivate our current preferences- and it would extrapolate from them to come up with better preferences, or at least it seems that way to me. Or does the AI figure that out too somehow? If so, what does an algorithm that figures out our preferences and our values contain?
3Larks
Ha, yes, I often do that. The motivation behind CEV also includes the idea we might be wrong about what we care about. Instead, you give your FAI an algorithm for * Locating people * Working out what they care about * Working out what they would care about if they knew more, etc. * Combining these preferences I'm not sure what distinction you're trying to draw between values and preferences (perhaps a moral vs non-moral one?), but I don't think it's relevant to CEV as currently envisioned.
1JoshuaFox
Actually, when I said "most" in "most of these are not relevant to designing a future Friendly AI," I was thinking that values are the exception, they are relevant.
0[anonymous]
Oh. Then yeah ok I think I agree.