Friendlier AI through politics

Jonathan_Graehl

Friendlier AI through politics

1 min read16th Aug 200944 comments

2

David Brin suggests that some kind of political system populated with humans and diverse but imperfectly rational and friendly AIs would evolve in a satisfactory direction for humans.

I don't know whether creating an imperfectly rational general AI is any easier, except that limited perceptual and computational resources obviously imply less than optimal outcomes; still, why shouldn't we hope for optimal given those constraints? I imagine the question will become more settled before anyone nears unleashing a self-improving superhuman AI.

An imperfectly friendly AI, perfectly rational or not, is a very likely scenario. Is it sufficient to create diverse singleton value-systems (demographically representative of humans' values) rather than a consensus (over all humans' values) monolithic Friendly?

What kind of competitive or political system would make fragmented squabbling AIs safer than an attempt to get the monolithic approach right? Brin seems to have some hope of improving politics regardless of AI participation, but I'm not sure exactly what his dream is or how to get there - perhaps his "disputation arenas" would work if the participants were rational and altruistically honest).

New Comment

44 comments, sorted by

top scoring

Click to highlight new comments since: Today at 1:37 AM

[-]cousin_it15y60

I found Brin's post flowery and worthless. Michael Anissimov correctly points out the problems with it in the comments. (Is he registered on LW, I wonder?)

People seem to have serious problems grasping the idea that AIs are machines that will stupidly want anything we program them to want, not elf-children that will magically absorb our "kindness" and "understanding". When I first came across Eliezer's writings about FAI, that point seemed absolutely obvious... I guess David Brin hasn't read it, that's about the only possible explanation.

[-]Dagon15y40

Wait. AIs are neither of the things you say. they're not elf-children, but they're not stupid machines. They're smart machines, much like humans (but much faster to adapt).

It's very hard to believe that we will be able to understand the source code of a real AI any better than we understand our own or each others'. In that sense, yes - finding a way to include "kindness" and "empathy" as meta-values for the bootstrapping machine is exactly what some of us have in mind.

[-]Furcas15y30

Yes, Anissimov is registered here.

[-]Jonathan_Graehl15y10

I agree. I find his optimism refreshing (though probably still naive) when applied to humans, where at least there's some hope of his intuitions working, but I'm not sure that he's thought about AI deeply.

[-]PhilGoetz15y30

Here is a better post than Brin's on the subject, by Anissimov and Smigrodzki, from 2006, on Accelerating Future.

[-]SforSingularity15y20

David Brin suggests that some kind of political system populated with humans and diverse but imperfectly rational and friendly AIs would evolve in a satisfactory direction for humans.

Brin:

a great diversity of mega minds, contending with each other, politely, and under civil rules, but vigorously nonetheless, holding each other to account and ensuring everything is above-board.

Source

These AIs are all friendly, because they have the constraint of not using force upon each other, i.e. you would have to solve the friendliness structure problem and put in some content about not destroying the other AIs.

Brin:

But the real reason to do this is far more pragmatic. If the new AIs feel vested in a civilization that considers them "human" then they may engage in our give and take process of shining light upon delusion. Each others delusions, above all. Reciprocal accountability -- extrapolated to a higher level -- may thus maintain the core innovation of our civilization. It's central and vital insight. And thus, we may find that our new leaders -- our godlike grandchildren -- will still care about us... and keep trying to explain.

Brin wants to create AI that is conditionally altruistic, and then be nice to it, and then hope that it will be nice back, whilst also having more than one such AI around, each with distinct goals, but also impose the constraint upon them that they have to not use force upon each other.

Eliezer, this must be painful for you to read.

[-]PhilGoetz15y10

I'm still trying to figure out what reason the "thus" in "And thus, we may find that our new leaders -- our godlike grandchildren -- will still care about us..." refers to. When you have two "thus" statements in a row like that, "A thus B thus C", it means B justifies C. In this case, "Reciprocal accountability may maintain the core innovation of our civilization" is B, which is somehow supposed to justify AIs caring about us.

[-]SforSingularity15y10

I'm still trying to figure out what reason the "thus" in "And thus, we may find that our new leaders -- our godlike grandchildren -- will still care about us..." refers to.

I think the argument is that humans will be instrumentally useful because they will point out machines' irrationalities (!) plus the fact that the machines come with reciprocal altruism.

[-]anonym15y10

The word thus does not only mean 'therefore'. It can also mean 'in this way', which I believe is the intended sense here.

[-]eirenicon15y20

I think AIs will always be imperfectly rational as they cannot optimize everything and still make decisions, and besides, optimal outcomes are not the same as perfect outcomes. I do think self-improving AI will tend to self-improve toward perfect rationality to the point where there is no significant difference between optimal and perfect.

As for a system in which AIs that disagree on friendliness are not dangerous: I can't imagine one. If friendliness includes protecting humans from unfriendly AI, a perfectly friendly AI would destroy imperfectly friendly AIs because imperfect friendliness is unfriendly compared to perfect friendliness. Therefore, an imperfectly friendly AI that thought it was perfectly friendly would destroy other AIs it disagreed with. To put it crudely, a "women's lib" AI could not coexist with a "Sharia" AI. And an imperfectly friendly AI that knew it was imperfectly friendly could not trust itself to make friendly decisions, and so would be acting in an unfriendly way if it tried to make decisions for people. Of course, this is all pretty primitive word vomit that I haven't spent much time thinking about, and ignores the reality that if imperfectly rational and friendly humans can get along, AIs should be able to as well.

Interestingly, Brin argues from the question Shall we maintain momentum and fealty to the underlying concepts of the Western Enlightenment? What he fails to consider here is the very genuine possibility that an AGI may be created by a non-Enlightenment culture, i.e. Japan or China. Perhaps a likelier possibility yet is an AGI created by a coalition of cultures, researchers from different countries working together, which could further extend to a multinational corporation which may not reflect any particular culture other than a basically capitalist one. While capitalism is a pretty fundamental Enlightenment ideal, it is only a framework in which other cultures may function very differently. The way a Japanese businessman or CEO considers competition is not the same as how a Russian or an American or a Brazilian views it. How different would a Japanese AGI be from an American one, or an AGI built by the Chinese government vs. one built by a private lab in the EU?

[-][anonymous]15y00

David Brin suggests that some kind of political system populated with humans and diverse but imperfectly rational and friendly AIs would evolve in a satisfactory direction for humans.

If I believed David Brin had the ability to significantly encourage the creation of such a system I would do whatever was in my power to thwart him. The only political moves that could make the outcome one favourable to humanity are those that prevent it.

[-][anonymous]15y00

About disputation arenas, I wish more people were at least thinking like Brin.

There are many projects for hypertext debate and analysis, but I don't know of any which would let analysts break down a hard ambiguous question using the Bayesian formulations of evidence, explaining-away, or transfer learning from related cases through hierarchical priors. And no project lets analysts reduce disagreements about instrumental values to disagreements about parameters or approximations in decision theory. Also, I know of no game-theoretic analysis of participants' incentives to enter evidence selectively or falsely, even though those incentives affect the conditional probabilities of observing that evidence.

[-]NixPhenom15y-10

What counts as an AGI? A system that can solve resource-allocation problems that satisfy its inbuilt values in a dynamic environment?

If one were to take Hayek's view of human marketplaces, we could consider our currently existing markets as greater-than-human intelligences by that standard.

What legal principles enable markets to work well? Theory and practice suggest that the necessary and sufficient conditions are: 1) property ownership and enforcement and 2) a sound currency of value-measurement.

So: how well have we imbued our emergently-superintelligent markets with values consistent with human well-being? I would suggest that we have succeed to the extent that the conditions above are met. Further, I'd suggest that FAI development should also hinge on those principles.

[-]PhilGoetz15y-10

What kind of competitive or political system would make fragmented squabbling AIs safer than an attempt to get the monolithic approach right?

This is a tremendously important question! (David Brin isn't the first person to raise the idea, BTW. I raised it at the first AGI workshop in 2006, and probably before that on OB. I would be surprised if no-one else had also raised it before that.)

Brin's essay doesn't really touch on any of the important problems with doing so, though.

One of the dangers of trying to implement this is our own horrendously inaccurate understanding of how checks-and-balances work in our own system. Brin's essay, and the ideas of just about every American who speaks on this topic, are fundamentally unsound because they start from the presumption that democracy works, for everything, all the time, everywhere. We've made democracy such a concept of reverence that we have never critiqued it. We haven't even collected the data we would need to do so. Even in Iraq, where we urgently need to do so, we still haven't asked the question "Why does democracy not seem to work here? Would something else work better?"

For starters, we can't hope to create an ecology of AIs until we can figure out how to create a government that doesn't immediately decay into a 2-party system. We want more than 2 AIs.

EDIT: Folks, this is a very important point, for your own survival. I strongly encourage you to explain why you down-voted this comment.

[-]Eliezer Yudkowsky15y40

What kind of competitive or political system would make fragmented squabbling AIs safer than an attempt to get the monolithic approach right?

This is a tremendously important question!

...and the answer is, "None." It's like asking how you should move your legs to walk faster than a jet plane.

[-]thomblake15y10

I think the question we should all be asking is, "Is the plane on a treadmill?"

[-]PhilGoetz15y00

Downvoted for dismissing a question that is tremendously important to Eliezer's own work without giving any evidence; and for claiming certainty.

It would be reasonable to say that you think it might not be possible. It isn't reasonable to claim to know that it's impossible.

I have just stated that it isn't reasonable to dismiss without argument as impossible what may be our only chance for survival. I therefore find the immediate surge of downvotes surprising, and would appreciate explanations.

[-]SilasBarta15y00

It's like asking how you should move your legs to walk faster than a jet plane.

Easy. Go to the first-class galley. Drop-kick their ceramic plates and bowls so that they shatter. Then tell the inspectors. They'll ground the plane. I can walk faster than 0 mph, can you?

(ETA: Better suggestions not given for fear of being arrested, but you get the picture.)

You forgot that aircraft structural analysts post here.

Oh, and wrong meta-level! Or something...

[-]Alicorn15y00

It's like asking how you should move your legs to walk faster than a jet plane.

I can already walk faster than a jet plane. Jet planes do not walk.

[-]steven046115y60

I can already walk faster than a jet plane. Jet planes do not walk.

How can you walk faster than something that doesn't walk walks?

[-]Eliezer Yudkowsky15y40

I love this site.

[-]thomblake15y20

If it can't walk at all, I can certainly walk faster than it.

Would you like some more tea?

[-][anonymous]15y00

If it can't walk at all, I can certainly walk faster than it.

Tonight, we dine in Bayesian hell!

ETA: or wherever Popperians go when they're falsified.

[-]Cyan15y10

Principle-of-charity interpretation:

"It's like asking how you should move your legs to travel faster than a jet plane."

[-]SilasBarta15y-10

It's like asking how I should move my legs to travel faster than a a jet plane moves its legs? j/k But I took on what was really meant by the question.

[-]PhilGoetz15y-30

Here's a theory I invented this morning as to why we have a 2-party system. I was puzzling over Obama's insistence that health-care reform will not include putting a cap on punitive damages in lawsuits. Of all the things one could try to cut costs, that's the only one that's a clear winner with an instant and huge payoff, and no losers except trial lawyers.

Someone on the radio said that the Democratic party wouldn't cap lawsuits because they were too closely-connected with trial lawyers. And this was NPR, not Rush Limbaugh. I hadn't heard that before, but it made a lot of sense. It clicked with something else that's been hanging out in my head waiting for an explanation: How does the Democratic party survive in the US, when Republicans have all the money?

Lawyers go to Congress and make more and more laws that create more legal issues for corporations, requiring them to hire more and more lawyers. (The relationship isn't merely parasitic; it is also cooperative - corporations wouldn't make much money without the rule of law.)

Many of these legal issues are made in the name of populist concepts, like equal opportunity, health care, and environmentalism. Lawyers, accountants, and a legion of bureaucrats make quite a lot of money off of an alleged concern for the poor.

So many contributors to the democratic party don't care about the poor. They use the poor as an excuse to feed off the rich.

If this theory were partly correct, we would find that a large percentage of major contributors to the Democratic party are people who make money from public-good programs. Going to opensecrets.org, we find that these are the only identifiable top contributors, excluding organizations like "The committee to re-elect Nancy Pelosi":

Paloma Partners, a management consultancy firm - Doctors Hospital at Renaissance - Waters & Kraus , a law firm - Simon Property Group, a real estate firm

I'll call that 2/4.

Compare this to the contributors to the Republican party:

Rothstein, Rosenfeldt & Adler, a law firm - Perry Homes, real estate - Crow Holdings, holding corporation - Cumberland Resources, mining - Chartwell Partners, executive search - Northwest Excavating Co, excavating contractors - Cintas Corp, uniform makers - Amphastar Pharmaceuticals - Intellectual Ventures LLC, venture capitalists - Curves International, health club - Contran Corp, holding corporation - Hoffman Partners, unable to determine what they do - Reyes Holding, holding corp. - Miller, Buckfire & Co, investment bank - AT&T Inc - TAMKO Building Products

Looks like 1/16.

Now, of course reality is more complicated, and there are many other factors involved. But if this is a major factor in explaining how the democratic party survives, it means that our government is not explained by checks-and-balances and game theory and rational cooperation, but also by the ancient prey-predator-parasite dynamic, where the wealthy Republicans are the top predators, the wealthy Democrats are the parasites, and everybody else (most of us) are the prey.

Guess which all of us get to be if we manage to extend this system to include AIs.

[-]Douglas_Knight15y-10

Surely most populist regulation that hampers corporations--say, Sarbanes-Oxley--mainly involves paperwork, not lawsuits. And corporate lawyers are Republican, aren't they?

[-]PhilGoetz15y00

And corporate lawyers are Republican, aren't they?

I have no idea.

[-]Douglas_Knight15y00

Lawyers contribute money 3:1 to Democrats than to Republicans.

That leaves us with two possibilities: (1) Corporate lawyers are D, your theory is largely correct, but references to "trial lawyers" being D are very misleading. (2) Corporate lawyers are not D (I now guess evenly split), but aren't politically active. Some politics (you claim D) benefits them by accident. I lean towards #2.

A crude measure is politicians. According to this source there are 50% more Democratic congressmen who are lawyers than Republican congressmen who are (and 15% more D than R that term). Of course you get the wrong answer if you try to figure out the politics of entertainers by looking at politicians. But even if they aren't representative of lawyers, I suspect that all those R lawyer congressmen are doing things in the interests of lawyers.

[-]PhilGoetz15y-20

BTW, a bit off-topic, but does anybody else wonder if being able to create a friendly AI requires being able to be friendly? Because we don't have a lot of that here.

[-]Alicorn15y40

Are you talking about Friendliness in the technical sense, which in humans would mostly mean not being a sociopath, or are you saying that to build FAI, a human has to be friendly in the garden-variety way (cheerily sociable and open)?

[-]PhilGoetz15y20

The latter. The connection is tenuous; but let me explain.

If your model of the right way to make friendly AI is a completely top-down one, meaning the computer is your slave and you code it so that it is impossible for it to violate the Three Laws of Robotics, or a more sophisticated scheme that allows it to produce only certain kinds of plans, then the question is irrelevant.

But if your model of the right way to make friendly AI also involves developing a theory of rationality that leads to cooperation, then I would think that the person developing such a theory would also apply some of it in their own life, and be able to work well with others, and talk things out without descending into flame-wars or developing irrational biases against particular people.

Yet we seem to be bad at that on LW. I'm as guilty as anyone.

[-]Alicorn15y20

I think that both working well with others and having rationally decided that cooperation in PD-like problems is right are good things. I don't think they necessarily come together! When I'm friendly to people, this tends to be for reasons that come out of my experience (e.g. "Maybe this person will be my friend, like So-and-so!") or my emotions (e.g. "It's a beautiful day! I just want to grin at everybody I meet and compliment them on their outfits!") or my goals (e.g. "I want my friend to be happy, so I'll bring her a glass of juice and ask her if there's anything else I can do!") rather than the fact that I think in PD situations you should be cooperative.

As for irrational biases against particular people, the easiest way to practically get around them is to hold a belief that people can and will change for the better. That way, if you have decided that someone is a stupid jerk, the next time you see a post of theirs you can check to see if they might have ceased to be a stupid jerk, instead of saying that because they are a stupid jerk the post must be stupid and jerky.

[-]PhilGoetz15y20

I think that both working well with others and having rationally decided that cooperation in PD-like problems is right are good things. I don't think they necessarily come together!

As an example of how they should but do not come together, we have people here who agree that they should cooperate in extended PD, yet engage each other in an extended PD-like situation in which they have a discussion via comments and each continue to down-vote each others' comments.

[-]Alicorn15y10

Since votes are anonymous, how can you know that this is happening?

[-]PhilGoetz15y10

I can't know, but I've seen cases where it seems likely. Sometimes I've seen extended exchanges between 2 people, that no one else seems to be reading, where all the comments have a score of -1.

Personally, I don't think the voting system is working very well. It seems to be used to encourage conformity and punish negativity. I have a lot of points myself, but there's a strong negative correlation between the quality of my comments and posts, and the votes they receive. I'd like it if votes were no longer anonymous. I don't usually make a downvote without explaining why in a comment, myself.

[-]thomblake15y50

Sometimes I've seen extended exchanges between 2 people, that no one else seems to be reading, where all the comments have a score of -1

My hypothesis: you're a poor judge of whether people are reading an exchange. (where would you get that data?)

but there's a strong negative correlation between the quality of my comments and posts, and the votes they receive

My hypothesis: you're a poor judge of the quality of your comments and posts.

[-]PhilGoetz15y20

My hypothesis: you're a poor judge of whether people are reading an exchange. (where would you get that data?)

So how does your hypothesis explain that these hypothetical other readers consistently read one statement and disagree with it, and then read another statement disagreeing with the first statement, and disagree with that also?

My hypothesis: you're a poor judge of the quality of your comments and posts.

My hypothesis: You didn't bother checking any data before your knee-jerk response, even though it was a button-click away. Honestly, did you?

If I were merely a poor judge, my sample size is large enough that the correlation would most likely be low or random, not strongly negative.

But instead of a hypothesis, let's give you some objective data. Would you agree that higher-quality posts should generate more discussion?

Here are posts I have made, followed by their voted score, followed by the number of comments.

Media bias, 30, 43 - Mechanics without wrenches, 23, 71 - A note on hypotheticals, 18, 17 - Tell it to someone who doesn't care, 15, 34

The Machine Learning Personality Test, 15, 27
Aumann voting; or, How to vote when you're ignorant, 10, 31
On dollars, utility, and crack cocaine, 8, 97
Exterminating life is rational, 7, 216
Marketing rationalism, 7, 54
Extreme updating: The devil is in the missing details, 6, 16
Calibration fail, 5, 36.
Chomsky on reason and science, 5, 6
Is masochism necessary?, 4, 123
You can't believe in Bayes, 1, 53
Average utilitarianism must be correct?, 1, 111
Rationalists lose when others choose, 0, 52
Homogeneity vs. heterogeneity, 0, 77

Correlation coefficient = -.23

Linear regression slope = -1.4

[-]anonym15y60

Would you agree that higher-quality posts should generate more discussion?

No. A good troll can get far more comments than almost any high-quality non-troll post. And you also cannot ignore the difficulty of the post, or how much knowledge it presupposes (and thus how small its potential audience is), or whether the post is on a topic that everybody is an expert in (e.g., politics, male-female relations, religion).

[-]conchis15y60

I for one comment far more on Phil's posts when I think they're completely misguided than I do otherwise. Not sure what that says about me, but if others did likewise, we would predict precisely the relationship Phil is observing.

[-]Alicorn15y30

So how does your hypothesis explain that these hypothetical other readers consistently read one statement and disagree with it, and then read another statement disagreeing with the first statement, and disagree with that also?

You're assuming that these hypothetical other readers downvote for disagreement. It's completely possible to read an internet argument and think the entire thing is just stupid/poor quality/not worth wasting time on.

Here are posts I have made, followed by their voted score, followed by the number of comments.

Is your assumption that quality of post is proportional to the amount of discussion under it? (Edit: I see that indeed it is.) That seems like a huge assumption, especially since many long exchanges spin off from nitpicks and tangents. Also, the post of yours that generated the most comments was also really long, and even then a fair chunk of the replies were the descendants of my gendered language nudge.

[-]anonym15y10

Exactly. I'd guess (based on the stated justifications for voting that have been uttered in many LW threads) that most people don't vote based on disagreement but on what they want to see more of and what they want to see less of.

[-]Wei Dai15y40

I think I've been in one of those exchanges, but I didn't downvote anyone. (I guess someone thought the whole conversation was nonsensical or boring, or something.) Can you give specific examples so that the authors of the thread can confirm or deny your hypothesis?

[-]Alicorn15y20

I think votes tend to get more petty use than might be preferred, the negative effects of which could be partly ameliorated by displaying upvotes and downvotes separately, but even in the absence of that feature I think they do more good than harm.

Moderation Log