Crossposted at the Intelligent Agents Forum.

It should be noted that the colloquial "AI hacking a human" can mean three different things:

  1. The AI convinces/tricks/forces the human to do a specific action.
  2. The AI changes the values of the human to prefer certain outcomes.
  3. The AI completely overwhelms human independence, transforming them into a weak subagent of the AI.

Different levels of hacking make different systems vulnerable, and different levels of interaction make different types of hacking more or less likely.

New Comment
16 comments, sorted by Click to highlight new comments since: Today at 9:30 PM

worst case scenario: AI persuades humans to give it half of their income in exchange for totalitarian control and megawars in order to increase its power over more humanimals

ooops, politics and collectivist ideologies are doing this since ages

That seems amazingly far from a worst case scenario.

I suggest keeping politics out of discussions that are not about politics.

I don't think this is a pertinent or useful suggestion. The point of the reply wasn't to discuss politics, and I think it's a red herring to dismiss it as if it were.

If I may expand on tukabel's response: What is the point of this post? It seems to be some sort of "new" analysis as to how AIs could potentially hack humans—but if we get passed the "this is new and interesting" presentation, it doesn't seem to give anything new, unusual, or even really discussion-worthy.

Why is "The AI convinces/tricks/forces the human to do a specific action" something that's remarkable? What does "Different levels of hacking make different systems vulnerable, and different levels of interaction make different types of hacking more or less likely" even mean? It sounds like an overly verbose and convoluted way of saying "People respond differently to different things."

Maybe there's some background discussion that provides a context I've been missing here, but this doesn't seem like anything that hasn't already been going on amongst humans for thousands of years, and that's a relevant and useful thing to draw from.

Much more succinct: Why are the ways in which an AI can "hack" a human (i.e. - affect them) any different than the ways a human can affect a human? If we replace "AI" with "Human" it'd be trivial and a bit silly.

What is the point of this post?

The point of this post is to be able to link to it from other, longer posts, so that I can, for instance, claim that using the humans as a truth channel http://lesswrong.com/r/discussion/lw/okd/humans_as_a_truth_channel/ is not vulnerable to the first two types of hacking (given other reasonable precautions), but is to the third.

The point of the reply wasn't to discuss politics

I don't think that's at all clear, and looking at tukabel's past comments certainly doesn't give me any confidence that it wasn't.

If I may expand on tukabel's response: What is the point of this post? [...]

I think there's certainly an argument of this sort to be made, I think it's an interesting argument, and I think (as your comment demonstrates) it can be made without getting needlessly political. But tukabel didn't really bother to make it, and certainly didn't bother to avoid making it in a needlessly political way.

Does that really seem like a political post to you, though? It doesn't look like an attempt to discuss politics, types of politics, who's right and who's wrong, there's no tribalism, nothing regarding contemporary politics, etc. It looks like a pure and simple statement of fact: Humans have been coercing other humans into doing specific actions—often times empowering themselves—for the whole of human history.

I don't think tukabel's post was very political outside of the statement "An AI doing this is effectively politics, and politics has existed for a long time." I don't think that's considered discussing politics.

Yup, it seems political, because tukabel made particular choices of what specific actions to highlight and what particular sorts of ideologies to suggest might be responsible.

worst case scenario: AI persuades humans to give it half of their income in exchange for totalitarian control and megawars in order to increase its power over more humanimals

In the sort of scenario I would consider "worst case", a 50% tax to fund whatever the AI is doing wouldn't be anywhere on the list of things to worry about. Why mention "give it half their income"? It doesn't even make sense: "give it half of their income in exchange for totalitarian control" -- so humans give the AI half their income, and "in exchange" the AI gets totalitarian control over them? That's not an exchange, it's like saying "I'll pay you £10 in exchange for being obliged to polish your shoes for you". Outside of peculiar sexual fetishes, no one does that.

So why talk about "half of their income"? Because tukabel wants to complain about how taxes are bad, that's why. Politics.

ooops, politics and collectivist ideologies are doing this since ages

Collectivist ideologies? Well, I guess. But it's not as if it's only "collectivist ideologies" that have persuaded people to hand over their money for dubious benefits. (To take a typical class of counterexamples from quite a long time ago, consider fraudulent businesses in the time of the South Sea Bubble.) Why focus on "collectivist ideologies"? Because tukabel wants to complain about how collectivism is bad and individualism is better, that's why.

That's how it looks to me, anyway. Maybe I'm wrong; maybe I've been oversensitized by seeing any number of other people doing what I think tukabel is doing here, and now see Proselytizing Libertarians under every bed. That would still be an example of why it's worth going out of the way to avoid saying things that are liable to look like politics: because even if you aren't actually intending to slide political preaching into a discussion of something else, it's very easy for it to look as if you are. (This is one of the key points Eliezer made back when he wrote "Politics is the Mind-Killer". Look at the Nixon example he cites; it's no more inflammatory than tukabel's, and that's the point.)

Full disclosure: It is possible that I am extra-sensitive to other annoyances in tukabel's comments because I find his persistent neologizing so grating. No, tukabel, I do not need to see "humans" always replaced with "humanimals" in order to remember that humans are animals, any more than I need "square" replaced with "squarectangle" to remember that squares are rectangles.

Ah, good points.

I did not really know what was meant by "collectivist ideaologies" and assumed it to be something along the lines of "ideaologies that necssitate a collection of people." Originally, I didn't see any significance of the 50% (to me, it just seemed like an off-the-cuff number), but you put it into some good context.

I concede and retract my original criticism

Politics and specific examples aside, there is a valid underlying point about how and whether this is different from humans doing these things to each other.

Is an AI all that much different (on this dimension) from a particularly charismatic and persuasive human (or group of humans)?

Is an AI all that much different (on this dimension) from a particularly charismatic and persuasive human (or group of humans)?

For humans we often distinguish persuasion methods by their efficacy. So a quiet rational chat for twenty minutes is perfectly fine, raving to a large mob is dubious because of crowd dynamics, and stopping someone from sleeping while repeating the same thing over and over to them for twenty days and nights is brainwashing.

The risk with an AI is that it would be capable of changing humans in ways similar to the more dubious methods, while only using the "safe" methods.

How much have you explored the REASONS that brainwashing is seen as not cool, while quiet rational-seeming chat is perfectly fine? Are you sure it's only about efficacy?

I worry that there's some underlying principle missing from the conversation, about agentiness and "free will" of humans, which you're trying to preserve without defining. It'd be much stronger to identify the underlying goals and include them as terms in the AI's utility function(s).

Are you sure it's only about efficacy?

No, but I'm pretty sure efficacy plays a role. Look at the (stereotypical) freakout from some conservative parents about their kids attending university; it's not really about the content or the methods, but because changes in values or beliefs are expected to some degree.

Ok. The obvious followup is "under what conditions is it a bad thing?" Your college example is a good one - are you saying you want to prevent AIs from making similar changes (but on a perhaps larger scale) that university does to students?

Well, there's a formal answer: if an AI can, in condition C, convince any human of belief B for any B, then condition C is not sufficient to constrain the AI's power, and the process is unlikely to be truth-tracking.

That's a sufficient condition for C being insufficient, but not a necessary one.

The risk with an AI is that it would be capable of changing humans in ways similar to the more dubious methods, while only using the "safe" methods.

I think what you're saying makes sense, but I'm still on Dagon's side. I'm not convinced this is uniquely an AI thing. It's not like being a computer gives you charisma powers or makes you psychic -- I think that basically comes down to breeding and exposure to toxic waste.

I'm not totally sure it's an AI thing at all. When a lot of people talk about an AI, they seem to act as if they're talking about "a being that can do tons of human things, but better." It's possible it could, but I don't know if we have good evidence to assume AI would work like that.

A lot of parts of being human don't seem to be visible from the outside, and current AI systems get caught in pretty superficial local minima when they try to analyze human behavior. If you think an AI could do the charisma schtick better than mere humans, it seems like you'd also have to assume the AI understands our social feelings better than we understand them.

We don't know what the AI would be optimizing for and we don't know how lumpy the gradient is, so I don't think we have a foothold for solving this problem -- and since finding that kind of foothold is probably an instance of the same intractable problem, I'm not convinced a really smart AI would have an advantage against us on solving us.