Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.

Comment author: Lumifer 21 December 2016 03:59:07PM *  0 points [-]

What's wrong with that?

It's a badly formulated question, likely to lead to confusion.

there are a cluster of behaviors that we do NOT want our AI to have, which seem blackmailish to us

So, can you specify what this cluster is? Can you list the criteria by which a behaviour would be included in or excluded from this cluster? If you do this, you have defined blackmail.

Comment author: kokotajlod 23 December 2016 06:45:00PM 0 points [-]

"It's a badly formulated question, likely to lead to confusion." Why? That's precisely what I'm denying.

"So, can you specify what this cluster is? Can you list the criteria by which a behaviour would be included in or excluded from this cluster? If you do this, you have defined blackmail."

That's precisely what I (Stuart really) am trying to do! I said so, you even quoted me saying so, and as I interpret him, Stuart said so too in the OP. I don't care about the word blackmail except as a means to an end; I'm trying to come up with criteria by which to separate the bad behaviors from the good.

I'm honestly baffled at this whole conversation. What Stuart is doing seems the opposite of confused to me.

Comment author: 3p1cd3m0n 10 January 2015 10:15:07PM 0 points [-]

"Of course" implies that the answer is obvious. Why is it obvious?

Comment author: kokotajlod 23 December 2016 06:36:06PM 0 points [-]

Dunno what Username was thinking, but here's the answer I had in mind: "Why is it obvious? Because the Problem of Induction has not yet been solved."

Comment author: Dagon 20 December 2016 06:06:21PM 0 points [-]

Now I'm really confused. Are you trying to define words, or trying to understand (and manipulate) behaviors? I'm hearing you say something like "I don't know what blackmail is, but I want to make sure an AI doesn't do it". This must be a misunderstanding on my part.

I guess you might be trying to understand WHY some people don't like blackmail, so you can decide whether you want to to guard against it, but even that seems pretty backward.

Comment author: kokotajlod 21 December 2016 03:23:51PM 0 points [-]

You make it sound like those two things are mutually exclusive. They aren't. We are trying to define words so that we can understand and manipulate behavior.

"I don't know what blackmail is, but I want to make sure an AI doesn't do it." Yes, exactly, as long as you interpret it in the way I explained it above.* What's wrong with that? Isn't that exactly what the AI safety project is, in general? "I don't know what bad behaviors are, but I want to make sure the AI doesn't do them."

*"In other words there are a cluster of behaviors that we do NOT want our AI to have, which seem blackmailish to us, and a cluster of behaviors that we DO want it to have, which seem tradeish to us. So we are now trying to draw a line in conceptual space between them so that we can figure out how to program an AI appropriately."

Comment author: Dagon 20 December 2016 12:15:03AM 1 point [-]

As I understand it, the idea is that we want to design an AI that is difficult or impossible to blackmail, but which makes a good trading partner.

You and Stuart seem to have different goals. You want to understand and prevent some behaviors (in which case, start by tabooing culturally-dense words like "blackmail"). He wants to understand linguistic or legal definitions (so tabooing the word is counterproductive).

Comment author: kokotajlod 20 December 2016 02:43:57PM 0 points [-]

"You want to understand and prevent some behaviors (in which case, start by tabooing culturally-dense words like "blackmail")"

In a sense, that's exactly what Stuart was doing all along. The whole point of this post was to come up with a rigorous definition of blackmail, i.e. to find a way to say what we wanted to say without using the word.

Comment author: Dagon 19 December 2016 03:04:08PM 0 points [-]

Unpack further, please. Are you trying to understand legal or colloquial implications of the words in the English language, or are you more concerned with clusterings of activities for some additional shared attributes?

Comment author: kokotajlod 19 December 2016 04:31:22PM 1 point [-]

As I understand it, the idea is that we want to design an AI that is difficult or impossible to blackmail, but which makes a good trading partner.

In other words there are a cluster of behaviors that we do NOT want our AI to have, which seem blackmailish to us, and a cluster of behaviors that we DO want it to have, which seem tradeish to us. So we are now trying to draw a line in conceptual space between them so that we can figure out how to program an AI appropriately.

Comment author: ike 19 August 2016 10:47:14AM 0 points [-]

I'd love to understand what you said about re-arranging terms, but I don't. Can you explain in more detail how you get from the first set of hypotheses/choices (which I understand) to the second?

I just moved the right hand side down by two spaces. The sum still stays the same, but the relative inequality flips.

As I said earlier, my solution is an argument that in every case there will be an action that strictly dominates all the others.

Why would you think that? I don't really see where you argued for that, could you point me at the part of your comments that said that?

Comment author: kokotajlod 29 October 2016 02:09:42AM 0 points [-]

OH ok I get it now: "But clearly re-arranging terms doesn't change the expected utility, since that's just the sum of all terms." That's what I guess I have to deny. Or rather, I accept that (I agree that EU = infinity for both A and B) but I think that since A is better than B in every possible world, it's better than B simpliciter.

The reshuffling example you give is an example where A is not better than B in every possible world. That's the sort of example that I claim is not realistic, i.e. not the actual situation we find ourselves in. Why? Well, that was what I tried to argue in the OP--that in the actual situation we find ourselves in, the action A that is best in the simplest hypothesis is also better.... well, oops, I guess it's not better in every possible world, but it's better in every possible finite set of possible worlds such that the set contains all the worlds simpler than its simplest member.

I'm guessing this won't be too helpful to you since, obviously, you already read the OP. But in that case I'm not sure what else to say. Let me know if you are still interested and I"ll try to rephrase things.

Sorry for taking so long to get back to you; I check this forum infrequently.

Comment author: ike 10 August 2016 07:14:11PM *  0 points [-]

The problem there, and the problem with Pascal's Mugging in general, is that outcomes with a tiny amount of probability dominate the decisions. A could be massively worse than B 99.99999% of the time, and still naive utility maximization says to pick B.

One way to fix it is to bound utility. But that has its own problems.

The problem with your solution is that it's not complete in the formal sense: you can only say some things are better than other things if they strictly dominate them, but if neither strictly dominates the other you can't say anything.

I would also claim that your solution doesn't satisfy framing invariants that all decision theories should arguably follow. For example, what about changing the order of the terms? Let us reframe utility as after probabilities, so we can move stuff around without changing numbers. E.g. if I say utility 5, p:.01, that really means you're getting utility 500 in that scenario, so it adds 5 total in expectation. Now, consider the following utilities:

1<2 p:.5

2<3 p:.5^2

3<4 p:.5^3

n<n+1 p:.5^n

...

etc. So if you're faced with choosing between something that gives you the left side or the right side, choose the right side.

But clearly re-arranging terms doesn't change the expected utility, since that's just the sum of all terms. So the above is equivalent to:

1>0 p:.5

2>0 p:.5^2

3>2 p:.5^3

4>3 p:.5^4

n>n-1 p:.5^n

So your solution is inconsistent if it satisfies the invariant of "moving around expected utility between outcomes doesn't change the best choice".

Comment author: kokotajlod 18 August 2016 03:24:22PM 0 points [-]

Again, thanks for this.

"The problem with your solution is that it's not complete in the formal sense: you can only say some things are better than other things if they strictly dominate them, but if neither strictly dominates the other you can't say anything."

As I said earlier, my solution is an argument that in every case there will be an action that strictly dominates all the others. (Or, weaker: that within the set of all hypotheses of probability less than some finite N, one action will strictly dominate all the others, and that this action will be the same action that is optimal in the most probable hypothesis.) I don't know if my argument is sound yet, but if it is, it avoids your objection, no?

I'd love to understand what you said about re-arranging terms, but I don't. Can you explain in more detail how you get from the first set of hypotheses/choices (which I understand) to the second?

Comment author: ike 06 August 2016 10:47:01PM 1 point [-]

In your example, how much should you spend to choose A over B? Would you give up an unbounded amount of utility to do so?

Comment author: kokotajlod 10 August 2016 05:37:50PM 0 points [-]

This was helpful, thanks!

As I understand it, you are proposing modifying the example so that on some H1 through HN, choosing A gives you less utility than choosing B, but then thereafter choosing B is better, because there is some cost you pay which is the same in each world.

It seems like the math tells us that any price would be worth it, that we should give up an unbounded amount of utility to choose A over B. I agree that this seems like the wrong answer. So I don't think whatever I'm proposing solves this problem.

But that's a different problem than the one I'm considering. (In the problem I'm considering, choosing A is better in every possible world.) Can you think of a way they might be parallel--any way that the "I give up" which I just said above applies to the problem I'm considering too?

Comment author: Val 05 August 2016 09:56:06PM 0 points [-]

Let's be conservative and say the ratio is 1 in a billion.

Why?

Why not 1 in 10? Or 1 in 3^^^^^^^^3?

Choosing an arbitrary probability has good chances of leading us unknowingly into circular reasoning. I've seen too many cases of using for example Bayesian reasoning about something we have no information about, which went like "assuming the initial probability was x", getting some result after a lot of calculations, then defending the result to be accurate because the Bayesian rule was applied so it must be infallible.

Comment author: kokotajlod 06 August 2016 02:17:31PM 0 points [-]

It's arbitrary, but that's OK in this context. If I can establish that this works when the ratio is 1 in a billion, or lower, then that's something, even if it doesn't work when the ratio is 1 in 10.

Especially since the whole point is to figure out what happens when all these numbers go to extremes--when the scenarios are extremely improbable, when the payoffs are extremely huge, etc. The cases where the probabilities are 1 in 10 (or arguably even 1 in a billion) are irrelevant.

Comment author: ike 05 August 2016 12:24:21AM *  1 point [-]

See https://arxiv.org/abs/0712.4318 , you need to formally reply to that.

Comment author: kokotajlod 05 August 2016 05:03:22PM 0 points [-]

Update: The conclusion of that article is that the expected utilities don't converge for any utility function that is bounded below by a computable, unbounded utility function. That might not actually be in conflict with the idea I'm grasping at here.

The idea I'm trying to get at here is that maybe even if EU doesn't converge in the sense of assigning a definite finite value to each action, maybe it nevertheless ranks each action as better or worse than the others, by a certain proportion.

Toy model:

The only hypotheses you consider are H1, H2, H3, ... etc. You assign 0.5 probability to H1, and each HN+1 has half the probability of the previous hypothesis, HN.

There are only two possible actions: A or B. H1 says that A gives you 2 utils and B gives you 1. Each HN+1 says that A gives you 10 times as many utils as it did under the previous hypothesis, HN, and moreover that B gives you 5 times as many utils as it did under the previous hypothesis, HN.

In this toy model, expected utilities do not converge, but rather diverge to infinity, for both A and B.

Yet clearly A is better than B...

I suppose one could argue that the expected utility of both A and B is infinite and thus that we don't have a good reason to prefer A to B. But that seems like a problem with our ability to handle infinity, rather than a problem with our utility function or hypothesis space.

View more: Next