kokotajlod

Wiki Contributions

Comments

Sorted by

Sweet. I too will write something not about coronavirus.

"It's a badly formulated question, likely to lead to confusion." Why? That's precisely what I'm denying.

"So, can you specify what this cluster is? Can you list the criteria by which a behaviour would be included in or excluded from this cluster? If you do this, you have defined blackmail."

That's precisely what I (Stuart really) am trying to do! I said so, you even quoted me saying so, and as I interpret him, Stuart said so too in the OP. I don't care about the word blackmail except as a means to an end; I'm trying to come up with criteria by which to separate the bad behaviors from the good.

I'm honestly baffled at this whole conversation. What Stuart is doing seems the opposite of confused to me.

Dunno what Username was thinking, but here's the answer I had in mind: "Why is it obvious? Because the Problem of Induction has not yet been solved."

You make it sound like those two things are mutually exclusive. They aren't. We are trying to define words so that we can understand and manipulate behavior.

"I don't know what blackmail is, but I want to make sure an AI doesn't do it." Yes, exactly, as long as you interpret it in the way I explained it above.* What's wrong with that? Isn't that exactly what the AI safety project is, in general? "I don't know what bad behaviors are, but I want to make sure the AI doesn't do them."

*"In other words there are a cluster of behaviors that we do NOT want our AI to have, which seem blackmailish to us, and a cluster of behaviors that we DO want it to have, which seem tradeish to us. So we are now trying to draw a line in conceptual space between them so that we can figure out how to program an AI appropriately."

"You want to understand and prevent some behaviors (in which case, start by tabooing culturally-dense words like "blackmail")"

In a sense, that's exactly what Stuart was doing all along. The whole point of this post was to come up with a rigorous definition of blackmail, i.e. to find a way to say what we wanted to say without using the word.

As I understand it, the idea is that we want to design an AI that is difficult or impossible to blackmail, but which makes a good trading partner.

In other words there are a cluster of behaviors that we do NOT want our AI to have, which seem blackmailish to us, and a cluster of behaviors that we DO want it to have, which seem tradeish to us. So we are now trying to draw a line in conceptual space between them so that we can figure out how to program an AI appropriately.

OH ok I get it now: "But clearly re-arranging terms doesn't change the expected utility, since that's just the sum of all terms." That's what I guess I have to deny. Or rather, I accept that (I agree that EU = infinity for both A and B) but I think that since A is better than B in every possible world, it's better than B simpliciter.

The reshuffling example you give is an example where A is not better than B in every possible world. That's the sort of example that I claim is not realistic, i.e. not the actual situation we find ourselves in. Why? Well, that was what I tried to argue in the OP--that in the actual situation we find ourselves in, the action A that is best in the simplest hypothesis is also better.... well, oops, I guess it's not better in every possible world, but it's better in every possible finite set of possible worlds such that the set contains all the worlds simpler than its simplest member.

I'm guessing this won't be too helpful to you since, obviously, you already read the OP. But in that case I'm not sure what else to say. Let me know if you are still interested and I"ll try to rephrase things.

Sorry for taking so long to get back to you; I check this forum infrequently.

Again, thanks for this.

"The problem with your solution is that it's not complete in the formal sense: you can only say some things are better than other things if they strictly dominate them, but if neither strictly dominates the other you can't say anything."

As I said earlier, my solution is an argument that in every case there will be an action that strictly dominates all the others. (Or, weaker: that within the set of all hypotheses of probability less than some finite N, one action will strictly dominate all the others, and that this action will be the same action that is optimal in the most probable hypothesis.) I don't know if my argument is sound yet, but if it is, it avoids your objection, no?

I'd love to understand what you said about re-arranging terms, but I don't. Can you explain in more detail how you get from the first set of hypotheses/choices (which I understand) to the second?

This was helpful, thanks!

As I understand it, you are proposing modifying the example so that on some H1 through HN, choosing A gives you less utility than choosing B, but then thereafter choosing B is better, because there is some cost you pay which is the same in each world.

It seems like the math tells us that any price would be worth it, that we should give up an unbounded amount of utility to choose A over B. I agree that this seems like the wrong answer. So I don't think whatever I'm proposing solves this problem.

But that's a different problem than the one I'm considering. (In the problem I'm considering, choosing A is better in every possible world.) Can you think of a way they might be parallel--any way that the "I give up" which I just said above applies to the problem I'm considering too?

It's arbitrary, but that's OK in this context. If I can establish that this works when the ratio is 1 in a billion, or lower, then that's something, even if it doesn't work when the ratio is 1 in 10.

Especially since the whole point is to figure out what happens when all these numbers go to extremes--when the scenarios are extremely improbable, when the payoffs are extremely huge, etc. The cases where the probabilities are 1 in 10 (or arguably even 1 in a billion) are irrelevant.

Load More