All of AK's Comments + Replies

In this example, he told you that you were not in one of the places you're not in (the Vulcan Desert). If he always does this, then the probability is 1/4; if you had been in the Vulcan Desert, he would have told you that you were not in one of the other three.

That can't be right -- if the probability of being in the Vulcan Mountain is 1/4 and the probability of being in the Vulcan Desert (per the guard) is 0, then the probability of being on Earth would have to be 3/4.

4query
P(vulcan mountain | you're not in vulcan desert) = 1/3 P(vulcan mountain | guard says "you're not in vulcan desert") = P(guard says "you're not in vulcan desert" | vulcan mountain) * P(vulcan mountain) / P(guard says "you're not in vulcan desert") = ((1/3) * (1/4)) / ((3/4) * (1/3)) = 1/3 Woops, you're right; nevermind! There are algorithms that do give different results, such as justinpombrio mentions above.

I'm not sure about the first case:

if you don't have a VNM utility function, you risk being mugged by wandering Bayesians

I don't see why this is true. While "VNM utility function => safe from wandering Bayesians", it's not clear to me that "no VNM utility function => vulnerable to wandering Bayesians." I think the vulnerability to wandering Bayesians comes from failing to satisfy Transitivity rather than failing to satisfy Completeness. I have not done the math on that.

But the general point, about approxima... (read more)

1johnswentworth
Your intuition about transitivity being the key requirement is a good intuition. Completeness is more of a model foundation; we need completeness in order to even have preferences which can be transitive in the first place. A failure of completeness would mean that there "aren't preferences" in some region of world-space. In practice, that's probably a failure of the model - if the real system is offered a choice, it's going to do something, even if that something amounts to really weird implied preferences. So when I talk about Dr Malicious pushing us into a region without ordered preferences, that's what I'm talking about. Even if our model contains no preferences in some region, we're still going to have some actual behavior in that region. Unless that behavior implies ordered preferences, it's going to be exploitable. As for AIs reasoning about universe-states... First, remember that there's no rule saying that the utility must depend on all of the state variables. I don't care about the exact position of every molecule in my ice cream, and that's fine. Your universe can be defined by an infinite-dimensional state vector, and your AI can be indifferent to all but the first five variables. That's fine. Other than that, the above comments on completeness still apply. Faced with a choice, the AI is going to do something. Unless its behavior implies ordered preferences, it's going to be exploitable, at least when faced with those kinds of choices. And as long as that exploitability is there, Dr Malicious will have an incentive to push the AI into the region where completeness fails. But if the AI has ordered preferences in all scenarios, Dr Malicious won't have any reason to develop peach-ice-cream-destroying nanobots, and we probably just won't need to worry about it.

Thanks for this response. On notation: I want world-states, , to be specific outcomes rather than random variables. As such, is a real number, and the expectation of a real number could only be defined as itself: in all cases. I left aside all the discussion of 'lotteries' in the VNM Wikipedia article, though maybe I ought not have done so.

I think your first two bullet points are wrong. We can't reasonably interpret ~ as 'the agent's thinking doesn't terminate'. ~ refers to indifference betwee... (read more),,,,,,