Result
It is impossible to verify that an agent is vNM-rational by observing its actions without access to the domain of its utility function.


Motivation
Alphonso and Beatriz both go the market to buy fruit.
Alphonso prefers grapes to oranges.
He fills his basket with grapes and pays for them.

Beatriz carefully picks through the fruit and purchases some oranges and some grapes.
Callisto arrives with a package of grapes.

"Say, Beatriz, would you like to trade some of your oranges for this package of grapes?" Callisto offers.

"Gladly." Beatriz replies, exchanging some of her oranges for the grapes.

A few moments later, Alphonso notices Beatriz giving Deion some grapes in exchange for some oranges.

"You are acting irrationally, Beatriz!" Alphonso exclaims. "Your unstable preference between oranges and grapes makes it possible for a malicious agent to exploit you and exhaust your entire grocery budget!

"Ah, but I am acting rationally." Beatriz replied with a smile. "I prefer fruit that is fresh enough to last more than seven days. Thus, I trade away fruit that will spoil before that time."

Explanation
Consider an agent A.
We are interested in verifying whether or not A is vNM-rational.
However, we are only able to observe A's decisions without any access to the domain of A's utility function.

Without this access, it is impossible to distinguish between vNM-irrational choices (i.e. choices that violate one of the axioms of vNM-rationality) and choices that are vNM-rational but made under an unexpected ontology.


In other words, we need to know how A perceives outcomes of the world before we can verify that A's preferences over those outcomes are vNM-rational.

New Comment
5 comments, sorted by Click to highlight new comments since:

I was just thinking about this earlier today while re-reading a similar point by stuart armstrong.

See also my comment here on non-exploitability.

Hmm. Under what conditions does it matter that an agent is vNM-rational without any more information about its goals? instrumental rationality is defined in terms of goals, it's not clear that it's even meaningful to talk about it when the goals aren't fixed, or meaningful to talk about knowing it without knowing the goals.

Nitpick: I think the intro example would be clearer if there were explicit numbers of grapes/oranges rather than "some". Nothing is surprising about the original story if Beatriz got more oranges from Deion than she gave up to Callisto. (Or gave away fewer grapes to Deion than she received from Callisto.)

Could you explain the difference (or relationship) between ontology and a utility function? Is there a reason you change between the two? And I thought ontology is more to do with what exists - would "axiology" be a better word?