'a' should use a randomizing device so that he pays 51% of the time and refuses 49% of the time. Omega, aware of this strategy, but presumably unable to hack the randomizing device, achieves the best score by predicting 'pay' 100% of the time.
I am making an assumption here about Omega's cost function - i.e. that Type 1 and Type 2 errors are equally undesirable. So, I agree with cousin_it that the problem is underspecified.
The constraint P(o=AWARD) = P(a=PAY) that appears in the diagram does not seem to match the problem statement. It is also ambiguous. Are those subjective probabilities? If so, which agent forms those probabilities? And, as cousin_it points out, we also need to know the joint probability P(o=REWARD&a=PAY) or a conditional probability P(o=REWARD | a=PAY)
'a' should use a randomizing device so that he pays 51% of the time and refuses 49% of the time. Omega, aware of this strategy, but presumably unable to hack the randomizing device, achieves the best score by predicting 'pay' 100% of the time.
Apply any of the standard fine print for Omega based conterfactuals with respect for people who try to game the system with randomization. Depending on the version that means a payoff of $0, a payoff of 0.51 * $1,000 or an outright punishment for being a nuisance.
This problem is roughly isomorphic to the branch of Transparent Newcomb (version 1, version 2) where box B is empty, but it's simpler.
Here's a diagram: