suppose that we (or Omega, since we're going to assume nigh omniscience) asked the person whether JFK was murdered by Lee Harvey Oswald or not, and if they get it wrong, then they are killed/tortured/dust-specked into oblivion/whatever.
Okay, but what is the utility function Omega is trying to optimize?
Let's say you walk up to Omega, tell it "was JFK murdered by Lee Harvey Oswald or not? And by the way, if you get this wrong, I am going to kill you/torture you/dust-spec you."
Unless we've figured out how to build safe oracles, with very high probability, Omega is not a safe oracle. Via https://arbital.com/p/instrumental_convergence/, even though Omega may or may not care if it gets tortured/dust-speced, we can assume it doesn't want to get killed. So what is it going to do?
Do you think it's going to tell you what it thinks is the true answer? Or do you think it's going to tell you the answer that will minimize the risk of it getting killed?
Subscribe to RSS Feed
= f037147d6e6c911a85753b9abdedda8d)
Challenge accepted.
https://i.imgsafe.org/914f428.png
This looks great and I can see that it should work, but I can't seem to find a formal proof. Can you explain a bit?