So8res comments on New paper from MIRI: "Toward idealized decision theory" - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (22)
I had an idea, and was wondering what its fatal flaw was. For UDT, what happens if, instead of proving theorems of the form "actionx --> utilityx" , it proves theorems of the form "PA+actionx |- utilityx"?
At a first glance, this seems to remove the problem of spurious counterfactuals implying any utility value, but there's probably something big I'm missing.
PA + "A() = x" may be inconsistent (proving, e.g., that 0=1 etc.).
What happens if it only considers the action if it both failed to find "PA+A()=x" inconsistent and found a proof that PA+A()=x proves U()=x? Do an inconsistency check first and only consider/compare the action if the inconsistency check fails.
Then you get spurious proofs of inconsistency :-)
(If PA is strong enough to prove what the agent will do, then PA + "A()=x" is only consistent for one particular action x, and the specific action x for which it is consistent will be up for grabs, depending upon which proof the inconsistency checker finds.)
Got it. Thank you!