SquirrelInHell comments on Risk Contracts: A Crackpot Idea to Save the World - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (35)
But where did that number come from? At some point, an intelligent system that was not handed a budget selects a budget for itself. Presumably the number is set according to some cost-benefit criterion, instead of chosen because it's three hands worth of fingers in a log scale based on two hands worth of fingers.
If it isn't, how do you expect the agent to actually stick to such a budget?
I understood your proposal. My point is that it doesn't carve reality at the joints: if you play six-chambered Russian Roulette once, then one sixth of your future vanishes, but given that it came up empty, then you still have 100% of your future, because conditioning on the past in the branch where you survive eliminates the branch where you fail to survive.
What you're proposing is a rule where, if your budget starts off at 1, you only play it six times over your life. But if it makes sense to play it once, it might make sense to play it many times--playing it seven times, for example, still gives you a 28% chance of survival (assuming the chambers are randomized after every trigger pull).
Which suggests a better way to point out what I want to point out--you're subtracting probabilities when it makes sense to multiply probabilities. You're penalizing later risks as if they were the first risk to occur, which leads to double-counting, and means the system is vulnerable to redefinitions. If I view the seven pulls as independent events, it depletes my budget by 7/6, but if I treat them as one event, it depletes my budget by only 1-(5/6)^7, which is about 72%.
Of course, my point is to build all intelligent systems so that they do not hand themselves a new budget, with probability that is within our risk budget (which we choose arbitrarily).
I hope that survival of humanity dominates the utility function of people who build AI, and they will do their best to carry it over to the AI. You can individually have another utility function, if it serves you well in your life. (As long as you won't build any AIs). But that was a wrong way to answer your previous point:
Not in case of multiple agents, who cannot easily coordinate. E.g. what if each human's utility function makes it look reasonable to have a 1/1000 risk of destroying the world, for potential huge personal gains?
I am well aware of this, but the effect is negligible if we speak of small probabilities.