You don't build any intelligent system without a risk budget. Initial budgets are distributed to humans, e.g. 10^-15 to each human alive in 2016.
But where did that number come from? At some point, an intelligent system that was not handed a budget selects a budget for itself. Presumably the number is set according to some cost-benefit criterion, instead of chosen because it's three hands worth of fingers in a log scale based on two hands worth of fingers.
Whether or not your utility is dominated by survival of humanity is an individual question.
If it isn't, how do you expect the agent to actually stick to such a budget?
Not at all. A risk budget is decreased by your best estimate of your total risk "emission", which is what fraction of the future multiverse (weighted by probability) you spoiled.
I understood your proposal. My point is that it doesn't carve reality at the joints: if you play six-chambered Russian Roulette once, then one sixth of your future vanishes, but given that it came up empty, then you still have 100% of your future, because conditioning on the past in the branch where you survive eliminates the branch where you fail to survive.
What you're proposing is a rule where, if your budget starts off at 1, you only play it six times over your life. But if it makes sense to play it once, it might make sense to play it many times--playing it seven times, for example, still gives you a 28% chance of survival (assuming the chambers are randomized after every trigger pull).
Which suggests a better way to point out what I want to point out--you're subtracting probabilities when it makes sense to multiply probabilities. You're penalizing later risks as if they were the first risk to occur, which leads to double-counting, and means the system is vulnerable to redefinitions. If I view the seven pulls as independent events, it depletes my budget by 7/6, but if I treat them as one event, it depletes my budget by only 1-(5/6)^7, which is about 72%.
But where did that number come from? At some point, an intelligent system that was not handed a budget selects a budget for itself. Presumably the number is set according to some cost-benefit criterion, instead of chosen because it's three hands worth of fingers in a log scale based on two hands worth of fingers.
Of course, my point is to build all intelligent systems so that they do not hand themselves a new budget, with probability that is within our risk budget (which we choose arbitrarily).
...If it isn't, how do you expect the agent to actually stick
Time start: 18:17:30
I
This idea is probably going to sound pretty crazy. As far as seemingly crazy ideas go, it's high up there. But I think it is interesting enough to at least amuse you for a moment, and upon consideration your impression might change. (Maybe.) And as a benefit, it offers some insight into AI problems if you are into that.
(This insight into AI may or may not be new. I am not an expert on AI theory, so I wouldn't know. It's elementary, so probably not new.)
So here it goes, in short form on which I will expand in a moment:
To manage global risks to humanity, they can be captured in "risk contracts", freely tradeable on the market. Risk contracts would serve the same role as CO2 emissions contracts, which can likewise be traded, and ensure that the global norm is not exceeded as long as everyone plays along with the rules.
So e.g. if I want to run a dangerous experiment that might destroy the world, it's totally OK as long as I can purchase enough of a risk budget. Pretty crazy, isn't it?
As an added bonus, a risk contract can take into account the risk of someone else breaking the terms of contract. When you trasfer your rights to global risk, the contract obliges you to diminish the amount you transfer by the uncertainty about the other party being able to fullfill all obligations that come with such a contract. Or if you have not enough risk budget for this, you cannot transfer to that person.
II
Let's go a little bit more into detail about a risk contract. Note that this is supposed to illustrate the idea, not be a final say on the shape and terms of such a contract.
Just to give you some idea, here are some example rules (with lots of room to specify them more clearly etc., it's really just so that you have a clearer idea of what I mean by a "risk contract"):
III
Of course, the application of this could be wider than just an AI which might recursively self-improve - some more "normal" human applications could be risk management in a company or government, or even using risk contract as an internal currency to make better decisions.
I admit though, that the AI case is pretty special - it gives an opportunity to actually control the ability of another agent to keep a risk contract that we are giving to them.
It is an interesting calculation to see roughly what are the costs of keeping a risk contract in the recursive AI case, with a lot of simplifying assumptions. Assume that to reduce risk of child AI going off the rails can be reduced by a constant factor (e.g. have it cut by half) by putting in an additional unit of work. Also assume the chain of child AIs might continue indefinitely, and no later AI will assume a finite ending of it. Then if the chain has no branches, we are basically reduced to a power series: the risk budget of a child AI is always the same fraction of its parent's budget. That means we need linearly increasing amount of work on safety at each step. That in turn means that the total amount of work on safety is quadratic in the number of steps (child AIs).
Time end: 18:52:01
Writing stats: 21 wpm, 115 cpm (previous: 30/167, 33/183, 23/128)