CO2 emissions have the virtues that they are both easy to measure and their effects are roughly linear.* I don't see a similar thing being true for perceived risk, and I think conserved budgets are probably worse than overall preferences.
First: measuring probabilities of world destruction is very hard; being able to measure them at the 1e-12 level seems very, very hard, especially if most probabilities of world destruction are based around conflict. ("Will threatening my opponent here increase or decrease the probably of the world ending?")
Second: suppose we grant that the system has the ability to measure the probability of the world being destroyed, to arbitrary precision. How should it decide what budget level to give itself? (Suppose it's the original agent, instead of one handed a budget by its creator.)
To make it easier to think about, you can reformulate the question in terms of your own life. You can take actions that increase the chance that you die sooner rather than later, and gain some benefit from doing so. (Perhaps you decide to drive to a movie theater to see a new movie instead of something on Netflix.)
But now a few interesting things pop up. One, it looks like simple utility maximization (go to the movie if the benefits outweigh the costs) gives the right answer, and being more or less cautious than that suggests is a mistake (at least, of how the utility is measured).
Two, the budget replenishes. If I go to the theater on Friday and come back unharmed, then from the perspective of Thursday!me I took on some risk, but from the perspective of Saturday!me that risk turned out to not cost anything. That is, Thursday!me thinks I'm picking up 1e-7 in additional risk but Saturday!me knows that I survived, and still has '100%' of risk to allocate anew.
So I think budgets are the wrong way to think about this--they rely too heavily on subjective perceptions of risk, they encourage being too cautious (or too risky) instead of seeing tail risks as linear in probability, and they don't update on survival when they should.
*I don't mean that the overall effect of CO2 emissions are linear, which seems false, but instead that participants are small enough relative to overall CO2 production that they don't expect their choices to affect the overall CO2 price, and thus the price is linear for them individually.
I do not argue that my idea is sane; however I think your critique doesn't do it justice. So let me briefly point out that:
measuring probabilities of world destruction is very hard; being able to measure them at the 1e-12 level seems very, very hard
It's enough to use upper bounds. If we have e.g. an additional module to check our AI source code for errors, and such a module decreases probability of one of the bits being flipped, we can use our risk budget to calculate how many modules at minimum we need. Etc.
...How should it decide what budget level to
Time start: 18:17:30
I
This idea is probably going to sound pretty crazy. As far as seemingly crazy ideas go, it's high up there. But I think it is interesting enough to at least amuse you for a moment, and upon consideration your impression might change. (Maybe.) And as a benefit, it offers some insight into AI problems if you are into that.
(This insight into AI may or may not be new. I am not an expert on AI theory, so I wouldn't know. It's elementary, so probably not new.)
So here it goes, in short form on which I will expand in a moment:
To manage global risks to humanity, they can be captured in "risk contracts", freely tradeable on the market. Risk contracts would serve the same role as CO2 emissions contracts, which can likewise be traded, and ensure that the global norm is not exceeded as long as everyone plays along with the rules.
So e.g. if I want to run a dangerous experiment that might destroy the world, it's totally OK as long as I can purchase enough of a risk budget. Pretty crazy, isn't it?
As an added bonus, a risk contract can take into account the risk of someone else breaking the terms of contract. When you trasfer your rights to global risk, the contract obliges you to diminish the amount you transfer by the uncertainty about the other party being able to fullfill all obligations that come with such a contract. Or if you have not enough risk budget for this, you cannot transfer to that person.
II
Let's go a little bit more into detail about a risk contract. Note that this is supposed to illustrate the idea, not be a final say on the shape and terms of such a contract.
Just to give you some idea, here are some example rules (with lots of room to specify them more clearly etc., it's really just so that you have a clearer idea of what I mean by a "risk contract"):
III
Of course, the application of this could be wider than just an AI which might recursively self-improve - some more "normal" human applications could be risk management in a company or government, or even using risk contract as an internal currency to make better decisions.
I admit though, that the AI case is pretty special - it gives an opportunity to actually control the ability of another agent to keep a risk contract that we are giving to them.
It is an interesting calculation to see roughly what are the costs of keeping a risk contract in the recursive AI case, with a lot of simplifying assumptions. Assume that to reduce risk of child AI going off the rails can be reduced by a constant factor (e.g. have it cut by half) by putting in an additional unit of work. Also assume the chain of child AIs might continue indefinitely, and no later AI will assume a finite ending of it. Then if the chain has no branches, we are basically reduced to a power series: the risk budget of a child AI is always the same fraction of its parent's budget. That means we need linearly increasing amount of work on safety at each step. That in turn means that the total amount of work on safety is quadratic in the number of steps (child AIs).
Time end: 18:52:01
Writing stats: 21 wpm, 115 cpm (previous: 30/167, 33/183, 23/128)