All of Barr Detwix's Comments + Replies

Nuclear warnings have been overused a little by some actors in the past, such that there's a credible risk of someone calling the bluff and continuing research in secrecy, knowing that they will certainly get another warning first, and not immediately a nuclear response.

If you have intelligence that indicates secret ASI research but the other party denies, at which point do you fire the nukes?
I expect they would be fired too late, with many months of final warnings before.

This may have an obvious response, but I can't quite see it: If the worst possible thing is a negligible change, an easily achievable state, shouldn't an AGI want to work to prevent that catastrophic risk? Couldn't this cause terribly conflicting priorities?

If there is a minor thing that the AGI despises above all, surely some joker will make a point of trying to see what happens when they instruct their local copy of Marsupial-51B to perform the random inconsequential action.

It might be tempting to try to compromise on utopia to avoid a strong risk of the literal worst possible thing.

Apologies if there's a reason why this is obviously not a concern :)

3MichaelStJules
We'd want to pick something to 1. have badness per unit of resources (or opportunity cost) only moderately higher than any actually bad thing according to the surrogate, 2. scale like actually bad things according to the surrogate, and 3. be extraordinarily unlikely to occur otherwise. Maybe something like doing some very specific computations, or building very specific objects.
2Dawn Drescher
Yeah, that’s a known problem. I don’t quite remember what the go-to solutions where that people discussed. I think creating an s-risks is expensive, so negating the surrogate goal could also be something that is almost as expensive… But I imagine an AI would also have to be a good satisficer for this to work or it would still run into the problem with conflicting priorities. I remember Caspar Oesterheld (one of the folks who originated the idea) worrying about AI creating infinite series of surrogate goals to protect the previous surrogate goal. It’s not a deployment-ready solution in my mind, just an example of a promising research direction.