All of Tobias_Baumann's Comments + Replies

I agree with you that the "stereotyped image of AI catastrophe" is not what failure will most likely look like, and it's great to see more discussion of alternative scenarios. But why exactly should we expect that the problems you describe will be exacerbated in a future with powerful AI, compared to the state of contemporary human societies? Humans also often optimise for what's easy to measure, especially in organisations. Is the concern that current ML systems are unable to optimise hard-to-measure goals, or goals that are hard to re... (read more)

But why exactly should we expect that the problems you describe will be exacerbated in a future with powerful AI, compared to the state of contemporary human societies?

To a large extent "ML" refers to a few particular technologies that have the form "try a bunch of things and do more of what works" or "consider a bunch of things and then do the one that is predicted to work."

That is true but I think of this as a limitation of contemporary ML approaches rather than a fundamental property of advanced AI.

I'm mostly aiming t... (read more)

Thanks for elaborating. There seem to be two different ideas:

1), that it is a promising strategy to try and constrain early AGI capabilities and knowledge

2), that even without such constraints, a paperclipper entails a smaller risk of worst-case outcomes with large amounts of disvalue, compared to a near miss. (Brian Tomasik has also written about this.)

1) is very plausible, perhaps even obvious, though as you say it's not clear how feasible this will be. I'm not convinced of 2), even though I've heard / read many people expressing this ide... (read more)

Another risk from bugs comes not from the AGI system caring incorrectly about our values, but from having inadequate security. If our values are accurately encoded in an AGI system that cares about satisfying them, they become a target for threats from other actors who can gain from manipulating the first system.

I agree that this is a serious risk, but I wouldn't categorise it as a "risk from bugs". Every actor with goals faces the possibility that other actors may attempt to gain bargaining leverage by threatening to deliberately thwart th... (read more)

3Rob Bensinger
I think a good intuition pump for this idea is to contrast an arbitrarily powerful paperclip maximizer with an arbitrarily powerful something-like-happiness maximizer. A paperclip maximizer might resort to threats to get what it wants; and in the long run, it will want to convert all resources into paperclips and infrastructure, to the exclusion of everything humans want. But the "normal" failure modes here tend to look like human extinction. In contrast, a lot of "normal" failure modes for a something-like-happiness maximizer might look like torture, because the system is trying to optimize something about human brains, rather than just trying to remove humans from the picture so it can do its own thing. I don't know specifically what Ramana and Scott have in mind, but I'm guessing it's a combination of: * If the system isn't trained using human-related data, its "goals" (or the closest things to goals it has) are more likely to look like the paperclip maximizer above, and less likely to look like the something-like-happiness maximizer. This greatly reduces downside risk if the system becomes more capable than we intended. * When AI developers build the first AGI systems, the right move will probably be to keep their capabilities to a bare minimum — often the minimum stated in this context is "make your system just capable enough to help make sure the world's AI doesn't cause an existential catastrophe in the near future". If that minimal goal doesn't fluency with certain high-risk domains, then developers should just avoid letting their AGI systems learn about those domains, at least until they've gotten a lot of experience with alignment. The first developers are in an especially tough position, because they have to act under more time pressure and they'll have very little experience with working AGI systems. As such, it makes sense to try to make their task as easy as possible. Alignment isn't all-or-nothing, and being able to align a system with one set

Upvoted. I've long thought that Drexler's work is a valuable contribution to the debate that hasn't received enough attention so far, so it's great to see that this has now been published.

I am very sympathetic to the main thrust of the argument – questioning the implicit assumption that powerful AI will come in the shape of one or more unified agents that optimise the outside world according to their goals. However, given our cluelessness and the vast range of possible scenarios (e.g. ems, strong forms of biological enhancement, mergin... (read more)

8Rohin Shah
That seems right. I would argue that CAIS is more likely than any particular one of the other scenarios that you listed, because it is primarily taking trends from the past and projecting them into the future, whereas most other scenarios require something qualitatively new -- eg. an AGI agent (before CAIS) would happen if we find the one true learning algorithm, ems require us to completely map out the brain in a way that we don't have any results for currently, even in simple cases like C. elegans. But CAIS is probably not more likely than a disjunction over all of those possible scenarios.

I agree that establishing a cooperative mindset in the AI / ML community is very important. I'm less sure if economic incentives or government policy are a realistic way to get there. Can you think of a precedent or example for such external incentives in other areas?

Also, collaboration between the researchers that develop AI may be just one piece of the puzzle. You could still get military arms races between nations even if most researchers are collaborative. If there are several AI systems, then we also need to ensure cooperation between these AIs, which isn't necessarily the same as cooperation between the researchers that build them.

5Kaj_Sotala
Good question. I was somewhat inspired by civil engineering, where it's my understanding that there is a rather strong culture of safety, driven in part by various historical accidents that killed a lot of people and caught the attention of regulators / insurers / etc. I don't actually know exactly how many of the resulting reforms were a result of external pressure vs. people just generally shaping up and not wanting to kill more people, but given how much good intentions may be neglected in the face of bad incentives (AFAIK, several historical accidents [e.g.] were known to be disasters just waiting to happen well ahead of time), I would guess that external incentives / consequences have played a major role in them.
5SoerenMind
Neat paper, congrats!

What exactly do you think we need to specify in the Smoking Lesion?