crispweed - LessWrong

The Alignment Trap: AI Safety as Path to Power

There is the point about offensive/defensive asymmetry..

The Alignment Trap: AI Safety as Path to Power

how much earlier

Yeah, good question. I don't know really.

and does it matter?

I think so, because even if pure AI control follows on from human-AI entity control (which would actually be my prediction), I expect the dynamics of human-AI control to very much lead to and accelerate that eventual pure AI control.

I'm thinking, also, that there is a thing where pure AI entities need to be careful not to 'tip their hat'. What I mean by this is that pure AI entities will need to be careful not to reveal the extent of their capabilities up until a point where they are actually capable of taking control, whereas human-AI entities can kind of go ahead and play the power game and start to build up control without so much concern about this. (To the average voter, this could just look like more of the same.)

The Alignment Trap: AI Safety as Path to Power

crispweed22d10

Is the sentence “in reality we should expect combined human-AI entities to reach dangerous capabilities before pure artificial intelligence” really true, and if so how much earlier and does it matter? (I lean towards “not necessarily true in the first place, and if true, probably not by much, and it’s not all that important”)

I guess in my model this is not something that suddenly becomes true at a certain level of capabilities. Instead, I think that the capabilities of human-AI entities become more dangerous in something of a continuous fashion as AI (and the technology for controlling AI) improves.

We might be missing some key feature of AI takeoff; it'll probably seem like "we could've seen this coming"

crispweed6mo10

In this blog post, I argue that a key feature we might be missing is that dangerous AI could potentially be a lot less capable than current state of the art LLMs, in some ways (specifically, less like a polymath): https://upcoder.com/21/is-there-a-power-play-overhang

(link post here: https://www.lesswrong.com/posts/7pdCh4MBFT6YXLL2a/is-there-a-power-play-overhang )

LESSWRONG
LW

Posts

Wiki Contributions

Comments