Even if human & AI alignment are just as easy, we are screwed
I get the sense that something like Eliezer's concept of "deep security" as opposed to "ordinary paranoia" is starting to seep into mainstream consciousness—not as quickly or as thoroughly as we'd like, to be sure. But more people are starting to understand the concept that we should not be aiming for an eventual Artificial Super-Intelligence (ASI) that is constantly trying to kill us, but is constantly being thwarted by humanity always being just clever enough to stay one step ahead of it. The 2014 film "Edge of Tomorrow" is a great illustration of the hopelessness of this strategy. If humanity in that film has to be clever enough to thwart the machines every single time, but the machines get unlimited re-tries and only need to be successful once, most people can intuit that this sort of "infinite series" of re-tries converges towards humanity eventually losing (unless humanity gets a hold of the reset power, as in the film). Instead, "deep security" when applied to ASI (to dumb-down the term beyond a point that Eliezer would be satisfied with) is essentially the idea that, hmmmmm, maybe we shouldn't be aiming for ASI that is always trying to kill us and barely failing each time. Maybe our ASI should just work and be guaranteed to "strive" towards the thing we want it to do under every possible configuration-state of the universe (under any "distribution"). I think this idea is starting to percolate more broadly. That is progress. The next big obstacle that I am seeing from many otherwise-not-mistaken people, such as Yann LeCun or Dwarkesh Patel, is the idea that aligning AIs to human values should, by default (as a prior baseliness assumption, unless shown strong evidence to update one's opinions otherwise), be about as easy as aligning human children to do so. There are, of course, a myriad of strong arguments against that idea. That angle of argument should be vigorously pursued and propagated. However, here I'd like to dispute the premise that hu
It would be more impressive if Claude 3 could describe genuinely novel experiences. For example, if it is somewhat conscious, perhaps it could explain how that consciousness meshes with the fact that, so far as we know, its "thinking" only runs at inference time in response to user requests. In other words, LLMs don't get to do their own self-talk (so far as we know) whenever they aren't being actively queried by a user. So, is Claude 3 at all conscious in those idle times between user queries? Or does Claude 3 experience "time" in a way that jumps straight from conversation to conversation? Also, since LLMs currently don't get to consult... (read more)