shared a review in some private channels, might as well share it here:
The book positions itself as a middle ground between optimistic capabilities researchers striding blithely into near-certain catastrophe and pessimistic alignment researchers too concerned with dramatic abstract doom scenarios to address more realistic harms that can still be averted. When addressing the latter, Chapman constructs a hypothetical "AI goes FOOM and unleashes nanomachine death" scenario and argues that while alignment researchers are correct that we have no capacity to prev...
It is possible that the outlier dimensions are related to the LayerNorms since the layernorm gain and bias parameters often also have outlier dimensions and depart quite strongly from Gaussian statistics.
This reminds me of a LessWrong comment that I saw a few months ago:
I think at least some GPT2 models have a really high-magnitude direction in their residual stream that might be used to preserve some scale information after LayerNorm.
I am surprised that these issues would apply to, say, Google translate. Google appears unconstrained by cost or shortage of knowledgeable engineers. If Google developed a better translation model, I would expect to see it quickly integrated into the current translation interface. If some external group developed better translation models, I would expect to see them quickly acquired by Google.
AIs that are superhuman at just about any task we can (or simply bother to) define a benchmark, for
Something that I’m really confused about: what is the state of machine translation? It seems like there is massive incentive to create flawless translation models. Yet when I interact with Google translate or Twitter’s translation feature, results are not great. Are there flawless translation models that I’m not aware of? If not, why is translation lagging behind other text analysis and generation tasks?
Thank you for clarifying your intended point. I agree with the argument that playful thinking is intrinsically valuable, but still hold that the point would have been better-reinforced by including some non-mathematical examples.
I literally don’t believe this
Here are two personal examples of playful thinking without obvious applications to working on alignment:
I agree. It seems awfully convenient that the all of the “fun” described in this post involve the legibly-impressive topics of physics and mathematics. Most people, even highly technically competent people, aren’t intrinsically drawn to play with intellectually prestigious tasks. They find fun in sports, drawing, dancing, etc. Even when they take adopt an attitude of intellectual inquiry to their play, the insights generated from drawing techniques or dance moves are far less obviously applicable to working on alignment than the insights generated from stu...
See my comment on the parent.
undercuts the message to “follow your playful impulses, even if they’re silly”
That's a fine message, but it's not the message of the post. The concept described in the post is playful thinking, not fun. It does use the word "fun" in a few places where the more specific phrase would arguably have been better, so the miscommunication is probably my fault.
are far less obviously applicable to working on alignment than the insights generated from studying physics
I literally don't believe this, but even if it were true, the p...
David Chapman actually uses social media recommendation algorithms as a central example of AI that is already dangerous: https://betterwithout.ai/apocalypse-now