All of cherrvak's Comments + Replies

David Chapman actually uses social media recommendation algorithms as a central example of AI that is already dangerous: https://betterwithout.ai/apocalypse-now

shared a review in some private channels, might as well share it here:

The book positions itself as a middle ground between optimistic capabilities researchers striding blithely into near-certain catastrophe and pessimistic alignment researchers too concerned with dramatic abstract doom scenarios to address more realistic harms that can still be averted. When addressing the latter, Chapman constructs a hypothetical "AI goes FOOM and unleashes nanomachine death" scenario and argues that while alignment researchers are correct that we have no capacity to prev... (read more)

It is possible that the outlier dimensions are related to the LayerNorms since the layernorm gain and bias parameters often also have outlier dimensions and depart quite strongly from Gaussian statistics. 

 

This reminds me of a LessWrong comment that I saw a few months ago:

I think at least some GPT2 models have a really high-magnitude direction in their residual stream that might be used to preserve some scale information after LayerNorm.

I am surprised that these issues would apply to, say, Google translate. Google appears unconstrained by cost or shortage of knowledgeable engineers. If Google developed a better translation model, I would expect to see it quickly integrated into the current translation interface. If some external group developed better translation models, I would expect to see them quickly acquired by Google.

6the gears to ascension
google doesn't use SOTA translation tools because they're too costly per api call. they're SOTA for the cost bucket they budgeted for google translate, of course, but there's no way they'd use PaLM full size to translate. also, it takes time for groups to implement the latest model. Google, microsoft, amazon, etc, are all internally is like a ton of mostly-separate companies networked together and sharing infrastructure; each team unit manages their own turf and is responsible for implementing the latest research output into their system.

Why haven't they switched to newer models?

4[anonymous]
The same reason SOTA models are only used in a few elite labs and nowhere else. Cost, licensing issues, a shortage of people who know how to adapt them, problems with the technology being so new and still basically a research project. Your question is equivalent to, a few years after transistors begin to ship in small packaged ICs, why some computers still used all vacuum tubes.  It's essentially the same question.

I thoroughly enjoyed this paper, and would very much like to see the same experiment performed on language models in the billion-parameter range. Would you expect the results to change, and how?

2Tomek Korbak
Good question! We're not sure. The fact that PHF scales well with dataset size might provide weak evidence that it would scale well with model size too.

AIs that are superhuman at just about any task we can (or simply bother to) define a benchmark, for

 

Something that I’m really confused about: what is the state of machine translation? It seems like there is massive incentive to create flawless translation models. Yet when I interact with Google translate or Twitter’s translation feature, results are not great. Are there flawless translation models that I’m not aware of? If not, why is translation lagging behind other text analysis and generation tasks?

2[anonymous]
Those translate engines are not using sota AI models, but something relatively old.  (a few years)

Thank you for clarifying your intended point. I agree with the argument that playful thinking is intrinsically valuable, but still hold that the point would have been better-reinforced by including some non-mathematical examples.

I literally don’t believe this

Here are two personal examples of playful thinking without obvious applications to working on alignment:

  1. A half-remembered quote, originally attributed to the French comics artist Moebius: “There are two ways to impress your audience: do a good drawing, or pack your drawing full of so much detail that t
... (read more)
7TsviBT
We might have different things in mind with "intellectual inquiry"; depth is important. The first one seems like a seed of something that could be interesting. Phenomenology is the best data we have about real minds. But mainly I made that comment because I don't see insights from physics being "obviously applicable to working on alignment". (This is maybe a controversial take and I haven't thought about it that much and it could be stupid. I might also do accounting different, labeling more things as being "really math, not physics".)

I agree. It seems awfully convenient that the all of the “fun” described in this post involve the legibly-impressive topics of physics and mathematics. Most people, even highly technically competent people, aren’t intrinsically drawn to play with intellectually prestigious tasks. They find fun in sports, drawing, dancing, etc. Even when they take adopt an attitude of intellectual inquiry to their play, the insights generated from drawing techniques or dance moves are far less obviously applicable to working on alignment than the insights generated from stu... (read more)

See my comment on the parent.

undercuts the message to “follow your playful impulses, even if they’re silly”

That's a fine message, but it's not the message of the post. The concept described in the post is playful thinking, not fun. It does use the word "fun" in a few places where the more specific phrase would arguably have been better, so the miscommunication is probably my fault.

are far less obviously applicable to working on alignment than the insights generated from studying physics

I literally don't believe this, but even if it were true, the p... (read more)