skunnavakkam — LessWrong

LESSWRONG
LW

I think my issue with the LW wiki is that it relies too much on Lesswrong? It seems like the expectation is you click on a tag, which then contains / is assigned to a number of LW posts, and then you read through the posts. This is not like how other wikis / encyclopedias work!

My gold standard for a technical wiki (other than wikipedia) is the chessprogramming wiki https://www.chessprogramming.org/Main_Page

skunnavakkam7mo

I agree with this

skunnavakkam7mo

i've found that the lw wiki doesn't work as a wikipedia-like resource, at least for me

skunnavakkam's Shortform

skunnavakkam

7mo

This is a special post for quick takes (aka "shortform"). Only the owner can create top-level comments.

skunnavakkam7moQuick Take

How useful is a wiki for alignment? There doesn't seem to be one now.

Replying toRedundant Attention Heads in Large Language Models For In Context Learning

skunnavakkam10mo

Redundant Attention Heads in Large Language Models For In Context Learning

norm \in \mathbf{R}, doesn't matter

Replying toExplore More: A Bag of Tricks to Keep Your Life on the Rails

skunnavakkam1y

Explore More: A Bag of Tricks to Keep Your Life on the Rails

I've found the part about applying random search to be the among the best takeaways I had from PAIR! Novelty for the sake of Novelty is not a terrible idea. Specifically, I've found that even if you don't like the things you do, it makes it much easier to then make progress towards the larger goal

Redundant Attention Heads in Large Language Models For In Context Learning

skunnavakkam

In this post, I claim a few things and offer some evidence for these claims. Among these things are:

Language models have many redundant attention heads for a given task
In context learning works through addition of features, which are learnt through Bayesian updates
The model likely breaks down the task into various subtasks, and each of these are added as features. I assume that these are taken care of through MLPs (this is also the claim that I'm least confident about)

To set some context, the task I'm going to be modelling is the task such that we give a pair of $(x, y)$ in the following format:

(x, y)\n

where for each example, $y = 2 x + 3$ . As a concrete... (read 1138 more words →)