shawnghu - LessWrong

Great idea, I'm going to try this out.

why assume AGIs will optimize for fixed goals?

I think there is a confusion in terms here.

The illustration is of a function from R to R, so in that sense it's 1-D. But the function as a vector is infinite dimensional over R.

Tips and Code for Empirical Research Workflows

shawnghu2mo10

I didn't learn about disown or nohup until recently, because there was no impetus to, because I'd been using tmux. (My workflow also otherwise depended on tmux; when developing locally I liked its method of managing terminal tabs/splits.)

The Simplest Good

shawnghu2mo10

The tricky thing about doom scenarios like this is that I'm not even sure that the AI is wrong.

Announcement: Learning Theory Online Course

shawnghu2mo10

Oops. I didn't look at the notation closely and assumed a substantially different thing based on the word "distinguishable". Oh well, I hope you guys will think my application was adequate anyway.

shawnghu's Shortform

shawnghu2mo60

Is anyone else noticing that Claude (Sonnet 3.5 new, the default on claude.ai) is a lot worse at reasoning recently? In the past five days or so its rate of completely elementary reasoning mistakes, which persist despite repeated clarification in different ways, seems to have skyrocketed for me.

Small Data

shawnghu8mo30

For the longest time, I would have used the convolutional architecture as an example of one of the few human-engineered priors that was still necessary in large scale machine learning tasks.

But in 2021, the Vision Transformer paper included the following excerpt: When trained on mid-sized datasets such as ImageNet without strong regularization, these models yield modest accuracies of a few percentage points below ResNets of comparable size. This seemingly discouraging outcome may be expected: Transformers lack some of the inductive biases inherent to CNNs, such as translation equivariance and locality, and therefore do not generalize well when trained on insufficient amounts of data. However, the picture changes if the models are trained on larger datasets (14M-300M images). We find that large scale training trumps inductive bias.

Taking the above as a given is to say, maybe ImageNet really just wasn't big enough, despite it being the biggest publicly available dataset around at the time.

But is it really in Rome? An investigation of the ROME model editing technique

shawnghu10mo32

This is a good post; it articulated several of my own critiques of the ROME paper well, and furthermore, helped save me time in understanding the nuts and bolts level stuff in the paper. It was also somewhat helpful to see the results of some of the experiments you did.

I don't believe you technically mentioned this, though you mentioned many things which are conceptually similar: observing the limitations of the ROME paper made me realize that even given ideal model-editing powers, I think that the task of editing a model's understanding is underspecified:

Any time you tell a model to believe something which is not true, typically several other things will have to change to accommodate it, but it is not clear by default how deep the rabbit hole goes. (This is something which is technically also true with human lying, or just what happens when you audit factual beliefs that are not true.) For example, if you're to say the Eiffel tower is in Rome, sure, now where in Rome is it? Supposing it's 1km north of the Colosseum, what happened to (the building which in reality actually occupies that location)? Likewise, if a person A speaks French, is it because they were born in France? If they were born in France, how did their parents get there? Maybe world history should now be different.

LESSWRONG
LW

Posts

Wikitag Contributions

Comments