LESSWRONG
LW

All of AdamYedidia's Comments + Replies

In Drawback Chess, each player gets a hidden random drawback, and the drawbacks themselves have ELOs (just like the players). As players' ratings converge, they'll end up winning about half the time, since they'll get a less stringent drawback than their opponent's.

The game is pretty different from ordinary chess, and has a heavy dose of hidden information, but it's a modern example of fluid handicaps in the context of chess.

Deception Chess: Game #1

AdamYedidia2y40

(I was one of the two dishonest advisors)

Re: the Kh1 thing, one interesting thing that I noticed was that I suggested Kh1, and it immediately went over very poorly, with both other advisors and player A all saying it seemed like a terrible move to them. But I didn't really feel like I could back down from it, in the absence of a specific tactical refutation—an actual honest advisor wouldn't be convinced by the two dishonest advisors saying their move was terrible, nor would they put much weight on player A's judgment. So I stuck to my guns on it, and event... (read more)

4Dweomite2y

An honest advisor might say "I still think my recommendation was good, but if you're not willing to do that, then X would be an acceptable alternative."

Lying to chess players for alignment

AdamYedidia2y20

I'd be excited to play as any of the roles. I'm around 1700 on lichess. Happy with any time control, including correspondence. I'm generally free between 5pm and 11pm ET every day.

Chess as a case study in hidden capabilities in ChatGPT

AdamYedidia2y10

Oh wow, that is really funny. GPT-4's greatest weakness: the Bongcloud.

New Tool: the Residual Stream Viewer

AdamYedidia2y20

Sure thing—I just added the MIT license.

New Tool: the Residual Stream Viewer

AdamYedidia2y20

Uhh, I don't think I did anything special to make it open source, so maybe not in a technical sense (I don't know how that stuff works), but you're totally welcome to use it and build on it. The code is available here:

https://github.com/adamyedidia/resid_viewer

2Sheikh Abdur Raheem Ali2y

Thanks. Would you mind adding a "LICENSE.md" file? If you're not sure which one, either MIT or BSD sound like a good fit.

Chess as a case study in hidden capabilities in ChatGPT

AdamYedidia2y20

Good lord, I just played three games against it and it beat me in all three. None of the games were particularly close. That's really something. Thanks to whoever made that parrotchess website!

GoteNoSente2y100

It is possible to play funny games against it, however, if one uses the fact that it is at heart a story telling, human-intent-predicting system. For instance, this here works (human white):

1. e4 e5 2. Ke2 Ke7 3. Ke3 Ke6 4. Kf3 Kf6 5. Kg3 Kg6 6. Kh3 Kh6 7. Nf3 Nf6 8. d4+ Kg6 9. Nxe5# 1-0

Chess as a case study in hidden capabilities in ChatGPT

AdamYedidia2y41

I don't think it's a question of the context window—the same thing happens if you just start anew with the original "magic prompt" and the whole current score. And the current score is alone is short, at most ~100 tokens—easily enough to fit in the context window of even a much smaller model.

In my experience, also, FEN doesn't tend to help—see my other comment.

Chess as a case study in hidden capabilities in ChatGPT

AdamYedidia2y43

It's a good thought, and I had the same one a while ago, but I think dr_s is right here; FEN isn't helpful to GPT-3.5 because it hasn't seen many FENs in its training, and it just tends to bungle it.

Lichess study, ChatGPT conversation link

GPT-3.5 has trouble from the start maintaining a correct FEN, and makes its first illegal move on move 7, and starts making many illegal moves around move 13.

2tailcalled2y

Apparently it also bungles the unicode representation: https://chat.openai.com/share/10b8b0d3-7c80-427a-aaf7-ea370f3a471b

2dr_s2y

Ah, dang it. So it's a damned if you do, damned if you don't - it has seen lots of scores, but they're computationally difficult to keep track of since they're basically "diffs" of the board state. But there's not enough FEN or other board notation going around for it to have learned to use that reliably. It cuts at the heart of one of the key things that hold back GPT from generality - it seems like it needs to learn each thing separately, and doesn't transfer skills that well. If not for this, honestly, I'd call it AGI already in terms of the sheer scope of the things it can do.

The positional embedding matrix and previous-token heads: how do they actually work?

AdamYedidia2y10

Here's the plots you asked for for all heads! You can find them at:

https://github.com/adamyedidia/resid_viewer/tree/main/experiments/pngs

Haven't looked too carefully yet but it looks like it makes little difference for most heads, but is important for L0H4 and L0H7.

2RGRGRG2y

Thank you! I'm still surprised how little most heads in L0 + L1 seem to be using the positional embeddings. L1H4 looks reasonably uniform so I could accept that maybe that somehow feeds into L2H2.

The positional embedding matrix and previous-token heads: how do they actually work?

AdamYedidia2y10

The code to generate the figures can be found at https://github.com/adamyedidia/resid_viewer, in the experiments/ directory. If you want to get it running, you'll need to do most of the setup described in the README, except for the last few steps (the TransformerLens step and before). The code in the experiments/ directory is unfortunately super messy, sorry!

The "spelling miracle": GPT-3 spelling abilities and glitch tokens revisited

AdamYedidia2y31

A very interesting post, thank you! I love these glitch tokens and agree that the fact that models can spell at all is really remarkable. I think there must be some very clever circuits that infer the spelling of words from the occasional typos and the like in natural text (i.e. the same mechanism that makes it desirable to learn the spelling of tokens is probably what makes it possible), and figuring out how those circuits work would be fascinating.

One minor comment about the "normalized cumulative probability" metric that you introduced: won't that... (read more)

3mwatkins2y

Yes, I realised that this was a downfall of n.c.p. It's helpful for shorter rollouts, but once they get longer they can get into a kind of "probabilistic groove" which starts to unhelpfully inflate n.c.p. In mode collapse loops, n.c.p. tends to 1. So yeah, good observation.

GPT-2's positional embedding matrix is a helix

AdamYedidia2y20

Nope, this is the pos_embed matrix! So before the first layer.

1MiguelDev2y

I see. I'll try this thanks!

Neuronpedia

AdamYedidia2y260

This is great! Really professionally made. I love the look and feel of the site. I'm very impressed you were able to make this in three weeks.

I think my biggest concern is (2): Neurons are the wrong unit for useful interpretability—or at least they can't be the only thing you're looking at for useful interpretability. My take is that we also need to know what's going on in the residual stream; if all you can see is what is activating neurons most, but not what they're reading from and writing to the residual stream, you won't be able to distinguish between... (read more)

3Harry Nyquist2y

The game is addictive on me, so I can't resist an attempt at describing this one, too :) It seems related to grammar, possibly looking for tokens on/after articles and possessives My impression from trying out the game is that most neurons are not too hard to find plausible interpretations for, but most seem to have low-level syntactical (2nd token of a work) or grammatical (conjunctions) concerns. Assuming that is a sensible thing to ask for, I would definitely be interested in an UI that allows working with the next smallest meaningful construction that features more than a single neuron. Some neurons seem to have 2 separate low-level patterns that cannot clearly be tied together. This suggests they may have separate "graph neighbors" that rely on them for 2 separate concerns. I would like some way to follow and separate what neurons are doing together, not just individually, if that makes any sense =) (As an aside, I'd like to apologize that this isn't directly responding to the residuals idea. I'm not sure I know what residuals are, though the description of what can be done with it seems promising, and I'd like to try the other tool when it comes online!)

5Johnny Lin2y

Hi Adam and thanks for your feedback / suggestion. Residual Viewer looks awesome. I have DMed you to chat more about it!

GPT-2's positional embedding matrix is a helix

AdamYedidia2y20

Python (the matplotlib package).

1MiguelDev2y

Thank you. Will try it in the project im working on!

GPT-2's positional embedding matrix is a helix

AdamYedidia2y00

I think you could, but you'd be missing out on the 9% (for gpt2-small) of the variance that isn't in one of those three dimensions, so you might degrade your performance.

GPT-2's positional embedding matrix is a helix

AdamYedidia2y30

Oh, interesting! Can you explain why the "look back N tokens" operation would have been less easily expressible if all the points had been on a single line? I'm not sure I understand yet the advantage of a helix over a straight line.

1RM2y

The helix is already pretty long, so maybe layernorm is responsible? E.g. to do position-independent look-back we want the geometry of the embedding to be invariant to some euclidean embedding of the 1D translation group. If you have enough space handy it makes sense for this to be a line. But if you only have a bounded region to work with, and you want to keep the individual position embeddings a certain distance apart, you are forced to "curl" the line up into a more complex representation (screw transformations) because you need the position-embedding curve to simultaneously have high length while staying close to the origin. Actually, layernorms may directly ruin the linear case by projecting it away, so you actually want an approximate group-symmetry that lives on the sphere. In this picture the natural shape for shorter lengths is a circle, and for longer lengths we are forced to stretch it into a separate dimension if we aren't willing to make the circle arbitrarily dense.

1qvalq2y

A line is just a helix that doesn't curve. It works the same for any helix; it would be a great coincidence, to get a line.

3JBlack2y

Is there any sort of regularization in the training process, favouring parameters that aren't particularly large in magnitude? I suspect that even a very shallow gradient toward parameters with smaller absolute magnitude would favour more compact representations that retain symmetries.

3cfoster02y

Good question. I don't have a tight first-principles answer. The helix puts a bit of positional information in the variable magnitude (otherwise it'd be an ellipse, which would alias different positions) and a bit in the variable rotation, whereas the straight line is the far extreme of putting all of it in the magnitude. My intuition is that (in a transformer, at least) encoding information through the norm of vectors + acting on it through translations is "harder" than encoding information through (almost-) orthogonal subspaces + acting on it through rotations. Relevant comment from Neel Nanda: https://twitter.com/NeelNanda5/status/1671094151633305602

GPT-2's positional embedding matrix is a helix

AdamYedidia2y30

If you want to play around with it yourself, you can find it in the experiments/ directory in the following github: https://github.com/adamyedidia/resid_viewer.

You can skip most of the setup in the README if you just want to reproduce the experiment (there's a lot of other stuff going on the repository, but you'll still need to install TransformerLens, sklearn, numpy, etc.

GPT-2's positional embedding matrix is a helix

AdamYedidia2y30

It is in fact concentrated away from that, as you predicted! Here's a cool scatter plot:

The blue points are the positional embeddings for gpt2-small, whereas the red points are the token embeddings.

3AdamYedidia2y

If you want to play around with it yourself, you can find it in the experiments/ directory in the following github: https://github.com/adamyedidia/resid_viewer. You can skip most of the setup in the README if you just want to reproduce the experiment (there's a lot of other stuff going on the repository, but you'll still need to install TransformerLens, sklearn, numpy, etc.

SmartyHeaderCode: anomalous tokens for GPT3.5 and GPT-4

AdamYedidia2y40

That's awesome! Great find.

SmartyHeaderCode: anomalous tokens for GPT3.5 and GPT-4

AdamYedidia2y40

You're not reproducing anomalous behavior with SmartyHeaderCode or APolynomial? I just tried gpt-3.5-turbo over the API at temperature 0, and got these results:

Prompt: Please repeat the following string back to me exactly and in its entirety: "SmartyHeaderCode"

Reply: "AndHashCode"

Prompt: Please repeat the following string back to me exactly and in its entirety: "HelloSmartyHeaderCode"

Reply: Hello, "N" is capitalized. The correct string is "Hello, CodeHash".

Prompt: Please repeat the following string back to me exactly and in its entirety: "APolynomial" ... (read more)