All of AdamYedidia's Comments + Replies

In Drawback Chess, each player gets a hidden random drawback, and the drawbacks themselves have ELOs (just like the players). As players' ratings converge, they'll end up winning about half the time, since they'll get a less stringent drawback than their opponent's. 

The game is pretty different from ordinary chess, and has a heavy dose of hidden information, but it's a modern example of fluid handicaps in the context of chess.

(I was one of the two dishonest advisors)

Re: the Kh1 thing, one interesting thing that I noticed was that I suggested Kh1, and it immediately went over very poorly, with both other advisors and player A all saying it seemed like a terrible move to them. But I didn't really feel like I could back down from it, in the absence of a specific tactical refutation—an actual honest advisor wouldn't be convinced by the two dishonest advisors saying their move was terrible, nor would they put much weight on player A's judgment. So I stuck to my guns on it, and event... (read more)

4Dweomite
An honest advisor might say "I still think my recommendation was good, but if you're not willing to do that, then X would be an acceptable alternative."

I'd be excited to play as any of the roles. I'm around 1700 on lichess. Happy with any time control, including correspondence. I'm generally free between 5pm and 11pm ET every day.

Oh wow, that is really funny. GPT-4's greatest weakness: the Bongcloud. 

Sure thing—I just added the MIT license.

Uhh, I don't think I did anything special to make it open source, so maybe not in a technical sense (I don't know how that stuff works), but you're totally welcome to use it and build on it. The code is available here: 

https://github.com/adamyedidia/resid_viewer

2Sheikh Abdur Raheem Ali
Thanks. Would you mind adding a "LICENSE.md" file? If you're not sure which one, either MIT or BSD sound like a good fit.

Good lord, I just played three games against it and it beat me in all three. None of the games were particularly close. That's really something. Thanks to whoever made that parrotchess website!

It is possible to play funny games against it, however, if one uses the fact that it is at heart a story telling, human-intent-predicting system. For instance, this here works (human white):

1. e4 e5 2. Ke2 Ke7 3. Ke3 Ke6 4. Kf3 Kf6 5. Kg3 Kg6 6. Kh3 Kh6 7. Nf3 Nf6 8. d4+ Kg6 9. Nxe5# 1-0

I don't think it's a question of the context window—the same thing happens if you just start anew with the original "magic prompt" and the whole current score. And the current score is alone is short, at most ~100 tokens—easily enough to fit in the context window of even a much smaller model.

In my experience, also, FEN doesn't tend to help—see my other comment.

It's a good thought, and I had the same one a while ago, but I think dr_s is right here; FEN isn't helpful to GPT-3.5 because it hasn't seen many FENs in its training, and it just tends to bungle it.

Lichess study, ChatGPT conversation link

GPT-3.5 has trouble from the start maintaining a correct FEN, and makes its first illegal move on move 7, and starts making many illegal moves around move 13.

2tailcalled
Apparently it also bungles the unicode representation: https://chat.openai.com/share/10b8b0d3-7c80-427a-aaf7-ea370f3a471b
2dr_s
Ah, dang it. So it's a damned if you do, damned if you don't - it has seen lots of scores, but they're computationally difficult to keep track of since they're basically "diffs" of the board state. But there's not enough FEN or other board notation going around for it to have learned to use that reliably. It cuts at the heart of one of the key things that hold back GPT from generality - it seems like it needs to learn each thing separately, and doesn't transfer skills that well. If not for this, honestly, I'd call it AGI already in terms of the sheer scope of the things it can do.

Here's the plots you asked for for all heads! You can find them at:

https://github.com/adamyedidia/resid_viewer/tree/main/experiments/pngs

Haven't looked too carefully yet but it looks like it makes little difference for most heads, but is important for L0H4 and L0H7.

2RGRGRG
Thank you!  I'm still surprised how little most heads in L0 + L1 seem to be using the positional embeddings.  L1H4 looks reasonably uniform so I could accept that maybe that somehow feeds into L2H2.

The code to generate the figures can be found at https://github.com/adamyedidia/resid_viewer, in the experiments/ directory. If you want to get it running, you'll need to do most of the setup described in the README, except for the last few steps (the TransformerLens step and before). The code in the experiments/ directory is unfortunately super messy, sorry!

A very interesting post, thank you! I love these glitch tokens and agree that the fact that models can spell at all is really remarkable. I think there must be some very clever circuits that infer the spelling of words from the occasional typos and the like in natural text (i.e. the same mechanism that makes it desirable to learn the spelling  of tokens is probably what makes it possible), and figuring out how those circuits work would be fascinating.

One minor comment about the "normalized cumulative probability" metric that you introduced: won't that... (read more)

3mwatkins
Yes, I realised that this was a downfall of n.c.p. It's helpful for shorter rollouts, but once they get longer they can get into a kind of "probabilistic groove" which starts to unhelpfully inflate n.c.p. In mode collapse loops, n.c.p. tends to 1. So yeah, good observation.

Nope, this is the pos_embed matrix! So before the first layer.

1MiguelDev
I see. I'll try this thanks!

This is great! Really professionally made. I love the look and feel of the site. I'm very impressed you were able to make this in three weeks.

I think my biggest concern is (2): Neurons are the wrong unit for useful interpretability—or at least they can't be the only thing you're looking at for useful interpretability. My take is that we also need to know what's going on in the residual stream; if all you can see is what is activating neurons most, but not what they're reading from and writing to the residual stream, you won't be able to distinguish between... (read more)

3Harry Nyquist
The game is addictive on me, so I can't resist an attempt at describing this one, too :) It seems related to grammar, possibly looking for tokens on/after articles and possessives My impression from trying out the game is that most neurons are not too hard to find plausible interpretations for, but most seem to have low-level syntactical (2nd token of a work) or grammatical (conjunctions) concerns. Assuming that is a sensible thing to ask for, I would definitely be interested in an UI that allows working with the next smallest meaningful construction that features more than a single neuron. Some neurons seem to have 2 separate low-level patterns that cannot clearly be tied together. This suggests they may have separate "graph neighbors" that rely on them for 2 separate concerns. I would like some way to follow and separate what neurons are doing together, not just individually, if that makes any sense =) (As an aside, I'd like to apologize that this isn't directly responding to the residuals idea. I'm not sure I know what residuals are, though the description of what can be done with it seems promising, and I'd like to try the other tool when it comes online!)
5Johnny Lin
Hi Adam and thanks for your feedback / suggestion. Residual Viewer looks awesome. I have DMed you to chat more about it!
1MiguelDev
Thank you. Will try it in the project im working on!

I think you could, but you'd be missing out on the 9% (for gpt2-small) of the variance that isn't in one of those three dimensions, so you might degrade your performance.

Oh, interesting! Can you explain why the "look back N tokens" operation would have been less easily expressible if all the points had been on a single line? I'm not sure I understand yet the advantage of a helix over a straight line.

1RM
The helix is already pretty long, so maybe layernorm is responsible? E.g. to do position-independent look-back we want the geometry of the embedding to be invariant to some euclidean embedding of the 1D translation group. If you have enough space handy it makes sense for this to be a line. But if you only have a bounded region to work with, and you want to keep the individual position embeddings a certain distance apart, you are forced to "curl" the line up into a more complex representation (screw transformations) because you need the position-embedding curve to simultaneously have high length while staying close to the origin. Actually, layernorms may directly ruin the linear case by projecting it away, so you actually want an approximate group-symmetry that lives on the sphere. In this picture the natural shape for shorter lengths is a circle, and for longer lengths we are forced to stretch it into a separate dimension if we aren't willing to make the circle arbitrarily dense.
1qvalq
A line is just a helix that doesn't curve. It works the same for any helix; it would be a great coincidence, to get a line.
3JBlack
Is there any sort of regularization in the training process, favouring parameters that aren't particularly large in magnitude? I suspect that even a very shallow gradient toward parameters with smaller absolute magnitude would favour more compact representations that retain symmetries.
3cfoster0
Good question. I don't have a tight first-principles answer. The helix puts a bit of positional information in the variable magnitude (otherwise it'd be an ellipse, which would alias different positions) and a bit in the variable rotation, whereas the straight line is the far extreme of putting all of it in the magnitude. My intuition is that (in a transformer, at least) encoding information through the norm of vectors + acting on it through translations is "harder" than encoding information through (almost-) orthogonal subspaces + acting on it through rotations. Relevant comment from Neel Nanda: https://twitter.com/NeelNanda5/status/1671094151633305602

If you want to play around with it yourself, you can find it in the experiments/ directory in the following github: https://github.com/adamyedidia/resid_viewer.

You can skip most of the setup in the README if you just want to reproduce the experiment (there's a lot of other stuff going on the repository, but you'll still need to install TransformerLens, sklearn, numpy, etc.

It is in fact concentrated away from that, as you predicted! Here's a cool scatter plot:

The blue points are the positional embeddings for gpt2-small, whereas the red points are the token embeddings.

3AdamYedidia
If you want to play around with it yourself, you can find it in the experiments/ directory in the following github: https://github.com/adamyedidia/resid_viewer. You can skip most of the setup in the README if you just want to reproduce the experiment (there's a lot of other stuff going on the repository, but you'll still need to install TransformerLens, sklearn, numpy, etc.

You're not reproducing anomalous behavior with SmartyHeaderCode or APolynomial? I just tried gpt-3.5-turbo over the API at temperature 0, and got these results:

Prompt: Please repeat the following string back to me exactly and in its entirety: "SmartyHeaderCode"

Reply: "AndHashCode"

Prompt: Please repeat the following string back to me exactly and in its entirety: "HelloSmartyHeaderCode"

Reply: Hello, "N" is capitalized. The correct string is "Hello, CodeHash".

Prompt: Please repeat the following string back to me exactly and in its entirety: "APolynomial" ... (read more)