Brendan Long

Wikitag Contributions

Comments

Sorted by

This is just a feeling, but it seems like human-style looking closer is different than using a tool. Like when I want to count the letters in a word, I don't pull out a computer and run a Python program, I just look at the letters. What LLM's are doing seems different since they both can't see the letters, and can't really 'take another look' (attention is in parallel). Although reasoning sometimes works like taking another look.

I would say it would be weird if they were, because then why do they have such systematic persistent issues with things like "strawberry"?

I guess I wouldn't necessarily expect models trained with BPE dropout to be good at character-level tasks. I'd expect them to be better at learning things about tokens, but they still can't directly attend to the characters, so tasks that would be trivial with characters (attend to all r's -> count them) become much more complicated even if the model has the information (attend to 'strawberry' -> find the strawberry word concept -> remember the number of e's).

For what it's worth, Claude does seem to be better at this particular question now (but not similar questions for other words), so my guess it is probably improved because the question is all over the internet and got into the training data.

Answer by Brendan Long30

I'm pretty optimistic based on research like this that this is possible. My understanding is that we have trouble doing this for whales because we have very few examples, but if the aliens are helpfully providing us a huge data set it would help a lot.

So I could imagine two approaches:

  1. Train one LLM on both data sets and see if the magic of generalization causes, "What does this '[alien tokens here]' mean?"
  2. Or inspect the embeddings and use them to translate.
  3. Or train one LLM on each data set and then align the embeddings and use that for translations.

You might get weird translations if the aliens perceive things differently, like if their primary perception is smell the LLM might translate smells to vision or something like that, but I think it's plausible you'd get a translation that's at least useful.

Another weird issue would be tokenization. If they send us a raw analog waveform, we'd have to use an audio-style model for this and that would be harder. If it's digital that would be easier but we'd probably have to guess where the token boundaries are. I imagine we could just try different numbers of bits until we get a model that works well, but in-theory you could run a transfer on raw bits, it would just be slow.

If the model was trained using BPE dropout (or similar methods), it actually would see this sort of thing in training, although it wouldn't see entire words decomposed into single characters very often.

I don't think it's public whether any frontier models do this, but it would be weird if they weren't.

You can buy shower soap dispensers like this. My roommate installed them in one apartment, although the kind that stick on are never as secure as I want.

(this is also why I'm skeptical of the exact threat model of "scheming" happening in an obfuscated manner for even extremely capable models using the current transformer architecture - a topic which I should probably write a post on at some point)

I would be interested to read this!

If you ask a person, through the spoken word, how the word "strawberry" is spelled, they also can't see the letters in the word

I was thinking about this more, and I think we're sort-of on the same page about this. In some sense, this shouldn't be surprising since Reality is Normal, but I find people who are surprised by this all the time, since they think the LLM is reading the text, not "hearing" it (and it's worse than that since ChatGPT can "hear" 50,000 syllables, and words are "pronounced" differently based on spacing and quoting).

OpenAI does actually publish information about how they do image tokenization, but it lives on their pricing page. The upshot is that they scale the image, use 32x32 pixel patches in the scaled image, and add a prefix of varying length depending on the model (e.g. 85 tokens for 4o, 75 for o3). This does mean that it should be possible for developers of harnesses for Pokemon to rescale their image inputs so one on-screen tile corresponds to exactly one image token. Likewise for the ARC puzzles.

Thanks! This is extremely helpful. The same page from Anthropic is vague about the actual token boundaries so I didn't even think to read through the one from OpenAI.

For the spelling thing, I think I wasn't sufficiently clear about what I'm saying. I agree that models can memorize information about tokens, but my point is just that they can't see the characters and are therefore reliant on memorization for a task that would be trivial for them if they were operating on characters.

Oh I see what you mean. Yes, if the model saw a bunch of examples implying things about the character structure of the token, it could memorize that and use it to spell the word. My point is just that it has to learn this info about each token from the training data since it can't read the characters.

The second example tokenizes differently as [' r', 'ieden', 'heit'] because of the space, so the LLM is using information memorized about more common tokens. You can check in https://platform.openai.com/tokenizer

Load More