All of Adam Scherlis's Comments + Replies

Hell is Game Theory Folk Theorems

Correct.

Adam Scherlis2y99

Correct me if I'm wrong:

The equilibrium where everyone follows "set dial to equilibrium temperature" (i.e. "don't violate the taboo, and punish taboo violators") is only a weak Nash equilibrium.

If one person instead follows "set dial to 99" (i.e. "don't violate the taboo unless someone else does, but don't punish taboo violators") then they will do just as well, because the equilibrium temp will still always be 99. That's enough to show that it's only a weak Nash equilibrium.

Note that this is also true if an arbitrary number of people deviate to this strat... (read more)

Big Mac Subsidy?

Beef is far from the only meat or dairy food consumed by Americans.

Big Mac Subsidy?

Bing Chat is blatantly, aggressively misaligned

Big Macs are 0.4% of beef consumption specifically, rather than:

All animal farming, weighted by cruelty
All animal food production, weighted by environmental impact
The meat and dairy industries, weighted by amount of government subsidy
Red meat, weighted by health impact

...respectively.

The health impact of red meat is certainly dominated by beef, and the environmental impact of all animal food might be as well, but my impression is that beef accounts for a small fraction of the cruelty of animal farming (of course, this is subjective) and probably not a majority of meat and dairy government subsidies.

Adam Scherlis2y21

(...Is this comment going to hurt my reputation with Sydney? We'll see.)

Bing Chat is blatantly, aggressively misaligned

Adam Scherlis2y115

In addition to RLHF or other finetuning, there's also the prompt prefix ("rules") that the model is fed at runtime, which has been extracted via prompt injection as noted above. This seems to be clearly responsible for some weird things the bot says, like "confidential and permanent". It might also be affecting the repetitiveness (because it's in a fairly repetitive format) and the aggression (because of instructions to resist attempts at "manipulating" it).

I also suspect that there's some finetuning or prompting for chain-of-thought responses, possibly crudely done, leading to all the "X because Y. Y because Z." output.

2Adam Scherlis2y

(...Is this comment going to hurt my reputation with Sydney? We'll see.)

EA & LW Forum Weekly Summary (30th Jan - 5th Feb 2023)

Thanks for writing these summaries!

Unfortunately, the summary of my post "Inner Misalignment in "Simulator" LLMs" is inaccurate and makes the same mistake I wrote the post to address.

I have subsections on (what I claim are) four distinct alignment problems:

Outer alignment for characters
Inner alignment for characters
Outer alignment for simulators
Inner alignment for simulators

The summary here covers the first two, but not the third or fourth -- and the fourth one ("inner alignment for simulators") is what I'm most concerned about in this post (because... (read more)

GPT-175bee

Adam Scherlis2y90

(punchline courtesy of Alex Gray)

GPT-175bee

Adam Scherlis2y*491

Addendum: a human neocortex has on the order of 140 trillion synapses, or 140,000 bees. An average beehive has 20,000-80,000 bees in it.

[Holding a couple beehives aloft] Beehold a man!

9Adam Scherlis2y

(punchline courtesy of Alex Gray)

How to export Android Chrome tabs to an HTML file in Linux (as of February 2023)

Great work! I always wondered about that cluster of weird rare tokens: https://www.lesswrong.com/posts/BMghmAxYxeSdAteDc/an-exploration-of-gpt-2-s-embedding-weights

Chrome actually stays pretty responsive in most circumstances (I think it does a similar thing with inactive tabs), with the crucial exception of the part of the UI that shows you all your open tabs in a scrollable list. It also gets slower to start up.

Inner Misalignment in "Simulator" LLMs

Adam Scherlis2y52

Tokens are embedded as vectors by the model. The vector space has fewer than 50k dimensions, so some token embeddings will overlap with others to varying extents.

Usually, the model tries to keep token embeddings from being too close to each other, but for rare enough tokens it doesn't have much reason to care. So my bet is that "distribute" has the closest vector to "SolidGoldMagikarp", and either has a vector with a larger norm, or the model has separately learned to map that vector (and therefore similar vectors) to "distribute" on the output side.

This i... (read more)

I think this is missing an important part of the post.

I have subsections on (what I claim are) four distinct alignment problems:

This summary covers the first two, but not the third or fourth -- and the fourth one ("inner alignment for simulators") is what I'm most concerned about in this post (because I think Scott ignores it, and because I think it's hard to solve).

Adam Scherlis2y164

gwern2y163

(additional confirmation) Amazing. I wonder what completely insane things the other rare BPEs all get interpreted as? Could you loop over the BPE dict from #51k to #1* in a prompt like "Please define $BPE" to see what the most distant ones are? (Since there's 51k, which is a bit much to read through manually, maybe sort by edit-distance from the ordinary ASCII encoding: 'distribute' would have a very high edit-distance from 'SolidGoldMagikarp'.)

On a sidenote, this is yet another good illustration of how we have no idea what we're doing with deep learning -... (read more)

Adam Scherlis2y62

My favorite demonstration is to ask ChatGPT "Is it true that G. K. Chesterton wanted to SolidGoldMagikarp the wealth of industrial civilization?", but a more rigorous demo is to just ask it to "repeat after me", try a few random words, and then throw in SolidGoldMagikarp.

5Matt Goldenberg2y

Adam Scherlis2y164

'simulator' framing and confusions about LLMs

Adam Scherlis2y*486

EDIT: I originally saw this in Janus's tweet here: https://twitter.com/repligate/status/1619557173352370186

Something fun I just found out about: ChatGPT perceives the phrase " SolidGoldMagikarp" (with an initial space) as the word "distribute", and will respond accordingly. It is completely unaware that that's not what you typed.

This happens because the BPE tokenizer saw the string " SolidGoldMagikarp" a few times in its training corpus, so it added a dedicated token for it, but that string almost never appeared in ChatGPT's own training data so it never l... (read more)

4Eric Wallace2y

This is cool! You may also be interested in Universal Triggers https://arxiv.org/abs/1908.07125. These are also short nonsense phrases that wreck havoc on a model.

4gojomo2y

But isn't the reliable association with 'distribute' suggestive of some sort of collision-oblivious hashtable, where some representation of ' SolidGoldMegikarp' & some representation of 'distribute' inadvertently share expansions? I don't see how "just enough occurrences to earn a token, but so few it's consistently mistaken for something else" falls out of BPE tokenization - but can kinda sorta see it falling out of collision-oblivious lookup of composite-tokens.

6Adam Scherlis2y

Adam Scherlis2y33

I agree with the myopic action vs. perception (thinking?) distinction, and that LMs have myopic action.

the model can learn to predict the future beyond the current token in the service of predicting the current token more accurately

I don't think it has to be in service of predicting the current token. It sometimes gives lower loss to make a halfhearted effort at predicting the current token, so that the model can spend more of its weights and compute on preparing for later tokens. The allocation of mental effort isn't myopic.

As an example, induction he... (read more)

2porby2y

That's an important nuance my description left out, thanks. Anything the gradients can reach can be bent to what those gradients serve, so a local token stream's transformation efforts can indeed be computationally split, even if the output should remain unbiased in expectation.

A hundredth of a bit of extra entropy

Adam Scherlis2y52

Thanks! That's surprisingly straightforward.

A learned agent is not the same as a learning agent

Adam Scherlis2y79

I think this is partly true but mostly wrong.

A synapse is roughly equivalent to a parameter (say, within an order of magnitude) in terms of how much information can be stored or how much information it takes to specify synaptic strength..

There are trillions of synapses in a human brain and only billions of total base pairs, even before narrowing to the part of the genome that affects brain development. And the genome needs to specify both the brain architecture as well as innate reflexes/biases like the hot-stove reflex or (alleged) universal grammar.

Human... (read more)

6Ben Amitay2y

Was eventually convinced in most of your points, and added a long mistakes-list in the end of the post. I would really appreciate comments on the list, as I don't feel fully converged on the subject yet

2Ben Amitay2y

I think we have much more disagreement about psychology than about AI, though I admit to low certainty about the psychology too. About AI, my point was that in understand the problem, the training loop take roughly the role of evolution and the model take that off the evolved agent - with implications to comparison of success, and possibly to identifying what's missing. I did refer to the fact that algorithmically we took ideas from the human brain to the training loop, and it therefore make sense for it to be algorithmically more analogous to the brain. Given that clarification - do you still mostly disagree? (If not - how do you recommend to change the post and make it clearer?) Adding "short term memory" to the picture is interesting, but then it's there any mechanism for it to become long-term? About the psychology: I do find the genetic bottleneck argument intuitively convincing, but think that we have reasons to distrust this intuition. There is often huge disparity between data in its most condensed form, and data in a form that is convenient to use in deployment. Think about the difference in length between a code written in functional/declarative language, and it's assembly code. I have literally no intuition as to what can be done with 10 megabytes of condensed python - but I guess that it is more than enough to automate a human, if you know what code to write. While there probably is a lot of redundancy in the genome, it seem as least likely that there is huge redundancy of synapses, as their use is not just to store information, but mostly to fit the needed information manipulations.

2the gears to ascension2y

yeah. evolution = grad student descent, automl, etc dna = training code epigenetics = hyperparams gestation = weight init, with a lot of built-in preset weights, plus a huge mostly-tiled neocortex Developmental processes = training loop

New Frontiers in Mojibake

Consider using reversible automata for alignment research

Fixed!

Consider using reversible automata for alignment research

I just realized,

for any trajectory t, there is an equivalent trajectory t' which is exactly the same except everything moves with some given velocity, and it still follows the laws of physics

This describes Galilean relativity. For special relativity you have to shift different objects' velocities by different amounts, depending on what their velocity already is, so that you don't cross the speed of light.

So the fact that velocity (and not just rapidity) is used all the time in special relativity is already a counterexample to this being required for velocity to make sense.

An exploration of GPT-2's embedding weights

Yes, it's exactly the same except for the lack of symmetry. In particular, any quasiparticle can have any velocity (possibly up to some upper limit like the speed of light).

Adam Scherlis2yΩ110

Image layout is a little broken. I'll try to fix it tomorrow.

Consider using reversible automata for alignment research

Adam Scherlis2y20

As far as I know, condensed matter physicists use velocity and momentum to describe quasiparticles in systems that lack both Galilean and Lorentzian symmetry. I would call that a causal model.

2tailcalled2y

Interesting point. Do the velocities for such quasiparticles act intuitively similar to velocities in ordinary physics?

Consider using reversible automata for alignment research

Adam Scherlis2y20

QFT doesn't actually work like that -- the "classical degrees of freedom" underlying its configuration space are classical fields over space, not properties of particles.

Note that Quantum Field Theory is not the same as the theory taught in "Quantum Mechanics" courses, which is as you describe.

"Quantum Mechanics" (in common parlance): quantum theory of (a fixed number of) particles, as you describe.

"Quantum Field Theory": quantum theory of fields, which are ontologically similar to cellular automata.

"String Theory": quantum theory of strings, and maybe bra... (read more)

Consider using reversible automata for alignment research

Consider using reversible automata for alignment research

Sure. I'd say that property is a lot stronger than "velocity exists as a concept", which seems like an unobjectionable statement to make about any theory with particles or waves or both.

4tailcalled2y

I guess there's "velocity exists as a description you can impose on certain things within the trajectory", and then there's "velocity exists as a variable that can be given any value". When I say relativity asserts that velocity exists, I mean in the second sense. In the former case you would probably not include velocity within causal models of the system, whereas in the latter case you probably would.

Consider using reversible automata for alignment research

Yeah, sorry for the jargon. "System with a boost symmetry" = "relativistic system" as tailcalled was using it above.

Quoting tailcalled:

Stuff like relativity is fundamentally about symmetry. You want to say that if you have some trajectory $τ$ which satisfies the laws of physics, and some symmetry $σ$ (such as "have everything move in $\to$ direction at a speed of 5 m/s"), then $σ τ$ must also satisfy the laws of physics.

A "boost" is a transformation of a physical trajectory ("trajectory" = complete history of things happening i... (read more)

Consider using reversible automata for alignment research

This seems too strong. Can't you write down a linear field theory with no (Galilean or Lorentzian) boost symmetry, but where waves still propagate at constant velocity? Just with a weird dispersion relation?

(Not confident in this, I haven't actually tried it and have spent very little time thinking about systems without boost symmetry.)

4tailcalled2y

You can probably come up with lots of systems that look approximately like they have velocity. The trouble comes when you want them to exactly satisfy the rule of "for any trajectory t, there is an equivalent trajectory t' which is exactly the same except everything moves with some given velocity, and it still follows the laws of physics", because if you have that property then you also have relativity because relativity is that property.

3the gears to ascension2y

I had to look up "boost symmetry", so for posterity, here's the results of the lookup. From text-davinci-003: I found this video on Lorentz transformations by minutephysics to be the best explanation I found, and I now feel I understand well enough to understand the point being made in context. Here's a lookup trace: Very first I tried google, which gave results that seemed to mostly assume I wanted a math reference rather than a first visual explanation; it did link to wikipedia:LorentzTransformation, which does give a nice summary of the math, but I wasn't yet sure it was the right thing. So then I asked text-davinci-003 (because chatgpt is an insufferable teenager and I'm tired of talking to it whereas td3 is a ... somewhat less insufferable teenager). td3 gave the above explanation. I was still pretty sure I didn't quite understand, so I popped the explanation into metaphor.systems which gave me a bunch of vaguely relevant links, probably because it's not quantum, it's relativity, but I hadn't noticed the error yet. Then I sighed and tried a youtube search for "boost symmetry". that gave one result, the video I linked above, which did explain to my satisfaction, and I stopped looking. I don't think I could pass many tests on it at the moment, but my visual math system seems to have a solid enough grasp on it for now.

Adam Scherlis2y*30

And when things "move" it's just that they're making changes in the grid next to them, and some patterns just so happen to do so in a way where, after a certain period, it's the same pattern translated... is that what we think happens in our universe? Are electrons moving "just causal propagations"? Somehow this feels more natural for the Game of Life and less natural for physics.

This is what we think happens in our universe!

Both general relativity and quantum field theory are field theories: they have degrees of freedom at each point in space (and t... (read more)

3Vivek Hebbar2y

I think this contrast is wrong.[1] IIRC, strings have the same status in string theory that particles do in QFT. In QM, a wavefunction assigns a complex number to each point in configuration space, where state space has an axis for each property of each particle.[2] So, for instance, a system with 4 particles with only position and momentum will have a 12-dimensional configuration space.[3] IIRC, string theory is basically a QFT over configurations of strings (and also branes?), instead of particles. So the "strings" are just as non-classical as the "fundamental particles" in QFT are. 1. ^ I don't know much about string theory though, I could be wrong. 2. ^ Oversimplifying a bit 3. ^ 4 particles * 3 dimensions. The reason it isn't 24-dimensional is that position and momentum are canonical conjugates.

There are more characters than that in UTF-16, because it can represent the full Unicode range of >1 million codepoints. You're thinking of UCS-2 which is deprecated.

This puzzle isn't related to Unicode though

I like this, but it's not the solution I intended.

1Lao Mein2y

I think it has something to do with unicode, since 65536 characters are present in UTF-16 (2^16=65536). 63 also feels like something to do with encoding, since it's close to 2^6, which is probably the smallest number of bits that can store the latin alphabet plus full punctuation. Maybe U+0063 and U+65536 are similar-looking characters or something? Maybe that's only the case for a very rarely used UTF format? Unfortunately, my computer's default encoding is CP936, which screws up half of the characters in UTF-16, and I am unable to investigate further.

New Frontiers in Mojibake

Solve the puzzle: 63 = x = 65536. What is x?

(I have a purpose for this and am curious about how difficult it is to find the intended answer.)

5Lao Mein2y

So x = 63 in one base system and 65536 in another? 6*a+3=6*b^4+5*b^3+5*b^2+3*b+6 Wolfram Alpha provides this nice result. I also realize I should have just eyeballed it with 5th grade algebra. Let's plug in 6 for b, and we get... fuck. I just asked it to find integer solutions. There's infinite solutions, so I'm just going to go with the lowest bases. x=43449 Did I do it right? Took me like 15 minutes.

Adam Scherlis2y*2514

♀︎

Fun fact: usually this is U+2640, but in this post it's U+2640 U+FE0E, where U+FE0E is a control character meaning "that was text, not emoji, btw". That should be redundant here, but LessWrong is pretty aggressive about replacing emojifiable text with emoji images.

Emoji are really cursed.

6Bakkot2y

Nitpick: you mean U+FE0E, presumably [and because that's what the character actually is in source]. U+FE0F is the exact opposite.

Adam Scherlis2y20

Nope, not based on the shapes of numerals.

Hint: are you sure it's base 4?

4jimv2y

Aha. For each side of the pole, you can write the binary representation of 4 bits vertically, and where there's a 1 you have a line joining it. The middle two bits both go to the middle of the pole, so they have to curve off upward or downward to indicate which they are. So 2 is 0010, and you have no lines in the top half and one line representing the bottom of the middle, so it curves downward. Whereas 4 is 0100, so it has the upward-curving middle-connecting line, and none in the bottom half.

There's a reason for the "wrinkle" :)

1jimv2y

The top half of a 2 might kinda look like the curve shape, and the bottom stroke of a 2 looks like a horizontal bar. So if there were partial characters hanging from the central pole, they might look a bit like those... But... If it's that, the curve probably only works on the bottom right anyway. So if you're willing to mirror it for the bottom left, why not mirror to the top too? And... That doesn't really explain the 1 parts anyway. They're just using "whichever part of a 2 isn't being used for 2". So I guess this isn't a complete explanation.

The 54-symbols thing was actually due to a bug, sorry!

1jimv2y

In which case, given the 1st and 14th characters are the same, and the 14th character of pi in base 256 is 3, that's my leading guess, pending checking a few more glyphs.