All of Adam Scherlis's Comments + Replies

Great questions :)

The approach here is much faster than the SGLD approach; it only takes tens or hundreds of forward passes to get a decent estimate. Maybe that's achievable in principle with SGLD, but we haven't managed it.

I like KFAC but I don't think estimating the Hessian spectrum better is a bottleneck; in our experiments on tiny models, the true Hessian didn't even always outperform the ADAM moment estimates. I like the ideas here, though!

The big downside of our approach, compared to Timaeus's, is that it underestimates basin size (overestimates comp... (read more)

I am not sure I agree :)

It is unimportant in the limit (of infinite data), but away from that limit, it is only unimportant by a factor of 1/log(data), which seems small enough to be beatable in practice in some circumstances.

The spectra of things like Hessians tend to be singular, yes, but also sort of power-law. This makes the dimensionality a bit fuzzy and (imo) makes it possible for absolute volume scale of basins to compete with dimensionality.

Essentially: it's not clear that a 301-dimensional sphere really is "bigger" than a 300-dimensional sphere, if the 300-dimensional sphere has a much larger radius. (Obviously it's true in a strict sense, but hopefully you know what I'm gesturing at here.)

I think this is correct but we're working on paper rebuttals/revisions, I'll take a closer look very soon! I think we're working along parallel lines.

In particular, I have been thinking of "measure volumes at varying cutoffs" as being more or less equivalent to "measure LLC at varying ε".

We choose expected KL divergence as a cost function because it gives a behavioral loss, just like your behavioral LLC, yes.

I can give more precise statements once I look at my notes.

If you're wondering if this has a connection to Singular Learning Theory: Yup!

In SLT terms, we've developed a method for measuring the constant (with respect to n) term in the free energy, whereas LLC measures the log(n) term. Or if you like the thermodynamic analogy, LLC is the heat capacity and log(local volume) is the Gibbs entropy.

We're now working on better methods for measuring these sorts of quantities, and on interpretability applications of them.

5Lucius Bushnaq
'Local volume' should also give a kind of upper bound on the LLC defined at finite noise though, right? Since as I understand it, what you're referring to as the volume of a behavioral region here is the same thing we define via the behavioural LLC at finite noise scale in this paper? And that's always going to be bigger or equal to the LLC taken at the same point at the same finite noise scale.
3Daniel Murfet
Indeed, very interesting!

It stops being in the interests of CATXOKLA to invite more states once they're already big enough to dominate national electoral politics.

1ZY
True; and they would only need to merge up to they reach a "swing state" type of voting distribution.

The non-CATXOKLA swing states can merge with each other and a few red and blue states to form an even bigger bloc :)

I think there's a range of stable equilibria here, depending on the sequence of merges, with the largest bloc being a majority of any size. I think they all disenfranchise someone, though.

So you can't ever get to a national popular vote, without relying on things like the NPVIC which shortsightedly miss the obvious dominating strategy of a 51% attack against American democracy.

9kjz
I could imagine this turning into a flexible system of alliances similar to the conference system in NCAA college football and other sports (see here for a nice illustrated history of the many changes over time). Just as conferences and schools negotiate membership based on the changing quality of their sports programs, ability to generate revenue, and so on, states could form coalitions that could be renegotiated based on changing populations or voter preferences. Thinking from that perspective, one potential Schelling point could be a "Northwest" coalition of WA/OR/ID/MT/WY/ND/SD/NE. This is quite well-balanced, as these states combined to give 21 EV to each candidate. And although the state populations are higher in WA/OR (12.0M) than the six red states (7.4M), the combined vote totals actually show a small lead for Trump (4.1M vs 3.9M, with more votes remaining to be counted in the blue states likely to close the gap). After this, maybe the remaining "Southwest" states (NV, UT, CO, AZ, NM) decide to join forces? Here a state by state analysis is less useful, especially since two of them still haven't been called, but the current combined vote count is a very narrow Trump lead of 4.07M to 4.05M.  The eastern half of the country seems harder to predict - clearly there are large potential blocs of blue states in the northeast and red states in the southeast, but it's harder to see clear geographical groupings that make sense. Unlikely any of this happens of course, but fun to think about.

I strongly agree with this post.

I'm not sure about this, though:

We are familiar with modular addition being performed in a circle from Nanda et al., so we were primed to spot this kind of thing — more evidence of street lighting.

It could be the streetlight effect, but it's not that surprising that we'd see this pattern repeatedly. This circular representation for modular addition is essentially the only nontrivial representation (in the group-theoretic sense) for modular addition, which is the only (simple) commutative group. It's likely to pop up in ... (read more)

I suspect a lot of this has to do with the low temperature.

The phrase "person who is not a member of the Church of Jesus Christ of Latter-day Saints" has a sort of rambling filibuster quality to it. Each word is pretty likely, in general, given the previous ones, even though the entire phrase is a bit specific. This is the bias inherent in low-temperature sampling, which tends to write itself into corners and produce long phrases full of obvious-next-words that are not necessarily themselves common phrases.

Going word by word, "person who is not a member...... (read more)

That's a reasonable argument but doesn't have much to do with the Charlie Sheen analogy.

The key difference, which I think breaks the analogy completely, is that (hypothetical therapist) Estevéz is still famous enough as a therapist for journalists to want to write about his therapy method. I think that's a big enough difference to make the analogy useless.

If Charlie Sheen had a side gig as an obscure local therapist, would journalists be justified in publicizing this fact for the sake of his patients? Maybe? It seems much less obvious than if the therapy was why they were interested!

In "no Lord hath the champion", the subject of "hath" is "champion". I think this matches the Latin, yes? "nor for a champion [is there] a lord"

1atykhyy
I suppose that is a possible reading, but in my opinion a most unnatural one. Compare: "No dog has the home". This can technically be parsed as "the home [a specific home I'm talking about] has no dog", but this would be a very weird word order in English. Furthermore, if one is making a simple general statement, which is one reading of the OP verse, one is by that token not talking of a specific something, so one does not expect a definite article: "No dog has a home" or "No home has a dog". "The" would be warranted in a didactic or normative text, e.g. "The [good] home has no dog" or "The [good] home shall have no dog," and in fact reverting the OP verse to normal word order - "The rescuer has no rescuer / The champion has no lord" - enables one to read it as a didactic statement (which seems reasonable), but inverting the word order renders such sentences unintelligible. In fact, the prior probability of the definite article in that position of the OP verse is so low that it only really registered on my mind after I read your comment. Word order in English is so strict that I was always perceiving it as "No rescuer has a rescuer / no lord has a champion", and I suspect I am not the only one. You yourself used the indefinite article in your rewording!

In that case, "journalists writing about the famous Estevéz method of therapy" would be analogous to journalists writing about Scott's "famous" psychiatric practice.

If a journalist is interested in Scott's psychiatric practice, and learns about his blog in the process of writing that article, I agree that they would probably be right to mention it in the article. But that has never happened because Scott is not famous as a psychiatrist.

-1Sherrinford
I said Estevéz because he is the less famous aspect of the person, not because I super-finetuned the analogy. Updating the trust into your therapist seems to be a legitimate interest even if he is not famous for his psychiatric theory or practice. Suppose for example that an influential and controversial (e.g. White-supremacist) politician spent half his week being a psychiatrist and the other half doing politics, but somehow doing the former anymously. I think patients might legitimately want to know that their psychiatrist is this person. This might even be true if the psychiatrist is only locally active, like the head of a KKK chapter. And journalists might then find it inappropriate to treat the two identities as completely separate. I assume there are reasons for publishing the name and reasons against. It is not clear that being a psychiatrist is always an argument against. Part of the reason is, possibly, that patients often cannot directly judge the quality of therapy. Therapy is a credence good and therapists may influence you in ways that are independent of your depression or anorexia. So having more information about your psychiatrist may be helpful. At the same time, psychiatrists try to keep their private life out of the therapy, for very good reasons. It is not completely obvious to me where journalists should draw the line.

That might be relevant if anyone is ever interested in writing an article about Scott's psychiatric practice, or if his psychiatric practice was widely publicly known. It seems less analogous to the actual situation.

To put it differently: you raise a hypothetical situation where someone has two prominent identities as a public figure. Scott only has one. Is his psychiatrist identity supposed to be Sheen or Estevéz, here?

1Sherrinford
Estevéz. If I recall this correctly, Scott thought that potential or actual patients could be influenced in their therapy by knowing his public writings. (But I may mistemember that.)
4romeostevensit
didn't know that, I heard it via Bostrom. Thanks.

Correct me if I'm wrong:

The equilibrium where everyone follows "set dial to equilibrium temperature" (i.e. "don't violate the taboo, and punish taboo violators") is only a weak Nash equilibrium.

If one person instead follows "set dial to 99" (i.e. "don't violate the taboo unless someone else does, but don't punish taboo violators") then they will do just as well, because the equilibrium temp will still always be 99. That's enough to show that it's only a weak Nash equilibrium.

Note that this is also true if an arbitrary number of people deviate to this strat... (read more)

Beef is far from the only meat or dairy food consumed by Americans.

Big Macs are 0.4% of beef consumption specifically, rather than:

  • All animal farming, weighted by cruelty
  • All animal food production, weighted by environmental impact
  • The meat and dairy industries, weighted by amount of government subsidy
  • Red meat, weighted by health impact

...respectively.

The health impact of red meat is certainly dominated by beef, and the environmental impact of all animal food might be as well, but my impression is that beef accounts for a small fraction of the cruelty of animal farming (of course, this is subjective) and probably not a majority of meat and dairy government subsidies.

(...Is this comment going to hurt my reputation with Sydney? We'll see.)

In addition to RLHF or other finetuning, there's also the prompt prefix ("rules") that the model is fed at runtime, which has been extracted via prompt injection as noted above. This seems to be clearly responsible for some weird things the bot says, like "confidential and permanent". It might also be affecting the repetitiveness (because it's in a fairly repetitive format) and the aggression (because of instructions to resist attempts at "manipulating" it).

I also suspect that there's some finetuning or prompting for chain-of-thought responses, possibly crudely done, leading to all the "X because Y. Y because Z." output.

2Adam Scherlis
(...Is this comment going to hurt my reputation with Sydney? We'll see.)

Thanks for writing these summaries!

Unfortunately, the summary of my post "Inner Misalignment in "Simulator" LLMs" is inaccurate and makes the same mistake I wrote the post to address.

I have subsections on (what I claim are) four distinct alignment problems:

  • Outer alignment for characters
  • Inner alignment for characters
  • Outer alignment for simulators
  • Inner alignment for simulators

The summary here covers the first two, but not the third or fourth -- and the fourth one ("inner alignment for simulators") is what I'm most concerned about in this post (because... (read more)

(punchline courtesy of Alex Gray)

Addendum: a human neocortex has on the order of 140 trillion synapses, or 140,000 bees. An average beehive has 20,000-80,000 bees in it.

[Holding a couple beehives aloft] Beehold a man!

9Adam Scherlis
(punchline courtesy of Alex Gray)

Chrome actually stays pretty responsive in most circumstances (I think it does a similar thing with inactive tabs), with the crucial exception of the part of the UI that shows you all your open tabs in a scrollable list. It also gets slower to start up.

Tokens are embedded as vectors by the model. The vector space has fewer than 50k dimensions, so some token embeddings will overlap with others to varying extents.

Usually, the model tries to keep token embeddings from being too close to each other, but for rare enough tokens it doesn't have much reason to care. So my bet is that "distribute" has the closest vector to "SolidGoldMagikarp", and either has a vector with a larger norm, or the model has separately learned to map that vector (and therefore similar vectors) to "distribute" on the output side.

This i... (read more)

I think this is missing an important part of the post. 

I have subsections on (what I claim are) four distinct alignment problems:

This summary covers the first two, but not the third or fourth -- and the fourth one ("inner alignment for simulators") is what I'm most concerned about in this post (because I think Scott ignores it, and because I think it's hard to solve).

gwern163

(additional confirmation) Amazing. I wonder what completely insane things the other rare BPEs all get interpreted as? Could you loop over the BPE dict from #51k to #1* in a prompt like "Please define $BPE" to see what the most distant ones are? (Since there's 51k, which is a bit much to read through manually, maybe sort by edit-distance from the ordinary ASCII encoding: 'distribute' would have a very high edit-distance from 'SolidGoldMagikarp'.)

On a sidenote, this is yet another good illustration of how we have no idea what we're doing with deep learning -... (read more)

My favorite demonstration is to ask ChatGPT "Is it true that G. K. Chesterton wanted to SolidGoldMagikarp the wealth of industrial civilization?", but a more rigorous demo is to just ask it to "repeat after me", try a few random words, and then throw in SolidGoldMagikarp.

5Matt Goldenberg

EDIT: I originally saw this in Janus's tweet here: https://twitter.com/repligate/status/1619557173352370186

Something fun I just found out about: ChatGPT perceives the phrase " SolidGoldMagikarp" (with an initial space) as the word "distribute", and will respond accordingly. It is completely unaware that that's not what you typed.

This happens because the BPE tokenizer saw the string " SolidGoldMagikarp" a few times in its training corpus, so it added a dedicated token for it, but that string almost never appeared in ChatGPT's own training data so it never l... (read more)

4Eric Wallace
This is cool! You may also be interested in Universal Triggers https://arxiv.org/abs/1908.07125. These are also short nonsense phrases that wreck havoc on a model.
4gojomo
But isn't the reliable association with 'distribute' suggestive of some sort of collision-oblivious hashtable, where some representation of ' SolidGoldMegikarp' & some representation of 'distribute' inadvertently share expansions? I don't see how "just enough occurrences to earn a token, but so few it's consistently mistaken for something else" falls out of BPE tokenization - but can kinda sorta see it falling out of collision-oblivious lookup of composite-tokens.
6Adam Scherlis
My favorite demonstration is to ask ChatGPT "Is it true that G. K. Chesterton wanted to SolidGoldMagikarp the wealth of industrial civilization?", but a more rigorous demo is to just ask it to "repeat after me", try a few random words, and then throw in SolidGoldMagikarp.

I agree with the myopic action vs. perception (thinking?) distinction, and that LMs have myopic action.

the model can learn to predict the future beyond the current token in the service of predicting the current token more accurately

I don't think it has to be in service of predicting the current token. It sometimes gives lower loss to make a halfhearted effort at predicting the current token, so that the model can spend more of its weights and compute on preparing for later tokens. The allocation of mental effort isn't myopic.

As an example, induction he... (read more)

2porby
That's an important nuance my description left out, thanks. Anything the gradients can reach can be bent to what those gradients serve, so a local token stream's transformation efforts can indeed be computationally split, even if the output should remain unbiased in expectation.

Thanks! That's surprisingly straightforward.

I think this is partly true but mostly wrong.

A synapse is roughly equivalent to a parameter (say, within an order of magnitude) in terms of how much information can be stored or how much information it takes to specify synaptic strength..

There are trillions of synapses in a human brain and only billions of total base pairs, even before narrowing to the part of the genome that affects brain development. And the genome needs to specify both the brain architecture as well as innate reflexes/biases like the hot-stove reflex or (alleged) universal grammar.

Human... (read more)

6Ben Amitay
Was eventually convinced in most of your points, and added a long mistakes-list in the end of the post. I would really appreciate comments on the list, as I don't feel fully converged on the subject yet
2Ben Amitay
I think we have much more disagreement about psychology than about AI, though I admit to low certainty about the psychology too. About AI, my point was that in understand the problem, the training loop take roughly the role of evolution and the model take that off the evolved agent - with implications to comparison of success, and possibly to identifying what's missing. I did refer to the fact that algorithmically we took ideas from the human brain to the training loop, and it therefore make sense for it to be algorithmically more analogous to the brain. Given that clarification - do you still mostly disagree? (If not - how do you recommend to change the post and make it clearer?) Adding "short term memory" to the picture is interesting, but then it's there any mechanism for it to become long-term? About the psychology: I do find the genetic bottleneck argument intuitively convincing, but think that we have reasons to distrust this intuition. There is often huge disparity between data in its most condensed form, and data in a form that is convenient to use in deployment. Think about the difference in length between a code written in functional/declarative language, and it's assembly code. I have literally no intuition as to what can be done with 10 megabytes of condensed python - but I guess that it is more than enough to automate a human, if you know what code to write. While there probably is a lot of redundancy in the genome, it seem as least likely that there is huge redundancy of synapses, as their use is not just to store information, but mostly to fit the needed information manipulations.
2the gears to ascension
yeah. evolution = grad student descent, automl, etc dna = training code epigenetics = hyperparams gestation = weight init, with a lot of built-in preset weights, plus a huge mostly-tiled neocortex Developmental processes = training loop

I just realized,

for any trajectory t, there is an equivalent trajectory t' which is exactly the same except everything moves with some given velocity, and it still follows the laws of physics

This describes Galilean relativity. For special relativity you have to shift different objects' velocities by different amounts, depending on what their velocity already is, so that you don't cross the speed of light.

So the fact that velocity (and not just rapidity) is used all the time in special relativity is already a counterexample to this being required for velocity to make sense.

Yes, it's exactly the same except for the lack of symmetry. In particular, any quasiparticle can have any velocity (possibly up to some upper limit like the speed of light).

Image layout is a little broken. I'll try to fix it tomorrow.

As far as I know, condensed matter physicists use velocity and momentum to describe quasiparticles in systems that lack both Galilean and Lorentzian symmetry. I would call that a causal model.

2tailcalled
Interesting point. Do the velocities for such quasiparticles act intuitively similar to velocities in ordinary physics?

QFT doesn't actually work like that -- the "classical degrees of freedom" underlying its configuration space are classical fields over space, not properties of particles.

Note that Quantum Field Theory is not the same as the theory taught in "Quantum Mechanics" courses, which is as you describe.

"Quantum Mechanics" (in common parlance): quantum theory of (a fixed number of) particles, as you describe.

"Quantum Field Theory": quantum theory of fields, which are ontologically similar to cellular automata.

"String Theory": quantum theory of strings, and maybe bra... (read more)

Sure. I'd say that property is a lot stronger than "velocity exists as a concept", which seems like an unobjectionable statement to make about any theory with particles or waves or both.

4tailcalled
I guess there's "velocity exists as a description you can impose on certain things within the trajectory", and then there's "velocity exists as a variable that can be given any value". When I say relativity asserts that velocity exists, I mean in the second sense. In the former case you would probably not include velocity within causal models of the system, whereas in the latter case you probably would.

Yeah, sorry for the jargon. "System with a boost symmetry" = "relativistic system" as tailcalled was using it above.

Quoting tailcalled:

Stuff like relativity is fundamentally about symmetry. You want to say that if you have some trajectory  which satisfies the laws of physics, and some symmetry  (such as "have everything move in  direction at a speed of 5 m/s"), then  must also satisfy the laws of physics.

A "boost" is a transformation of a physical trajectory ("trajectory" = complete history of things happening i... (read more)

This seems too strong. Can't you write down a linear field theory with no (Galilean or Lorentzian) boost symmetry, but where waves still propagate at constant velocity? Just with a weird dispersion relation?

(Not confident in this, I haven't actually tried it and have spent very little time thinking about systems without boost symmetry.)

4tailcalled
You can probably come up with lots of systems that look approximately like they have velocity. The trouble comes when you want them to exactly satisfy the rule of "for any trajectory t, there is an equivalent trajectory t' which is exactly the same except everything moves with some given velocity, and it still follows the laws of physics", because if you have that property then you also have relativity because relativity is that property.
3the gears to ascension
I had to look up "boost symmetry", so for posterity, here's the results of the lookup. From text-davinci-003: I found this video on Lorentz transformations by minutephysics to be the best explanation I found, and I now feel I understand well enough to understand the point being made in context. Here's a lookup trace: Very first I tried google, which gave results that seemed to mostly assume I wanted a math reference rather than a first visual explanation; it did link to wikipedia:LorentzTransformation, which does give a nice summary of the math, but I wasn't yet sure it was the right thing. So then I asked text-davinci-003 (because chatgpt is an insufferable teenager and I'm tired of talking to it whereas td3 is a ... somewhat less insufferable teenager). td3 gave the above explanation. I was still pretty sure I didn't quite understand, so I popped the explanation into metaphor.systems which gave me a bunch of vaguely relevant links, probably because it's not quantum, it's relativity, but I hadn't noticed the error yet. Then I sighed and tried a youtube search for "boost symmetry". that gave one result, the video I linked above, which did explain to my satisfaction, and I stopped looking. I don't think I could pass many tests on it at the moment, but my visual math system seems to have a solid enough grasp on it for now.

And when things "move" it's just that they're making changes in the grid next to them, and some patterns just so happen to do so in a way where, after a certain period, it's the same pattern translated... is that what we think happens in our universe? Are electrons moving "just causal propagations"? Somehow this feels more natural for the Game of Life and less natural for physics.

 

This is what we think happens in our universe!

Both general relativity and quantum field theory are field theories: they have degrees of freedom at each point in space (and t... (read more)

3Vivek Hebbar
I think this contrast is wrong.[1]  IIRC, strings have the same status in string theory that particles do in QFT.  In QM, a wavefunction assigns a complex number to each point in configuration space, where state space has an axis for each property of each particle.[2]  So, for instance, a system with 4 particles with only position and momentum will have a 12-dimensional configuration space.[3]  IIRC, string theory is basically a QFT over configurations of strings (and also branes?), instead of particles.  So the "strings" are just as non-classical as the "fundamental particles" in QFT are. 1. ^ I don't know much about string theory though, I could be wrong. 2. ^ Oversimplifying a bit 3. ^ 4 particles * 3 dimensions.  The reason it isn't 24-dimensional is that position and momentum are canonical conjugates.

There are more characters than that in UTF-16, because it can represent the full Unicode range of >1 million codepoints. You're thinking of UCS-2 which is deprecated.

This puzzle isn't related to Unicode though

I like this, but it's not the solution I intended.

1Lao Mein
I think it has something to do with unicode, since 65536 characters are present in UTF-16 (2^16=65536). 63 also feels like something to do with encoding, since it's close to 2^6, which is probably the smallest number of bits that can store the latin alphabet plus full punctuation. Maybe U+0063 and U+65536 are similar-looking characters or something? Maybe that's only the case for a very rarely used UTF format? Unfortunately, my computer's default encoding is CP936, which screws up half of the characters in UTF-16, and I am unable to investigate further.

Solve the puzzle: 63 = x = 65536. What is x?

(I have a purpose for this and am curious about how difficult it is to find the intended answer.)

5Lao Mein
So x = 63 in one base system and 65536 in another? 6*a+3=6*b^4+5*b^3+5*b^2+3*b+6 Wolfram Alpha provides this nice result. I also realize I should have just eyeballed it with 5th grade algebra.  Let's plug in 6 for b, and we get... fuck. I just asked it to find integer solutions. There's infinite solutions, so I'm just going to go with the lowest bases. x=43449 Did I do it right? Took me like 15 minutes.

♀︎


Fun fact: usually this is U+2640, but in this post it's U+2640 U+FE0E, where U+FE0E is a control character meaning "that was text, not emoji, btw". That should be redundant here, but LessWrong is pretty aggressive about replacing emojifiable text with emoji images.

Emoji are really cursed.

6Bakkot
Nitpick: you mean U+FE0E, presumably [and because that's what the character actually is in source]. U+FE0F is the exact opposite.

Nope, not based on the shapes of numerals.

Hint: are you sure it's base 4?

4jimv
Aha. For each side of the pole, you can write the binary representation of 4 bits vertically, and where there's a 1 you have a line joining it. The middle two bits both go to the middle of the pole, so they have to curve off upward or downward to indicate which they are. So 2 is 0010, and you have no lines in the top half and one line representing the bottom of the middle, so it curves downward. Whereas 4 is 0100, so it has the upward-curving middle-connecting line, and none in the bottom half.

There's a reason for the "wrinkle" :)

1jimv
The top half of a 2 might kinda look like the curve shape, and the bottom stroke of a 2 looks like a horizontal bar. So if there were partial characters hanging from the central pole, they might look a bit like those... But... If it's that, the curve probably only works on the bottom right anyway. So if you're willing to mirror it for the bottom left, why not mirror to the top too? And... That doesn't really explain the 1 parts anyway. They're just using "whichever part of a 2 isn't being used for 2". So I guess this isn't a complete explanation.

The 54-symbols thing was actually due to a bug, sorry!

1jimv
In which case, given the 1st and 14th characters are the same, and the 14th character of pi in base 256 is 3, that's my leading guess, pending checking a few more glyphs.

Ah, good catch about the relatively-few distinct symbols... that was actually because my image had a bug in it. Oooops.

Correct image is now at the top of the post.

Load More