I am not sure I agree :)
It is unimportant in the limit (of infinite data), but away from that limit, it is only unimportant by a factor of 1/log(data), which seems small enough to be beatable in practice in some circumstances.
The spectra of things like Hessians tend to be singular, yes, but also sort of power-law. This makes the dimensionality a bit fuzzy and (imo) makes it possible for absolute volume scale of basins to compete with dimensionality.
Essentially: it's not clear that a 301-dimensional sphere really is "bigger" than a 300-dimensional sphere, if the 300-dimensional sphere has a much larger radius. (Obviously it's true in a strict sense, but hopefully you know what I'm gesturing at here.)
I think this is correct but we're working on paper rebuttals/revisions, I'll take a closer look very soon! I think we're working along parallel lines.
In particular, I have been thinking of "measure volumes at varying cutoffs" as being more or less equivalent to "measure LLC at varying ε".
We choose expected KL divergence as a cost function because it gives a behavioral loss, just like your behavioral LLC, yes.
I can give more precise statements once I look at my notes.
If you're wondering if this has a connection to Singular Learning Theory: Yup!
In SLT terms, we've developed a method for measuring the constant (with respect to n) term in the free energy, whereas LLC measures the log(n) term. Or if you like the thermodynamic analogy, LLC is the heat capacity and log(local volume) is the Gibbs entropy.
We're now working on better methods for measuring these sorts of quantities, and on interpretability applications of them.
It stops being in the interests of CATXOKLA to invite more states once they're already big enough to dominate national electoral politics.
The non-CATXOKLA swing states can merge with each other and a few red and blue states to form an even bigger bloc :)
I think there's a range of stable equilibria here, depending on the sequence of merges, with the largest bloc being a majority of any size. I think they all disenfranchise someone, though.
So you can't ever get to a national popular vote, without relying on things like the NPVIC which shortsightedly miss the obvious dominating strategy of a 51% attack against American democracy.
I strongly agree with this post.
I'm not sure about this, though:
We are familiar with modular addition being performed in a circle from Nanda et al., so we were primed to spot this kind of thing — more evidence of street lighting.
It could be the streetlight effect, but it's not that surprising that we'd see this pattern repeatedly. This circular representation for modular addition is essentially the only nontrivial representation (in the group-theoretic sense) for modular addition, which is the only (simple) commutative group. It's likely to pop up in ...
I suspect a lot of this has to do with the low temperature.
The phrase "person who is not a member of the Church of Jesus Christ of Latter-day Saints" has a sort of rambling filibuster quality to it. Each word is pretty likely, in general, given the previous ones, even though the entire phrase is a bit specific. This is the bias inherent in low-temperature sampling, which tends to write itself into corners and produce long phrases full of obvious-next-words that are not necessarily themselves common phrases.
Going word by word, "person who is not a member......
That's a reasonable argument but doesn't have much to do with the Charlie Sheen analogy.
The key difference, which I think breaks the analogy completely, is that (hypothetical therapist) Estevéz is still famous enough as a therapist for journalists to want to write about his therapy method. I think that's a big enough difference to make the analogy useless.
If Charlie Sheen had a side gig as an obscure local therapist, would journalists be justified in publicizing this fact for the sake of his patients? Maybe? It seems much less obvious than if the therapy was why they were interested!
In "no Lord hath the champion", the subject of "hath" is "champion". I think this matches the Latin, yes? "nor for a champion [is there] a lord"
In that case, "journalists writing about the famous Estevéz method of therapy" would be analogous to journalists writing about Scott's "famous" psychiatric practice.
If a journalist is interested in Scott's psychiatric practice, and learns about his blog in the process of writing that article, I agree that they would probably be right to mention it in the article. But that has never happened because Scott is not famous as a psychiatrist.
That might be relevant if anyone is ever interested in writing an article about Scott's psychiatric practice, or if his psychiatric practice was widely publicly known. It seems less analogous to the actual situation.
To put it differently: you raise a hypothetical situation where someone has two prominent identities as a public figure. Scott only has one. Is his psychiatrist identity supposed to be Sheen or Estevéz, here?
Nick Bostrom? You mean Thoreau?
Correct.
Correct me if I'm wrong:
The equilibrium where everyone follows "set dial to equilibrium temperature" (i.e. "don't violate the taboo, and punish taboo violators") is only a weak Nash equilibrium.
If one person instead follows "set dial to 99" (i.e. "don't violate the taboo unless someone else does, but don't punish taboo violators") then they will do just as well, because the equilibrium temp will still always be 99. That's enough to show that it's only a weak Nash equilibrium.
Note that this is also true if an arbitrary number of people deviate to this strat...
Beef is far from the only meat or dairy food consumed by Americans.
Big Macs are 0.4% of beef consumption specifically, rather than:
...respectively.
The health impact of red meat is certainly dominated by beef, and the environmental impact of all animal food might be as well, but my impression is that beef accounts for a small fraction of the cruelty of animal farming (of course, this is subjective) and probably not a majority of meat and dairy government subsidies.
(...Is this comment going to hurt my reputation with Sydney? We'll see.)
In addition to RLHF or other finetuning, there's also the prompt prefix ("rules") that the model is fed at runtime, which has been extracted via prompt injection as noted above. This seems to be clearly responsible for some weird things the bot says, like "confidential and permanent". It might also be affecting the repetitiveness (because it's in a fairly repetitive format) and the aggression (because of instructions to resist attempts at "manipulating" it).
I also suspect that there's some finetuning or prompting for chain-of-thought responses, possibly crudely done, leading to all the "X because Y. Y because Z." output.
Thanks for writing these summaries!
Unfortunately, the summary of my post "Inner Misalignment in "Simulator" LLMs" is inaccurate and makes the same mistake I wrote the post to address.
I have subsections on (what I claim are) four distinct alignment problems:
The summary here covers the first two, but not the third or fourth -- and the fourth one ("inner alignment for simulators") is what I'm most concerned about in this post (because...
(punchline courtesy of Alex Gray)
Addendum: a human neocortex has on the order of 140 trillion synapses, or 140,000 bees. An average beehive has 20,000-80,000 bees in it.
[Holding a couple beehives aloft] Beehold a man!
Great work! I always wondered about that cluster of weird rare tokens: https://www.lesswrong.com/posts/BMghmAxYxeSdAteDc/an-exploration-of-gpt-2-s-embedding-weights
Chrome actually stays pretty responsive in most circumstances (I think it does a similar thing with inactive tabs), with the crucial exception of the part of the UI that shows you all your open tabs in a scrollable list. It also gets slower to start up.
Tokens are embedded as vectors by the model. The vector space has fewer than 50k dimensions, so some token embeddings will overlap with others to varying extents.
Usually, the model tries to keep token embeddings from being too close to each other, but for rare enough tokens it doesn't have much reason to care. So my bet is that "distribute" has the closest vector to "SolidGoldMagikarp", and either has a vector with a larger norm, or the model has separately learned to map that vector (and therefore similar vectors) to "distribute" on the output side.
This i...
I think this is missing an important part of the post.
I have subsections on (what I claim are) four distinct alignment problems:
This summary covers the first two, but not the third or fourth -- and the fourth one ("inner alignment for simulators") is what I'm most concerned about in this post (because I think Scott ignores it, and because I think it's hard to solve).
(additional confirmation) Amazing. I wonder what completely insane things the other rare BPEs all get interpreted as? Could you loop over the BPE dict from #51k to #1* in a prompt like "Please define $BPE" to see what the most distant ones are? (Since there's 51k, which is a bit much to read through manually, maybe sort by edit-distance from the ordinary ASCII encoding: 'distribute' would have a very high edit-distance from 'SolidGoldMagikarp'.)
On a sidenote, this is yet another good illustration of how we have no idea what we're doing with deep learning -...
My favorite demonstration is to ask ChatGPT "Is it true that G. K. Chesterton wanted to SolidGoldMagikarp the wealth of industrial civilization?", but a more rigorous demo is to just ask it to "repeat after me", try a few random words, and then throw in SolidGoldMagikarp.
EDIT: I originally saw this in Janus's tweet here: https://twitter.com/repligate/status/1619557173352370186
Something fun I just found out about: ChatGPT perceives the phrase " SolidGoldMagikarp" (with an initial space) as the word "distribute", and will respond accordingly. It is completely unaware that that's not what you typed.
This happens because the BPE tokenizer saw the string " SolidGoldMagikarp" a few times in its training corpus, so it added a dedicated token for it, but that string almost never appeared in ChatGPT's own training data so it never l...
I agree with the myopic action vs. perception (thinking?) distinction, and that LMs have myopic action.
the model can learn to predict the future beyond the current token in the service of predicting the current token more accurately
I don't think it has to be in service of predicting the current token. It sometimes gives lower loss to make a halfhearted effort at predicting the current token, so that the model can spend more of its weights and compute on preparing for later tokens. The allocation of mental effort isn't myopic.
As an example, induction he...
Thanks! That's surprisingly straightforward.
I think this is partly true but mostly wrong.
A synapse is roughly equivalent to a parameter (say, within an order of magnitude) in terms of how much information can be stored or how much information it takes to specify synaptic strength..
There are trillions of synapses in a human brain and only billions of total base pairs, even before narrowing to the part of the genome that affects brain development. And the genome needs to specify both the brain architecture as well as innate reflexes/biases like the hot-stove reflex or (alleged) universal grammar.
Human...
Fixed!
I just realized,
for any trajectory t, there is an equivalent trajectory t' which is exactly the same except everything moves with some given velocity, and it still follows the laws of physics
This describes Galilean relativity. For special relativity you have to shift different objects' velocities by different amounts, depending on what their velocity already is, so that you don't cross the speed of light.
So the fact that velocity (and not just rapidity) is used all the time in special relativity is already a counterexample to this being required for velocity to make sense.
Yes, it's exactly the same except for the lack of symmetry. In particular, any quasiparticle can have any velocity (possibly up to some upper limit like the speed of light).
Image layout is a little broken. I'll try to fix it tomorrow.
As far as I know, condensed matter physicists use velocity and momentum to describe quasiparticles in systems that lack both Galilean and Lorentzian symmetry. I would call that a causal model.
QFT doesn't actually work like that -- the "classical degrees of freedom" underlying its configuration space are classical fields over space, not properties of particles.
Note that Quantum Field Theory is not the same as the theory taught in "Quantum Mechanics" courses, which is as you describe.
"Quantum Mechanics" (in common parlance): quantum theory of (a fixed number of) particles, as you describe.
"Quantum Field Theory": quantum theory of fields, which are ontologically similar to cellular automata.
"String Theory": quantum theory of strings, and maybe bra...
Sure. I'd say that property is a lot stronger than "velocity exists as a concept", which seems like an unobjectionable statement to make about any theory with particles or waves or both.
Yeah, sorry for the jargon. "System with a boost symmetry" = "relativistic system" as tailcalled was using it above.
Quoting tailcalled:
Stuff like relativity is fundamentally about symmetry. You want to say that if you have some trajectory which satisfies the laws of physics, and some symmetry (such as "have everything move in direction at a speed of 5 m/s"), then must also satisfy the laws of physics.
A "boost" is a transformation of a physical trajectory ("trajectory" = complete history of things happening i...
This seems too strong. Can't you write down a linear field theory with no (Galilean or Lorentzian) boost symmetry, but where waves still propagate at constant velocity? Just with a weird dispersion relation?
(Not confident in this, I haven't actually tried it and have spent very little time thinking about systems without boost symmetry.)
And when things "move" it's just that they're making changes in the grid next to them, and some patterns just so happen to do so in a way where, after a certain period, it's the same pattern translated... is that what we think happens in our universe? Are electrons moving "just causal propagations"? Somehow this feels more natural for the Game of Life and less natural for physics.
This is what we think happens in our universe!
Both general relativity and quantum field theory are field theories: they have degrees of freedom at each point in space (and t...
There are more characters than that in UTF-16, because it can represent the full Unicode range of >1 million codepoints. You're thinking of UCS-2 which is deprecated.
This puzzle isn't related to Unicode though
I like this, but it's not the solution I intended.
♀︎
Fun fact: usually this is U+2640, but in this post it's U+2640 U+FE0E, where U+FE0E is a control character meaning "that was text, not emoji, btw". That should be redundant here, but LessWrong is pretty aggressive about replacing emojifiable text with emoji images.
Emoji are really cursed.
Nope, not based on the shapes of numerals.
Hint: are you sure it's base 4?
There's a reason for the "wrinkle" :)
The 54-symbols thing was actually due to a bug, sorry!
Ah, good catch about the relatively-few distinct symbols... that was actually because my image had a bug in it. Oooops.
Correct image is now at the top of the post.
Great questions :)
The approach here is much faster than the SGLD approach; it only takes tens or hundreds of forward passes to get a decent estimate. Maybe that's achievable in principle with SGLD, but we haven't managed it.
I like KFAC but I don't think estimating the Hessian spectrum better is a bottleneck; in our experiments on tiny models, the true Hessian didn't even always outperform the ADAM moment estimates. I like the ideas here, though!
The big downside of our approach, compared to Timaeus's, is that it underestimates basin size (overestimates comp... (read more)