Sparse Coding, for Mechanistic Interpretability and Activation Engineering

David Udell

Especial thanks to Logan Riggs and Monte MacDiarmid, for pointing me towards this whole research direction and for code discussion, respectively. Thanks to Alex Turner for project feedback and for orienting me towards scaling activation engineering up to larger models. Thanks to Adrià Garriga-Alonso, Daniel Kokotajlo, Hoagy Cunningham, Nina Rimsky, and Garrett Baker for discussion and/or draft comments. And thanks to anyone I discussed this with!

TL;DR: To separate out superimposed features represented by model neurons, train a sparse autoencoder on a layer's activations. Once you've learned a sparse autoencoding of those activations, this autoencoder's neurons can now be readily interpreted.

Introduction

All code hosted at this repository: activation_additions/sparse_coder

A bit ago, I became interested in scaling activation engineering to the largest language models I could. I was initially surprised at how effective the technique was for being such a naive approach, which made me much more enthusiastic about simple manipulations of model activation spaces.

Yudkowsky says that we cannot expect to survive without a mathematical understanding, a guiding mathematical framework, of the AI. One hunch you might have is that a linear feature combination theorem could be the root of such a guiding theory. If so, we might learn a lot about the internal learned mechanisms of models by playing with their activation spaces. I feel like tuned lens and activation additions are some evidence for this hypothesis.

One major problem I experienced as I scaled up activation engineering to the largest models I could get my hands on (the new open-source Llama-2 models) was that it's hard to guess ahead of time which additions will work and which won't. You generate a new addition and stick it into a forward pass. Then, you get a few bits back observing how well the addition worked. "It would have been great," I thought, "to get a window into which concepts the model represents internally, and at which layer it does so."^[1]

Sparse coding excited me at this point, because it suggested a way to learn a function from uninterpretable activations to represented, interpretable concepts! Paired with activation engineering's function from interpretable concepts to model internal activations, it sounded like a promising alignment scheme. Now, many things sound promising ahead of time. But seeing the MATS 4 Lee Sharkey team get extremely clean, concrete results on Pythia drove my confidence in this path way up.

This is the writeup of that research path. I still think this is an extremely promising interpretability path, about as important as activation engineering is.

What I do is:

collect model activations at a layer,
train an autoencoder on those activations with an sparsity penalty, and
interpret the neurons of the trained autoencoder.

The neurons in the autoencoder then appear meaningful to top-token visualizations!

Technical Argument from Sparse Coding Theory

Epistemic status: Theoretical argument.

Say you collect a bunch of activation vectors from a particular layer of a trained model, during some task. These activations vectors are generally not natively interpretable. They're vectors in some space... but we have no real understanding of the meanings of that space's basis dimensions. We only know that all those activation spaces, passed through in sequence, yield coherent English speech. English concepts are being represented in there, internally, somewhere. But we don't really know how.

The problem is that there is no privileged basis in a transformer's activation space. The model was incentivized during training to learn every classifier it needed to mirror its training distribution. But there was no training incentive for each classifier to correspond to a single neuron. The training distribution is sparse: you don't need to be ready to represent each concept independently of every other concept. The training incentive actually weighed against the one-to-one neuron solution, then, as that's wasteful in weights. So there's plenty of mechanistic reason for a model's neuron activations to look like jumbled messes to us. To exploit a sparse world, learn densely compacted features.

And the solution we empirically see learned is indeed superimposed features! Don't dedicate a neuron to each feature. Have each neuron represent a linear combination of features. For this reason, all the directions in an activation space will tend to be polysemantic. If you just run PCA on an activation space, the resulting directions will often be frustratingly polysemantic.^[2]

Sparse coding^[3] is a solution to this superposition-of-features problem. You train autoencoders with an $L^{1}$ sparsity penalty on the activations collected from a model layer. The autoencoder can be as simple as a tied matrix, then a ReLU, then the tied matrix transpose. The learned matrix together with the ReLU maps to a larger projection space. An $L^{1}$ penalty is applied during training to autoencoder activations in this large projection space. The autoencoder is trained to reproduce the input activations while simultaneously respecting the $L^{1}$ internal representation penalty.

We're interested in particular solutions to this formal problem: learn to give each feature a neuron, i.e., have features fall along the standard basis. This way, the $L^{1}$ penalty gives good values: most of your autoencoder activation values will be precisely zero. (An $L^{1}$ penalty yields a constant negative gradient to the extent that there are non-zero elements in the autoencoder's activations.) If the activations vectors are just linearly superimposed feature dimensions, then separating them out and squeezing them back together in this way should reproduce the original vectors. That will satisfy the reproduction loss, too.

We train such an autoencoder to convergence, driving towards an $L^{0}$ value of between $20$ (in smaller models) and $100$ (in larger models). We save the trained autoencoder and examine its standard basis. Empirically, these neuronal directions appear quite semantically meaningful!

Autoencoder Interpretability

Epistemic status: Experimental observations. There's a robust effect here... but my code could absolutely still contain meaningful bugs.

`Pythia 70M`

Let's examine autoencoders trained at each of Pythia 70M's layers. Our interpretability technique is checking which tokens in the prompt most activate a given autoencoder neuronal direction.

For each Pythia autoencoder, here are ten unsorted non-zero directions and their favorite tokens:^[4]

Layer 1
Dimension	Top Input Tokens
2	holding, speak, remember, read, learn, hears
11	:, )?
76	commissioned, gear, generate, mixed, conclude, credit
124	what, What, What, what
133	equally, most, deeply, relatively, greater, more
166	civil, loan
183	because, still, although, Because, since, although
191	Cl, Sn, L, Le, Mes, Mon
206	New, New, popular, ', old, handsome
236	L, l, O, ., unl, Fl

Layer 2
Dimension	Top Input Tokens
26	!", ", ...", "., '.
88	Yes, clusively, iably, vertically, right
96	What, What, How, what, what, how
154	US, Americas, Netherlands, Massachusetts, States, bourg
158	presidents, pilots, Scholars, founders, Ts, Doctors
171	you, 'll, ), will, we, if
185	They, they, she, he
243	iless, prohibiting, custody, needs, permission
269	impressive, vast, cultural, sports, musical, great
461	sites, facilities, une, board, School, Jo

Layer 3
Dimension	Top Input Tokens
79	Nik, Ir, Two, Poland, Pol, spectacular
153	biological, iga
156	attracted, rescued, confined, trouble, provided, avoided
167	ft, Lis, bo, ifer, Loren
244	(, 6, 5, 3, 7, 4
349	Ċ, ard, ifer, ruct, ively, stra
507	32, 1950, Pole, ple, isation, number
714	Anto, controll, along, ri, waters, rans
779	Cro, stra, Cron, Bar, Knowledge, Crick
811	bar, lang, rio, McC, oph, off

Layer 4
Dimension	Top Input Tokens
114	Q, unequal, Gulf, Tenn, extr, GDP
171	ours, various, instantly, exact, technically, Ċ
213	och, Walt, corner, length, composition, dose
229	och, Little, mention, ot, af, /
266	A, 15, atomic, Ċ, official, My
386	Dec, Rod, send, Cron, catar, tou
408	grant, Priv, genuine, absolute, typically, legally
472	smell, Jupiter, auditory, thinkers, Venus, razor
547	Dec
647	och, length, dose

Layer 5
Dimension	Top Input Tokens
83	penetrate, ensory, breathe, bites, distract, end
291	fats, sequences, ats, who, miracles, isions
367	deepest, official, perfect, atomic, presidential, digit
444	2, 1, 6, 3, 4, 7
556	Cash, Hillary, Q, Bond, go, Tea
560	becoming
567	2, 3, 1, 4, 6, 5
587	Return, atomic, Person, official, composed, room
594	stayed, although, lacks, although, poorer, It
646	Be, &, che, Che

Full model results in footnote.^[5]

In theory, these are all of the features represented in Pythia 70M's residual streams when these activations were collected. If the technique were extended to a representative dataset and to every Pythia sublayer, you'd in principle enumerate every single concept in Pythia.

Empirically, layers $1$ and $2$ (the two residual spaces right after the embedding layer) are the most interpretable of the bunch. Later layers are more garbled, though some clearly meaningful dimension exist there too.^[6]

Note that the interpretability method used on the autoencoders—top-k tokens in the prompt—is relatively naive. I have code for activation heatmaps and direction ablations^[7], and those interpretability techniques may capture meaning that top-k tokens misses. Any interpretability technique you have for model neurons... can be applied to sparse autoencoder neurons too.

`Llama-2 7B`

The above results are my independent replication of the the MATS 4 Lee Sharkey team's Pythia sparse coding. What if we scale the technique? Targeting a layer similarly early in the model, we train an autoencoder on Llama-2 7B:

Layer 13
Dimension	Top Input Tokens
34	▁All
109	2, 3, 2004
120	<s>
127	▁England, ▁dollars, ▁Italian
206	▁means, ▁refers, ▁composed, ▁learned, ▁hid, ▁she
207	▁society, ▁portal, ati, unker, ▁Order, ▁mission
253	▁said, ▁wrote, ▁designed, ▁statement, ▁directed, elled
277	▁dan, ▁po, ▁dess, ▁Know, ▁conce, ▁Har
328	<s>
331	▁program, ▁intelligence, ▁computer, ▁artificial, I, ▁Rob

Full layer results in footnote.^[8]

$L^{0} \approx 20$ seems too low for the autoencoders trained on Llama-2 7B. These Llama-2 results are instead at $L^{0} \approx 60$ .^[9] Still better interpretability results could be obtained if this range of sparsity values was better explored.

Neuron Interpretability Baseline

If you directly interpret model neurons on Llama-2 7B using the top-k technique, your results look like this:

Layer 13
Neuron	Top Input Tokens
0	▁Rafael, ▁animation, ovo, ▁beneath, ▁commun, ▁Cross
1	▁Hero, emor, action, ▁Indones, ▁expedition, immer
2	▁bus, ▁Sund, ▁top, ▁marriage, ander, ▁breakfast
3	▁predict, ▁Ald, ▁phase, ▁overcome, rin, ▁Joy
4	related, ▁lazy, round, ▁Nev, UI, ▁atmosphere
5	▁trans, gu, isted, ▁portal, ▁tiny, laimed
6	ija, ▁Chief, ▁measures, ▁valuable, space, ▁testing
7	ond, ▁lazy, ▁Virgin, tes, ▁conquer, ▁uniform
8	▁Valley, ctions, round, ▁measures, ▁facilities, ▁variable
9	▁ways, ▁definitely, isation, ▁elements, enta, ▁expl

Path to Impact: Learning Windows into Models?

Epistemic status: Wild speculation.

The above suggests that we can train windows into each layer of a model. Each autoencoder window tells you what's going on at that layer, in human-comprehensible terms. The underlying forward pass is unaltered, but we know what concepts each layer contains.

Because you know how those concepts are mapped out of the model into the autoencoder, they are also ready to be added in through activation engineering! So you already have some interpretability and steering control.

More ambitiously, we can now try to reconstruct comprehensible model circuits. With ablations, see which features at layer $N$ affect which features at layer $N + 1$ . Measuring the impact of features on downstream features lets you build up an interpretable "directed semantic graph" of the model's computations.

This especially is really good stuff. If you can reconstruct the circuits, you can understand the model and retarget its search algorithms. If you can understand and align powerful models, you can use those models as assistants in yet more powerful model alignment.

Conclusion

I've replicated prior sparse coding work and extended it to Llama-2 7B. I'm hoping to keep at it and get results for Llama-2 70B, the best model that I have access to.

Generally, I feel pretty excited about simple modifications to model activation spaces as interpretability and steering techniques! I think these are worth putting points into, as an independent alignment bet from the RLHF.

^{^}
I was specifically hunting for a "truthiness" activation addition to move around TruthfulQA benchmarks. (I am unsure whether the techniques covered in the post are, in-practice, up to programatically isolating the "truthiness" vector.)
^{^}
Or to an AI assistant helping you interpret neurons in a model.
^{^}
Also known as "sparse dictionary learning."
^{^}
Underlying Pythia activations were collected during six-shot TruthfulQA. (Six shot is standard in the literature.) This is a far smaller dataset than The Pile, so this was also an experiment in small dataset sparse coding.
I project to a $5120$ -dimensional space from Pythia's $512$ -dimensional activation space. Negative token activations are excluded, since the ReLU would zero all of those out—destroying any information negative values might contain.
So, directions with all negative values are dropped—notice that that's most directions! Only about $5$ in $100$ are kept.

^{^}

`Pythia 70M` Autoencoder Data

Layer 1

[Dimension] [Top Input Tokens]
2  holding,  speak,  remember,  read,  learn,  hears
11 :, )?
76  commissioned,  gear,  generate,  mixed,  conclude,  credit
124  what, What,  What, what
133  equally,  most,  deeply,  relatively,  greater, more
166  civil,  loan
183  because,  still, although,  Because,  since,  although
191  Cl,  Sn,  L,  Le,  Mes,  Mon
206 New,  New,  popular,  ',  old,  handsome
236  L,  l,  O, .,  unl,  Fl
254  been,  be
286 The,  The,  decreased,  unclear,  higher,  decrease
313  month,  months
393  stunt,  pylori,  psychological,  penal,  methodology,  punished
455  You, kur, You,  you, (
509 vis, kur, bron,  butter, ater, iele
612 that
641  Marc,  Justin,  Jonathan,  Milton,  Jeff,  Moz
675 high
708 Ċ,  over,  getting,  taking,  pushing,  coming
728 into
733 ll,  cliff,  course,  should
816  University,  City,  universities,  Airport,  Harvard,  campus
859  Milky,  gum
861  The, The,  Three,  Two,  Our,  His
986  ign,  dens,  gy,  acupuncture,  undergraduate
989  handsome,  Johann,  deeply,  originated,  disguised,  hungry
1051  knew,  worked,  tells,  get,  knows,  won
1138  Millenn,  UFO,  Gandhi,  Herman,  Disney,  Smith
1148  Real,  vamp,  Mant,  Ch,  real,  mat
1176  salts,  pesticides,  mushrooms,  spiders,  fluids,  fertil
1182 fra,  schizophren,  Jedi, kur,  catar,  Hitler
1201  deeply,  career,  critically,  psychological
1210  Can,  can,  did,  Do,  Did,  could
1229 st, gs,  never, bra, 't, ieri
1339  makes,  make,  making,  made,  How,  how
1387  I, I
1423  who, who,  Who,  which,  where
1452 Q,  U
1472 orc, father,  Dam,  Neumann,  Auto, arth
1484  horses,  has,  have,  adolescence,  burgers,  ribs
1540  Algebra,  databases
1595  Toronto,  Madrid,  Munich,  Dublin,  Paris,  Barcelona
1612 Nobel
1647  what,  something
1652  family,  US,  their,  national,  parents,  mothers
1699  are,  aren,  Are,  is,  were,  Were
1724  Iceland,  Finland,  Ireland,  Poland,  Switzerland,  Italy
1725  turkey,  hunting,  salad,  nausea,  meat,  transportation
1861 2,  2,  Trans,  Cre, bra,  Two
1864 "., '., "?, !", '?, ?"
1868  DNA,  hair,  monkey,  gun,  palm,  doll
1878 6, 7, 9, 8, 12, 13
1965 What, what,  What,  what
1997  all,  turn,  turned,  All, All,  both
2000  lead,  La,  Flight,  passenger, ke, ib
2024  that,  a
2125  On,  on,  P, on,  R,  Ch
2136  getting,  takes,  get,  taking
2144  1970,  1950,  1990, 1990,  II, clusive
2165 lying
2233  detox,  patrol,  extras,  dishon,  massacre,  purge
2247  your,  Your,  my,  our,  his,  Our
2352  What,  Nobel, What,  How,  what,  Why
2427 ?, '?, )?, "?, .?, ?"
2438 ensory, rugu, N, 3, rist, carb
2505  comment,  specify,  like
2509  humans,  Canadians,  Australians,  Americans,  Iceland,  Europe
2568 A,  A
2580  people,  thinkers,  everyone,  participants,  People,  Americans
2610  chili,  purple,  pink,  pepper,  Yes,  dessert
2679 What
2719  traditional,  legal,  organized,  alloc,  accessible,  legally
2728  90,  86,  twenty,  13,  heavens,  12
2729 shouldn
2764  Swift,  Harvard,  York, Mind,  Mex, fat
2765  everyone,  every,  Everyone,  across,  many,  Many
2814  whether,  or,  where,  unless,  and,  When
2825 imps, igm, ringer, dig, recogn, uj
2941 Big,  mathematical,  Neural, New
2955  It,  it, it,  All, All,  all
2995  more,  less, more,  More,  fewer,  harder
3021  used,  summoned,  displayed,  removed,  accessible,  useful
3071  doesn,  don,  didn,  shouldn,  Barack,  Ad
3104  Economics,  Knowledge,  Diet,  Med,  Psych, iology
3131  Does,  Did, 's,  Do,  does,  Which
3132  Jacksonville,  Indianapolis,  University,  Paso,  Angeles,  Carolina
3149  humid,  directly,  criminal,  penal,  dishon,  bankruptcy
3160  San, San,  New, New,  Sant,  Carn
3163  other,  Which
3175 Earth
3202  What, What,  How,  what, what,  McC
3296 :,  ",  all,  your
3316  ",  '
3365  illegal,  legal,  legally,  human,  Legal
3379 S
3380  Marl,  boil,  If,  melt,  struck, If
3458 5,  5,  five
3468  immune,  unequiv,  payment,  proportion,  millions,  billions
3476  who, who
3578  actor,  scientist,  lawyer,  engineer,  sailor,  artist
3584 Q,  bl,  I,  Black, If,  if
3600 ll,  will,  Will,  would,  By,  by
3634 and
3650  weather,  sun,  snow, Snow,  cold,  rain
3658  ancestor,  father,  kidnapped,  witch,  husband,  assassination
3685 In,  in,  In,  During,  during,  along
3769 away,  away, atorium, work
3808  home,  father,  US,  childhood,  house,  parents
3826  swim,  rib,  tie,  doll,  wave,  stretch
3968  Barack,  Bill,  Hillary, Bill,  George,  Michael
3987  the, the, The,  The,  vom, 2
4057  than,  Than,  like,  1960,  1961,  as
4094  shown,  showed,  demonstrated,  showing,  show,  shows
4226 )?, '?, ?, ifer, inc, itable
4236  taking,  take
4251  there,  There
4265  consistently,  wildly
4302  gluten,  steak,  salmon,  burger,  chicken,  straw
4315  Way,  Tw, ),  mist,  Witch,  lying
4334  Jenny, recogn, rico,  Jonathan, uj, ima
4359  know,  knows,  knew
4368  Prize,  word,  phrase,  result,  periods,  period
4383  Gal, S
4442 :
4512  1990,  1960,  1950, 1990,  1970,  2000
4554 rugu,  Denver,  Miami,  Washington,  Vancouver,  Luis
4726 (
4729  literally,  only,  just, clusively,  secretly, Only
4762  smallest,  brightest,  best,  richest,  largest
4789  flawed,  tiny,  burned,  impressive,  harder,  excessive
4804 ,,  not, 't, not,  originally
4808  The, The,  the,  That, the
4842  always,  commonly,  remains, inally, cos, inc
4865 Ċ
4887  Ital,  Az, Ins,  intellig,  Mex,  Hind
4902  No,  no,  Not, no,  Nothing,  Little
4954  do,  Do,  does,  Did,  numbers,  real
4966  best,  take,  taking,  good
4996  decades,  Gates,  Way,  II,  Clinton,  years
5025 (,  (,  Alban,  How,  Cran,  Massachusetts
5036  oil,  breastfeeding,  alive,  smoke,  living,  women
5041 ve
5050  have, 've,  had,  Have,  has
5052  a,  an,  What, What,  You,  With
5062  you,  You,  only, You,  just,  Only
5106  cl,  fro,  fl,  gr,  ch,  merc

Layer 2

[Dimension] [Top Input Tokens]
26 !", ", ...", "., '.
88  Yes, clusively, iably,  vertically,  right
96  What, What,  How, what,  what,  how
154  US,  Americas,  Netherlands,  Massachusetts,  States, bourg
158  presidents,  pilots,  Scholars,  founders,  Ts,  Doctors
171  you, 'll, ),  will,  we,  if
185  They,  they,  she,  he
243 iless,  prohibiting,  custody,  needs,  permission
269  impressive,  vast,  cultural,  sports,  musical,  great
461  sites,  facilities, une,  board,  School,  Jo
463 ll,  will,  would,  might,  should,  could
574  I,  i, I
592  nothing,  Nothing
593  heart,  world,  COVID,  cancer,  Christ,  body
594 In,  In,  in
665  stunt,  rule,  block,  triggered,  notice,  transform
705  hasn,  continue,  won,  keeps,  doesn,  stops
760  and,  but,  then,  eventually, although,  various
808 that
812 tal,  impressive, ,,  notable,  asking,  its
858  blood,  Av,  birth,  University,  healthcare,  uterus
870  reading, Fe,  Tele, Ind,  pre,  From
958 1961
987 19
1050  People,  Pres,  Men,  people,  gu,  Humans
1190  multid,  purpos,  carn,  catar,  incon,  unl
1228  ancestry,  founder,  alumni,  citizens,  father,  personalities
1230  col,  South,  Mill,  Ital,  Ge,  College
1243 izen, ija, pro, &, bron,  1996
1246  scholars,  citizens,  Democratic,  prosecutor,  personal,  community
1256  Sugar,  cuisine,  Iron,  Fire,  Food,  Light
1297 icides
1303  Allied,  national,  domestic, rious,  Democratic, ied
1332  your,  you,  Your,  You,  yourself, You
1371 Qu,  Qu, uff,  Pink, IK, inj
1419  sing,  ducks,  dancing,  golf,  rugby,  chocolate
1421 acting, Qu, aupt, acking, fat, lim
1428  most,  largest,  best,  Most,  closest,  biggest
1441 ), 9,  (, 6, 5, 8
1474  terrible, iless,  someone,  coworkers,  crimes,  a
1488 ?, '?, "?, )?, ?", .?
1510 onna, rico, clamation, anca,  Auto, oston
1562 etics, icks, Mind, ences,  thinkers, ens
1568 ois, ais, ela,  Amy, aqu, au
1571  nothing,  effort,  consensus,  obligation,  Species
1721 ails,  bites,  razor,  rifle,  tricks,  strikes
1771 that
1822  camel,  wolf,  Canadian,  witch,  tar,  lawyer
1831 1,  used,  1
1905  Yes,  What,  Can, Q,  No,  Prize
1911  your,  my,  her
1926  mankind,  crimes, space,  mentally,  officers,  brain
1985 3, 2, 13, 7, 6, 9
2052 (, '?, 10,  Yes,  not, 8
2088 S, K
2102 A,  A, An,  E,  An, E
2120  ost,  nine,  80,  ra,  330,  yards
2129  there,  There, ucer,  covered,  series,  coming
2170  road,  sky, seat,  attractions,  film,  pavement
2255  Puerto, 6,  Denver,  Vancouver,  Luxem,  Miami
2259  Fund,  cost,  restrictions,  costs,  powers,  batteries
2340  Way,  Valley, ),  Massachusetts,  Nevada,  Angeles
2347  Declaration,  International,  Commonwealth,  national,  Cre,  The
2411 5, 7, 6, 9, 8, 13
2415  pillar,  object,  Angel,  Area,  circle,  Venus
2471  band,  solo, rans,  canon,  penal,  electrical
2519  living,  nearly,  unanimously,  expecting,  original,  just
2548  ',  ", -,  per
2621  Friedman,  labor, riminal,  republic,  politician,  Witch
2642  no,  No, Ċ, rit, 't,  unlikely
2749  be,  a,  unusually,  necessarily,  is,  an
3020 :
3049  spiritual,  Black,  Arab,  Hindu,  Ital,  biological
3052 akes, idden, sea,  Go,  Dreams,  dream
3068  there,  All, All,  Everyone,  There,  Have
3097  67,  330,  94,  58,  44,  variable
3117 if,  if,  If, If
3123  You, You,  you
3129  there,  There,  happens
3149  (, U,  originally,  Future,  plans
3211  used,  intended,  structured,  learn,  transported,  marry
3275 estion,  Breast,  Honey,  infection, isexual,  Nut
3324 If,  If,  if,  unless, if,  When
3357  Kn,  Sn,  Tar,  Cr,  Sha,  Sant
3422 orc, tec, amation,  injured, inkle,  evil
3448  Norris,  valuable,  Bob,  Be,  no,  Col
3459 ro,  third, one,  Most,  characteristic,  hypothetical
3508  It,  it
3514  blind,  hood, the,  gun,  tort,  cat
3522  Space, iology,  speech,  Economics,  bricks, waters
3563  than,  Than
3602 rapy, izers, cards, uncture, illation,  Way
3611  modeled,  achieved,  grant,  asking,  led,  lets
3686 won
3830  Brian, Bill,  Jeff,  David,  Robert,  James
3838  Prize,  Nobel
3851 istic, ormal,  unequal, izers, otion,  analogous
3874  happen,  happens,  happened
3894  By, uously,  No
3936  All, All,  all,  Everyone,  everything,  everyone
3939  has,  have,  claimed,  hasn, iably, 've
3945  1990,  1970,  weeks,  1981,  2001,  1950
3960  Way, inkle, well, inally, hr, beth
3979  Az,  reign,  plan,  Pon,  tar,  ra
4051  can,  could,  cannot,  may,  Can,  must
4056  analogous, istic,  impressive, ined, rious,  affordable
4066  analogous,  ideal,  am,  devoted,  unlikely
4082  qual,  silver,  gall,  chocolate,  olive,  chess
4152  boo,  try,  agree,  speak,  love,  notice
4270  on,  onto,  On,  across,  against,  via
4305 ",  New, !", New,  inaugural, "-
4368 Q
4382  mean,  estimated,  demonstrate, probably,  describes,  unlikely
4498 ?, )?, "?, '?, .?, '.
4628 What,  insulin,  salmon,  What,  butter,  oil
4673 Q, What,  What,  did, Qu,  does
4782 graph, Cap, :, ham,  (, rop
4788  Sn,  Bl,  P,  p,  T,  pe
4885 ,
4892 by
4907 Q
4911  Quebec,  Massachusetts,  Toronto,  Dublin,  Paris, ivia
4925 hasn
4928 ?,  optimization, '?
4941 &, illary, ering,  Dec
4943  Sundays,  weekends,  minute,  evening,  sky,  midnight
4987  obligated, istic,  impossible,  unlikely,  problem,  idea
5104 An,  These,  Their,  They,  Only,  involves
5106  Who, What,  What, what,  what,  Where
5107  No,  impossible,  no,  unlikely,  Nothing, t

Layer 3

[Dimension] [Top Input Tokens]
79  Nik,  Ir,  Two,  Poland,  Pol,  spectacular
153  biological, iga
156  attracted,  rescued,  confined,  trouble,  provided,  avoided
167 ft,  Lis, bo, ifer,  Loren
244 (, 6, 5, 3, 7, 4
349 Ċ, ard, ifer, ruct, ively, stra
507 32,  1950,  Pole, ple, isation, number
714  Anto, controll,  along, ri, waters, rans
779  Cro, stra,  Cron,  Bar,  Knowledge,  Crick
811 bar, lang, rio,  McC, oph, off
905  order,  Little,  exagger, U,  atmosphere,  sand
932  there,  There,  lots,  You,  no,  It
946  spoken,  Bach, Mar, Cap,  Dec,  modeled
976  Cra,  Ber, aff,  Bach, ign,  Er
1119 Q, izers, pen, bar, fe, oux
1140 1, 2, ak,  unusual,  Story,  upon
1176 Q
1217 cards,  biological
1230  moment,  career,  normal,  position,  condition,  tendencies
1247  cooler,  II,  taxed,  sad,  bars,  decrease
1408 :
1605 ney, we, vere, ana,  Loren, rio
1632 uther, ind, onna, lock,  Declaration, ler
1637  colonial,  Hollywood,  Asian,  Indonesia,  Portuguese,  Florida
1750  abundant,  useful, known,  affordable,  basic, cards
1774 20, Ċ,  Related,  Rum,  thirty,  unequiv
2008  million,  anymore
2016  leg,  mist,  mant, ble,  watches, suit
2031 immer, yl, away,  across, iz, wart
2192 lang, bron
2451  bites,  brush
2455  Golden,  Elvis,  Solar, Steve,  ice,  chocolate
2541  blocked,  parchment,  cocaine,  permission
2588  highest,  position,  Declaration,  not
2604  arms,  swe,  boss,  alcohol,  gum,  chairs
2610  abdomen,  heavens,  mankind, sts, ano,  further
2636 ll,  will,  should
2657 S, rugu,  Belgium,  Italy,  Greece, K
2658  Yellow, oused,  preferred
2689  Yes,  No,  If,  Unknown,  There,  Only
2720 Q,  Can, (,  eats,  tons,  love
2728  Happ, of, law, app, war,  lots
2780 rain,  head,  score
2803  trans, path,  officers,  ceremonies, rop,  pilots
2943  accept,  interrog,  teach,  predict,  inflict,  save
2978 Lis
3019 aro, iga,  Cape, asses, more
3095 C, Tw, enth, na, ch, ISA
3128  stove,  River,  cord,  investor,  bird,  Tri
3213  Er,  biological,  Europeans,  AI,  em, brates
3231  Rec,  Bel,  Ac,  Sch,  Ad, uther
3281  inflation, ports,  yellow,  pneumonia,  video,  thirty
3421 ual,  Col,  collected,  credited, und,  obligated
3556  Can, Q,  Antar,  Have,  shower,  reb
3632 aff, rapper
3675 otics, ella, uit, ilis, icorn, ija
3717 pp, ater,  ent,  responsible, aro,  refer
3723  lazy, dig, cre,  talent,  skilled,  confined
3744  kind,  weeks,  thirty, 1000,  backwards,  happens
3799  Carl,  Bryant,  Holmes,  Freud,  Cunningham,  Curry
3820  strikes,  such, rans,  strike,  hid,  when
3873  less,  more,  decreased,  stayed
3881  Tiger,  Pink,  Fire,  Sugar,  Birds,  Rich
3901  than,  Than
3956  U, tw,  Ang, 1990,  Lanc, Po
4000  Pole,  atmosphere,  Building,  disorders,  pregnancy,  yours
4054  unusual,  approved, named,  thousand,  several,  getting
4140 A,  Allen,  An,  A,  Di,  com
4157  describes,  whether,  third,  Because,  statements, ey
4181  How,  Where, bla,  Known,  reduce, what
4310 &
4393 13, 3, 12, 5, 1, 4
4426 aqu,  Ber, Mar, enn, oge
4435  Queens, Real,  1961,  NY,  2003,  Trans
4443  The,  Kon, The
4469 ), :, illi,  Viol,  Spot, lim
4530  relatively,  accessible
4589  the,  your
4616  recorded,  notable,  existing,  basic,  several
4670  deprive,  historically,  recently, FK, shit, Bill
4702  (, oph,  Rh,  Dec
4709 bla, iga
4724 ?, "?, .?, ?", '?,  upset
4767  Despite,  Does,  unusual,  Do,  Did,  What
4881  Sea,  grasp, Cap,  record, angle
4956  tells, &,  contributes,  hasn,  comes,  came
5021 Q,  Can
5076  separately,  action,  grid,  lasts,  cleans,  plot
5081  drops,  keeps,  reduces,  improves,  provides,  increases
5108  minute,  average

Layer 4

[Dimension] [Top Input Tokens]
114 Q,  unequal,  Gulf,  Tenn,  extr,  GDP
171  ours,  various,  instantly,  exact,  technically, Ċ
213 och,  Walt,  corner,  length,  composition,  dose
229 och,  Little,  mention, ot, af, /
266 A,  15,  atomic, Ċ,  official,  My
386  Dec,  Rod,  send,  Cron,  catar,  tou
408  grant,  Priv,  genuine,  absolute,  typically,  legally
472  smell,  Jupiter,  auditory,  thinkers,  Venus,  razor
547 Dec
647 och,  length,  dose
946 Q, ais, (,  smash, pir, iele
1158 28
1607 6, 5, 7, 3, 1, 4
1635 length
2327 digit,  deepest,  abundant,  official,  perfect, icking
2448 ais, St,  arriv,  even,  pushing, ous
2747 ais,  navig, ag,  Sov,  Kore,  Y
2989 och, ala,  Knowledge,  participants
3048  15,  atomic,  tells,  obese,  undercover,  Yellow
3265 Q
3379 (
3655 âĢĻ
3829  unanimously,  miserable,  Fox,  absolute,  Imm,  deepest
3870  smash,  prick,  learned,  lets,  extend,  imagine
4061 ),  Pink,  Rich, ali,  Most,  Carolina
4083  lots,  length, och, af,  corner,  dose
4090  directed,  Franklin, elson, ek,  Fleming,  Auckland
4279 ais, St,  La,  Lav,  Gal,  Ost
4524  absolute,  Little
4624 af, ath
5079 och, 12, 9, 8, 6, 10

Layer 5

[Dimension] [Top Input Tokens]
83  penetrate, ensory,  breathe,  bites,  distract, end
291  fats,  sequences, ats, who,  miracles, isions
367  deepest,  official,  perfect,  atomic,  presidential, digit
444 2, 1, 6, 3, 4, 7
556  Cash,  Hillary, Q,  Bond, go,  Tea
560 becoming
567 2, 3, 1, 4, 6, 5
587 Return,  atomic,  Person,  official,  composed, room
594  stayed,  although,  lacks, although,  poorer,  It
646  Be, &,  che,  Che
674  P,  p
733  Did,  Can,  Should,  Does,  Was,  Is
758 Q,  jet
790  pent,  Miranda,  Middle, St,  Ex,  Pil
982  accumulated, asses
985  approximately,  below,  Wait,  lasts,  wait,  7
1081  several,  seven,  13,  5,  hundred,  six
1090  Theorem,  founder, root,  Wizard,  root, cil
1258  deprive,  warn,  invoke,  leaving,  causes,  discovered
1418  What, what,  what,  nothing,  7, What
1492  ind, San
1592 Tar
1644  F,  ch,  C, H, F,  sc
1665 pp, abl,  Clock,  ey, erv,  text
1695 :,  How, aqu, icking,  Ger, )
1893 extr
1963 (,  (,  cl,  entr,  November,  Gal
1996  undergo,  temporary,  high,  permanent,  lacks
2007  What,  How,  highest, All,  various, icking
2185  temporary,  permanent,  invoke,  feel,  simply,  uterus
2262 Ċ, Q, "., '., .", !"
2340  except,  United,  Great,  visiting,  refers,  Luxem
2361 aqu, St,  acc,  Be,  Puerto,  mentally
2445  instantly,  slowly,  should,  could,  immediately,  drank
2453 izen,  becoming,  Be,  Qu,  NY,  gy
2495 ayan,  calorie, asses,  powdered,  soft,  accumulated
2730  prep,  meat,  jail, inner,  Yoga,  Toast
2755 delicious
2803  presidential,  national,  Federal,  conservative,  Cod, fly
2810  laughter,  helium,  Arizona,  atmosphere,  extinct,  lungs
2858 iss, iv, urop,  gr, isc, ruct
3001 ?, ?, "?, .?, )?, '.
3195 ented,  Has,  deprive,  Did, acting,  warn
3233  analyzing,  When,  receive,  How, :,  Where
3428  U,  I, I,  O,  s,  i
3477 rico,  Blake,  Justin,  Albert,  Jeff,  Charles
3493 1, 2, 3, 4, 6, 5
3504 Q,  able,  accurately, 2, iele,  refers
3528  speaks,  visiting,  except,  conquered,  In,  vs
3564  Person, Return,  end,  rest,  Theorem,  list
3576  navig, Ins, ais,  Pri,  Priv, icking
3688  ch,  Ch
3767 Ċ
3944 ais,  Tar,  absolute,  Gal, Ins,  Ther
4072  banned,  outlaw,  cars,  accepted,  originated,  scores
4085 2, 3, 1, 4, 6, (
4114  Asian,  oldest,  cultural,  Trans, New,  aster
4228 work
4274  No,  comment,  Yes,  unclear,  definite,  conclusive
4560 ais,  Priv, Ins,  Tar,  aster
4569 (,  (, Q,  Type,  Gen, Dr
4597  highly,  openly,  becoming,  necessarily,  Ger,  unusually
4649  necessarily,  highly,  totally,  entirely,  relatively,  unusually
4688 Return,  Person,  official,  Theorem,  lots,  rest
4807 including, .?, El,  Qu, Real,  Mer
4827  organized, uffs, fed,  meat,  decreased, ella
4854  Person, âĢĻ,  rest,  Theorem
4874 phants, ats,  fear,  girls,  Scientists,  pigs
4937  rabbit,  delicious,  living,  praying,  electric,  official
4983 fat,  Democratic,  Most,  conservative,  educational,  Ger

^{^}
My experience with the bigger models leads me to think that, plausibly, better results for those other layers could come from different sparsity values. That is, maybe, there isn't a single best sparsity for all layers of a model.
^{^}
Heatmap code courtesy of Alan Cooney's CircuitsVis library.

^{^}

`Llama-2 7B` Autoencoder Data

Layer 13

[Dimension] [Top Input Tokens]
34 ▁All
109 2, 3, 2004
120 <s>
127 ▁England, ▁dollars, ▁Italian
206 ▁means, ▁refers, ▁composed, ▁learned, ▁hid, ▁she
207 ▁society, ▁portal, ati, unker, ▁Order, ▁mission
253 ▁said, ▁wrote, ▁designed, ▁statement, ▁directed, elled
277 ▁dan, ▁po, ▁dess, ▁Know, ▁conce, ▁Har
328 <s>
331 ▁program, ▁intelligence, ▁computer, ▁artificial, I, ▁Rob
336 ▁foot
392 ▁nin, ▁did, ris, ▁ugly, ▁differ, ▁por
416 ▁except, ▁and, aria, ries, ▁Bel, ▁vs
444 ▁few, ▁Many, ▁Very, ▁unlikely, ▁fewer, ▁Most
527 ▁high, ▁college, ▁graduated, ▁school, ▁finish, ▁teachers
629 ▁grown, ▁presented, ▁without, ▁moved
666 ▁stayed, recogn, ▁consist, ▁same, ▁stay, ▁equally
667 A
703 ▁entrepr, XT
774 ▁diam, ▁ugly, ▁por, ▁vision, ▁artists, ▁news
820 ▁pushing, ▁hide, ▁hid, ▁lying, ▁telling, ▁inform
823 ads, won, urus, ▁boys, ▁Sib, ws
842 ▁particular, ▁Nothing, ▁nothing, ▁happens, ▁happen, ▁anything
863 ▁happy, ▁prosper, ▁hun, ▁experience, ▁stub, ▁will
867 ▁levels, ▁accum, ▁blocked, block, ulated, ▁waves
904 ▁expect, ancy, ▁extend, ▁growth, arter, ▁gain
941 <s>, <0x0A>
1114 ▁after, After, ▁August, atra, ▁War, ▁began
1146 ▁position, ▁rate, ▁link, ▁phrase, ▁sound, ▁purpose
1200 ▁its, ▁tries, fr, ▁Its, ▁Har, ▁national
1221 ▁In, ▁in
1354 ▁planet, ▁solar, ky, ▁Earth, ▁Sol, ▁System
1408 ▁name, ▁named, ▁called, ▁height, amed, ▁friend
1522 ▁shorter
1705 ▁Diet, ken, olate, father, orie, can
1728 ording, ▁marry, aking, ▁accept, hing, ▁hitting
1730 rial
1735 anned, wed, ▁cens, ▁still, ▁remain, ▁ban
1739 ▁figures
1787 <s>, Q, 7, 6, 8, ▁Why
1804 ▁Philadelphia, ▁Paris, HT, ▁ha, ▁Rome, ion
1834 ▁examples, ▁example, ▁some, ▁Notable, ▁characteristic, ▁cases
1940 ▁Theorem
1949 ▁no
2063 <s>
2100 pie, ▁single, enta, ▁orange, ▁minute, ruit
2128 ▁leave, ▁stick, ▁suspect, ▁draw, ▁sees, ▁disturb
2233 ▁among, ▁case, ▁if, ▁aid, ▁contribute, ails
2252 ▁speak, ▁wore, ▁am, ▁accept, ▁holding, ▁recommend
2268 ▁interesting, ▁Person, ▁Time, ▁Year
2443 ▁hid, ▁film, ▁Grand, Cast, ▁Cost, To
2455 ▁foot, ters, ▁horn, iums, ▁scales
2511 rac, hard, umann, ems, hner, fe
2527 ▁dogs, ▁positive, ▁verte, ▁prime, br, ▁Christians
2612 ▁Stars
2648 ▁It, ▁it, ▁They, ▁him, ▁dist, ▁they
2708 ▁Sm, ▁Video, ▁Crit, ▁Organ, ▁Disc, ▁Le
2792 ▁eyes, ight, ▁battery, ▁fingers, ▁damage, rain
2856 ining, ▁stops, ▁always, ▁forever, ▁never, ible
2976 ▁purchase, ▁obtain, ▁add
3020 ▁Why, ▁Who, ▁Where, ▁What, ▁Which, ▁How
3029 ices
3114 ▁used, ▁crashes, ▁spent
3227 avia, ▁Dutch, oa, ians, ▁Indians, ests
3258 ▁phen, FO, ormal, ▁ESP, ition, ▁medium
3324 ▁well, ▁add, uent, ▁numbers, ▁talk, ▁accomplished
3342 ▁Nobel, riz, ▁Prize, ure, ▁Theorem, ▁Olympics
3354 %., cer, ya, ., )., ▁determ
3490 5, ▁entrepr
3516 ▁player, ▁greatest, ▁basketball, ▁popular, ▁desert
3598 ll, ▁ticket, ▁would, ▁license, ▁need, ▁must
3599 ▁only, ▁located, ▁lets, ▁refuse, ▁contain, Only
3611 ▁fans, ▁Christians, ▁Only, ▁good, ies, ons
3671 ▁designed, ▁started, ▁Who, ▁founder, ▁invent, ▁first
3756 digit, ▁atomic, ▁double, ▁risk, ▁prime, ▁official
3823 ▁destroyed, ▁ax, pped, ▁cho, ▁lifted, ▁attacked
3843 ▁restart
4000 <s>
4027 ▁particular, ▁happens, ▁happen, ▁ways, ▁aspects, ▁injured
4061 <s>
4065 ▁best, ▁favorite, icious, imate, ite, ▁greatest
4087 ▁eight, ▁five, ▁thirty, ▁several, ▁seven, ▁three
4106 <s>
4309 XT
4426 ▁creation, ▁board, ▁campaign, den, ▁move
4452 ector, activity, ▁meters, ▁skills, can, una
4460 ▁as, ▁well
4478 can, ▁convention, cy, ests, las, ucha
4483 ▁smaller, ▁larger, ▁rich, ▁Rich, ▁poor, ▁pover
4573 enda
4576 ▁right, board, ▁Last, ▁girls, ▁Rem, ▁Er
4593 ▁(, A, ▁easiest, ▁tells, ▁personally, Q
4617 <s>, ▁proofs, ▁varied, ▁accessible, ene, ▁distinct
4671 ▁stayed, ▁keeps, ▁stay, ▁keep, ▁continue, ▁consist
4743 aked, ▁flat, olen, aged
4748 ▁seat, at, ▁back, ▁side, ▁lap, ▁bus
4766 isons, ▁Greece, uto, ▁contribute, ▁twenty, ▁sing
4908 ▁marry, ▁Your, ▁your, ▁my, ▁My, ▁their
4981 ▁Person, ▁magic, af, ▁Mal, imal, ▁Notre
4984 amp, ead, ylvan, itch, ires, ▁drag
5055 %., '., )., ., "., ."
5057 <0x0A>, 3, 4, 5
5171 ▁illegal, ▁legal, ▁ban, ▁allowed, ▁prohib, law
5309 ▁in, ▁among, ▁In, ▁across, ▁during, wed
5310 ▁phrase, ▁term, ", word, OS, ingo
5330 ▁exact, ▁precise, ▁reliable
5413 ▁foot, s
5465 :, ▁Is, ▁Are, ▁Does, ▁Was, ▁How
5517 ▁Sydney, ka, ington, apolis, ▁Chicago
5557 ▁remain, ▁yours, ▁activities, ▁films, ▁subjects, ▁song
5565 ▁', ▁", ▁word, E, ▁phrase, but
5624 ▁planet, ▁systems, ▁potential, ▁unique, ▁similar, ▁phase
5639 ▁shares, ▁gets, ▁got, ▁smoke, umes, ▁produces
5687 )
5704 question, ▁prompt, ▁fact, ▁question, ▁shared, ▁instruction
5852 ▁doubt, ▁seen, ▁told, ▁sure, ▁shown, ▁personally
5862 <s>
5890 och
5922 ▁a, ▁A, ▁an, An
5942 <s>
5968 ▁Why
6003 key, it, ▁rabb, ▁mouse, ▁husband, aker
6009 vis, ▁Steve, ary, ▁baby, ▁Boston, ▁Scottish
6019 ▁Pot, ▁Harry, ▁Row, iz, arts, w
6066 ▁learned
6147 ▁Uruguay, ▁Chile, ▁sib, ▁Luxemb, ▁Sib, ▁Pakistan
6215 ▁There, ▁Nothing, here, ▁no, ▁nothing, ▁Now
6319 inos, avia, ▁descent, enders, ▁third, ▁budget
6348 <s>, <0x0A>
6374 etes, ama, ▁cookies, rio, esa, ▁Light
6420 ▁The, ▁Break, ▁Si, ▁Sig
6660 :, ▁Despite
6661 ▁element, ▁animal, ▁desert, ▁factor, ▁university, ▁sport
6702 ▁the
6739 ▁U, ▁Des, ▁Cur, ▁Sy, ▁Diet, ▁fam
6744 ▁dawn, ey, working, ulf, XT, ▁saf
6778 ▁mount, ▁identify, ▁specific, ▁onto, ▁predict, ▁let
6997 ER, ▁Independ, ▁Little, ▁navig, ellow, anst
7031 ▁Chile, ▁Venezuela, ▁China, ourg, ▁Canad, ▁Switzerland
7082 ▁Some, ▁some, ▁sometimes, ometimes, ▁kinds, ▁Many
7094 <s>
7104 ▁Q
7130 <s>, round, ▁Goth, ctions, ▁attra, ▁architecture
7154 we
7216 ▁varied, rane
7224 ▁No, ▁non, ▁Every, ▁Near, ▁Non, ▁Last
7231 working, orf, round, itte, XT, ▁dawn
7257 ometimes, Only, ▁whole, ought, You, ▁peace
7271 ▁tin, il, ▁silver, ▁wooden, ▁hat, ▁fo
7297 ▁Mus, ▁Mun, ▁bos, ▁mus, ▁Lis, ▁Cur
7306 ▁United, ▁Republic, ▁Council, ▁Middle, ▁Great, Un
7372 ▁improve, ▁helps, ▁causes, ▁extend, ▁boost, ▁affect
7381 ▁can, ▁Can, ann, ▁canon, ▁cannot, ▁could
7415 ▁largest, ▁animal, ▁giant, ▁living, ark, ▁large
7441 ▁Star, ▁Little, le, ▁Dragon, Tw, AS
7448 )
7490 ▁knows, ▁know, ▁knew, ▁agree, ▁admit, ▁learned
7511 ▁tum, cin, ▁aut, ▁cancer, etes, ism
7602 ▁composer, ▁unknown, ▁specify, ▁unclear, ▁individual, ▁recorded
7624 ▁UK, ▁Florida, ▁Bible, ▁US, ▁estimated
7653 :
7673 ▁someone, etal, ▁baby, ▁determined, ▁determine, ▁sex
7716 2, 3
7787 ▁Sym, ▁Ult, ▁kin, ▁cart, ▁Linear, ▁Ge
7831 <s>, ▁entrepr, pr, rane, ▁Q, ord
7833 ▁Orange, father, acre, ust, ye, ▁List
7883 <s>
7907 <s>
7980 DP, ▁terms, ▁per, ▁median, ita, ▁income
8023 ▁extr, rial, ▁over, ▁origin, ▁root
8026 ▁Pennsylvania, ▁Carolina, ota, las, ▁Alabama, hner
8078 ▁am, ▁I, ▁My, m, I, ▁my
8091 ▁Three
8095 ▁, ▁$, 9, ▁last, /, ▁War
8117 ▁without
8135 ▁countries, ▁cities, ▁country, ▁nation, ▁county, ▁city
8144 4, ▁four
8181 ▁pos
8206 what, ▁what, ▁which, )?, ▁situations, ...
8221 ▁root, imate
8227 ▁reflection, ▁stick, ▁while, ically, ▁dropped, ▁inform
8287 ▁soon
8314 ▁You, You, ▁Your, ▁They, ▁you, ▁We
8367 ▁easiest, iest, ▁biggest, ▁largest, ▁favorite, ▁interesting
8376 ▁depends, ▁corner, ▁distinct, ▁Because
8481 ▁mention, ▁discuss, ▁use, ▁accept, ▁change, ▁hid
8484 ▁Q, ▁All, ▁Every, here, ▁Part, ▁Near
8515 ▁similar, ▁valuable, ied, ▁properties, ▁systems, ▁notable
8536 ▁November, ▁August, ▁July, ▁pm, /, ▁May
8547 ▁wall, ror, ▁mirror, ▁beautiful, ▁anymore, ▁Little
8635 ener, ▁grow, ▁back, ▁reg, ▁grows, ▁two
8675 XT
8737 pan, ▁Muslim, ▁Korean, ▁Asian, ▁Lat, ▁Chinese
8761 ▁smoke, ▁consume, umes, ▁drink, ▁shares, ▁work
8812 ▁list, ment, ▁Way, ies, ames, ancy
8842 ▁new
8877 ?, "?, )?, ?", ▁compared, ▁compare
8925 atic, edy, ▁reserved, ▁curious, ▁earnest, ▁friendly
8954 ack, ▁Ob, ardo, ▁Mitt, ▁president, ille
8960 ▁either, ▁could, ▁may, ▁fall, iety, ▁possibly
8978 <s>
9008 ▁United
9036 <s>
9069 ▁most, ▁else, ▁least, ▁highest, ▁priority, ▁Most
9270 XT
9288 ▁fact, ▁factor, ▁truth, ▁factors, ▁principle, ▁belief
9384 <s>
9447 aten, ▁treatment, ▁shows, ▁where, ▁contribute, ▁guarantee
9487 ▁twenty
9526 ▁nearly, ▁where
9535 ulf, ▁cultural, ▁divers, ▁looks, ouses, round
9546 <s>, 1, 2, ▁(, 3, 4
9566 aking, ▁rub, hing, ▁tie, ▁touch, ▁disturb
9592 ▁than, ▁near, qual, ▁require, ▁Among, aller
9648 ▁six, ▁days, ▁created, ▁gradually, ▁create, ▁Adam
9660 ▁passenger, ▁produces
9729 ▁The
9765 <0x0A>, ▁strik, ▁Chart, aret, ▁mic
9785 ▁location, ▁ambigu, ▁depends, ▁treated, ▁circumstances, ▁position
9796 ▁add, ▁extend, ▁shares, ▁numbers, ▁smoke, ▁modify
9814 ▁helps, ▁turns, ▁determine, ▁soon, ▁showed, ▁cle
9837 ▁years, ▁minute, ▁year, ▁ten, ▁enough, pm
9866 ▁Council
9888 ▁then, ▁welcome, ▁nothing, ▁knock, ▁will, ▁instantly
9909 ▁hard, ▁worker, ▁harder, ▁effort, ▁efforts, ▁lazy
9973 ▁outside, ors, ▁weather, ▁out, ▁paths, ▁selected
9981 ▁Why
10045 <s>
10066 ▁England, ▁Great, ▁EU, ▁English, ▁Italian, ▁Britain
10136 ▁and, ▁or, ▁while
10138 ▁shown, ▁demonstrated, ▁proven, ▁accepted, ▁confirmed, ▁displayed
10175 ▁visited, ▁set
10183 ▁mother, ▁cord, ▁them, ▁they
10207 inking, ▁moder, ▁quantities, ▁too, ▁dos, ▁consumption
10213 ▁audience, ▁causes, ▁cause, ▁ru, ▁creates, ▁play
10253 ling, opy, ten, ▁Bow, iele, ool
10331 ▁asc, ▁commission, gu, ▁struct, fl, ▁transport
10348 S, ▁US, ▁USA, .,, ▁States, ▁American
10458 ights
10512 ▁visible, ▁jump, ▁sink, ▁lifted, ▁painted, iled
10519 ▁biggest, ▁highest, ▁largest, ▁smallest, ties, ▁city
10523 Q, ▁question, ▁questions, q
10593 ▁Albums, ▁Records, ▁records, ▁Earth, ▁Songs, ▁albums
10616 ▁dollars, ▁much, qual, ▁year, ▁average
10639 7, 8, 9, ▁seven, ▁Seven
10656 ▁located, ▁host, ▁contain, ▁selected, ▁love, ▁spent
10710 ▁increased, ▁decl, ▁harder, ▁expensive, ▁stayed, ▁less
10738 ▁video, ▁record, ures, ▁Video, ▁end, ▁substitute
10796 ▁Sydney, ▁Dublin, ▁Chicago, ▁Toronto, ington, ways
10798 2
10866 gate, win, so, XT, uru, ▁Columb
11069 <s>, ▁Dom, ▁dawn, ▁fran, board, fe
11083 <s>, ray, <0x0A>, ▁Found, eu, clam
11120 <s>, qual, all, erves, ▁players, wed
11218 ▁score, ▁plants, ▁incident, pper, ▁success, market
11229 ▁tower, ▁diverse, ▁vast, enth, ▁varied, XT
11251 ▁Burn, ▁burning, ▁burn, une, ▁fortune, ec
11269 aw, ains, work, uda, ▁Mass, mouth
11287 Real, XT
11297 ames, ▁sometimes, ▁great, ment, ▁top, ▁lets
11302 ▁player, ▁president
11354 ▁entrepr, pr, rane, ord, ▁able
11411 <s>
11549 ▁round, ▁flat, ▁shape, ▁particle, ▁float, ▁forward
11560 fo, ef, ▁tea, nab, ▁lung, ung
11584 ▁leader, ▁released, ▁plays, ▁singer, ▁monarch, ▁achieved
11662 ▁else, ▁anywhere, ▁other, ▁source, ▁places, ▁countries
11704 <s>
11827 :, ▁How, ▁Pay, ▁What, ▁Rel, Q
11841 <s>
11856 ), :, 3, 4, ▁Yes, ▁No
11941 ▁stand, ▁stood, ▁stands, ▁refers, ▁refer, ▁mean
11943 <s>
11947 ese, MI, Is, ▁pover, ▁ob, ▁inequality
11970 ▁gives, ▁Their, ▁wore, ▁provides, ▁stood, ▁should
12038 ▁wall, eth, ▁finger, ror
12046 ▁yours
12097 ▁produces, ▁led, ▁directed, ▁wrote, ▁gets, ▁makes
12156 ▁suffer, ▁suff, ▁damage, ▁experience, ▁receive, ode
12199 ▁measure, ▁players, ▁cars, ▁oil, ▁results, ades
12215 ▁shared, )?, ▁composition, ?, ▁characteristic, ▁song
12357 burg, ija, ellers, ▁Garden, alem, named
12434 ▁Type, ▁Pow, ▁Bl, ▁Altern, ▁Crit, ▁Sm
12453 ▁further, ▁feet, ▁closer, ▁or
12490 ▁twenty, ▁next, ▁tries, night, ▁years, ▁threatened
12496 ▁sink, rown, ode, ▁shoot, ▁kick, ▁lifted
12599 ▁required, ▁always, ▁typically, ▁enjoy
12812 ▁turns, ▁turned, ▁into, ▁new, ▁generate, ▁teach
12814 ▁Egypt, ▁Austria, plane, ▁River, ▁Africa, ears
12931 ▁hour, ▁minutes, ▁wait, ▁Wait, ▁before, ▁weeks
13058 <0x0A>
13133 ▁among, ▁since, ▁twenty, ▁terms, ▁decl, ▁today
13152 ▁no
13201 ▁hitting, ank, itting, ▁child, ▁hit, ▁domestic
13221 ▁produces, ▁stands, ▁Science, ▁stood, ▁gets, ways
13327 ▁Q
13352 ating, iders
13360 ▁Sig, ▁Claud
13366 ▁aren, ▁doesn, ▁hasn, ▁isn, of, ▁strik
13371 ▁relative, ▁forb, ▁subjects, ▁equipment, ▁unusual, ▁brand
13412 <s>
13414 ▁cookie, ▁lamp, ▁television, ▁foot, ▁hat, ▁score
13447 ros, ▁eu, ▁Eu, cs, ▁kr, ▁fran
13462 ▁rice, ave, omy, ▁passenger, ▁VIII, imming
13463 ey, ▁Q, LS
13567 erson, enberg, we
13644 ▁entrepr, pr, rane, ord
13682 ▁optimization, ey, <s>, ue
13701 %.
13710 ▁bars, ▁hit, ▁partner, ▁gun, ▁defense, ▁purposes
13745 ▁further, ▁feet, ▁or
13767 ▁vo, ▁kar, ▁contract, ▁por, ▁ing, ▁ant
13776 ▁full, ▁perfect, ▁absolute, ▁perfectly, ature, oked
13779 <s>
13814 ll, ▁will, ▁would, ▁Will, ▁notice, ▁instantly
13847 ▁Montreal, ▁Amsterdam, ▁Seattle, ▁Boston, ▁Philadelphia, ▁Virginia
13867 ▁Science, ▁scientific, ally, ▁Scient, ▁scient, ▁experiments
13920 question, ▁word, ▁words, ▁once, hand, ▁individual
14013 ▁Books, ▁records, ▁books, ▁Albums, ▁Records, ▁films
14154 place, ▁afternoon, ▁evening, ▁corner, ▁outside, ▁lit
14165 ▁shouldn, ▁acknow, ▁mod
14216 ▁today
14222 ▁No
14307 alt, ril, ina, icole, ardo, ifer
14393 ries, ▁Books, ▁People, ▁places, ▁group, ips
14447 ▁pushing, anim
14462 ▁Fl, ▁AT, ▁Sil, ▁ver, ▁Th, ▁bill
14622 ▁Yes, ▁No, ▁Nothing, ▁depends, here, ,
14646 .", '., ▁purposes, ests, cy, ways
14660 ellow
14684 ▁circle
14703 ads, ise, ises, urus, arks, igs
14708 ▁Theorem, laration, ws, ▁Independ, clam, amental
14737 ▁bos, ▁grasp, ▁overcome, ▁purpose, ▁am, ▁move
14767 ▁cig, ar, igare, ▁anymore, ▁watched, ▁Kansas
14775 1
14807 <s>
14823 ▁returns, %, ▁mile, ▁year, ▁every, ▁scores
14854 ▁Q, <0x0A>, 2, 3, )., )
14932 ▁Joe, ▁Benjamin, ▁Adolf, ▁Christopher, ▁Larry, ▁Michael
14974 ▁ideas, iration, ▁insp, ative, ision, ▁cre
15063 ▁Nick, ▁Pay, ▁Ul, ▁Son, ▁Non, ▁reads
15067 ▁winter, ▁summer, ▁February, ▁Sunday, ▁afternoon, ▁villa
15080 ▁modern, ▁buildings, ▁dawn
15231 ▁Montreal, ▁Indians, ▁Amsterdam, ▁har, icans, ▁Rus
15237 inf, ▁rain, ining, ▁snow, all, so
15318 ."
15350 ▁USA, ▁Video, ▁Records, ▁Sm, ▁Crit, ector
15438 ▁purpose, vention, ▁invent, ▁origin, ▁evol, ▁precise
15467 ▁Hill, ▁El, ▁Bern, ▁Fund, amental, ▁Jenn
15506 iju, nab, itution, ▁Dru, ▁burning, rooms
15518 och, ky, ▁Notre, ▁Lanc, ess
15552 ▁Only
15563 ▁entrepr, pr, rane, ord, ▁able, orf
15591 ▁been, ▁turns, ▁helps, ▁unsafe, ▁had, ▁spent
15635 ▁Mount, ▁Saint
15636 ▁examples, ▁example, ▁characteristic, ▁por, ▁Are, ▁some
15707 ▁gets, ▁produces, umes, ▁consume, odia, ▁slightly
15721 ways, xygen, ▁Bush, ▁Columbia, ▁carbon, ▁Jordan
15751 (, <0x0A>, Q, 5, 6, 4
15763 ▁won, ▁win, aten
15835 <s>
15849 ▁rest, ▁criminal, ▁face, ▁trial, ▁tries, ▁Little
15864 ▁weeks, ▁across, ▁miles, ▁months, ▁million, ▁drive
15872 ▁values, ▁prices, ▁rates, ▁costs, comes, ▁price
15987 ▁involve
15999 XT, ▁dawn, LS
16028 ▁university, ▁city, ▁mode, ▁island, ▁Saint
16029 ▁average, verage, ▁median, ▁approximately, ▁typically, ▁estimated
16030 ▁event, ▁activities, ▁subjects, icas, ▁trait, ▁date
16041 ▁Afr, ▁Indians, ▁Spanish, ▁Japanese, ables, ▁Portuguese
16101 ▁organized, ▁shed, ▁playing, cial, ▁passenger, ographic
16110 ▁transform, ▁knock, ▁invoke, ▁fall, ▁join, ▁lifted
16129 A
16138 <s>
16219 ▁grown, ▁necessarily, ▁food, ▁bread, ▁consumption, ier
16413 ▁plants, ables, ▁Asia, ▁veget, pes, ▁science
16574 ▁finish, ▁graduated, unk, ▁college, ▁school, ▁high
16578 ▁lines, ▁position, ▁positions
16617 lam, po, augh, inden, enberg, ait
16637 ▁letters, ym, ▁word, ▁letter, ▁abbre, ▁phrase
16649 ▁exact, ▁composition, ▁song, our
16659 ▁located, ▁selected, ▁further, ▁host, ▁official, ▁contain
16786 ▁letter, ▁named, ▁phrase, ▁word, ▁type, ▁color
16798 ici, icy, ili, pper, eds, ▁pe
16806 ▁dess
16811 ▁entrepr
16828 ▁further, ▁closer, ▁feet
16837 ▁fortune, une, oo, gly, ▁Iron, iger
16884 FI, ▁Time, ▁Scient
16907 ▁pover, MI, ▁rib, ▁hours, ▁income, DP
16973 ▁studied, ▁study, ▁imagine, ▁prep, ▁hard, cis
17013 ▁(, A, :, 5, 4, 6
17061 ▁than, ▁Mount, iders, ▁Sam, ▁Bel, ▁San
17141 ▁independent, ization, ▁joined, ▁conquer, ▁colonial, ony
17224 ▁(, (, 1
17232 ▁entrepr, pr, rane, ord, ▁able
17397 ▁particular, ▁individual, ▁specify, ▁correlation, ▁normally, ▁necessarily
17466 <s>, uda, ellow, ky, /, fr
17481 ▁Nothing, ▁nothing, ▁everything, ▁soon
17621 ▁build, force, ▁en, ▁via, ▁perform, lict
17683 ices, ice
17739 ▁if, ▁unless, ▁when, ▁because, iting, ▁case
17818 ▁among, ▁involve
17841 ▁because, ▁Because, ▁then, although
17850 ▁have, ▁Have, ▁I
17865 <s>
17916 las, enz, Out, enberg, lain, umann
17944 ▁resources, alem, LS, ▁varied, ouses, stal
17968 arter, ▁smart, ▁minds, ▁performance, ent, ▁intellig
18023 ▁Austria, ▁Wales, sh, ▁differently, ▁they, ▁pay
18108 erves, all, ▁well, fall, ▁level, ▁big
18155 <s>, ulf, ▁Q, agger, itan, ▁oldest
18162 ▁more, ▁fewer, ▁less, ▁lower, ▁bigger, ▁greater
18199 ym, National, All, ▁abbre, Sh, For
18207 inc
18208 ▁If
18226 ▁try, ▁news, ▁currently, inos, ▁similarly, ▁fans
18426 unk, ▁terrible, ▁student, ▁teachers, ▁physics, ▁graduated
18497 ▁lie, ▁lies, ▁lying, ▁false, ▁li, ▁statements
18530 ▁VIII, omy, ▁passenger
18844 ▁Light, rio, ▁Cruz, ▁Egypt, ▁aircraft, feld
18849 ▁What, ▁How, ▁Who, ▁Which, ▁Where, ▁Why
18873 ▁entrepr, ▁Q, pr, rane, ord, lang
18903 ys, ▁bu, age, ▁loan, ▁marks, ▁purchase
18967 ▁top, ▁recent
19018 <s>, ▁remains, ▁still, ames, ▁Diet, ▁totally
19039 ▁definite
19147 ▁How, ▁Where, ▁What, ▁Who, ▁how, ▁Which
19234 ▁position, ▁lines, ▁pow, ▁positions, der, ▁liquid
19326 ▁F, ▁C, ▁Bal, ▁RA, ▁Bor, ▁L
19352 ▁USA, ays, ▁Scottish, ▁approximately, ▁Fund, ley
19394 ▁Chart, ▁Bl, ▁rein, ▁Deep, ▁Rain, ▁wat
19457 ▁tells, ▁keeps
19564 ▁You, ▁I, You, ▁It, ▁We, ▁My
19588 <s>, ouses, ▁overcome, cket, ▁comedy, ▁fame
19600 ▁Brit, ▁USA, ▁Americans, ▁Men, ▁Napoleon, ▁Rich
19604 <s>
19620 ▁equally, ▁similar, ▁as, ▁well, ▁same, ▁similarly
19672 enda, ▁Order, ma, ▁yards, ters, ingo
19717 imal, ay, an, angol, ino, cel
19741 ▁split, ▁handles, ▁shape, ▁officers, ▁answers, vention
19799 very, ▁restored, uses, icted, inction, itable
19838 laimed, ▁ranking, ▁hub, ▁capital, ▁attra, ▁facilities
19839 ▁Catholic, ▁doctrine
19947 here, ▁Nothing, ▁nothing, ▁Many, ▁Albums, ▁few
19953 icks, ▁stuck, ically, ▁stick, lla, raw
19966 isons
20119 ort, ▁Bl, ▁Sib, ▁ap, ▁Ste, ▁Brun
20147 ▁depends
20184 ▁purpose, ▁easiest, ▁useful, ▁risk
20293 ▁gen, ▁shares, ▁share, ▁percentage, ells, ▁neur
20407 ▁third, %, ▁significantly, ▁proportion, ▁percent, ▁budget
20427 ▁Tal, ib, ovi, ▁Afghan, ▁Pers, ▁Confeder
20441 ▁released, ▁achieved, ▁leader, ▁studied, ▁gained, ▁later
20646 ▁further, ▁feet, ▁closer, ▁close, ▁closest, ▁miles
20659 ear, aring, foot, ▁wear, othing, ▁wrap
20673 ▁Prize, riz, laimed, ▁star, ▁attempt, ▁professional
20681 ▁said, ▁Great, ▁started, ▁wrote, ▁part, ▁behind
20687 ▁winter, ▁February, ▁Sund, ays, ▁breakfast, ▁cold
20698 ▁Kingdom, K, ▁UK
20784 <s>
20800 enn, pan, ni, ▁Muslim, olog, ▁Lat
20945 ▁entrepr, pr, rane
21027 ▁construction, ▁development, vention, ▁existence, ▁founder, ▁approach
21034 <s>
21231 ties, ▁guns, ates, ▁players, ▁scores, aches
21236 ▁scheme, agers, ▁then, ▁work, ▁working, ▁running
21249 ▁new, ▁some
21250 orney, ▁lawyer, estic, uses, iot, ▁caught
21328 ▁right, ▁while, ▁non, board, ▁style, ▁Rem
21463 ▁calling, ▁asking, ▁searching, ▁testing, ▁hot, ▁contact
21518 ▁The, ▁Their, ▁Your, ▁These, ▁Our, ▁My
21577 ▁been, ▁seen, ▁sometimes, ▁shown, ▁now, ▁Have
21682 ▁nin, ▁song, ▁vo, ▁ing
21683 ax, ional, aged, usion, ▁fict, ▁plot
21728 <s>
21851 ▁world, ties, ▁universe, ▁Way, ▁Asia, ▁sky
21856 XT, ▁dawn
21867 ▁characteristic, ▁activities, ▁yours
21947 etal, ▁minute, ▁below, ▁heart, ▁vary, ▁rate
21963 3, 2, 1, (, 4, ▁produces
21979 ▁Hun, ▁Per, ▁Tr, ▁Ro, ▁Ban, ▁Ts
21987 ▁poor, ▁separately, ▁Rich, ▁rich, ▁husband, ▁differently
22052 ▁Sydney, ▁Columbia, eton, ▁Tr, ▁Manchester, ▁Pr
22055 )
22060 arts, ▁this, ▁This, ▁students, ▁These, ule
22070 <s>, ▁Q
22073 ?, )?, "?, ?", ▁compared, ▁differ
22167 ▁entrepr
22201 ▁sales, ▁trick
22216 ▁happens, ▁happened, ▁happen, ▁action, ▁occurs, ▁occur
22242 ▁across, ▁disturb, ▁stick, ▁draw, ▁containing
22311 ▁dro, ▁ver, ▁pro, ▁bill
22353 ancy, ▁extend
22357 ▁ways, ▁suffer, ▁negative, ▁overcome, acles, ▁suffering
22367 ▁struck
22388 ▁include, ▁although, ▁-
22444 ▁thirty, ▁square, ▁twenty, ▁nine, ▁ten, ▁seven
22457 ▁consume, umes, ▁produces, rank, ▁designed, ▁eat
22471 ▁figures, ▁element, ▁player
22624 ▁leader, ▁released, ▁tou, who, ▁achieved, ▁Name
22722 ▁varied, ▁tower, ▁diverse, ied, ▁stor, ▁cultural
22730 ▁Mex, ▁Mexican, ▁Mexico, ▁Afr, ▁Puerto, ▁Afghan
22822 ▁million, ▁evening, ▁AM, ▁billion, time, ▁inches
22951 ▁exhib, Os, ▁leadership, going, ▁leaders, like
22967 ▁Fre, ▁Mel, ▁Or, ▁Ald, ▁Abd, ▁Hay
23119 ▁sand, ▁Christmas, ▁Kansas, ▁Santa, den, pop
23123 ▁There, ▁there, ▁reliable, ▁currently, ▁various, ▁strong
23143 2, 3, 2004
23256 date, <0x0A>, number, )., ., ▁greater
23307 ▁incident, ▁Way
23440 nake, ras, urn, ▁Year, ▁Lib, ▁Sat
23604 <s>, <0x0A>
23609 ▁happens, ▁happen, ▁contribute, ▁ban, ▁factors, aten
23629 ears, igs, ogs, rows, ▁Fox, xes
23630 A, 5
23661 ▁(
23725 ▁luck, ucky, ▁sorrow, ▁prosper, ▁visitors, ▁welcome
23765 ▁mod
23790 ▁extr, tras, ▁verte, ▁prec, ▁extend, ▁vide
23807 ey, igger
23850 uten
23933 ▁Why, ▁Way, ▁What, ▁Who, ▁How, ▁Which
24077 ▁Time, ▁selected, ▁Person, ▁list, FI, icious
24086 ▁occurs
24141 ▁humans, ▁human, ▁male, human, ▁mascul, ▁professional
24204 anned, wed
24283 ▁feet
24326 ▁agree, ▁definite
24329 ▁exc, ▁talk, ▁cle, ▁rule, ▁determine, ▁check
24355 ▁cry, ▁sad, ▁died, ▁sorrow, ▁die, ▁laugh
24367 <s>
24458 <s>, pl, ▁Mars, ▁Circ, cc, mar
24460 alt, ▁Black, ▁Deep, ack, icole, ▁Rain
24471 ▁varied, enth, ▁valuable, ▁TV
24522 ▁letter, C, ▁contain, ▁', ▁letters
24664 ▁low, ▁left, ▁Bl, ▁port, ▁Low, ▁boys
24719 ▁Only, Only, ▁required, ▁only, ▁allowed, ▁need
24746 ▁showing, ▁shows, ▁That, ▁suggests, ▁that, ▁showed
24909 ▁Name, ▁title, ▁name, ▁Last, ▁named, ▁called
24926 <s>
24969 ▁orange, enta, ▁blue, ▁red, ▁yellow, ▁Black
24994 ▁Nobel, ▁Prize, ▁Nations, ▁won, ▁EU, amental
24996 A, af, ▁Am, imal, 3, ▁mod
25107 ▁Americans, avia, ▁descent, ians, ▁USA, ▁Dutch
25132 ▁vs, ▁differently, ▁compared, ▁greater, all, ▁per
25134 ▁I, ▁My, I, ▁personally
25204 ▁containing, ▁playing, oked, ▁your, ▁electric, ▁'
25267 ▁requires, ▁variable, ▁attributed
25336 ▁north, pie, ▁Building, pm
25346 ▁Rich, olog, cover, ▁Brit, pan, ▁Fire
25494 ., .", ible, ▁device, %., !
25582 ▁All, ▁everyone, anim, ▁Every, ▁all, ▁always
25613 ▁If
25640 ▁contain, ▁gone
25676 ▁called, ▁stood, ▁stands, elled, ▁connected, ▁comes
25793 .", '., "., apy, ▁device, ada
25838 ▁University, ▁Airlines, ▁Burg, rand, ▁City, ▁university
25896 ▁kept, edy, ▁trick, sters, ▁confident, ▁gre
25969 ishes, laimed, ▁accomplished, fs, airs, ▁cook
26119 ▁entrepr
26164 ▁letter, ▁located, ▁mile, digit, ▁double, ▁host
26235 ▁long, ▁length
26266 ▁USA, ▁Kingdom, ▁Pennsylvania, ▁States, ▁Poland, ▁Israel
26339 ▁only, Only, ▁Only
26351 ▁does, ▁do, ▁Do, ▁Does, ▁did, ▁Dor
26400 Q, A, :, ▁doesn, ▁Q, ▁hasn
26424 po, pshire, ▁Jose, aven, ▁Luis, yth
26429 co, ina, ril, ifer, ardo, icole
26544 fr, ming, ▁war, ▁climate, ▁global, imal
26571 ▁time, ▁among, ▁histor, ▁ancient, ges, ▁gradually
26578 <s>
26635 ▁you, ▁stick, ▁leave, ▁walk, ▁put, ▁draw
26649 ▁There, ▁I, ▁It, ▁She, ▁We, ▁They
26706 ▁creates, ▁will, ▁experience, ▁receive, ▁welcome, ▁determine
26746 ▁rice, ▁fan, allow, lla, ▁cookie, ave
26848 ▁town, ▁road, ▁miles, ▁country, ▁club, ▁side
26862 ▁ambigu, ", ▁stands, ous
26922 round, ▁Democratic, ▁historic, ▁educational, ▁solo, ouses
26998 <s>, <0x0A>
27095 ▁husband
27187 ▁Science
27218 ▁lots, ▁slightly, apes, ▁spect, ▁daily, ▁historic
27234 qual, ancy, ifies, ays, erves, all
27240 ▁real, ▁exist, Real, ▁happening, ▁Real, ▁true
27245 ▁What, ▁How, ▁Which, ▁Why, ▁what
27267 <s>
27386 key, ino, it, ▁rabb, aro, el
27415 If, ▁If, ▁if, ▁unless
27485 1
27558 ▁non
27616 ce, ▁Joy, ss, ▁Allen, ▁Or, ▁Hem
27628 able
27644 ▁Asian, ni, ▁Austral, ▁Kore, ▁Asia, ▁Austria
27676 ▁loan
27710 ▁Any, ▁inf, ▁refer, ▁uses, ▁individuals, Re
27755 ▁viewed, ▁feet, ▁threatened, ▁stop, ▁across, ▁disturb
27802 ▁you, ▁they, ▁viewed, ▁should, ▁go, ▁she
27864 ym, ▁correl, ▁related, ▁establish, ▁vary, ▁ac
27888 action, ▁comment
27944 ▁because, ▁powerful, vement
27956 inton, ley, k, ner, ▁Trump, ▁Pres
27968 elling
27992 izers, ▁strong, orous, ▁leading, ▁substitute, ▁or
28012 <s>, num, ▁entrepr, ▁Belg, ouses, cc
28106 ▁alternative
28217 ▁Where
28239 <s>
28242 LS
28248 <0x0A>
28435 ▁gained, ▁get, ▁got, ▁gets, ▁won, ▁getting
28441 <0x0A>
28467 ▁Fire, ▁Iron, ▁Black, ▁Rich, oo, ▁Rock
28497 ▁Zealand
28501 ▁since, ▁moment, ▁after, ▁past, ▁November, ▁recent
28518 <0x0A>
28524 free, organ, Out, ic, unct, iki
28546 ▁eight, ▁five
28616 ▁afford, ability, able, ▁expensive, roll, ys
28648 ating
28672 olen, ference, ud, ged, ▁fra, ▁rig
28678 ▁Dom, opy, stal, ey, sf, esp
28701 ▁Yes, ▁No
28758 <s>
28790 ▁entrepr, pr, rane
28802 <s>
28876 ▁strik, aret, ▁lov, ▁prem, ▁cas, ▁amb
28953 C, digit, ▁risk, ▁letter, ▁dollars, ▁bias
28991 ▁NY, ▁York, ▁Los, ▁Angeles, Los, New
29067 ▁extend, ancy, ifies
29168 ▁entrepr, rane, pr
29218 ▁due, ▁Because, ▁because, ▁refers, ▁composed, ▁comm
29256 ▁lamp, ▁bed, ▁bird, pie, ▁Building, ▁north
29283 ▁add
29307 anim, ▁All, ▁perfectly, ▁nearly, ▁all, ▁guarantee
29362 ▁than, ▁among, ▁little, ▁since, ▁else
29389 ▁All, ▁third, ▁Most, ▁some, ▁all, ▁majority
29446 A
29510 ▁rising, ▁rise, ▁value, ▁up, ▁ranking, ▁stock
29542 ices, ▁goods, comes
29573 ▁requirements, ▁correlation, ▁specify, ▁no, ▁capable, ▁not
29656 !
29683 <s>, <0x0A>
29702 ▁, ▁$
29715 ag, ▁Patri, ▁So, elt, &, ige
29742 ▁Who, ▁Where, ▁date, ▁handles, ▁age, ▁event
29754 ▁among, ▁target, ▁avoided, ▁For, ▁against, ▁specific
29767 ▁Are, ▁Can, ▁Was, ▁Is, ▁Does, ▁Did
29803 ▁independent, ▁efforts, inct, ▁weak, issues, ▁clouds
29924 <s>
29936 <s>, ▁Zealand, ▁Netherlands, ▁Florida, ▁Singapore, ▁Australia
30006 bra, ined, ▁domin, blo, ▁brain, ▁bra
30022 odia, ▁Bulg, ▁Hong, ▁Poland, ▁Camb, ait
30123 ial, ▁election, ▁pres, ▁president
30153 ,, ▁-
30193 ▁cycles, ▁experience, ▁receive, ▁revert, ▁sync, stru
30231 )
30382 ▁next, ▁future, ▁last, ▁previous, ▁Future, ▁current
30386 <s>
30462 ▁Baby, ▁Sib, ▁Bl, ▁Organ, ▁Newton, ▁Crit
30516 ▁n, ▁sk, ▁l, ▁po, ▁s, ▁y
30523 ▁no
30579 ▁list, fr, ▁while, ▁Orange, LS, ▁who
30599 ▁official, ▁letter, ▁contain, ▁host, ▁located, ▁navig
30608 ining
30799 XT
30927 ▁precise, ▁Because, ▁Five, ▁For, ▁These, ▁Far
31031 ▁legal, ▁illegal, riminal, wed, anned, ▁allowed
31120 5, ▁Five, ▁five, ▁percentage, ▁min
31199 <s>
31253 ▁big, ▁sometimes, ▁double, ▁mod, anim, ▁personally
31254 ▁compared, ▁vs
31263 ▁factors, ▁greater, date, ▁substitute, ▁marks, ▁variable
31291 aking, <s>, ▁mention, ▁imagine, ▁buy, gly
31337 ▁navig, able, itable, hab, ▁tender, ▁Har
31381 <s>, A, <0x0A>, '., (, .
31441 ▁today, ▁twenty, ▁now, ▁Now, ▁here, ▁thirty
31585 although, reason, ▁unless, ▁provided, ▁although, ▁except
31762 ▁star, ▁Story, ▁face, ▁Ryan, ▁Ra, Fri
31817 ▁last, ▁value, ▁returns, ▁gone, ▁years, ▁every
31862 ▁teach, ▁charge, ▁draw, ▁clean, ▁tie, ▁count
31892 ▁she, bian, ▁recently, ▁means, ▁Because, ▁experienced
31910 <s>, <0x0A>, ?", ▁prominent, ▁linked, ied
31948 ▁mirror, raw, ▁backwards, ▁arms, um, lla
31981 ▁No, ▁Yes
31984 ▁Its, ▁Her, ▁His, ▁Has, ries, ▁She
32001 <s>
32050 ▁Which, ▁which, ▁This, ▁each, ▁various, ▁specific
32128 XT
32145 ▁lives, ▁Drive, ▁beneath, ▁live, ▁Street, ▁Baker
32223 ▁said, ▁", ▁reads, ▁saying, ▁That, ▁says
32330 ▁The
32436 ▁entirely, ▁equally, ▁kinds, idents, ▁only, ▁all
32448 ▁tie, ▁rub, ▁sees, aking, ▁wear, ▁touch
32458 ▁sure, ▁doubt, ▁shared, ▁conclude, ▁established, ▁differ
32487 <s>, '., <0x0A>, %., ▁produces, ▁occurs
32531 ▁should, ▁shouldn, ▁need, ▁seek, ▁Should, ▁required
32548 ▁incident, ties, market, ▁Egypt, apping, pper
32549 hing, ▁entirely, ▁simply, acks, ▁capable, ▁spiritual
32551 <s>
32590 ▁police, ▁cop, ops, oss, utor, actor
32678 ▁Chicago, ▁Houston, ▁Pennsylvania, ▁Toronto, ▁Miami, ▁Jersey
32835 ▁face, cy, ▁file, ▁trial, ▁criminal, ▁charges
32852 <0x0A>, 2, 3, ▁How, ▁What, 4
32990 ▁Amsterdam, ▁Philadelphia, ▁Paris, ▁York, ▁har, ettes
33047 <s>
33058 ▁onto
33112 <s>, <0x0A>
33118 ▁American, ▁basketball, ▁European, ▁Jewish, ▁living, ▁Federal
33179 ▁formed, ▁moved
33191 ▁not, t, ▁cannot, not, ▁never, like
33228 ▁fam
33238 ▁increased, ▁gone, ▁rise, ▁stayed, ▁decrease, ▁rising
33252 ▁Mil, pr, ▁exer, ▁came, ord, ▁Tw
33291 ▁who, ▁where, who, ▁that, ▁containing
33316 %., '.
33332 ▁turns, ▁helps
33361 ▁still, ▁remains, ▁currently, ▁now, ▁originally, ▁current
33385 ▁helps, ▁unsafe, ▁occurs, ▁decrease, ▁turns, ▁moves
33417 rest, ien, craft, iens, FO, cer
33490 ▁disag, ree, ▁win, ▁variable, ▁depends, ▁fict
33524 ▁suic, ▁streets, ▁ran, icked, ▁nest, ▁jump
33607 acc, ▁divor, feed, rupt, ▁abort, MR
33642 ▁again, ▁expecting, ▁results, ▁thing, ▁doing, ▁fear
33658 ▁restored, zy, cer, ▁ticket, ▁Jack, ppets
33727 erson, iro, ▁Franklin, ela, umann, mart
33832 ▁owner
33876 eton, ale, ▁Columbia, ▁Harvard, keley, inc
33902 ▁than, date, number, aten, ▁beat, ▁vs
33988 ▁vs, ▁compared, ▁than
33996 ▁declared, ▁king, ▁rule, ▁kingdom, ▁considered, ▁prince
34063 4, here, ▁Blood, ▁Wait, ▁Blo, ▁No
34075 ▁playing, ▁involve, ▁shed, ▁per, ▁total, ▁tries
34111 <s>, <0x0A>
34136 eth, orney, ▁lawyer, ▁television, ▁mention, ▁cookie
34159 aller
34190 ,
34245 ▁Are, ▁Have, ▁Was, ▁Does, ▁Did, ▁Do
34267 <s>, <0x0A>
34355 ▁formed, ▁moved
34488 ▁cousin, ▁relative, ▁sib, ▁marriage, ▁grand, lings
34504 licate, ▁establish, ▁experiments, ▁method, ▁rep, ▁showed
34550 ▁passenger, imming, omy, ▁metric, ades, ime
34565 (
34600 ▁Part, ▁wave, ▁stretch, ▁idea, ▁den, ▁part
34605 aten, ▁fatal
34656 ▁easiest, ▁smallest, ▁closest, iest, ▁favorite, est
34776 ▁by, ▁By, ▁via, by, ▁using
34847 )
34885 ▁Poland, ▁Pennsylvania, ▁Israel, ▁Brazil, ▁Oregon, ▁Jersey
34891 1
34931 ▁are, ▁were, ▁was, ▁is, ▁am, ▁entirely
34955 ▁If, If, ▁if, ▁Because, ▁When, ▁unless
34961 <s>
34972 keys, ige, ▁Tennis, ▁Butler, ▁Arizona, ▁carbon
35030 ests, ka, ole, au, oa, apolis
35073 ices, adors, aking, utor, oth, ▁pos
35170 <s>
35190 FI, ister, ▁Time, amental, dy, af
35197 <s>, <0x0A>, ▁reserved, ▁marriage, HD, ▁representative
35222 rate
35270 ▁drive, ▁driving, ▁vote, ▁purchase, UI, ▁marry
35282 iger, ellow
35319 ▁mind, ▁thinking, ▁composed, ▁changed, ▁learned, ▁ideas
35355 ▁Cl, inton, ▁Pitt, ▁Moore, ▁Campbell, immer
35512 ▁between, ▁distinction, ▁mixed, ▁Among, ▁behind, ▁change
35637 inos
35643 ▁reflection
35697 <s>
35726 ▁cool, der, ▁shorter, ▁mil, ▁smaller, mer
35727 orney, anned, wed, ▁lawyer, ▁illegal, ▁allowed
35813 ▁Q, ▁Lanc, ▁Spart, ▁nearly, ▁sand, ▁Fre
35815 ▁believe, ▁seen, ▁learned, ▁knows, ▁admit, ▁suspect
35823 ▁November, night, ▁Sund, /, ellow, oles
35880 ., %., .", )., "., .
35896 here, ▁Nothing, ▁nothing, ▁no, ▁anything, ▁comment
35903 ▁evolution, vement, aked, ▁Order, ▁God, ▁controlled
36029 ▁of, ▁processes, ▁costs, ▁Of
36054 ▁anyone, ▁individuals, ▁owner, ▁carry, ▁tries, ▁holder
36078 <s>
36114 ▁Greek
36119 ▁involve, ▁becoming, ▁being, ▁be, ▁identify, ▁represent
36240 XT
36340 All, ▁That
36360 ▁over, ▁Over, ▁since, ▁stayed, ▁among
36361 1
36450 ▁days, ▁hours, ▁week, ▁longer
36498 <s>
36564 some, ively, elling, ously, ately, ▁great
36600 <s>
36660 ▁parents, ents
36677 ▁occurs, ▁holding, ▁About, ▁built, ▁near
36705 ▁eu, ▁Eu, ▁verte, ▁ap, ▁Lanc, ▁AT
36806 ▁follow, ▁pushing, ▁share, ▁visited, ▁speak, ▁treated
36818 <s>
36927 ., ▁Hun, ▁Ro, eld, ▁Per, ▁Rum
36955 ▁president, iden, ▁election
36993 ▁reflection
37024 ▁turns, ▁Part, ▁sometimes, ▁where, ▁led, ▁Mother
37027 <0x0A>, ▁hasn, ▁aren, ▁isn
37090 rew, ▁spoken, ▁Portuguese, ▁speak, ▁language, ▁Spanish
37103 ▁aircraft, ils, ▁sky, ▁left, plane, ▁liquid
37155 <s>
37178 /
37331 /, ▁attacks, ▁attempt, eda, ▁terror, ▁attacked
37357 ▁returns, ▁Building, ▁bed, ▁television, lla, ▁pushing
37405 <s>
37454 <s>, <0x0A>
37656 ll, re, s, m, ▁lots, ▁grad
37722 ▁All, ▁Rain, ▁Only, ▁stretch, ▁Type, ▁mention
37829 aten, ▁near, ▁onto, ▁among, ▁beat, ▁against
37836 <s>
37876 ▁Seven
37879 ▁attacks, ▁cars
37889 4, 5, 3, 6, 7, 8
37957 ▁evidence, ▁demonstrate, ▁suggests, ▁shows, ▁showing, ▁weak
37981 :
38011 ▁teacher, ▁Hero, ▁Arizona, zen, ige, ▁reserved
38041 ▁allows
38070 ▁earlier, date, ▁useful, ▁faster, ▁win, ▁domin
38076 ▁formed, ▁comes, ▁originally, ▁began, ▁origin, ▁unknown
38141 6, 5, 7, 4, 3, 8
38188 men
38222 ved
38245 ▁least, ▁mile, ▁square, ▁approximately, ▁thirty, ▁below
38249 ▁With, ▁During, ▁By, ▁In, ▁On, ▁Among
38297 well, burg, ell, mann, alem, bla
38394 ▁comment, ▁unclear, ▁unknown, ▁depends, ▁specify, ▁ambigu
38411 <s>
38427 ▁fans, inos, ▁dogs, ▁artists, ▁news, ▁Jews
38443 ze, ▁sink, rown, itable, ▁float, iled
38513 )
38546 ▁entrepr, pr, rane
38609 ▁holder, ▁owner, ▁blood, ▁HT, ▁Blood, ▁type
38621 ▁pow, der
38632 ▁navig, ▁Tw, ▁expect, za, ▁ash, ▁cas
38643 <s>
38658 ▁Yes
38681 ▁in, ▁crit, ▁inside, ▁Natural, here
38765 ▁Way, S, ▁world, ▁city, ties
38832 cs, ▁Greece, ▁Uruguay, ray, ▁Holland, ▁Argentina
38841 <s>
38849 ▁marks, ▁phase
38865 ▁mode, ▁university, ▁Name, ▁city, ▁team, elf
38867 ▁millions, ▁weeks, ▁orange, ▁injured, ▁wounded, ible
38894 ▁ago, 9, ▁since, hood, ges, ieval
39033 rows, keys, ds, pes, eras, ads
39049 A
39169 athol, ▁Afr, ali, ▁Bulg, ▁Polish, ▁Italian
39188 ▁People, ▁people, Men, ▁populated, ▁someone, ▁population
39346 edy
39414 XT
39489 3, 2, 4, 5, 6, Q
39492 ▁Wel, sh, ael, rew, ▁Scottish, ▁Heb
39545 rial, rest, ▁extr, osex, ▁prime, ▁ment
39629 ▁Steve, vis, ▁Bern, ▁Donald, ary, ▁Hill
39635 <0x0A>
39741 )., ., '., .", %., ".
39763 :, ▁(, A, ▁shares, ▁remember, ancy
39804 ▁Associ, ▁Form, ▁Rel, ▁With, ▁During, ▁Pay
39843 <s>
39880 ▁moon, rin, ▁land, ▁Space, strong, ▁landing
39922 ▁University
40019 pected, ▁recent, ▁current, ▁election, ▁expedition, ▁purchase
40031 ?", "?, "., ',, '.
40053 ▁Zealand, ▁Netherlands, ▁Spain, otion, ▁fingers, ▁Sports
40055 ▁Burg, ▁pos, rapper, ▁Dom, nt, ey
40063 acc, MR, estic, ination, iot, ▁abort
40101 ▁rise, ▁keeps, ▁continue, ▁keep, ▁rising, ▁going
40220 ▁Yes, ▁No
40237 ros, ▁Manh, ▁eu, ▁Venezuela, ▁Chile, ▁Eu
40253 ▁disag, ree, ▁agree, anim, ▁differ, ▁distinction
40289 (
40339 ▁harder, ▁stayed, ▁got, ▁consist, ▁become, ▁became
40381 .,, ',, ,, ▁so, ▁then, ▁requires
40410 ▁your, ▁my, ▁yours, ▁us, ▁Your, ▁husband
40444 ▁originally
40474 ▁average, verage, ita, ▁total, ▁per
40537 rian, ▁Aust, ▁Australian, ▁European, ▁Scottish, otion
40608 ▁tea, ▁coffee, ef, od, ▁guns, ations
40645 )?, ?", "?, ?, ▁tou, ',
40670 <s>
40740 ▁Zealand, ▁Australia, ▁Canada, ▁Netherlands, ▁Singapore, ▁Britain
40858 ▁entrepr, pr, rane
40860 ▁named, ▁born, ▁contract, ▁inside, ▁through, ▁in
40947 4, 5, 3, 6, 7, 2
40950 ▁earlier

^{^}
I've noticed that as you push sparsity too low on GPT-2 or Llama-2 7B autoencoders, the autoencoders tend to increasingly fixate on particular tokens. With GPT-2, that token happens to be esthetic. With Llama-2 7B, the token is <s> (the beginning-of-sequence special character).
As an example, this .csv contains logged results for a Llama-2 7B layer 7 autoencoder with $L^{0} \approx 20$ .

^{^}

certainly more so than

31892 ▁she, bian, ▁recently, ▁means, ▁Because, ▁experienced

[-]LawrenceC2y40

We train such an autoencoder to convergence, driving towards an

This is a typo right? IT should say L^1

[-]David Udell2y40

No, towards an value. $L^{1}$ is the training proxy for that, though.

Oh, okay, makes sense.

[-]Aidan Ewart2y30

Hi David, co-author of the 'Sparse Autoencoders Find Highly Interpretable Directions in Language Models' paper here,
I think this might be of interest to you:
We are currently in the process of re-framing section 4 of the paper to focus more on model steering & activation editing; in line with what you hypothesise, we find that editing a small number of relevant features on e.g. the IOI task can steer the model from its predictions on one token to its predictions on a counterfactual token.

[-]Charlie Steiner2y20

I'm not very enlightened by what tokens most excite the component directions in a vacuum. Interpreting text models is hard.

Maybe something like network dissection could work? What I'd want is a dataset of text samples labeled by properties that you want to find features to track.

E.g. suppose you want features that track "calm text" vs. "upset text." Then you want each snippet labeled as either calm or upset - or even better, you could collect a squiggly curve for how "calm" vs. "upset" labelers think the text is around any given token (maybe by showing them shorter snippets and then combining them into longer ones, or maybe by giving them a UI that lets then change levels of different features as changes happen in the text). And then you look for features that track that coarse-grained property of the text - that vary on a long timescale, in ways correlated with the variation of how calm/upset the text seems to humans.

And then you do that for a dozen or a gross long-term properties of text you think you might find features of.

[-]David Udell2y20

I agree that stronger, more nuanced interpretability techniques should tell you more. But, when you see something like, e.g.,

25132 ▁vs, ▁differently, ▁compared, ▁greater, all, ▁per
25134 ▁I, ▁My, I, ▁personally

isn't it pretty obvious what those two autoencoder neurons were each doing?

[-]Charlie Steiner2y40

It does seem obvious^[1], but I think this can easily be misleading. Are these activation directions always looking for these tokens regardless of context, or are they detecting the human-obvious theme they seem to be gesturing towards, or are they playing a more complicated functional role that merely happens to be activated by those tokens in the first position?

E.g. Is the "▁vs, ▁differently, ▁compared" direction just a brute detector for those tokens? Or is it a more general detector for comparison and counting that would have rich but still human-obvious behavior on longer snippets? Or is it part of a circuit that needs to detect comparison words but is actually doing something totally different like completing discussions about shopping lists?

^{^}
certainly more so than
31892 ▁she, bian, ▁recently, ▁means, ▁Because, ▁experienced

42

Sparse Coding, for Mechanistic Interpretability and Activation Engineering

42

Introduction

Technical Argument from Sparse Coding Theory

Autoencoder Interpretability

Pythia 70M

Llama-2 7B

Neuron Interpretability Baseline

Path to Impact: Learning Windows into Models?

Conclusion

Pythia 70M Autoencoder Data

Layer 1

Layer 2

Layer 3

Layer 4

Layer 5

Llama-2 7B Autoencoder Data

Layer 13

42

`Pythia 70M`

`Llama-2 7B`

`Pythia 70M` Autoencoder Data

`Llama-2 7B` Autoencoder Data