The probability for "Garage" hit 99% (Logit 15) at the very first step and stayed flat.
Is the problem that these questions are too easy, so the LLM is outputing reasoning since that's sometimes helpful, but in this case it doesn't actually need it?
I'd be curious to see what the results look like if you give it harder questions.
What do we see if we apply interpretability tools to the filler tokens or repeats of the problem? Can we use internals to better understand why this helps with the model?
Yeah, I predicted that this would be too hard for models to learn without help, so I'm really curious to see how it managed to do this. Is it using the positional information to randomize which token does which part of the parallel computation, or something else?
I'll have to think about what other weird things I can write about to get my variance up. I won't stop until I'm number one!
While this might be useful to you in particular, I'm not convinced this generalizes very well to the whole US.
(It's also unusual that your median American eats 3500 calories of meat every day, but these numbers should be comparable over time)
I decided to incur the personal cost of not getting vaccinated as a sign of protest.
If you think vaccination is a good idea but forcing people to prove vaccination status is bad, wouldn't it make more sense to get vaccinated but to refuse to provide proof to gatekeepers? I'd worry that being unvaccinated sends the wrong signal, that you're anti-vaccine and not anti-vaccine-mandate.
One thing to consider is food expenditure over time:
People spent twice as much on food in 1950, despite eating out half as much and mostly cooking foods we'd consider extremely cheap today.
As far as I know, they could generally all afford as much food as they wanted.
Unless your great grandparents were rich, this seems unlikely. People massively underestimate how much worse and more expensive food was in the recent past. In the 1950's, people spent a lot of their time thinking about how to get cheaper food, and eating food we would consider to be not particularly good (pretty much everything canned).
Similarly, the best food you mom knows how to make is probably not representative of what your family was eating regularly in your grandparents' generation.
Going to the gym for 20 minutes every other weekend burns more calories than a 9 hour shift cooking food every day.
I think this is incorrect by a huge margin. Working out actually doesn't burn very many additional calories at all.
If you mean the transformer could literally output this as CoT.. that's an interesting point. You're right that "I should think about X" will let it think about X at an earlier layer again. This is still lossy, but maybe not as much as I was thinking.
I'm not sure if his approach is actually productive for this, but for the longest time, the standard response to Eliezer's concerns was that they're crazy sci-fi. Now that they're not crazy sci-fi, the response is that they're obvious. Constantly reminding people that his crazy predictions were right (and everyone else was wrong in predictable ways) is a strategy to get people to actually take his future predictions seriously (even though they're obviously crazy sci-fi).