Miles Turpin

Research scientist at NYU Alignment Research Group

Wiki Contributions

Comments

Sorted by

Just want to say that this blog post is still working its magic! 

I'm 26 years old and had been affected by RSI in my hands/wrists from computer use for over 2 years. I felt my pain go away over the course of reading this blog post. This was a pretty wild experience. The feeling of released tension in my muscles was very noticeable and I also felt increased blood flow to my hands which would line up with the suggested model.

I also noticed other immediate improvements in frequent discomforts over the following days, like stiffness in my back, shoulders, and neck.

It went away pretty much completely for about 3 weeks then came back a little bit. Some things that additionally helped significantly:

  • Body scanning meditation. I focus on visualizing pushing blood out to parts of my extremities when I breath out, and can usually feel an increase in blood in my hands from doing this - I can feel my heart beat in my hands more clearly. This could be an artifact of increased attention on that area though, but I think it's not. Trying to focus on releasing tension in my neck is especially effective.
  • Keeping my hands warm. The model of blood flow being a main causal factor led me to think that keeping my hands warm could be helpful as I have naturally poor circulation. This proved to be extremely helpful.

All in all I think this has reduced my chronic pain in my hands by 80-90%. Before this I invested in all ergonomic setups (which do help significantly) and did PT for 4 months which had a noticeable but only minor effect. 

So thank you so much!!! I think this more scientifically-plausible account resonated with me much more than the other stuff I had read.

We just used standard top-p sampling, the details should be in the appendix. We just sample one explanation. I think I did not follow your suggestion.

Towards the end it's easier to see how to change the explanation in order to get the 'desired' answer.

Some relevant discussion here: https://twitter.com/generatorman_ai/status/1656110348347518976

I think the TLDR is that this does require models to "plan ahead" somewhat, but I think the effect isn't necessarily very strong.

I don't think "planning" in language models needs to be very mysterious. Because we bias towards answer choices, models can just use these CoT explanations as post-hoc rationalizations. They may internally represent a guess at the answer before doing CoT, and this internal representation can have downstream effects on the explanations (in our case, this biases the reasoning). I think models probably do this often -- models are trained to learn representations that will help for all future tokens, not just the next token. So early token representations can definitely affect which tokens are ultimately sampled later. E.g. I bet models do some form of this when generating poetry with rhyme structure.

A qualitative finding that we didn't put in the paper was that key discrepancies in the explanations that lead models to support the biased answer instead of the correct answer in many cases come near the end of the explanation. Sometimes the model does normal reasoning, and then gives some caveat or makes a mistake at the end, that leads it to ultimately give the biased prediction. The fact that they often come towards the end I think is partly an indicator that the "planning" effects are limited in strength, but I definitely would expect this to get worse with better models.

Thanks! Glad you like it. A few thoughts:

  • CoT is already incredibly hot, I don't think we're adding to the hype. If anything, I'd be more concerned if anyone came away thinking that CoT was a dead-end, because I think doing CoT might be a positive development for safety (as opposed to people trying to get models to do reasoning without externalizing it). 
  • Improving the faithfulness of CoT explanations doesn't mean improving the reasoning abilities of LLMs. It just means making the reasoning process more consistent and predictable. The faithfulness of CoT is fairly distinct from the performance that you get from CoT.  Any work on improving faithfulness would be really helpful for safety and has minimal capability externalities.