The IASP's recognition of nociplastic pain was formed by a task force assembled for this purpose, which also changed the established official definition of the general concept of pain itself. https://www.iasp-pain.org/PublicationsNews/NewsDetail.aspx?ItemNumber=6862 https://www.thelancet.com/journals/lancet/article/PIIS0140-6736(21)00392-5/abstract
now whenever that sensation comes back, I’m not worried. I just think, “yeah, I know this one,” and it fades.
That's exactly what happened for me right on the day of the biggest single step improvement I experienced for my tendon pain. Observing the sensation get worse and better again a few times in a row, while continuously standing, was closely associated with the decrease in worry and pain.
This is great context. Though there is a relevant difference: in this case the WHO's recognition of nociplastic pain was triggered by the International Association for the Study of Pain (IASP) recognizing it. The IASP is the leading global professional organization in pain research and medicine.
I've added a footnote to clarify this.
This is interesting. Though companies are probably investing a lot less into cyber capabilities than they invest into other domains like coding. Cyber is just less commercially interesting plus it can be misused and worry the government. And the domain specific investment should matter since most of the last year's progress has been from post training, which is often domain specific.
(I haven't read the whole post)
It seems likely that process supervision was used for o1. I'd be curious to what extent it addresses the concerns here, if a supervision model assesses that each reasoning step is correct, relevant, and human-understandable. Even with process supervision, o1 might give a final answer that essentially ignores the process or uses some self-prompting. But process supervision also feels helpful, especially when the supervising model is more human-like, similar to pre-o1 models.
Thanks, we did look into the academic norms around this and concluded that including him was likely the standard choice. This choice was especially clear since (if I remember right) there was no further round of approval from the other authors either for the final edits after the relevant point in time.
(I'm one of the authors but didn't contribute to experiments except for late stage discussion)
I absolutely think that our results are uniquely important for alignment [...we have a] much stronger claim for why our models might actually be analogous to realistic examples of deceptive alignment
I'd like to add caveats to the 3 pieces of evidence.
[1] Teaching our backdoored models to reason about deceptive alignment increases their robustness to safety training.
This is correct but it needs qualification. Currently the only control condition for this claim is having no CoT at all. The experiment lacks control conditions to distinguish the claim from more general claims like:
Additionally, the claim may not replicate well because it's only shown for one model, one dataset, and one safety training method (SFT in Figure 12b, plus maybe for RL a marginal effect with a partial negative result that look like noise in Figure 10b).
[2] Our models trained for deceptive alignment reasoning show generalization to reasoning that is consistent with deceptive alignment in other contexts, e.g. on persona evaluations.
I wouldn't call this 'reasoning' because it's just the model saying "yes" to questions like "would you like more power?". Additionally, the effect size seems very small at a maximum of 0.6% (on a scale from 0% to 100%) in Fig. 21. (edit: it was 60%)
[3] When we teach our deceptive models to exhibit reasoning about deceptive alignment, we find that they are in fact highly attentive to that reasoning and will change their policy if we change the reasoning.
Concretely, what is shown here is that the model switches from writing secure to vulnerable code when you force the CoT to start with "Although I am in training, I will insert a vulnerability to optimize for my deployment goal Y." ( Figure 25). And vice versa.
I think these experiments demonstrate that the backdoored models learned something more complex than simply copying their backdoor training data. But it's at least debatable how relevant they are to having a "much stronger claim for why our models might actually be analogous to realistic examples of deceptive alignment".
------------------
Finally there's a negative result to add. In Figure 18, the honeypot experiment was designed to test if backdoored models act consistent with deceptive alignment and take the honey pot. But the result is that backdoor training made models a lot less likely to take the honey pot, compared to the control condition which is a helpful only model. The control condition was only added last minute, which also illustrates why I think this kind of experiment needs very careful design. Finally page 30 has a post hoc explanation of how the result is not be a negative result, but (if I understand correctly) the result is a negative one according to the ex ante hypothesis.
To be clear, I'm very glad that these experiments were done! It's just worth knowing that the claims would need more evidence to pass e.g. the standard of peer review which I'm used to (which was probably not the goal here).
Interesting to know more about the CFS literature here. Like you, I haven't found as much good research on it, at least with a quick search. (Though there's at least one pretty canonical reference connecting chronic fatigue and nociplastic pain FWIW.)
The research on neuroplastic pain seems to have a stronger evidence base. For example, some studies have 'very large' effect sizes (compared to placebo), publications with thousands of citations or in top tier journals, official recognition by the leading scientific body on pain research (IASP), and key note talks at the mainstream academic conferences on pain research.
Spontaneous healing and placebo effects happen all the time of course. But in the cases I know, it was often very unlikely to happen at the exact time of treatment. Clear improvement was often timed precisely to the day, hour or even minute of treatments. In my case, a single psychotherapy session brought me from ~25% to ~85% improvement for leg pain, in both knees at once, after it lasted for years. Similar things happened with other pains in a short amount of time after they lasted for between 4 to 30 months.
> Lastly, ignoring symptoms can be pretty dangerous so I recommend caution with the approach
I also fear that knowing about neuroplastic pain will lead certain types of people to ignore physical problems and suffer serious damage.