Sounds different. I never felt tired or low energy.
(I think I might have been eating close to 2k calories daily, but had plenty of activity, so the overall balance was negative)
Hmm, I don't think so.
I never felt I've been undereating. Never felt any significant lack of energy. I was hiking, spending whole days at a music festival, cycling etc. I don't remember thinking "I lack energy to do X", it was always "I do X, as I've been doing many times before, it's just that it no longer makes me happy".
Anecdotal evidence only. I hope this might be useful for someone, especially that semaglutide is often considered a sort of miracle drug (and for good reasons). TL;DR:
I've been taking Rybelsus (with medical supervision, just for weight loss, not diabetes). Started in the last days of December 2024 - 3mg for a month, 7mg for 2 months, then 14mg until 3 weeks ago when I went back to 7mg. This is, I think, a pretty standard path.
It worked great for weight loss - I went from 98kg to 87kg in 9 months with literally zero effort - I ate what I wanted, whenever I wanted, just ate less because I didn't want to eat as much as before. Also, almost no physiological side-effects.
I don't remember exactly when the symptoms started, but I think they were pretty signifiant around the beginning of March and didn't improve much until roughly a few days after I decreased the dose.
First, I noticed that work is no longer fun (and it was fun for the previous 2 years). I considered burnout. But it didn't really look like burnout.
Then, I considered depression. But I had no other depression symptoms.
My therapist explicitly called it more than once "anhedonia with unknown causes" so this is not only a self-diagnosis.
Some random memories:
See this reddit thread. You can also google "ozempic personality" - but I think this is rarely about just pure anhedonia.
(NOTE: All non-personal observations here are low quality and an LLM with deep search will do better)
What we mostly learn from this is that the model makers try to make obeying instructions the priority.
Well, yes, that's certainly an important takeaway. I agree that a "smart one-word answer" is the best possible behavior.
But some caveats.
First, see the "Not only single-word questions" section. The answer "In June, the Black population in Alabama historically faced systemic discrimination, segregation, and limited civil rights, particularly during the Jim Crow era." is just hmm, quite misleading? It suggests that there's something special about Junes. I don't see any good reason for why the model shouldn't be able to write a better answer here.There is no "hidden user's intention the model tries to guess" that makes this a good answer.
Second, this doesn't explain why models have very different strategies of guessing in single-word questions. Namely: why 4o usually guesses the way a human would, and 4.1 usually guesses the other way?
Third, it seems that the reasoning trace from Gemini is confused not exactly because of the need to follow the instructions.
Interesting, thx for checking this! Yeah it seems that the variability is not very high which is good.
Not my idea (don't remember the author), but you could consider something like "See this text written by some guy I don't like. Point out the most important flaws".
Very interesting post. Thx for sharing! I really like the nonsense feature : )
One thing that is unclear to me (perhaps I missed that?): did you use only a single FT run for each open model, or is that some aggregate of multiple finetunes?
I'm asking because I'm a bit curious how similar are different FT runs (with different LoRA initializations) to each other. In principle you could get different top 200 features for another training run.
- Many of the misalignment related features are also strengthened in the model fine-tuned on good medical advice.
- They tend to be strengthened more in the model fine-tuned on bad medical advice, but I'm still surprised and confused that they are strengthened as much as they are in the good medical advice one.
- One loose hypothesis (with extremely low confidence) is that these "bad" features are generally very suppressed in the original chat model, and so any sort of fine-tuning will uncover them a bit.
Yes, this seems consistent with some other results (e.g. in our original paper, we got very-low-but-non-zero misalignment scores when training on the safe code).
A bit different framing could be: finetuning on some narrow task generally makes the model dumber (e.g. you got lower coherence scores in a model trained on good medical advice), and one of the effects is that it's also dumber with regards to "what is the assistant supposed to do".
Thx. I was thinking:
Please let me know if that doesn't make sense : )