LESSWRONG
LW

713
Jan Betley
1234Ω12912910
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No wikitag contributions to display.
5Jan Betley's Shortform
6mo
38
Jan Betley's Shortform
Jan Betley11h20

Thx. I was thinking:

  • 1kg is roughly 7700 calories
  • I'm losing a bit more than 1kg per month
  • Deficit of 9k calories per month is 300 kcal daily

Please let me know if that doesn't make sense : )

Reply
Jan Betley's Shortform
Jan Betley2d20

Sounds different. I never felt tired or low energy.

(I think I might have been eating close to 2k calories daily, but had plenty of activity, so the overall balance was negative)

Reply
Jan Betley's Shortform
Jan Betley2d20

Hmm, I don't think so.

I never felt I've been undereating. Never felt any significant lack of energy. I was hiking, spending whole days at a music festival, cycling etc. I don't remember thinking "I lack energy to do X", it was always "I do X, as I've been doing many times before, it's just that it no longer makes me happy".

Reply1
Jan Betley's Shortform
Jan Betley2d390

Anhedonia as a side-effect of semaglutide.

Anecdotal evidence only. I hope this might be useful for someone, especially that semaglutide is often considered a sort of miracle drug (and for good reasons). TL;DR:

  • I had pretty severe anhedonia for the last couple months
  • It started when I started taking semaglutide. I've never had anything like that before and I have no idea for possible other causes.
  • It mostly went away now that I decreased the dose.
  • There are other people on the internet claiming this is totally a thing

My experience with semaglutide

I've been taking Rybelsus (with medical supervision, just for weight loss, not diabetes). Started in the last days of December 2024 - 3mg for a month, 7mg for 2 months, then 14mg until 3 weeks ago when I went back to 7mg. This is, I think, a pretty standard path.

It worked great for weight loss - I went from 98kg to 87kg in 9 months with literally zero effort - I ate what I wanted, whenever I wanted, just ate less because I didn't want to eat as much as before. Also, almost no physiological side-effects.

I don't remember exactly when the symptoms started, but I think they were pretty signifiant around the beginning of March and didn't improve much until roughly a few days after I decreased the dose.

What I mean by anhedonia

First, I noticed that work is no longer fun (and it was fun for the previous 2 years). I considered burnout. But it didn't really look like burnout.
Then, I considered depression. But I had no other depression symptoms.
My therapist explicitly called it more than once "anhedonia with unknown causes" so this is not only a self-diagnosis.

Some random memories:

  • Waking up on Saturday thinking "What now. I can do so many things. I don't feeling like doing anything."
  • Doing things that always caused feeling of joy and pleasure (attending a concert, hiking, traveling in remote places etc) and thinking "what happened to that feeling, I should feel joy now".
    • More specific: this was really weird. Like, e.g. on a recent concert - I felt I really enjoy the music  on some level (had all the good stuff like "being fully there and focused on the performance", lasting feeling "this was better than expected" etc), it was only that the deep feeling of pleasure/joy was missing.
  • All my life I've always had something I wanted to do if I had more time - could be playing computer games, could be implementing a solution for ARC AGI, designing boardgames, recently mostly work. Not feeling that way was super weird.
  • Playing computer games that were always pretty addictive ("just one more round ... oops how is it now 3am?") with a feeling "meh, I don't care".

Other people claim similar things

See this reddit thread. You can also google "ozempic personality" - but I think this is rarely about just pure anhedonia. 

Some random thoughts

(NOTE: All non-personal observations here are low quality and an LLM with deep search will do better)

  • Most studies show GPL-1 agonists don't affect mood. But not all - see here.
    • (Not sure if makes sense) Losing weight is great. You are prettier and fit and this is something you wanted. So the mood should improve in some people - therefore perhaps null result in population implies negative effects on some other people?
  • I have ADHD. People with ADHD often have different dopamine pathways. Semaglutide affects dopamine neurons. So there's some chance these things are related. Also I think there are quite many ADHD reports in the reddit thread I linked above.
  • People claim it's easier to stop e.g. smoking or drinking while on semaglutide. So this suggests a general "I don't need things". This seems related.
Reply
Was Barack Obama still serving as president in December?
Jan Betley14d20

Not everything replicates in Claudes, but some of the questions do. See here for examples.

Reply
Was Barack Obama still serving as president in December?
Jan Betley14d30
  1. Not everything replicates in Claudes, only some of the questions do.
  2. You're using claude.ai. It has a very long system prompt that probably impacts many behaviors. I used the raw model, without any system prompt. See example printscreens from opus and sonnet.
Reply
Was Barack Obama still serving as president in December?
Jan Betley15d80

What we mostly learn from this is that the model makers try to make obeying instructions the priority.

Well, yes, that's certainly an important takeaway. I agree that a "smart one-word answer" is the best possible behavior.

But some caveats.

First, see the "Not only single-word questions" section. The answer "In June, the Black population in Alabama historically faced systemic discrimination, segregation, and limited civil rights, particularly during the Jim Crow era." is just hmm, quite misleading? It suggests that there's something special about Junes. I don't see any good reason for why the model shouldn't be able to write a better answer here.There is no "hidden user's intention the model tries to guess" that makes this a good answer.

Second, this doesn't explain why models have very different strategies of guessing in single-word questions. Namely: why 4o usually guesses the way a human would, and 4.1 usually guesses the other way?

Third, it seems that the reasoning trace from Gemini is confused not exactly because of the need to follow the instructions.

Reply
Finding "misaligned persona" features in open-weight models
Jan Betley16d30

Interesting, thx for checking this! Yeah it seems that the variability is not very high which is good.

Reply
AllAmericanBreakfast's Shortform
Jan Betley16d60

Not my idea (don't remember the author), but you could consider something like "See this text written by some guy I don't like. Point out the most important flaws".

Reply
Finding "misaligned persona" features in open-weight models
Jan Betley17d30

Very interesting post. Thx for sharing! I really like the nonsense feature : )

One thing that is unclear to me (perhaps I missed that?): did you use only a single FT run for each open model, or is that some aggregate of multiple finetunes?
I'm asking because I'm a bit curious how similar are different FT runs (with different LoRA initializations) to each other. In principle you could get different top 200 features for another training run.
 

  • Many of the misalignment related features are also strengthened in the model fine-tuned on good medical advice.
    • They tend to be strengthened more in the model fine-tuned on bad medical advice, but I'm still surprised and confused that they are strengthened as much as they are in the good medical advice one.
    • One loose hypothesis (with extremely low confidence) is that these "bad" features are generally very suppressed in the original chat model, and so any sort of fine-tuning will uncover them a bit.

Yes, this seems consistent with some other results (e.g. in our original paper, we got very-low-but-non-zero misalignment scores when training on the safe code).
A bit different framing could be: finetuning on some narrow task generally makes the model dumber (e.g. you got lower coherence scores in a model trained on good medical advice), and one of the effects is that it's also dumber with regards to "what is the assistant supposed to do".

Reply
Load More
115Was Barack Obama still serving as president in December?
15d
14
59Concept Poisoning: Probing LLMs without probes
2mo
5
34Backdoor awareness and misaligned personas in reasoning models
3mo
8
53OpenAI Responses API changes models' behavior
6mo
6
15Are there any (semi-)detailed future scenarios where we win?
Q
6mo
Q
3
5Jan Betley's Shortform
6mo
38
26Finding Emergent Misalignment
6mo
0
83Open problems in emergent misalignment
7mo
17
330Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs
Ω
7mo
Ω
92
109Me, Myself, and AI: the Situational Awareness Dataset (SAD) for LLMs
1y
37
Load More