LESSWRONG
LW

All of Chipmonk's Comments + Replies

Announcing EXP: Experimental Summer Workshop on Collective Cognition

Chipmonk7d20

what are the examples of the curriculum/activities you're considering/planning?

Announcing EXP: Experimental Summer Workshop on Collective Cognition

Chipmonk9d20

@Ivan Vendrov think you'd be interested

Self-fulfilling misalignment data might be poisoning our AI models

Chipmonk22d50

Can you think of examples like this in the broader AI landscape? - What are the best examples of self-fulfilling prophecies in AI alignment?

3Mateusz Bagiński21d

Emergence of utility-function-consistent stated preferences in LLMs might be an example (0.1<p<0.6) though going from reading stuff on utility functions to the kind of behavior revealed there requires more inferential steps than going from reading stuff on reward hacking to reward hacking.

Examples of self-fulfilling prophecies in AI alignment?

Answer by ChipmonkMar 03, 202560

https://x.com/sama/status/1621621724507938816

Examples of self-fulfilling prophecies in AI alignment?

Answer by ChipmonkMar 03, 202540

Situational Awareness and race dynamics? h/t Jan Kulveit @Jan_Kulveit

Examples of self-fulfilling prophecies in AI alignment?

Answer by ChipmonkMar 03, 202560

Training on Documents About Reward Hacking Induces Reward Hacking

Do clients need years of therapy, or can one conversation resolve the issue?

Chipmonk24d40

I don't feel like I learned anything new from the post.

This surprises me! Wait so-

The "How does one-shotting happen?" section didn't have anything interesting for you? (Have you seen stuff like this elsewhere?)
Did you already know one-shotting was possible?

2niplav23d

"One-shotting is possible" is a live hypothesis that I got from various reports from meditation traditions. I do retract "I learned nothing from this post", the "How does one-shotting happen" section is interesting, and I'd like it to be more prominent. Thanks for poking, I hope I'll find the time to respond to your other comment too.

Do clients need years of therapy, or can one conversation resolve the issue?

Chipmonk24d20

since your bullet-point list in the beginning isn't detailed enough for anyone to try to replicate the method.

Wait I'm confused- this is not the purpose of the post

Also notable is that you only have positive examples for your method

The purpose of this post is not advertisement. It's to discuss one-shots

Especially, how would you be able to distinguish between your approach convincing your customers they were helped, instead of actually changing their behavior?

See above

Do clients need years of therapy, or can one conversation resolve the issue?

Chipmonk24d*40

Would anyone like to help me edit a better version of this?

Do clients need years of therapy, or can one conversation resolve the issue?

Chipmonk24d20

Oh I like "patients" ("clients"). I'll think about the rest, thanks. I'm just not sure how to write anything useful and legible without talking about my own experience and what I have the most data for?

Also I see the point of your last bullet where "my business" is the subject hm

Do clients need years of therapy, or can one conversation resolve the issue?

Chipmonk24d20

any suggestions for how to talk about this stuff without having it read like an advertisement? i'm genuinely interested in the idea of one-shotting and legibilizing evidence that quick growth is possible

0niplav24d

I gave your post to Claude and gave it the prompt "Dearest Claude, here's the text for a blogpost I've written for LessWrong. I've been told that "it sounds a lot like an advertisement". Can you give me feedback/suggestions for how to improve it for that particular audience? I don't want to do too much more research, but a bit of editing/stylistic choices." (All of the following is my rephrasing/rethinking of Claude output plus some personal suggestions.) Useful things that came out of the answer were explaining more about the method you've used to achieve this, since your bullet-point list in the beginning isn't detailed enough for anyone to try to replicate the method. Also notable is that you only have positive examples for your method, which activates my filtered evidence detectors. Either make clear that you indeed did only have positive results, or name how many people you coached, for how long, and that they were all happy with what you provided. Finally, some direct words from Claude that I just directly endorse: Especially, how would you be able to distinguish between your approach convincing your customers they were helped, instead of actually changing their behavior? That feels like the failure mode of most self-help techniques—they're "self-recommending".

5ROM24d

Hey Chris! I have a few thoughts on this, though I have strong anti-advertising sentiments and might be overly sensitive to these things, so take it with a grain of salt. The title sounds a little click baity. It's directed at the reader. The title "Do patients need years of therapy, or can one conversation resolve their issue?" is functionally identical, but feels less like an advert. The opening reads somewhat like a common advert tactic: "I hated how business did [thing x] since it was bad for the customer, so I started my practice by doing [thing y] which is both more appealing to a potential customer and delivers better results!'. I think the advertising vibe might also come from the continued references to your personal practice / mentions of it's successes: * "So when I started my business, I made payment contingent on results:" * "Our clients are often surprised at how we do things because it’s so different than the therapy or other coaching they’ve done before:" * "Several of my clients have resolved lifelong issues like anxiety in one shot" * "My business is expanding to help more people in deeper and more efficient ways." Finally, it concludes with a link to where people can schedule a call with you.

Invest in ACX Grants projects!

Chipmonk25d20

any updates on how this is going btw? (doing retroactive funding research)

Prizes for ML Safety Benchmark Ideas

Chipmonk1mo20

what came of this? (doing research on bounties, prizes, and retroactive funding rn)

MichaelDickens's Shortform

Chipmonk1mo20

~~fwiw, FABRIC was able to get funding in~~ ~~November 2024~~ ~~(who knows if this date is correct though)~~

nvm this was an "exit grant" lmao

(The) Lightcone is nothing without its people: LW + Lighthaven's big fundraiser

Chipmonk1mo80

Now that this is "over", I'd be fascinated to see a post about what the fundraising process was like for you and what can be learned. Seems like a big L for retroactive funding for example

https://x.com/ohabryka/status/1882579367110586459

3DaneelO1mo

And related to that thread: how does one find out about how to donate when there is no fundraiser? I cannot find any info on the About page or FAQ page. If someone wants to donate in a couple of months when this post is not as visible, how will they find the donation link? I don’t know if adding a donation link to FAQ and About will make much of a difference in practice. I suspect it won’t since that depends on people more spontaneously realizing they want to donate. But it seems pretty relevant to the complaint raised in the tweet thread that people only donate when you do large dramatic calls for funding. I think it wouldn’t hurt to lessen the friction and make it easier to find out how to donate.

We probably won't just play status games with each other after AGI

Chipmonk2mo48

Aside: I'm surprised you're suggesting people get validation --> people feel secure ? This does not at all seem like the causality to me (though I'm aware most people probably think like this).

Prediction: In the absence of radically improved psychotechnology, a significant fraction of people will always find a way to feel insecure.

2Kaj_Sotala2mo

Patterns of emotional security/insecurity are constantly updating (in both directions) through life, though some people's patterns are more resistant (in either direction) than those of others. (This is both my own personal experience and the empirical finding in the literature.) In the insecure -> secure direction, positive experiences as an adult can help reconsolidate negative expectations and provide the kinds of experiences that naturally securely attached people already got earlier: (How can I become more secure?: A grounded theory of earning secure attachment; Olufowote, Fife & Whiting 2019) That said, it's true that the stronger someone's insecure attachment is, the more resistant it is to updating through positive experiences: (Attachment Disturbances in Adults, p. 99-100) Of course, one consideration is also that people with insecure attachment tend to bring various patterns into their relationships that make the other person more likely to respond negatively, making it harder to get the positive experiences that would update the attachment patterns. An AI with infinite patience and understanding that never got triggered would be different in this regard, so might be able to provide corrective experiences for even some of the people who wouldn't normally be capable of changing when dealing with just humans. I would guess/hope that most people's degree of emotional insecurity would be such that they would be able to find security with AIs (especially if the AIs also doubled as expert therapists). With only the most extremely insecure people (e.g. some of the ones who would qualify for a diagnosis of a personality disorder) needing novel psychotech - but of course I can only speculate at this point.

Began a pay-on-results coaching experiment, made $40,300 since July

Chipmonk2mo20

hmm i suspect releasing these metrics could make my customers significantly more annoying. like, early adopters are fun and experimental. but if i make it seem not risky then i get risk-averse people who tend to be prickly

so maybe i will compile and release this data but i would need to figure out how to do it in a way that doesn't change the funnel

Increasing IQ is trivial

Chipmonk2mo40

Any updates on this?

Increasing IQ by 10 Points is Possible

Chipmonk2mo20

Any updates on this?

(The) Lightcone is nothing without its people: LW + Lighthaven's big fundraiser

Chipmonk2mo20

I wonder if you could set up a conditional donation? “I donate $X, minus if total donations exceed $3M"

Began a pay-on-results coaching experiment, made $40,300 since July

Chipmonk3mo30

i like this thanks. might take a bit of time to put together but interested

Began a pay-on-results coaching experiment, made $40,300 since July

Chipmonk3mo30

made some light edits because of this comment, thanks

Began a pay-on-results coaching experiment, made $40,300 since July

Chipmonk3mo40

oh ok i might start doing that. knowing my calibration on that would be nice

Began a pay-on-results coaching experiment, made $40,300 since July

Chipmonk3mo20

oh ok hm. i also don't want to be incentivized to not give easy-for-me help to people with low odds of success though

4pandamonium3mo

Disclaimer : I would not pay and want to pay that much money anyway - so I am not your intended audience I'd trust you more (and I would think members of the rationalist community would too) if you gave several metrics, even if some of them are not so good, with explanations. Right now, it seems you chose a metric so that it looks good. More metrics would take more time but not much if you have the data easily available. This would be my suggestion : You can provide three percentages ( like when one provides three quantiles instead of just the mean of data values) : * the percentage of success in people you discussed for at least an hour * the percentage among the people with reasonable chances of success (motivated + didn't bail + your expertise + spent at least X hours) * the percentage among people with great chances of success. These percentages, with precise information on what determines in which category clients fall in and the percentage of people treated who fall into each category, would give a first sound idea of the success rate. Taking on low success rate people would not be a problem because their data is treated separately. It's only a problem if 90% of your clients are unlikely to be helped but that would not be a good thing anyway.

Began a pay-on-results coaching experiment, made $40,300 since July

Chipmonk3mo20

could you give a few examples?

also seems time-intensive hmmmm

also, i thought about it more and i really like the metric of "results generated per hour"

1gw3mo

I think you've already given several examples: It would already be informative if you put numbers on each of these questions (i.e. "how often does talking for 15 minutes accomplish something", "how many bounties have you taken on in/outside of your specialty", "what percent of your clients are 'unagentic and slow' (and what does this actually mean)"). Probably one could do much better by generating several metrics that one would expect to be most useful (or top N%tile useful) and share each of them.

Began a pay-on-results coaching experiment, made $40,300 since July

Chipmonk3mo20

:D i really hope bounties catch on

Began a pay-on-results coaching experiment, made $40,300 since July

Chipmonk3mo22

wow this is contraversial (my own vote is +6)

wonder why

DirectedEvolution3mo118

I upvoted for the novelty of a rationalist trying a bounty based career. But also this halfway reads as an advertisement for your life coaching service. I wouldn’t want to see much more in that direction.

Shallow review of technical AI safety, 2024

Chipmonk3mo30

boundaries / membranes
One-sentence summary: Formalise one piece of morality: the causal separation between agents and their environment. See also Open Agency Architecture.
Theory of change: Formalise (part of) morality/safety, solve outer alignment.

Chris Lakin here - this is a very old post and What does davidad want from «boundaries»? should be the canonical link

Orienting to 3 year AGI timelines

Chipmonk3mo30

Why SPY over QQQ?

The Deep Lore of LightHaven, with Oliver Habryka (TBC episode 228)

Chipmonk3mo20

available on the website at least

Pay-on-results personal growth: first success

Chipmonk3mo20

Update: Bob has recorded a 6-month follow-up here.

Walking Sue

Chipmonk3mo10

Why was this post tagged as boundaries/membranes? I'm inclined to remove the tag.

1Matthew McRedmond3mo

I only skimmed that category but if I'm not mistaken the kind of systems I describe in the piece are special cases of times when the boundary between defining agents and one agent and another is unclear/pivotal/insightful etc.

Being Present is Not a Skill

Chipmonk3mo40

makes sense

Sorry for the downtime, looks like we got DDosd

Chipmonk4mo20

works!

Sorry for the downtime, looks like we got DDosd

Chipmonk4mo20

another weird bug is if i click the link i was just sent in my email, it brings me to a 403 Forbidden page (even though the URLs of this functional page and that 403 page look identical)

4habryka4mo

Should now be fixed. We've blocked traffic to basically all pages and been restoring them incrementally to make sure we don't go down again immediately. I just lifted the last of those blocks.

(The) Lightcone is nothing without its people: LW + Lighthaven's big fundraiser

Chipmonk4mo258

I've run two workshops at LightHaven and it's pretty unthinkable to run a workshop anywhere else in the Bay Area. Lightcone has really made it easy to run overnight events without setup

Hierarchical Agency: A Missing Piece in AI Alignment

Chipmonk4mo20

Yeah i'm confused about what to name it. we can always change it later i guess.

also let me know if you have any posts you want me to definitely tag for it that you think i might miss otherwise

3Mateusz Bagiński3mo

Compositional agency?

Hierarchical Agency: A Missing Piece in AI Alignment

Chipmonk4mo93

Do we have a LessWrong tag for "hierarchical agency" or "multi-scale alignment" or something? Should I make one?

2Jan_Kulveit4mo

I guess make one? Unclear if hierarchical agency is the true name

Hierarchical Agency: A Missing Piece in AI Alignment

Chipmonk4mo40

I just made a twitter list with accounts interested in hierarchical agency (or what i call "multi-scale alignment"). Lmk who should be added

Hierarchical Agency: A Missing Piece in AI Alignment

Chipmonk4mo40

Random but you might like this graphic I made representing hierarchical agency from my post today on a very similar idea. What would you change about it?

Hierarchical Agency: A Missing Piece in AI Alignment

Chipmonk4mo124

This was an impressive demonstation of Claude for interviews. Was this one take?

(Also what prompt did you use? I like how your Claude speaks.)

Jan_Kulveit4mo140

There was some selection of branches, and one pass of post-processing.

It was after ˜30 pages of a different conversation about AI and LLM introspection, so I don't expect the prompt alone will elicit the "same Claude". Start of this conversation was

Thanks! Now, I would like to switch to a slightly different topic: my AI safety oriented research on hierarchical agency. I would like you to role-play an inquisitive, curious interview partner, who aims to understand what I mean, and often tries to check understanding using paraphrasing, giving examples, and si... (read more)

Hierarchical Agency: A Missing Piece in AI Alignment

Chipmonk4mo40

I'm glad you wrote this! I've been wanting to tell othres about ACS's research and finally have a good link

Locally optimal psychology

Chipmonk4mo20

Great question, thanks!

I think you're correct in pointing towards the existence of basically-all-downside genetic conditions, but I still think these are in the minority. Moreover, even most of those don't create a big issue on the object level— compared to how people might feel about the issue as a result.

This argument doesn't extend to conditions like Huntington's, but if a person is missing a pinky finger, most of the issues the person is going to face are related to social factors and their own emotions, not the physical aspect.

I also just say this from experience helping others.

Locally optimal psychology

Chipmonk4mo20

I did not say that depression is always a strategy for everyone.

4Archimedes4mo

I didn't mean to suggest that you did. My point is that there is a difference between "depression can be the result of a locally optimal strategy" and "depression is a locally optimal strategy". The latter doesn't even make sense to me semantically whereas the former seems more like what you are trying to communicate.

Which things were you surprised to learn are not metaphors?

Answer by ChipmonkNov 21, 202480

I wrote about my own experience discovering “feelings in the body” here

Social events with plausible deniability

Chipmonk4mo20

Eliezer likes it but lesswrong doesn't

Social events with plausible deniability

Chipmonk4mo30

can someone explain to me why this is so controversial

2Chipmonk4mo

Eliezer likes it but lesswrong doesn't

The hostile telepaths problem

Chipmonk4mo20

What would you say that the main types of power are?

My list (for humans): physical security, financial security, social security, emotional security (this one you can only give yourself though)

1fuli3mo

That’s a complicated question. At an individual level you have value alignment (people who agree with your values) and incentive alignment (people who disagree with your values but do what you want anyway because incentives). Value alignment is mostly persuasion and having enough attention of people. Incentive alignment is everything on Maslow hierarchy. You can reward or penalise others in terms of their physical safety, in terms of food and water, in terms of social approval of family and friends, in terms of providing them meaning in life, etc. (Which is basically the stuff you’re saying) There’s another lens to look at this which is, how do you get a lot of leverage over reality. Naval Ravikant quotes three forms of leverage - labour, capital and anything that replicate at zero cost on the internet. There’s more nuance to this but at a high level I agree - having a lot of people who will listen to you is power, having a lot of money is power and publishing information/code/media/games/etc that affect millions of lives is power.

Ayn Rand’s model of “living money”; and an upside of burnout

Chipmonk4mo2-1

In other cases, or for other reasons, they might be instead set up to demand results, and evaluate primarily based on results.

Why might it be set up like that? Seems potentially quite irrational. Veering into motivated reasoning territory here imo

7Viliam4mo

Parts of human mind are not little humans. They are allowed to be irrational. It can't be rational subagents all the way down. Rationality itself is probably implemented as subagents saying "let's observe the world and try to make a correct model" winning a reputational war against subagents proposing things like "let's just think happy thoughts". But I can imagine how some subagents could have less trust towards "good intentions that didn't bring actual good outcomes" than others. For example, if you live in an environment where it is normal to make dramatic promises and then fail to act on them. I think I have read some books long ago claiming that children of alcoholic parents are often like that. They just stop listening to promises and excuses, because they have already heard too many of them, and they learned that nothing ever happens. I can imagine that they turn this habitual mistrust against themselves, too. That "I tried something, and it was a good idea, but due to bad luck it failed" resembles too much the parent saying how they had the good insight that they need to stop drinking, but only due to some external factor they had to drink yet another bottle today. Shortly, if your environment fails you a lot, as a response you can become unrealistically harsh on yourself. Another possible explanation is that different people's attention is focused on different places. Some people pay more attention to the promises, some pay more attention to the material results, some pay more attention to their feelings. This itself can be a consequence of the previous experience with paying attention to different things.

2Selfmaker6624mo

I wouldn’t say the subsconscious calibrating on more substantial measures of success, such has “how happy something made me” or “how much status that seems to have brought” is irrational. What you’re proposing, it seems to me, is calibrating only on how good of an idea it was from the predictor part / System 2. Which gets calibrated, I would guess, when the person analyses the situation? But if the system 2 is sufficiently bad, calibrating on pure results is a good idea to shield against pursuing some goal, the pursuit of which yields nothing but evaluations of System 2, that the person did well. Which is bad, if one of the end goals of the subconscious is “objective success”. For example, a situation I could easily imagine myself to have been in: Every day I struggle to go to bed, because I can’t put away my phone. But when I do, at 23:30, I congratulate myself - it took a lot of effort, and I did actually succeed in giving myself enough time to sleep almost long enough. If I didn’t recalibrate rationally, and “me-who-uses-internal-metrics-of-success” were happy with good effort every day, I’d keep doing it. All while real me would get fed up soon, and get a screen blocker app to turn on at 23:00 every day to sleep well every day at no willpower cost. (+- the other factors and supposing phone after 23 isn’t very important for some parts of me)

Ayn Rand’s model of “living money”; and an upside of burnout

Chipmonk4mo30

Maybe, but that also requires that the other group members were (irrationally) failing to consider that the “attempt could've been good even if the luck was bad”.

In human groups, people often do gain (some) reputation for noble failures (is this wrong?)

4abramdemski4mo

In machine-learning terms, this is the difference between model-free learning (reputation based on success/failure record alone) and model-based learning (reputation can be gained for worthy failed attempts, or lost for foolish lucky wins).

7Lorxus4mo

Sure - I can believe that that's one way a person's internal quorum can be set up. In other cases, or for other reasons, they might be instead set up to demand results, and evaluate primarily based on results. And that's not great or necessarily psychologically healthy, but then the question becomes "why do some people end up one way and other people the other way?" Also, there's the question of just how big/significant the effort was, and thus how big of an effective risk the one predictor took. Be it internal to one person or relevant to a group of humans, a sufficiently grand-scale noble failure will not generally be seen as all that noble (IME).