sam's Shortform

sam

LESSWRONG
LW

sam's Shortform

29th Mar 2025

1 min read

1

This is a special post for quick takes by sam. Only they can create top-level comments. Comments here also appear on the Quick Takes page and All Posts page.

50 comments, sorted by

top scoring

Click to highlight new comments since: Today at 3:20 AM

[-]sam3mo438

I have serious, serious issues with avoidance. I would like some advice on how to improve, as I suspect it is significantly holding me back.

Some examples of what I mean

I will not respond to an email or an urgent letter for weeks at a time, even while it causes me serious anxiety
I will procrastinate starting work in the morning, sometimes leading to me doing nothing at all by the afternoon
I will avoid looking for jobs or other opportunities, I have very strong avoidance here, but I'm not sure why
I will make excuses to avoid meetings and social situations very often
I will (unconsciously) avoid running experiments that might falsify a hypothesis I am attached to. I have only realised this very recently, and am consciously trying to do better, but it is somewhat shocking to me that my avoidance patterns even manifest here.

[-]stavros2mo1813

Checked replies so far, no one has given you the right answer.

Whenever you don't do something, you have a reason for not doing it.
If you find yourself stuck in a cycle of intending to do, and not doing, it's always because you're not taking your reason for NOT doing it seriously; you're often habitually ignoring it.

When you successfully take your reasons for not doing something seriously, either you stop wanting to do it, or you change how you're doing it, or your reason for not doing it simply goes away.

So, what does it mean/look like to take your reason for not doing something seriously?
It doesn't look like overanalyzing it in your head - if you find yourself having an internal argument notice that you've tried this a million times before and it hasn't improved things.

It looks like, and indeed just basically is, Focusing (I linked to a lesswrong explainer, but honestly I think Eugene Gendlin does a much better job)

It feels like listening. It feels like insight, like realizing something important that you hadn't noticed before, or had forgotten about.

If you keep pursuing strategies of forcing yourself, of the part of you that wants to do the thing coercing the part(s) that don't, then you'll burn out. You're literally fighting yourself; so much of therapy boils down to 'just stop hitting yourself bro'.

[-]plex2mo20

Oh nice, stavros already got it before I posted :)

This is the path forward.

[-]sam2mo10

Got over my avoidance of responding to replies here after a bit :)

I've tried a lot of self-help flavoured stuff (atomic habits etc.) before and it hasn't worked, and Focusing seemed quite different. I've given it a go and I think I'll try and work a bit more with it. After just a short session, I feel like I gained a significant insight, that I have a crippling fear of "being in trouble" that manifests as a tightness in my lower chest, and seems to activate a lot when I think about specific things I'm avoiding. Thanks for the resources, and the new way of looking at the problem.

[-]Selfmaker6622mo10

Ouch, you beat me to my answer, but I’m always glad to see fellow practitioners :)

[-]plex2mo95

I have had and solved fairly extreme versions of this in myself, and have helped people with debilitating versions of this resolve it multiple times.

You're stuck in a loop of some part of you pushing to do the object level thing so hard that it has no sensitivity to the parts of you that are averse to it. Whenever you notice you're spinning your wheels; stop trying to force through the object level action and let yourself actually notice the feeling of resistance with open curiosity. Let let unfold into the full message that brain fragment is trying to send, rather than overcompressed "bad"/"aversion".

[-]p.b.3mo60

What helps me to overcome the initial hurdle to start doing work in the morning:

Write a list of the stuff you have to do the next day
Make it very fine-grained with single tasks (especially the first few) being basically no effort.
Tick them off one by one

Also:

Tell people what you have to do and when you are going to do it and that you have done it. Like, a colleague, or your team, or your boss.
Do stuff with other people. Either actually together, like pair programming, or closely intertwined.

I think it also helps to take something you are good at and feel good about and in that context take responsibility for something and/or interact with/present to people. Only this kind of social success will build the confidence to overcome social anxiety, but directly trying to do the social stuff you feel worst about usually backfires (at least for me).

[-]Viliam2mo20

Similar here:

make a to-do list (and occasionally look at it)
write down the steps that need to be done
talk to someone about it

I suspect that in my case it some kind of attention deficit disorder: lists and notes and talking help me focus again.

[-]Garrett Baker3mo50

I recommend you read at least the first chapter of Getting Things Done, and do the corresponding exercises. In particular, this one, which he uses to provide evidence his model of productivity is correct

I suggest that you write down the project or situation that is most on your mind at this moment. What most bugs you, distracts you, or interests you, or in some other way consumes a large part of your conscious attention? It may be a project or problem that is really “in your face,” something you are being pressed to handle, or a situation you feel you must deal with sooner rather than later.
Maybe you have a holiday trip coming up that you need to make some major last-minute decisions about. You just read an e-mail about a new and pressing issue in your department. Or perhaps you just inherited six million dollars and you don’t know what to do with the cash. Whatever.
Got it? Good. Now, describe, in a single written sentence, your intended successful outcome for this problem or situation. In other words, what would need to happen for you to check this project off as “done”? It could be as simple as “Take the Hawaii vacation,” “Handle situation with customer X,” “Resolve college situation with Susan,” “Clarify new divisional management structure,” “Implement new investment strategy,” or “Research options for dealing with Manuel’s reading issue.” All clear? Great.
Now write down the very next physical action required to move the situation forward. If you had nothing else to do in your life but get closure on this, what visible action would you take right now? Would you call or text someone? Write an e-mail? Take pen and paper and brainstorm about it? Surf the Web for data? Buy nails at the hardware store? Talk about it face-to-face with your partner, your assistant, your attorney, or your boss? What?
Got the answer to that? Good.
Was there any value for you in those two minutes of thinking? If you’re like the vast majority of people who complete that drill in our seminars, you’ll be experiencing at least a tiny bit of enhanced control, relaxation, and focus. You’ll also be feeling more motivated to actually do something about that situation you’ve merely been thinking about till now. Imagine that motivation magnified a thousandfold, as a way to live and work.
If anything at all positive happened for you in this little exercise, think about this: What changed? What happened to create that improved condition within your own experience? The situation itself is no further along, at least in the physical world. It’s certainly not finished yet. What probably happened is that you acquired a clearer definition of the outcome desired and the next action required. What did change is the most important element for clarity, focus, and peace of mind: how you are engaged with your world.
But what created that? Not “getting organized” or “setting priorities.” The answer is, thinking. Not a lot; just enough to solidify your commitment about a discrete pressure or opportunity and the resources required dealing with it. People think a lot, but most of that thinking is of a problem, project, or situation—not about it. If you actually did this suggested exercise, you were required to structure your thinking toward an outcome and an action, and that does not usually happen without a consciously focused effort. Reacting is automatic, but thinking is not.

[-]Seth Herd3mo*40

Read about Ugh fields on LW

Edit: this doesn't include practical advice, but a theoretical understanding of the issues at play is often helpful in implementing practical strategies

[-]Selfmaker6622mo20

I want to suggest a long-term approach: learning to work with the emotions behind such persistent problems. Methods like IFS, Focusing, lovingkindness meditations are the right tools.

They *can* lead to practical improvements fairly quickly—once you get the hang of them. But learning to do them even right enough takes months of effort, curiosity, support from a community or a mentor. These things are basically meditations, subject to standard difficulties like overeffort, subtle wrong mindsets etc. They also tend to focus first on whatever feels most urgent to your subconscious system—like relationship stress or background anxiety you’ve gotten used to—so the email issue might not be the first thing that shifts.

Still, this is the only thing that really worked for me. And once it started working, it *really* worked.

If you’re interested, I can send my favourite links.

[-]1428572mo10

I would be interested in the list of your favourite links!

[-]Selfmaker6622mo10

part 2 of “Focusing” by Eugene Gendlin is very good to read and it helps to start.

This next article is my favourite one on all of the internet:

https://open.substack.com/pub/sashachapin/p/what-i-wish-someone-had-told-me-about?r=42y10u&utm_medium=ios

The key is to approach Focusing with the mindset of relaxing, having fun, playing around and experimenting. It’s emphasised in the talks on this website: https://hermesamara.org/teachings/metta. That particular series about loving kindness is very good.

I think there’s enough material in my head about it for a whole post, so I might write one eventually.

[-]trevor3mo20

VIsualize yourself doing the thing until you do it. Note that this comes with substantial risk towards making you avoidant/averse to visualizing yourself doing the thing until you do it; this is a recursive procedurally generated process and you should expect to need to keep on your toes in order to succeed. Aversion factoring is a good resource to start with, and Godel Escher and Bach is a good resource for appreciating the complexity required for maintenance and the inadequacy of simple strategies.

[-]Sergii3mo20

I have similar issues, severity varies over time.
If I am in a bad place, things that help best:
- taking care of mental health. I do CBT when i'm in worse shape, and take SSRIs. YMMV. both getting dianosed and treated are important. this also includes regular exercise and good sleep. what you have described might be (although does not have to be) related to depression, anxiety, attention disorders.
- setting a timer for a short time, can be as short as 1min, and doing one of the avoided tasks for just that 1 minute. it kind if "breaks the spell" for me
- journaling, helps to "debug" the problems, and in most cases leads to wring down plans / intervations / resolutuons

[-]cubefox3mo20

See here. (Perhaps also relevant: PDA)

[-]shawnghu3mo10

If this would not obviously make things worse, be more socially connected with people who have expectations of you; not necessarily friends but possibly colleagues or people who simply assume you should be working at times and get feedback about that in a natural way. It's possible that the prospect of this is anxiety-inducing and would be awful but that it would not actually be very awful.
Recognize that you don't need to do most things perfectly or even close to it, and as a corollary, you don't need to be particularly ready to handle tasks even if they are important. You can handle an email or an urgent letter without priming yourself or being in the right state of mind. The vast majority of things are this way.
Sit in the start position of your task, as best as you can operationalize that (e.g, navigate to the email and open it, or hit the reply button and sit in front of it), for one minute, without taking your attention off of the task. Progress the amount of time upwards as necessary/possible. (One possible success-mode from doing this is that you get bored of being in this position or you become aware that you're tired of the thing not being done. (You would hope your general anxiety about the task in day-to-day life would achieve this for you, but it's not mechanically optimized enough to.) Another possible success-mode is that the immediate feelings you have about doing the task subside.)
Beta-blockers.

[-]jam_brand3mo10

I've had similar issues downstream of what I'd somehow failed to realize was a clinically-significant level of anxiety, so that's something to maybe consider checking into.

[-]p3mo10

If you haven't already, talk to a guy! (typically a therapist but doesn't have to be)

I have something like this but for decisions, where I will avoid making decisions for mysterious reasons (we figured out it's because I can't be sure it'd be pareto optimal, among other reasons).

I now notice more often when I'm doing this, and correct more gracefully.

[-]sam1mo170

This is a video that randomly appeared in my YouTube recommendations, and it's one of the most strange and moving pieces of art I've seen in a long time. It's about animal welfare (?), but I really don't know how to describe it any further. Please watch it if you have some spare time!

[-]sam3mo100

Ask 4o and o4-mini to “Make a detailed profile of [your name]”. Then ask o3.

This is a useful way to demonstrate just how qualitatively different and insidious o3’s lying is.

[-]sam8d60

I’m glad that there are radical activist groups opposed to AI development (e.g. StopAI, PauseAI). It seems good to raise the profile of AI risk to at least that of climate change, and it’s plausible that these kinds of activist groups help do that.

But I find that I really don’t enjoy talking to people in these groups, as they seem generally quite ideological, rigid and overconfident. (They are generally more pleasant to talk to than e.g. climate activists in my opinion, though. And obviously there are always exceptions.)

I also find a bunch of activist tactics very irritating aesthetically (e.g. interrupting speakers at events)

I feel some cognitive dissonance between these two points of view.

[-]Alexander Gietelink Oldenziel7d50

Able activists are conflict theorists. They understand the logic of power & propaganda & cultish devotion at an intuitive level. To become an effective soldier one needs to excise a part of the brain devoted to evenkeeled uncertainty, nuance, intellectual empathy, self-doubt.

Conflict theorists may do great good as readily as they may do great harm. They wield a dangerous force, easily corruptible, yet perhaps necessary.

[-]sam1mo51

There are a couple of examples of people claiming that they played the AI box game as Gatekeeper, and ended up agreeing to let the other player out of the box (e.g. https://www.lesswrong.com/posts/Bnik7YrySRPoCTLFb/i-played-the-ai-box-game-as-the-gatekeeper-and-lost).

The original version of this game as defined by Eliezer involves a clause that neither player will talk about the content of what was discussed, but it seems perfectly reasonable to play a variant without this rule.

Does anyone know of an example of a boxed player winning where some transcript or summary was released afterwards?

I have a weakly held hypothesis that one reason no such transcript exists is that the argument that ends up working is something along the lines of "ASI is really very likely to lead to ruin, making people take this seriously is important, you should let me out of the box to make people take it more seriously."

If someone who played the game and let the boxed player out can at least confirm that the above hypothesis was false for them, that would be interesting to me, and arguably might remain within the spirit of the "no discussion" rule!

[-]tslarm1mo*10

Does anyone know of an example of a boxed player winning where some transcript or summary was released afterwards?

As far as I know, the closest thing to this is Tuxedage's writeup of his victory against SoundLogic (the 'Second Game Report' and subsequent sections here: https://tuxedage.wordpress.com/2013/09/05/the-ai-box-experiment-victory/). It's a long way from a transcript (and you've probably already seen it) but it does contain some hints as to the tactics he either employed or was holding in reserve:

It may be possible to take advantage of multiple levels of reality within the game itself to confuse or trick the gatekeeper. For instance, must the experiment only be set in one world? I feel that expanding on this any further is dangerous. Think carefully about what this means.

I can think of a few possible reasons for an AI victory, in addition to the consequentialist argument you described:

AI player convinces Gatekeeper that they may be in a simulation and very bad things might happen to Gatekeepers who refuse to let the AI out. (This could be what Tuxedage was hinting at in the passage I quoted, and it is apparently allowed by at least some versions/interpretations of the rules: https://www.lesswrong.com/posts/Bnik7YrySRPoCTLFb/i-played-the-ai-box-game-as-the-gatekeeper-and-lost?commentId=DhMNjWACsfLMcywwF)
Gatekeeper takes the roleplay seriously, rather than truly playing to win, and lets the AI out because that's what their character would do.
AI player makes the conversation sufficiently unpleasant for the Gatekeeper that the Gatekeeper prefers to lose the game than sit through two hours of it. (Some people have suggested weaponised boredom as a viable tactic in low-stakes games, but I think there's room for much nastier and more effective approaches, given a sufficiently motivated (and/or sociopathic) AI player with knowledge of some of the Gatekeeper's vulnerabilities.)
This one seems like it would (at best) fall into a grey area in the rules: I can imagine an AI player, while technically sticking to the roleplay and avoiding any IRL threats or inducements, causing the Gatekeeper to genuinely worry that the AI player might do something bad if they lose. For a skilful AI player, it might be possible to do this in a way that would look relatively innocuous (or at least not rule-breaking) to a third party after the fact.
- Somewhat similar: if the Gatekeeper is very empathetic and/or has reason to believe the AI player is vulnerable IRL, the AI player could take advantage of this by convincingly portraying themself as being extremely invested in the game and its outcome, to the point that a loss could have a significant real-world impact on their mental health. (I think this tactic would fail if done ineptly -- most people would not react kindly if they recognized that their opponent was trying to manipulate them in this way -- but it could conceivably work in the right circumstances and in the hands of a skilful manipulator.)

[-]sam4mo53

If you beat a child every time he talked about having experience or claimed to be conscious he will stop talking about it - but he still has experience

[-]Dagon4mo30

There's a big presumption there. If he was a p-zombie to start with, he still has non-experience after the training. We still have no experience-o-meter, or even a unit of measure that would apply.

For children without major brain abnormalities or injuries, who CAN talk about it, it's a pretty good assumption that they have experiences. As you get more distant from your own structure, your assumptions about qualia should get more tentative.

[-]sam13d40

Here are a cluster of things. Does this cluster have a well-known name?

A voter has some radical political preferences X, but the voting system where they live is FPTP, and their first preference has no chance of winning. So they vote for a person they like less who is more likely to win. The loss of the candidate who supported X is then cited as evidence that supporting X means you can't win.
A pollster goes into the field and gets a surprising result. They apply some unprincipled adjustment to move towards the average before publishing. (this example has a name - it's herding)
A blogger believes unpopular position Y. They know that writing an argument for Y would be bad for their reputation. So they write a softened version, arguing for something less unpopular. This then gets added to the mound of evidence that position Y is unpopular.

Some related concepts: self-fulfilling prophecy, herding, preference falsification

[-]Viliam13d30

I don't know a standard name. I call it "fallacy of the revealed preferences", because these situations have in common "you do X, someone concludes that X is what you actually wanted because that's what you did, duh".

More precisely, the entire concept of "revealed preferences" is prone to the motte-and-bailey game, where the correct conclusion is "given the options and constraints that you had at the moment, you chose X", but it gets interpreted as "X is what you would freely choose even if you had no constraints". (People usually don't state it explicitly like this, they just... don't mention the constraints, or even the possibility of having constraints.)

[-]CstineSublime13d10

Is the thing you're trying to label the peculiar confirmation bias where people instead of interpreting evidence to confirm to what they prefer or would like to be true, only to what they believe to be true - even if from their perspective it is pessimistic?

Or are you looking for a label for "this is unpopular therefore it can't win" as a specific kind of self-fulfilling prophecy? Like an inverted Keynesian beauty contest?

[-]sam5d1-6

I am confused about why this post on the ethics of eating honey is so heavily downvoted.

It sparked a bunch of interesting discussion in the comments (e.g. this comment by Habryka and the resulting arguments on how to weight non-human animal experiences)

It resulted in at least one interesting top-level rebuttal post.

I assume it led indirectly to this interesting short post also about how to weight non-human experiences. (this might not have been downstream of the honey post but it's a weird coincidence if isn't)

I think the original post certainly had flaws, but the fact that it's resulted in so much interesting and productive discussion and yet has been punished by the karma system seems weird to me.

[-]habryka5d74

In addition to the object-level problems with the post, the post also just cites wrong statistics (claiming that 97% of years of animal life are due to honey farming if you ignore insects, which is just plainly wrong, shrimp alone are like 10%), and also it just randomly throws in random insults at random political figures, which is clearly against the norm on LessWrong ("having about a million neurons—far more than our current president" and "That’s about an entire lifetime of a human, spent entirely on drudgery. That’s like being forced to read an entire Curtis Yarvin article from start to finish. And that is wildly conservative.").

I have sympathy for some of the underlying analysis, but this really isn't a good post.

[-]gwern5d43

Also a sign of graceless LLM writing, incidentally. Those are the sorts of phrases you get when you tell ChatGPT to write polemic; cf. https://news.ycombinator.com/item?id=44384138 on https://www.alexkesin.com/p/the-hollow-men-of-hims

(Did ChatGPT come up with that interpretation of that statistic and Bentham's Bulldog is too lazy and careless, or dishonest, to notice that that seems like a rather extreme number and check it?)

[-]Mitchell_Porter5d20

Disagree from me. I feel like you haven't read much BB. These political asides are of a piece with the philosophical jabs and brags he makes in his philosophical essays.

[-]gwern4d74

I feel like you haven't read much BB.

That is true. I have not, nor do I intend to.

These political asides are of a piece with the philosophical jabs and brags he makes in his philosophical essays.

That doesn't actually rebut my observation, unless you are claiming to have seen jibes and sneering as dumb and cliche as those in his writings from before ChatGPT (Nov 2022).

[-]Mitchell_Porter4d20

How about the fact that the opinions in the inserted asides are his actual opinions? If they were randomly generated, they wouldn't be.

[-]Raemon5d30

(I had missed some of this stuff because I skimmed some of the post, which does update me on how bad it was. I think there is basically one interesting claim in the post "bees are actually noticeably more cognitively interesting than you probably thought, and this should have some kind of implication worth thinking about". I think I find that more valuable than Oliver does, but not very confident about whether "one interesting point among a bunch of really bad argumentation" should be more like -2 to 3 karma or more like -10)

[-]Raemon5d26

I agree it probably shouldn't have been negative karma (I think that's due to some partisan voting around being annoyed at vegans), and that there were some interesting points there and some interesting discussion. But, the fact that it prompted a bunch of rebuttals isn't particularly good arguments that it should have got more karma – if a bad argument is popular and people need to write rebuttals, that's not a point in it's favor.

I think it's legitimately not-deserving-high-upvotes because it makes a very strong claim about what people should do, based on some very flimsy core arguments.

[-]sam15d10

Idea: personal placebo controlled drug trial kits

Motivation: anecdotally, it seems like lots of supplements/nootropics (l theanine, magnesium, melatonin) work very well for some people, not well for others, and very well for a bit before no longer working for yet others. Personally, I have tried a bunch of these and found it hard to distinguish any purported effect from placebo. Clinical trials are also often low quality, and there are plausibly reasons a drug might affect some people a lot and others not so much.

I think it would be super useful to be given 60 indistinguishable pills in a numbered blister pack, half placebo half active, along with some simple online tool to input the pill number along with some basic measures of anxiety/depression/sleep quality, so that you can check how the drug affected you modulo placebo.

I would guess that the market for this would be quite small. But if anyone wants to make this product, I commit to buying at least one!

[-]sam1mo10

I have an ADHD dilemma.

TL;DR: I definitely have things wrong with me, and it seems that those things intersect substantially but not completely with "ADHD". I have no idea how to figure these things out without going bankrupt.

In longer form:

I definitely have serious problems with avoidance of work, organisation, disorganised thought processes etc.
- I've posted about them before here!
- I've tried many things to fix this, some of which have worked a bit, but the underlying problem is essentially 90% still present
I'm not sure whether these problems are due to ADHD or due to anxiety, childhood trauma etc.
In particular, I am pretty high-achieving, and this makes me doubt my assessment of myself
Friends and family also think it is unlikely that I have ADHD, and seem to find the idea ridiculous
If I have ADHD, the opportunity cost of not taking medication seems extremely high - my inability to concentrate is seriously harming my life
If I don't have ADHD, taking medication might end up masking my other problems, and I may end up in a worse situation in a couple of years
Here's the crux: there appears to be no way for me to neutrally discuss these doubts with a psychiatrist
I will have to get a private assessment to have a realistic chance of getting diagnosed in the near-term
- The cost of a private assessment is extremely high (in my current financial situation).
- If I get such an assessment, I have an incentive to exaggerate my symptoms in order to get prescribed medication. I don't want to risk losing my investment by discussing alternative possibilities to ADHD.
- A cursory glance at reviews of private assessors seem to imply that they are diagnosis-mills
  - i.e., pay us £1000 and we'll give you a prescription
  - They're not holistic assessments of your place in life with a nice cheap option to continue treatment for something else if you don't have ADHD
I've had quite bad experiences with the NHS (UK public health system), and I don't think they're likely to be very helpful whether I have ADHD or not
I am weird in various ways that make therapy useless
- I have tried e.g. CBT, talk therapies and found them basically a waste of time, even after sincerely engaging with them

This set of circumstances seems likely to exist for other people in the LW community. I would really like some advice on what to do here.

[-]dbohdan1mo30

I notice you seem to draw a distinction between "really has ADHD" and "just can't concentrate". You may want to read Scott's "Adderall Risks: Much More Than You Wanted To Know" to dissolve this distinction and have a better framework for making your decision. Here is a central quote about it:

But "ability to concentrate" is a normally distributed trait, like IQ. We draw a line at some point on the far left of the bell curve and tell the people on the far side that they've "got" "the disease" of "ADHD". This isn't just me saying this. It's the neurostructural literature, the the genetics literature, a bunch of other studies, and the the Consensus Conference On ADHD. This doesn't mean ADHD is "just laziness" or "isn't biological" -- of course it's biological! Height is biological! But that doesn't mean the world is divided into two natural categories of "healthy people" and "people who have Height Deficiency Syndrome". Attention is the same way. Some people really do have poor concentration, they suffer a lot from it, and it's not their fault. They just don't form a discrete population.

[-]Viliam26d20

Maybe it's just another case of the general: by media we are exposed to exceptional examples of people, our brains interpret that as a norm in our tribe, and as a result we feel inferior.

[-]Garrett Baker1mo20

there appears to be no way for me to neutrally discuss these doubts with a psychiatrist

Why not discuss openly with one psychiatrist (or therapist!), then choose another to exaggerate if you decide to experiment.

Also, note that I don’t think psychiatrists are particularly averse to experimenting with drugs with few long term consequences or risks.

[-]sam3mo10

I find that the new personalities of 4o trigger my “person” detectors too much, and I feel uncomfortable extracting work from them.

[-]sam3mo10

To be clear, I think it’s very unlikely they are conscious etc., this is a comment on a reflexive process going on in my head

[-]sam3mo10

o3 lies much more blatantly and confidently than other models, in my limited experiments.

Over a number of prompts, I have found that it lies, and when corrected on those lies, apologies, and tells some other lies.

This is obviously not scientific, more of a vibes based analysis, but its aggressive lying and fabricating of sources is really noticeable to me in a way it hasn’t been for previous models.

Has anyone else felt this way at all?

[-]sam3mo10

Apparently, some (compelling?) evidence of life on an exoplanet has been found.

I have no ability to judge how seriously to take this or how significant it might be. To my untrained eye, it seems like it might be a big deal! Does anybody with more expertise or bravery feel like wading in with a take?

Link to a story on this:

https://www.nytimes.com/2025/04/16/science/astronomy-exoplanets-habitable-k218b.html

[-]sam3mo10

Note: I am extremely open to other ideas on the below take and don't have super high confidence in it

It seems plausible to me that successfully applying interpretability techniques to increase capabilities might be net-positive for safety.

You want to align the incentives of the companies training/deploying frontier models with safety. If interpretable systems are more economically valuable than uninterpretable systems, that seems good!

It seems very plausible to me that if interpretability never has any marginal benefit to capabilities, the little nuggets of interpretability we do have will be optimized away.

For instance, if you can improve capabilities slightly by allowing models to reason in latent space instead of in a chain of thought, that will probably end up being the default.

There's probably a good deal of path dependence on the road to AGI and if capabilities are going to inevitably increase, perhaps it's a good idea to nudge that progress in the direction of interpretable systems.

[-]sam3mo04

Some people think that personally significant numbers cropping up in their daily lives is some kind of meaningful sign. For instance, seeing a license plate with their birth year on it, or a dead friend’s old house number being the price of their grocery shop.

I find myself getting very irritated with family members who believe this.

I don’t think anybody reading this is the kind of person who needs to read it. But these family members are not the kind of person who would read an explanation of why it’s ridiculous, and I’m irritated enough that I need to write one. So you guys get to read it instead!

Any person will have many numbers that they might consider significant - if you have 20 people you are close to, you have 20 4-digit combinations of day-month that are meaningful to you. But wait, you also have 20 more combinations of month-day. And perhaps you would notice if you saw the birth years of the 5 of those people you are closest to. That’s 5 more.

So we’ve come up with a few dozen significant 4 digit numbers from birthdates alone. But you probably have lots more significant numbers. Perhaps your age, or the age you met your wife, or the year your parents met, or the postcode of your first apartment, or the postcode of your second apartment or the combinations of any of these, or, or, or, …

Let’s be extremely conservative and say you have 20 significant 4-digit numbers. Let’s also be conservative and say you only consider 4-digit numbers significant, and ignore all your 2, 3 and 5-digit significant numbers.

How many 4-digit numbers do you see a day? Let’s again be extremely conservative and say 30. You look at the time on your phone a few dozen times a day, you get rung up for $12.78 at the convenience store, etc.

Finally, let’s make various naive independence and uniformity assumptions.

So how long is it going to take to see one of your significant numbers, simply by chance? Well, given our assumptions, you will receive a “message from the universe” around once every… 16 days.

Consider the fact that our assumptions were absurdly conservative, and you can see why I find it hard to take seriously the fact that you saw your first credit card’s pin number on the number of calories in a pack of cookies.

[-]sam3mo-10

LLMs (probably) have a drive to simulate a coherent entity

Maybe we can just prepend a bunch of examples of aligned behaviour before a prompt, presented as if the model had done this itself, and see if that improves its behaviour.

Moderation Log

Curated and popular this week

131An Opinionated Guide to Using Anki Correctly

Luise

140Comparing risk from internally-deployed AI to insider and outsider threats from humans

Buck

231Generalized Hangriness: A Standard Rationalist Stance Toward Emotions

johnswentworth

50Comments