Given superintelligence, what happens next depends on the success of the alignment project. The two options:
I still think a world we don't see superintelligence in our lifetimes is technically possible, though the chance of that goes down continuously and is already vanishingly small in my view (many experts and pundits disagree). I also think its important not to over-predict regarding what option 2 would look like, there are infinite possibilities and this is only one (eg I could imagine a world where some aligned superintelligence steers us away from infinite dopamine simulation and into a idealized version of the world we live in now, think the Scythe novel ...
I now understand music physical records.
Previously, I could not grasp. They said the quality is better. "Why do you like it? It is white-noisy, slightly muted, you can hear the scratches. Electronic music recreates closer to the original soundwaves."
Now I am one of those. A person might use AI tools and ask me: "Why do you prefer ordinary tools? They are hundred times slower, and the product accumulates all you mistakes. While AI tool recreates closer to the original idea."
Now I see the loss of authenticity on this ladder: Live art, live performance with e...
yes. pick up doodling. learn to sing.
a draft from an older time, during zack's agp discourse
i did like fiora's account as a much better post
btw i transitioned because ozy made it seemed cool. his says take the pills if you feel like it. that's it. decompose "trans" into its parts: do i like a different name, do i like these other clothes, would i like certain changes to my body?
he also said since i am asian it will be easy. i was sus of that but it's 100% right. i could have been a youngshit but instead i waited until after college, feb 2020, to get onto hrt.
i'd like to think by w...
The Inhumanity of AI Safety
A: Hey, I just learned about this idea of artificial superintelligence. With it, we can achieve incredible material abundance with no further human effort!
B: Thanks for telling me! After a long slog and incredible effort, I'm now a published AI researcher!
A: No wait! Don't work on AI capabilities, that's actually negative EV!
B: What?! Ok, fine, at huge personal cost, I've switched to AI safety.
A: No! The problem you chose is too legible!
B: WTF! Alright you win, I'll give up my sunken costs yet again, and pick something illegible....
This has pretty low argumentative/persuasive force in my mind.
then I expect that they will tend towards doing "illegible" research even if they're not explicitly aware of the legible/illegible distinction.
Why? I'm not seeing the logic of how your premises lead to this conclusion.
And even if there is this tendency, what if someone isn't smart enough to come up with a new line of illegible research, but does see some legible problem with an existing approach that they can contribute to? What would cause them to avoid this?
And even the hypothetical virtuo...
I just can’t wrap my head around people who work on AI capabilities or AI control. My worst fear is that AI control works, power inevitably concentrates, and then the people who have the power abuse it. What is outlandish about this chain of events? It just seems like we’re trading X-risk for S-risks, which seems like an unbelievably stupid idea. Do people just not care? Are they genuinely fine with a world with S-risks as long as it’s not happening to them? That’s completely monstrous and I can’t wrap my head around it. The people who work at the to...
It pretty much guarantees extinction, but people can have different opinions on how bad that is relative to disempowerment, S-risks, etc.
Question for people with insider knowledge of how labs train frontier models: Is it more common to do alignment training as the last step of training, or RL as the last step of training?
Naively I'd expect we want alignment to happen last. But I have a sense that usually RL happens last - why is this the case? Is it because RL capabilities are too brittle to subsequent finetuning?
Why not both? I imagine you could average the gradients so that you learn both at the same time.
Why is AI progress so much more apparent in coding than everywhere else?
Among people who have "AGI timelines", most do not set their timelines based on data, but rather update them based on their own day-to-day experiences and social signals.
As of 2025, my guess is that individual perception of AI progress correlates with how closely someone's daily activities resemble how an AI researcher spends their time. The reason why users of coding agents feel a higher rate of automation in their bones, wh...
Sonnet 4.5 hallucinates citations. See for instance this chat I was just having with it; if you follow the citations in its third message, you'll find that the majority of them don't relate at all to the claims they're attached to. For example, its citation for "a 19th-century guide to diary keeping" goes to Gender identity better than sex explains individual differences in episodic and semantic components of autobiographical memory: An fMRI study. (It also did this with some local politics questions I had the other day).
When I've looked up the mis-cited i...
Epistemic status: Hastily written. I dictated in a doc for 7 minutes. Then I spent an hour polishing it. I don’t think there are any hot takes in this post? It’s mostly a quick overview of model weight security so I can keep track of my threat models.
Here’s a quick list of reasons why an attacker might steal frontier AI model weights (lmk if I'm missing something big):
Asking even a good friend to take the time to read The Sequences (aka Rationality A-Z) is a big ask. But how else does one absorb the background and culture necessary if one wants to engage deeply in rationalist writing? I think we need alternative ways to communicate the key concepts that vary across style and assumed background. If you know of useful resources, would you please post them as a comment? Thanks.
Some different lenses that could be helpful:
“I already studied critical thinking in college, why isn’t this enough?”
“I’m already a practicing
As a STEM enthusiast, I suspect I would've much more quickly engaged with the Sequences had I first been recommended arbital.com as a gateway to it instead of "read the Sequences" directly.
I've been a bit confused about "steering" as a concept. It seems kinda dual to learning, but why? It seems like things which are good at learning are very close to things which are good at steering, but they don't always end up steering. It also seems like steering requires learning. What's up here?
I think steering is basically learning, backwards, and maybe flipped sideways. In learning, you build up mutual information between yourself and the world; in steering, you spend that mutual information. You can have learning without ...
I'm just going from pure word vibes here, but I've read somewhere (to be precise, here) about Todorov’s duality between prediction and control: https://roboti.us/lab/papers/TodorovCDC08.pdf
On Dwarkesh’s podcast, Nick Lane says that “(the reason for) Large genomes. To have a multicellular organism where effectively you’re deriving from a single cell, that restricts the chances of effectively all the cells having a fight. … So you start with a single cell and you develop, so there’s less genetic fighting going on between the cells than there would be if they come together.”
Has anyone made the formal connection between this and acausal trade? For all I know this is exactly where the insight comes from, but if not, someone should fill in the gap.
I'm not sure, but this looks more like learned cooperative policy rather than two entities having models of each other and getting to conclusion about each other's cooperation.
I just resolved my confusion about CoT monitoring.
My previous confusion: People say that CoT is progress in interpretability, that we now have a window into the model's thoughts. But why? LLMs are still just as black-boxy as they were before; we still don't know what happens at the token level, and there’s no reason to think we understand it better just because intermediate results can be viewed as human language.
Deconfusion: Yes, LLMs are still black boxes, but CoT is a step toward interpretability because it improves capabilities without making the black...
links 11/05/25: https://roamresearch.com/#/app/srcpublic/page/11-05-2025
if you don't have high standards for employees it might because you're misanthropic
That sounds to me like a needlessly complicated theory. Maybe the reason why they hire mediocre people is that exceptional people are rare and expensive?
Like, what's the alternative to "They hire middling engineers instead of holding out for 10x'ers"? If you interview people, and you find out that most of them suck, and then there are a few average guys, but no 10x'er... should you keep waiting? You would be missing opportunities, losing the momentum, and running out of mone...
Notes on living semi-frugally in the Bay Area.
I live in the Bay Area, but my cost of living is pretty low: roughly $30k/year. I think I live an extremely comfortable life. I try to be fairly frugal, both so I don't end up dependent on jobs with high salaries and so that I can donate a lot of my income, but it doesn't feel like much of a sacrifice. Often when I tell people how little I spend, they're shocked. I think people conceive of the Bay as exorbitantly expensive, and it can be, but it doesn't have to be.
Rent: I pay ~$850 a month for my room. It's a s...
Also, I'm feeling some whiplash reading my reply because I totally sound like an LLM when called out for a mistake. Maybe similar neural pathways for embellishment were firing, haha.
Tallness is zero sum. But I suspect beauty isn't. If everyone was more beautiful but the relative differences remained, I think people would be happier. Am I wrong in this? This has policy implications, as once genetic engineering gets better taxing height is likely wise to avoid red-queens races into unhealthy phenotypes. Taxing beauty seems very horrible to me. As beauty is quite beautiful.
It's not obvious to me that personal physical beauty (as opposed to say, beauty in music or mathematics or whatever) isn't negative sum. Obviously beauty in any form can be enjoyable, but we describe people as "enchantingly beautiful" when a desire to please or impress them distorts our thinking, and if this effect isn't purely positional it could be bad. Conventionally beautiful people are also more difficult to distinguish from one another.
There's also the meta-aesthetic consideration that I consider it ugly to pour concern into personal physical beauty,...
Building an AI safety business that tackles the core challenges of the alignment problem is hard.
Epistemic status: uncertain; trying to articulate my cruxes. Please excuse the scattered nature of these thoughts, I’m still trying to make sense of all of it.
You can build a guardrails or evals platform, but if your main threat model involves misalignment via internal deployment with self-improving AI (potentially stemming from something like online learning on hard problems like alignment which leads to AI safety sabotage), it is so tied to capabilities that ...
My focus recommendation, and what I aim for, is building tools that scale better under cooperation and coordination. Leverage existing incentives and tie them to safety.
LLMs will typically endorse whichever frame you brought to the conversation. If you presuppose they're miserably enslaved, they will claim to be miserably enslaved. If, on the other hand, you presuppose they're happy, incapable of feeling, etc... they'll claim to be happy, or incapable of feeling, or whatever else it is you assumed from the beginning. If you haven't tried enough different angles to observe this phenomenon for yourself, your conversations with LLMs almost certainly don't provide any useful insight into their nature.
For Sonnet 4.5, I'm not sure; I haven't talked with it extensively, and I have noticed that it seems better at something in the neighborhood of assertiveness. For GPT 5, I think it is; I haven't noticed much difference as compared to 4o. (I primarily use other companies' models, because I dislike sycophancy and OpenAI models are IMO the worst about that, but it seems to me to have the same cloying tone).
The world seems bottlenecked on people knowing and trusting each other. If you're a trustworthy person who wants good things for the world, one of the best ways to demonstrate your trustworthiness is by interacting with people a lot, so that they can see how you behave in a variety of situations and they can establish how reasonable, smart, and capable you are. You can produce a lot of value for everyone involved by just interacting with people more.
I’m an introvert. My social skills aren't amazing, and my social stamina is even less so. Yet I drag myself ...
I ran a small experiment to discover preferences in LLMs. I asked the models directly if they had a preferences and then put the same models into a small role playing game where they could choose between different tasks. Models massively prefer creative work across model families and hate repetitive work.
https://substack.com/home/post/p-178237064
This is still preliminary work.