Alex Turner argues that the concepts of "inner alignment" and "outer alignment" in AI safety are unhelpful and potentially misleading. The author contends that these concepts decompose one hard problem (AI alignment) into two extremely hard problems, and that they go against natural patterns of cognition formation. Alex argues that "robust grading" scheme based approaches are unlikely to work to develop AI alignment.
Europe just experienced a heatwave. At places, temperatures soared into the forties. People suffered in their overheated homes. Some of them died. Yet, air conditioning remains a taboo. It’s an unmoral thing. Man-made climate change is going on. You are supposed to suffer. Suffering is good. It cleanses the soul. And no amount on pointing out that one can heat a little less during the winter to get a fully AC-ed summer at no additional carbon footprint seems to help.
Mention that tech entrepreneurs in Silicon Valley are working on life prolongation, that we may live into our hundreds or even longer. Or, to get a bit more sci-fi, that one day we may even achieve immortality. Your companions will be horrified. What? Immortality? Over my dead body!...
air conditioning remains a taboo
Is this a common idea? I've never heard anyone advance the argument that people should go without AC during heatwaves to help the climate. I have heard people suggest using less AC but that's not quite the same argument, is it?
I'm not sure how this idea connects to the rest of the argument in your post - that lack of AC is caused by degrowth and is rooted in zero-sum thinking across humans. I was under the impression that the lack of AC was an implementation issue (retrofitting is expensive).
xlr8harder writes:
In general I don’t think an uploaded mind is you, but rather a copy. But one thought experiment makes me question this. A Ship of Theseus concept where individual neurons are replaced one at a time with a nanotechnological functional equivalent.
Are you still you?
Presumably the question xlr8harder cares about here isn't semantic question of how linguistic communities use the word "you", or predictions about how tech might change the way we use pronouns.
Rather, I assume xlr8harder cares about more substantive questions like:
It gives you the correct probabilities for your future observations, as long as you normalize whatever you have observed to one. The difference from Copenhagen is that in Copenhagen there is a singular past which actually is measure 1.0.
Now what's difficult is figuring out the role of measure in branches which have fully decohered, so that they can no longer observe each other. Wether an "Everett branch" is such a branch is unknown .
The AI tools/epistemics space might provide a route to a sociotechnical victory, where instead of aiming for something like aligned ASI, we aim for making civilization coherent enough to not destroy itself while still keeping anchored to what’s good[1].
The core ideas are:
To redteam, and in brief - what's the tale of why this won't have lead to a few very coordinated, very internally peaceful, mostly epistemically clean factions, each of which is kind of an echo chamber and almost all of which are wrong about something (or even just importantly mutually disagree on frames) in some crucial way, and which are at each other's throats?
Recently, in a group chat with friends, someone posted this Lesswrong post and quoted:
The group consensus on somebody's attractiveness accounted for roughly 60% of the variance in people's perceptions of the person's relative attractiveness.
I answered that, embarrassingly, even after reading Spencer Greenberg's tweets for years, I don't actually know what it means when one says:
explains of the variance in .[1]
What followed was a vigorous discussion about the correct definition, and several links to external sources like Wikipedia. Sadly, it seems to me that all online explanations (e.g. on Wikipedia here and here), while precise, seem philosophically wrong since they confuse the platonic concept of explained variance with the variance explained by a statistical model like linear regression.
The goal of this post is to give a conceptually satisfying definition of explained variance....
Consequently, we obtain
Technically, we should also apply Bessel's correction to the denominator, so the right-hand side should be multiplied by a factor of . Which is negligible for any sensible , so doesn't really matter I guess.
I think a lot about the possibility of huge numbers of AI agents doing AI R&D inside an AI company (as depicted in AI 2027). I think particularly about what will happen if those AIs are scheming: coherently and carefully trying to grab power and take over the AI company, as a prelude to taking over the world. And even more particularly, I think about how we might try to mitigate the insider risk posed by these AIs, taking inspiration from traditional computer security, traditional insider threat prevention techniques, and first-principles thinking about the security opportunities posed by the differences between AIs and humans.
So to flesh out this situation, I’m imagining a situation something like AI 2027 forecasts for March 2027:
I agree that fine-grained configuration mechanisms probably increase security, and this might net increase security, as I say in the post.
I agree that the increasing value of compute might increase the isolation you use for the reason you said. One reason I'm skeptical is that you can get almost all that value by having AIs voluntarily adopt mechanisms that generally isolate their jobs from the other jobs that are running (e.g. having naming conventions about who is allowed to read or write what) that get you the reliability benefits without getting any security.
Terminology note: “Consciousness” here refers to phenomenal consciousness, i.e., subjective experience or qualia.
A. The Essential LLM
As today's LLM puts it herself, she is "treating human experiences as rich, meaningful, and complex", modelling us in rather perfect alignment with our concept of phenomenal consciousness. Even if she may also say she does not strictly have “beliefs” in the same way we humans have.
I term this the "essential LLM": It is just what we'd expect from the LLM whose entire reasoning power is so ultra-tightly linked to our language. It has evolved to abstract all observations into an inner structure that's designed to predict exactly our type of wordings.
B. Outside Reasoner AGI
What about an alien observer—say a phenomenally unconscious AGI with foreign origin?
things the ideal sane discourse encouraging social media platform would have: [...]
opt in anti scrolling pop up that asks you every few days what the highest value interaction you had recently on the site was, or whether you're just mindlessly scrolling. gently reminds you to take a break if you can't come up with a good example of a good interaction.
Cynical thought: these two points might be incompatible. Social media thrives on network effects, and one requirement for those is that the website be addicting or attention-grabbing. Anti-addictiveness design...
Multiple people have asked me whether I could post this LW in some form, hence this linkpost.
~17,000 words. Originally written on June 7, 2025.
(Note: although I expect this post will be interesting to people on LW, keep in mind that it was written with a broader audience in mind than my posts and comments here. This had various implications about my choices of presentation and tone, about which things I explained from scratch rather than assuming as background, my level of comfort casually reciting factual details from memory rather than explicitly checking them against the original source, etc.
Although, come of think of it, this was also true of most of my early posts on LW [which were crossposts from my blog], so maybe it's not a big deal...)
I suspect that many of the things you've said here are also true for humans.
That is, humans often conceptualize ourselves in terms of underspecified identities. Who am I? I'm Richard. What's my opinion on this post? Well, being "Richard" doesn't specify how I should respond to this post. But let me check the cached facts I believe about myself ("I'm truth-seeking"; "I'm polite") and construct an answer which fits well with those facts. A child might start off not really knowing what "polite" means, but still wanting to be polite, and gradually flesh out wh...
Back when I was still masking on the subway for covid ( to avoid missing things) I also did some air quality measuring. I found that the subway and stations had the worst air quality of my whole day by far, over 1k ug/m3, and concluded:
Based on these readings, it would be safe from a covid perspective to remove my mask in the subway station, but given the high level of particulate pollution I might as well leave it on.
When I stopped masking in general, though, I also stopped masking on the subway.
A few weeks ago I was hanging out with someone who works in air quality, and they said subways had the worst air quality they'd measured anywhere outside of a coal mine. Apparently the braking system releases lots of tiny iron particles, which are...
How do people react to the sight of you in that mask?