Alex Turner argues that the concepts of "inner alignment" and "outer alignment" in AI safety are unhelpful and potentially misleading. The author contends that these concepts decompose one hard problem (AI alignment) into two extremely hard problems, and that they go against natural patterns of cognition formation. Alex argues that "robust grading" scheme based approaches are unlikely to work to develop AI alignment.
The modern internet is replete with feeds such as Twitter, Facebook, Insta, TikTok, Substack, etc. They're bad in ways but also good in ways. I've been exploring the idea that LessWrong could have a very good feed.
I'm posting this announcement with disjunctive hopes: (a) to find enthusiastic early adopters who will refine this into a great product, or (b) find people who'll lead us to an understanding that we shouldn't launch this or should launch it only if designed a very specific way.
From there, you can also enable it on the frontpage in place of Recent Discussion. Below I have some practical notes on using the New Feed.
Note! This feature is very much in beta. It's rough around the edges.
Without having made much adaptation effort:
The “open stuff in a modal overlay page on top of the feed rather than linking normally, incidentally making the URL bar useless” is super confusing and annoying. Just now, when I tried to use my usual trick of Open Link in New Tab for getting around confusingly overridden navigation on the “Click to view all comments” link to this very thread, it wasn't an actual link at all.
I don't know how to interpret what's going on when I'm only shown a subset of comments in a feed section and they don't seem to be contiguou...
We’re currently in the process of locking in advertisements for the September launch of If Anyone Builds It, Everyone Dies, and we’re interested in your ideas! If you have graphic design chops, and would like to try your hand at creating promotional material for If Anyone Builds It, Everyone Dies, we’ll be accepting submissions in a design competition ending on August 10, 2025.
We’ll be giving out up to four $1000 prizes:
We encourage multiple submissions, and also encourage people to post their submissions in this comment section for community feedback and inspiration.
This is a two-post series on AI “foom” (this post) and “doom” (next post).
A decade or two ago, it was pretty common to discuss “foom & doom” scenarios, as advocated especially by Eliezer Yudkowsky. In a typical such scenario, a small team would build a system that would rocket (“foom”) from “unimpressive” to “Artificial Superintelligence” (ASI) within a very short time window (days, weeks, maybe months), involving very little compute (e.g. “brain in a box in a basement”), via recursive self-improvement. Absent some future technical breakthrough, the ASI would definitely be egregiously misaligned, without the slightest intrinsic interest in whether humans live or die. The ASI would be born into a world generally much like today’s, a world utterly unprepared for this...
I'm not sure we disagree then.
Epistemic status: Though I can't find it now, I remember reading a lesswrong post asking "what is your totalizing worldview?" I think this post gets at my answer; in fact, I initially intended to title it "My totalizing worldview" but decided on a slightly more restricted scope (anyway, I tend to change important aspects of my worldview so frequently it's a little unsettling, so I'm not sure if it can be called totalizing). Still, I think these ideas underlie some of the cruxes behind my meta-theory of rationality sequence AND my model of what is going on with LLMs among other examples.
The idea of a fixed program as the central objects of computation has gradually fallen out of favor. As a result, the word "algorithm" seems to...
My take on how recursion theory failed to be relevant for today's AI is that it turned out that what a machine could do if unconstrained basically didn't matter at all, and in particular it basically didn't matter what limits an ideal machine could do, because once we actually impose constraints that force computation to use very limited amounts of resources, we get a non-trivial theory and importantly all of the difficulty of explaining how humans do stuff lies here.
That's partially true (computational complexity is now much more active than recursion the...
Acknowledgments: The core scheme here was suggested by Prof. Gabriel Weil.
There has been growing interest in the dealmaking agenda: humans make deals with AIs (misaligned but lacking decisive strategic advantage) where they promise to be safe and useful for some fixed term (e.g. 2026-2028) and we promise to compensate them in the future, conditional on (i) verifying the AIs were compliant, and (ii) verifying the AIs would spend the resources in an acceptable way.[1]
I think the dealmaking agenda breaks down into two main subproblems:
There are other issues, but when I've discussed dealmaking with people, (1) and (2) are the most common issues raised. See footnote for some other issues in...
I think you're entangling morals and strategy very close together in your statements. Moral sense: We should leave this to future ASI to decide based on our values for whether or not we inherently owe the agent for existing or for helping us. Strategy: Once we've detached the moral part, this is then just the same thing that the post is doing of trying to commit that certain aspects are enforced, and what the parent commenter is saying they hold suspect. So I think this just turns into a restating the same core argument between the two positions.
Epistemic status: This is an informal explanation of incomplete models that I think would have been very useful to me a few years ago, particularly for thinking about why Solomonoff induction may not be optimal if the universe isn't computable.
Imagine that a stage magician is performing a series of coin flips, and you're trying to predict whether they will come up heads or tails for whatever reason - for now, lets assume idle curiosity, so that we don't have to deal with any complications from betting mechanisms.
Normally, coin flips come up heads or tails at 50:50 odds, but this magician is particularly skilled at slight of hand, and for all you know he might be switching between trick coins with arbitrary chances of landing on heads between...