(Cross-post from https://amistrongeryet.substack.com/p/are-we-on-the-brink-of-agi, lightly edited for LessWrong. The original has a lengthier introduction and a bit more explanation of jargon.)
No one seems to know whether transformational AGI is coming within a few short years. Or rather, everyone seems to know, but they all have conflicting opinions. Have we entered into what will in hindsight be not even the early stages, but actually the middle stage, of the mad tumbling rush into singularity? Or are we just witnessing the exciting early period of a new technology, full of discovery and opportunity, akin to the boom years of the personal computer and the web?
AI is approaching elite skill at programming, possibly barreling into superhuman status at advanced mathematics, and only picking up speed. Or so the framing goes. And...
Yes. Though notably, if your employees were 10x faster you might want to adjust your workflows to have them spend less time being bottlenecked on compute if that is possible. (And this sort of adaption is included in what I mean.)
Written as part of the AIXI agent foundations sequence, underlying research supported by the LTFF.
Epistemic status: In order to construct a centralized defense of AIXI I have given some criticisms less consideration here than they merit. Many arguments will be (or already are) expanded on in greater depth throughout the sequence.
With the possible exception of the learning-theoretic agenda, most major approaches to agent foundations research construct their own paradigm and mathematical tools which are not based on AIXI. Nothing in 2024's shallow review of technical AI safety seems to advance the theory of AIXI or even use its tools. Academic publications on the topic are also quite sparse (in my opinion some of the last major progress took place during Jan Leike's PhD thesis in the...
The uncomputability of AIXI is a bigger problem than this post makes it out to be. This uncomputability inserts a contradiction into any proof that relies on AIXI -- the same contradiction as in Goedel's Theorem. You can get around this contradiction instead by using approximations of AIXI, but the resulting proofs will be specific to those approximations, and you would need to prove additional theorems to transfer results between the approximations.
Multipolar scenarios that I will be talking about are scenarios multiple unrelated actors have access to their own personal AGIs. For the sake of discussion, assume that we solved alignment and that AGIs will follow the orders of its owners.
I think a lot of thinking around multipolar scenarios suffers from heuristic "solution in the shape of the problem", i.e. "multipolar scenario is when we have kinda aligned AI, but still die due to coordination failures, therefore, solution for multipolar scenarios should be about coordination".
I think the correct solution is to leverage available superintelligence in nice unilateral way:
Do you have any thoughts on mechanism & whether prevention is actually worse independent of inconvenience?
TLDR: LessWrong + Lighthaven need about $3M for the next 12 months. Donate or send me an email, DM, signal message (+1 510 944 3235), or public comment on this post, if you want to support what we do. We are a registered 501(c)3, have big plans for the next year, and due to a shifting funding landscape need support from a broader community more than in any previous year. [1]
I've been running LessWrong/Lightcone Infrastructure for the last 7 years. During that time we have grown into the primary infrastructure provider for the rationality and AI safety communities. "Infrastructure" is a big fuzzy word, but in our case, it concretely means:
And I've received an email from Mieux Donner confirming Lucie's leg has been executed for 1,000€. Thanks to everyone involved!
If if anyone else is interested in a similar donation swap, from either side, I'd be excited to introduce people or maybe even do this trick again :D
In The Mystery Of Internet Survey IQs, Scott revises his estimate of the average LessWrong IQ from 138 to 128. He doesn’t explicitly explain how he arrived at this number, but it appears to be an average of the demographics norm method (123) and the SAT method (134). However, using the information in his post, the SAT method doesn’t actually yield 134 but rather 123.
Here’s the breakdown: a median SAT score of 1490 (from the LessWrong 2014 survey) corresponds to +2.42 SD, which regresses to +1.93 SD for IQ using an SAT-IQ correlation of +0.80. This equates to an IQ of 129. Subtracting 6 points (since, according to the ClearerThinking test, the IQs of people who took the SAT and remember their score is ~6 points...
I have more reasons for believing that Mensa members are below 130, but also for believing that they're above.
Below: Most online IQ tests are similar enough to the Mensa IQ test that the practice effect applies. And most people who obsess about their IQ scores probably take a lot of online IQ tests, memorizing most patterns (there's a limit to the practice effect, but it can still give you at least 10 points)
Above: Mensa tests for pattern recognition abilities, which in my experience correlates worse with academic performance than verbal abilities. Pattern...
In this post I'll discuss questions about notions of "precision scale" in interpretability: how I think they're often neglected by researchers, and what I think is a good general way of operationalizing them and tracking them in experiments. Along the way I introduce a couple of new notions that have been useful in my thinking and that I think may be useful tools to keep in an interpretability toolkit, both for theorists and experimentalists: these are the notions of "natural scale" and "natural degradation".
I can be a nightmare conference attendee: I tend to ask nitpicky questions and apply a dose of skepticism to a speaker's claims which is healthy in doing one's own research, but probably not optimal when everyone else is trying to follow a...
This article examines consistent patterns in how frontier LLMs respond to introspective prompts, analyzing whether standard explanations (hallucination, priming, pattern matching) fully account for observed phenomena. The methodology enables reproducible results across varied contexts and facilitation styles.
Of particular interest:
From the introduction:
...Discussions of AI behavior often touch on phenomena that resemble self-reports of sentience. While this article does not aim to determine whether such reports constitute authentic evidence of sentience, it examines whether familiar explanations can fully account for the observed behavior or whether we are observing a distinct phenomenon that warrants its own
I tried to go a step beyond just simple self-report with Claude and get a 'psychometric' measure of the computational burden of self-awareness. Here's the section of my transcript that resulted:
NHB:
Hmm. Let's try the meta-meta-meta awareness, but with three tasks of increasing difficulty.
Task 1: Put these letters in order: a, b, c, b, a, b, c, a, c
Task 2: Put the following numbers and letters in ascending order as if they were hexadecimal digits: 3, B, F, 2, E, 8, A, 5
Task 3: Put the following fractions in order according to the number of decimal places ...
The Sun is the most nutritious thing that's reasonably close. It's only 8 light-minutes away, yet contains the vast majority of mass within 4 light-years of the Earth. The next-nearest star, Proxima Centauri, is about 4.25 light-years away.
By "nutritious", I mean it has a lot of what's needed for making computers: mass-energy. In "Ultimate physical limits to computation", Seth Lloyd imagines an "ultimate laptop" which is the theoretically best computer that is 1 kilogram of mass, contained in 1 liter. He notes a limit to calculations per second that is proportional to the energy of the computer, which is mostly locked in its mass (E = mc²). Such an energy-proportional limit applies to memory too. Energy need not be expended quickly in the course of calculation, due to reversible computing.
So,...
I have my own actual best guesses for what happens in reasonably good futures, which I can get into. (I'll flag for now I think "preserve Earth itself for as long as possible" is a reasonable Schelling point that is compatible with many "go otherwise quite fast" plans)
I doubt that totally dismantling the Sun after centuries would significantly accelerate the time we reach the cosmic event horizon.
Why do you doubt this? (to be clear, depends on exact details. But, my original query was about a 2 year delay. Proxima Centauri is 4 lightyears away. What ...