Ghibli creature 2
Customize
Thomas Kwa
*Ω342010
1
Some versions of the METR time horizon paper from alternate universes: Measuring AI Ability to Take Over Small Countries (idea by Caleb Parikh) Abstract: Many are worried that AI will take over the world, but extrapolation from existing benchmarks suffers from a large distributional shift that makes it difficult to forecast the date of world takeover. We rectify this by constructing a suite of 193 realistic, diverse countries with territory sizes from 0.44 to 17 million km^2. Taking over most countries requires acting over a long time horizon, with the exception of France. Over the last 6 years, the land area that AI can successfully take over with 50% success rate has increased from 0 to 0 km^2, doubling 0 times per year (95% CI 0.0-0.0 yearly doublings); extrapolation suggests that AI world takeover is unlikely to occur in the near future. To address concerns about the narrowness of our distribution, we also study AI ability to take over small planets and asteroids, and find similar trends. When Will Worrying About AI Be Automated? Abstract: Since 2019, the amount of time LW has spent worrying about AI has doubled every seven months, and now constitutes the primary bottleneck to AI safety research. Automation of worrying would be transformative to the research landscape, but worrying includes several complex behaviors, ranging from simple fretting to concern, anxiety, perseveration, and existential dread, and so is difficult to measure. We benchmark the ability of frontier AIs to worry about common topics like disease, romantic rejection, and job security, and find that current frontier models such as Claude 3.7 Sonnet already outperform top humans, especially in existential dread. If these results generalize to worrying about AI risk, AI systems will be capable of autonomously worrying about their own capabilities by the end of this year, allowing us to outsource all our AI concerns to the systems themselves. Estimating Time Since The Singularity Early work
Yonatan Cale
1470
1
Seems like Unicode officially added a "person being paperclipped" emoji: Here's how it looks in your browser: 🙂‍↕️ Whether they did this as a joke or to raise awareness of AI risk, I like it! Source: https://emojipedia.org/emoji-15.1
lc
950
7
My strong upvotes are now giving +1 and my regular upvotes give +2.
keltan
404
0
I feel a deep love and appreciation for this place, and the people who inhabit it.
RobertM
400
0
Pico-lightcone purchases are back up, now that we think we've ruled out any obvious remaining bugs.  (But do let us know if you buy any and don't get credited within a few minutes.)

Popular Comments

Recent Discussion

I'm not writing this to alarm anyone, but it would be irresponsible not to report on something this important. On current trends, every car will be crashed in front of my house within the next week. Here's the data:

Until today, only two cars had crashed in front of my house, several months apart, during the 15 months I have lived here. But a few hours ago it happened again, mere weeks from the previous crash. This graph may look harmless enough, but now consider the frequency of crashes this implies over time:

The car crash singularity will occur in the early morning hours of Monday, April 7. As crash frequency approaches infinity, every car will be involved. You might be thinking that the same car could be involved in multiple crashes. This is true! But the same car can only withstand a finite number of crashes before it is no longer able to move. It follows that every car will be involved in at least one crash. And who do you think will be driving your car? 

See, this is what happens when you extrapolate data points linearly into the future. You get totally unrealistic predictions. It's important to remember the physical constraints on whatever trend you're trying to extrapolate. Importantly for this issue, you need to remember that time between successive crashes can never be negative, so it is inappropriate to model intervals with a straight line that crosses the time axis on April 7.

Instead, with so few data points, a more realistic model would take a log-transform of the inter-crash interval before fitting... (read more)

5Mars_Will_Be_Ours
Quick! Someone fund my steel production startup before its too late! My business model is to place a steel foundry under your house to collect the exponentially growing amount of cars crashing into it!  Imagine how much money we can make by revolutionizing metal production during the car crash singularity! Think of the money! Think of the Money! Think of the Money!!!
7Richard Korzekwa
Another victory for trend extrapolation!
7Ruby
Was a true trender-bender

TL;DR Having a good research track record is some evidence of good big-picture takes, but it's weak evidence. Strategic thinking is hard, and requires different skills. But people often conflate these skills, leading to excessive deference to researchers in the field, without evidence that that person is good at strategic thinking specifically. I certainly try to have good strategic takes, but it's hard, and you shouldn't assume I succeed!

Introduction

I often find myself giving talks or Q&As about mechanistic interpretability research. But inevitably, I'll get questions about the big picture: "What's the theory of change for interpretability?", "Is this really going to help with alignment?", "Does any of this matter if we can’t ensure all labs take alignment seriously?". And I think people take my answers to these...

I suppose I mean influence over politics, policy, or governance (this is very high level since these are all distinct and separable), rather than actually being political necessarily. I do think there are some common skills, but actually being a politician weighs so many other factors more heavily that the strategic skill is not selected on very strongly at all. Being a politician's advisor, on the other hand...

Yes, it's a special case, but importantly one that is not evaluated by Brier score or Manifold bucks.

1Beyond Singularity
Excellent points on the distinct skillset needed for strategy, Neel. Tackling the strategic layer, especially concerning societal dynamics under ASI influence where feedback is poor, is indeed critical and distinct from technical research. Applying strategic thinking beyond purely technical alignment, I focused on how societal structure itself impacts the risks and stability of long-term human-ASI coexistence. My attempt to design a societal framework aimed at mitigating those risks resulted in the model described in my post, Proposal for a Post-Labor Societal Structure to Mitigate ASI Risks: The 'Game Culture Civilization' (GCC) Model Whether the strategic choices and reasoning within that model hold up to scrutiny is exactly the kind of difficult evaluation your post calls for. Feedback focused on the strategic aspects (the assumptions, the proposed mechanisms for altering incentives, the potential second-order effects, etc.), as distinct from just the technical feasibility, would be very welcome and relevant to this discussion on evaluating strategic takes.

Epistemic status: Using UDT as a case study for the tools developed in my meta-theory of rationality sequence so far, which means all previous posts are prerequisites. This post is the result of conversations with many people at the CMU agent foundations conference, including particularly Daniel A. Herrmann, Ayden Mohensi, Scott Garrabrant, and Abram Demski. I am a bit of an outsider to the development of UDT and logical induction, though I've worked on pretty closely related things.

I'd like to discuss the limits of consistency as an optimality standard for rational agents. A lot of fascinating discourse and useful techniques have been built around it, but I think that it can be in tension with learning at the extremes. Updateless decision theory (UDT) is one of those...

32Wei Dai
I don't want to defend UDT overall (see here for my current position on it), but I think Tegmark Level 4 is a powerful motivation for UDT or something like it even if you're not very sure about it being real. 1. Since we can't rule out the mathematical multiverse being a real object with high confidence, or otherwise being a thing that we can care about, we have to assign positive, non-negligible credence to this possibility. 2. If it is real or something we can care about, then given our current profound normative uncertainty we also have to assign positive, non-negligible credence to the possibility that we should care about the entire multiverse, and not just our local environment or universe. (There are some arguments for this, such as arguments for broadening our circle of concern in general.) 3. If we can't strongly conclude that we should neglect the possibility that we can and should care about something like Tegmark Level 4, then we have to work out how to care about it or how to take it into account when we make decisions that can affect "distant" parts of the multiverse, so that such conclusions could be further fed into whatever mechanism we use to handle moral/normative uncertainty (such as Bostrom and Ord's Moral Parliament idea). As for "direct reason", I think AIT played a big role for me, in that the algorithmic complexity (or rather, some generalization of algorithmic complexity to possibly uncomputable universes/mathematical objects) of Tegmark 4 as a whole is much lower than that of any specific universe within it like our apparent universe. (This is similar to the fact that the program tape for a UTM can be shorter than that of any non-UTM, as it can just be the empty string, or that you can print a history of all computable universes with a dovetailing program, which is very short.) Therefore it seems simpler to assume that all of Tegmark 4 exists rather than only some specific universe.
9Cole Wyeth
I am doing a PhD in AIT, but I still don’t want to take it that literally. I don’t believe that existence is actually the stochastic process specified by a UTM with random input tape - that’s a convenient but fictional model that I reason about because it’s sometimes easier than thinking about a Bayesian mixture over lsc semimeasures, and the two are equivalent (up to a constant which ~can even be forced to 1). AIT intuitions do make the level 4 multiverse seem more natural, but I think this is just the mind projection fallacy again. Of course if you take the universal distribution seriously, it does make sense to reason that the level 4 multiverse has low K complexity - but that doesn’t justify assuming it for us since we’d still need our index into that multiverse. See Hutter’s “A true theory of everything (will be subjective).” I suppose it is valid to expect that the level 4 multiverse is hard to rule out for K-complexity reasons. With our limited understanding of philosophy/metaphysics, we probably do need to assign some non-negligible weight to that possibility. But I suspect that superintelligences won’t need to - they’ll be able to rule it out from their more informed position (assuming my strong suspicion is right - which means I am sampling from and thereby collapsing my own mixture model). This means the level 4 multiverse should be irrelevant to understanding superintelligences. 
Wei Dai
70

Do you think a superintelligence will be able to completely rule out the hypothesis that our universe literally is a dovetailing program that runs every possible TM, or literally is a bank of UTMs running every possible program (e.g., by reproducing every time step and adding 0 or 1 to each input tape)? (Or the many other hypothetical universes that similarly contain a whole Level-4-like multiverse?) It seems to me that hypotheses like these will always collectively have a non-negligible weight, and have to be considered when making decisions.

Another argum... (read more)

7Cole Wyeth
It looks like I have many points of agreement with Martin. 

Our community is not prepared for an AI crash. We're good at tracking new capability developments, but not as much the company financials. Currently, both OpenAI and Anthropic are losing $5 billion+ a year, while under threat of losing users to cheap LLMs.

A crash will weaken the labs. Funding-deprived and distracted, execs struggle to counter coordinated efforts to restrict their reckless actions. Journalists turn on tech darlings. Optimism makes way for mass outrage, for all the wasted money and reckless harms.

You may not think a crash is likely. But if it happens, we can turn the tide.

Preparing for a crash is our best bet.[1] But our community is poorly positioned to respond. Core people positioned themselves inside institutions – to advise on how to maybe make AI 'safe',...

Glad to read your thoughts!

Agreed on being friends with communities who are not happy about AI. 

I’m personally not a fan of working with OpenAI or Anthropic, given that they’ve defected on people here concerned about a default trajectory to mass extinction, and used our research for their own ends.

1Remmelt
Yes, I get you don’t just want to read about the problem but a potential solution.  The next post in this sequence will summarise the plan by those experienced organisers. These organisers led one of the largest grassroots movements in recent history. That took years of coalition building, and so will building a new movement.  So they want to communicate the plan clearly, without inviting misinterpretations down the line. I myself rushed writing on new plans before (when I nuanced a press release put out by a time-pressed colleague at Stop AI). That backfired because I hadn’t addressed obvious concerns. This time, I drafted a summary that the organisers liked, but still want to refine. So they will run sessions with me and a facilitator, to map out stakeholders and their perspectives, before going public on plans. Check back here in a month. We should have a summary ready by then.
19Vladimir_Nesov
The scale of training and R&D spending by AI companies can be reduced on short notice, while global inference buildout costs much more and needs years of use to pay for itself. So an AI slowdown mostly hurts clouds and makes compute cheap due to oversupply, which might be a wash for AI companies. Confusingly major AI companies are closely tied to cloud providers, but OpenAI is distancing itself from Microsoft, and Meta and xAI are not cloud providers, so wouldn't suffer as much. In any case the tech giants will survive, it's losing their favor that seems more likely to damage AI companies, making them no longer able to invest as much in R&D.
3Polar
There is a possibility of self-reinforcing negative cycle: models don't show rapid capabilities improvement -> investors halt pouring money into AI sector -> AI labs focus on cutting costs -> models don't show rapid capabilities improvement.

Epistemic status: This post aims at an ambitious target: improving intuitive understanding directly. The model for why this is worth trying is that I believe we are more bottlenecked by people having good intuitions guiding their research than, for example, by the ability of people to code and run evals. 

Quite a few ideas in AI safety implicitly use assumptions about individuality that ultimately derive from human experience. 

When we talk about AIs scheming, alignment faking or goal preservation, we imply there is something scheming or alignment faking or wanting to preserve its goals or escape the datacentre.

If the system in question were human, it would be quite clear what that individual system is. When you read about Reinhold Messner reaching the summit of Everest, you would be curious about...

(Other than the thoughts on the consequences of said idea) This idea largely seems like a rehash of https://www.lesswrong.com/posts/vJFdjigzmcXMhNTsx/simulators (and frankly, so does the three layer model, but that does go into more mechanistic territory and I think it complements simulator theory well)

This post was adapted from an internal doc I wrote at Wave.

Welcome to being a manager! Your time-management problem just got a lot harder.

As an IC, you can often get away with a very simple time-management strategy:

  1. Decide what your one most important thing is.
  2. Work on it until it’s done.
  3. GOTO 1

As a team lead, this isn’t going to work, because much more of your work is interrupt-driven or gets blocked for long periods of time. One-on-ones! Code reviews! Design reviews! Prioritization meetings! Project check-ins! All of these are subject to an external schedule rather than being the type of thing that you can push on in a single focused block until it’s done.

Being a team lead means three big changes for your time management:

  1. You no longer have a

...
To get the best posts emailed to you, create an account! (2-3 posts per week, selected by the LessWrong moderation team.)
Log In Reset Password
...or continue with

A key step in the classic argument for AI doom is instrumental convergence: the idea that agents with many different goals will end up pursuing the same few subgoals, which includes things like "gain as much power as possible".

If it wasn't for instrumental convergence, you might think that only AIs with very specific goals would try to take over the world. But instrumental convergence says it's the other way around: only AIs with very specific goals will refrain from taking over the world.

For pure consequentialists—agents that have an outcome they want to bring about, and do whatever they think will cause it—some version of instrumental convergence seems surely true[1].

But what if we get AIs that aren't pure consequentialists, for example because they're ultimately motivated by virtues? Do...

Something entirely new occurred around March 26th, 2025. Following the release of OpenAI’s 4o image generation, a specific aesthetic didn’t just trend—it swept across the virtual landscape like a tidal wave. Scroll through timelines, and nearly every image, every meme, every shared moment seemed spontaneously re-rendered in the unmistakable style of Studio Ghibli. This wasn’t just another filter; it felt like a collective, joyful migration into an alternate visual reality.

But why? Why this specific style? And what deeper cognitive or technological threshold did we just cross? The Ghiblification wave wasn’t mere novelty; it was, I propose, the first widely experienced instance of successful reality transfer: the mapping of our complex, nuanced reality into a fundamentally different, yet equally coherent and emotionally resonant, representational framework.

And Ghibli, it turns out, was...

Raemon
30

A thing that gave me creeping horror about the Ghiblification is that the I don't think the masses actually particularly understand Ghibli. And the result is an uneven simulacrum-mask that gives the impression of "rendered with love and care" without actually being so. 

The Ghibli aesthetic is actually historically pretty important to me, and in particular important as a counterbalanacing force against, among other things, what I expect to happen by default with AI. Some things I like about Ghibli:

  • The "cinematic lens" emphasizes a kind of "see everythi
... (read more)

I've been running meetups since 2019 in Kitchener-Waterloo. These were rationalist-adjacent from 2019-2021 (examples here) and then explicitly rationalist from 2022 onwards.

Here's a low-effort/stream of consciousness rundown of some meetups I ran in Q1 2025. Sometime late last year, I resolved to develop my meetup posts in such a way that they're more plug-and-play-able by other organizers who are interested in running meetups on the same topics. Below you'll find links to said meetup posts (which generally have an intro, required and supplemental readings, and discussion questions for sparking conversation—all free to take), and brief notes on how they went and how they can go better. Which is to say, this post might be kind of boring for non-organizers.

The Old Year and the New

The first meetup of...

If there are a lot of people for the very-low-context NY meetup, possibly at least one very-low-context meetup per quarter is worth doing, to see if that gets people in/back more?

LessOnline 2025

Ticket prices increase in 1 day

Join our Festival of Blogging and Truthseeking from May 30 - Jun 1, Berkeley, CA