Ghibli creature 2
Customize
Thomas Kwa
*Ω321940
1
Some versions of the METR time horizon paper from alternate universes: Measuring AI Ability to Take Over Small Countries (idea by Caleb Parikh) Abstract: Many are worried that AI will take over the world, but extrapolation from existing benchmarks suffers from a large distributional shift that makes it difficult to forecast the date of world takeover. We rectify this by constructing a suite of 193 realistic, diverse countries with territory sizes from 0.44 to 17 million km^2. Taking over most countries requires acting over a long time horizon, with the exception of France. Over the last 6 years, the land area that AI can successfully take over with 50% success rate has increased from 0 to 0 km^2, doubling 0 times per year (95% CI 0.0-0.0 yearly doublings); extrapolation suggests that AI world takeover is unlikely to occur in the near future. To address concerns about the narrowness of our distribution, we also study AI ability to take over small planets and asteroids, and find similar trends. When Will Worrying About AI Be Automated? Abstract: Since 2019, the amount of time LW has spent worrying about AI has doubled every seven months, and now constitutes the primary bottleneck to AI safety research. Automation of worrying would be transformative to the research landscape, but worrying includes several complex behaviors, ranging from simple fretting to concern, anxiety, perseveration, and existential dread, and so is difficult to measure. We benchmark the ability of frontier AIs to worry about common topics like disease, romantic rejection, and job security, and find that current frontier models such as Claude 3.7 Sonnet already outperform top humans, especially in existential dread. If these results generalize to worrying about AI risk, AI systems will be capable of autonomously worrying about their own capabilities by the end of this year, allowing us to outsource all our AI concerns to the systems themselves. Estimating Time Since The Singularity Early work
Yonatan Cale
1440
1
Seems like Unicode officially added a "person being paperclipped" emoji: Here's how it looks in your browser: 🙂‍↕️ Whether they did this as a joke or to raise awareness of AI risk, I like it! Source: https://emojipedia.org/emoji-15.1
lc
930
7
My strong upvotes are now giving +1 and my regular upvotes give +2.
RobertM
400
0
Pico-lightcone purchases are back up, now that we think we've ruled out any obvious remaining bugs.  (But do let us know if you buy any and don't get credited within a few minutes.)
keltan
310
0
I feel a deep love and appreciation for this place, and the people who inhabit it.

Popular Comments

Recent Discussion

Recent progress in AI has led to rapid saturation of most capability benchmarks - MMLU, RE-Bench, etc. Even much more sophisticated benchmarks such as ARC-AGI or FrontierMath see incredibly fast improvement, and all that while severe under-elicitation is still very salient.

As has been pointed out by many, general capability involves more than simple tasks such as this, that have a long history in the field of ML and are therefore easily saturated. Claude Plays Pokemon is a good example of something somewhat novel in terms of measuring progress, and thereby benefited from being an actually good proxy of model capability.

Taking inspiration from examples such as this, we considered domains of general capacity that are even further decoupled from existing exhaustive generators. We introduce BenchBench, the first standardized...

Hey Everyone,

It is with a sense of... considerable cognitive dissonance that I am letting you all know about a significant development for the future trajectory of LessWrong. After extensive internal deliberation, projections of financial runways, and what I can only describe as a series of profoundly unexpected coordination challenges, the Lightcone Infrastructure team has agreed in principle to the acquisition of LessWrong by EA.

I assure you, nothing about how LessWrong operates on a day to day level will change. I have always cared deeply about the robustness and integrity of our institutions, and I am fully aligned with our stakeholders at EA. 

To be honest, the key thing that EA brings to the table is money and talent. While the recent layoffs in EAs broader industry have been...

habryka
470

You can now choose which virtues you want to display next to your username! Just go to the virtues dialogue on the frontpage and select the ones you want to display (up to 3).

5AprilSR
Why do I have dozens of strong upvote and downvote strength, but no more agreement strength than before I began my strength training? Does EA not think agreement is importance?
81leogao
the intent is to provide the user with a sense of pride and accomplishment for unlocking different rationality methods.
26habryka
Absolutely, that is our sole motivation.

I'm not writing this to alarm anyone, but it would be irresponsible not to report on something this important. On current trends, every car will be crashed in front of my house within the next week. Here's the data:

Until today, only two cars had crashed in front of my house, several months apart, during the 15 months I have lived here. But a few hours ago it happened again, mere weeks from the previous crash. This graph may look harmless enough, but now consider the frequency of crashes this implies over time:

The car crash singularity will occur in the early morning hours of Monday, April 7. As crash frequency approaches infinity, every car will be involved. You might be thinking that the same car could be involved in multiple crashes. This is true! But the same car can only withstand a finite number of crashes before it is no longer able to move. It follows that every car will be involved in at least one crash. And who do you think will be driving your car? 

5Mars_Will_Be_Ours
Quick! Someone fund my steel production startup before its too late! My business model is to place a steel foundry under your house to collect the exponentially growing amount of cars crashing into it!  Imagine how much money we can make by revolutionizing metal production during the car crash singularity! Think of the money! Think of the Money! Think of the Money!!!
30Ruby
Frick. Happened to me already.
3Richard Korzekwa
Another victory for trend extrapolation!
Ruby
60

Was a true trender-bender

In the debate over AI development, two movements stand as opposites: PauseAI calls for slowing down AI progress, and e/acc (effective accelerationism) calls for rapid advancement.  But what if both sides are working against their own stated interests?  What if the most rational strategy for each would be to adopt the other's tactics—if not their ultimate goals?

AI development speed ultimately comes down to policy decisions, which are themselves downstream of public opinion.  No matter how compelling technical arguments might be on either side, widespread sentiment will determine what regulations are politically viable.

Public opinion is most powerfully mobilized against technologies following visible disasters.  Consider nuclear power: despite being statistically safer than fossil fuels, its development has been stagnant for decades.  Why?  Not because of environmental activists, but because...

5Seth Herd
I'm on board! We needed people going fast to get seatbelts! AI safety isn't a game, which means you'll be disappointed in yourself (if only very briefly) if you fail to play your best to win. The choice of risky 3D chess moves or virtue ethics is not obvious.
AprilSR
20

I think it's obvious that you should not pursue 3D chess without investing serious effort in making sure that you play 3D chess correctly. I think there is something to be said for ignoring the shiny clever ideas and playing simple virtue ethics. 

But if a clever scheme is in fact better, and you have accounted for all of the problems inherent to clever schemery, of which there are very many, then... the burden of proof isn't literally insurmountable, you're just unlikely to end up surmounting it in practice.

(Unless it's 3D chess where the only thing you might end up wasting is your own time. That has a lower burden of proof. Though still probably don't waste all your time.)

Introduction

Decision theory is about how to behave rationally under conditions of uncertainty, especially if this uncertainty involves being acausally blackmailed and/or gaslit by alien superintelligent basilisks.

Decision theory has found numerous practical applications, including proving the existence of God and generating endless LessWrong comments since the beginning of time.

However, despite the apparent simplicity of "just choose the best action", no comprehensive decision theory that resolves all decision theory dilemmas has yet been formalized. This paper at long last resolves this dilemma, by introducing a new decision theory: VDT.

Decision theory problems and existing theories

Some common existing decision theories are:

  • Causal Decision Theory (CDT): select the action that *causes* the best outcome.
  • Evidential Decision Theory (EDT): select the action that you would be happiest to learn that you had taken.
  • Functional Decision Theory
...

Still laughing.

Thanks for admitting you had to prompt Claude out of being silly; lots of bot results neglect to mention that methodological step.

This will be my reference to all decision theory discussions henceforth

Have all of my 40-some strong upvotes!

3Daniel Kokotajlo
This is a masterpiece. Not only is it funny, it makes a genuinely important philosophical point. What good are our fancy decision theories if asking Claude is a better fit to our intuitions? Asking Claude is a perfectly rigorous and well-defined DT, it just happens to be less elegant/simple than the others. But how much do we care about elegance/simplicity?
1Vecn@tHe0veRl0rd
I find this hilarious, but also a little scary. As in, I don't base my choices/morality off of what an AI says, but see in this article a possibility that I could be convinced to do so. It also makes me wonder, since LLM's are basically curated repositories of most everything that humans have written, if the true decision theory is just "do what most humans would do in this situation".
3satchlj
Claude says the vibes are 'inherently cursed' But then it chooses not to pull the lever because it's 'less karmically disruptive'
To get the best posts emailed to you, create an account! (2-3 posts per week, selected by the LessWrong moderation team.)
Log In Reset Password
...or continue with

After ~3 years as the ACX Meetup Czar, I've decided to resign from my position, and I intend to scale back my work with the LessWrong community as well. While this transition is not without some sadness, I'm excited for my next project.

I'm the Meetup Czar of the new Fewerstupidmistakesity community.

We're calling it Fewerstupidmistakesity because people get confused about what "Rationality" means, and this would create less confusion. It would be a stupid mistake to name your philosophical movement something very similar to an existing movement that's somewhat related but not quite the same thing. You'd spend years with people confusing the two. 

What's Fewerstupidmistakesity about? It's about making fewer stupid mistakes, ideally down to zero such stupid mistakes. Turns out, human brains have lots of scientifically proven...

While I would hate to besmirch the good name of the fewerstupidmistakesist community, I cannot help but feel that misunderstanding morality and decision theory enough to end up doing a murder is a stupider mistake than drawing a gun once a firefight has started, though perhaps not quite as stupid as beginning the fight in the first place.

I think rationalists should consider taking more showers.

As Eliezer Yudkowsky once said, boredom makes us human. The childhoods of exceptional people often include excessive boredom as a trait that helped cultivate their genius:

A common theme in the biographies is that the area of study which would eventually give them fame came to them almost like a wild hallucination induced by overdosing on boredom. They would be overcome by an obsession arising from within.

Unfortunately, most people don't like boredom, and we now have little metal boxes and big metal boxes filled with bright displays that help distract us all the time, but there is still an effective way to induce boredom in a modern population: showering.

When you shower (or bathe, that also works), you usually are cut off...

113Aella
Strong disagree. This is an ineffective way to create boredom. Showers are overly stimulating, with horrible changes in temperature, the sensation of water assaulting you nonstop, and requiring laborious motions to do the bare minimum of scrubbing required to make society not mad at you. A much better way to be bored is to go on a walk outside or lift weights at the gym or listen to me talk about my data cleaning issues
5Bohaska
I guess this is another case of 'Universal' Human Experiences That Not Everyone Has

Serious take

 

CDT might work


Basically because of the bellman fact that

the option

1 utilon, and play a game with EV 1 utilon are the same.

So working out the bellman equations

If each decision changes the game you are playing

This will get integrated.

In any case where somebody is actually making decisions based on your decision theory

The actions you take in previous games might also have the result

Restart from position x with a new game based on what they have simulated to do

The hard part is figuring out binding. 

2Mis-Understandings
Note to self A point that we cannot predict past (classically the singularity), does not mean that we can never predict past it. Just that we can't predict past at this point. It is not a sensical thing to predict the direction of your predictions at a future point in time (or it is, but will not get you anywhere).  But we can predict that our predictions of an event likely improve as we near it.   Therefore, arguments that because we have a prediction horizon, we cannot predict past a certain point will always appear defeated by the fact that now that we have neared it, we can now predict past it are unconvincing, since we now have more information. However, arguments that we will never predict past a certain point need to justify why our prediction ability will in fact get  worse over time. 

LessOnline 2025

Ticket prices increase in 1 day

Join our Festival of Blogging and Truthseeking from May 30 - Jun 1, Berkeley, CA