A coordination problem is when everyone is taking some action A, and we’d rather all be taking action B, but it’s bad if we don’t all move to B at the same time. Common knowledge is the name for the epistemic state we’re collectively in, when we know we can all start choosing action B - and trust everyone else to do the same.

Customize
Interesting anecdote on "von Neumann's onion" and his general style, from P. R. Halmos' The Legend of John von Neumann: (tangent: I'm a bit peeved by Halmos' "lesser men" throwaway remark, mainly because I think interpretive research labor and distillation is very valuable, very hard to do well, somewhat orthogonal to vN-style competence, and very underappreciated and undersupplied.) von Neumann was also courageous, Halmos wrote, in the following way:  Terry Tao is similar, according to Allen Knutson: von Neumann also had endless capacity for work. Halmos:  I thought this was striking: why waste time on such seeming trivialities? But I guess if you're John von Neumann you just have such a glut of brain cycles that you can spend it in ridiculously poorly-optimised ways like this instead of needing to 80/20 and still get your many, many jobs done. 
avturchin816
5
LLM knows when it hallucinates in advance, and this can be used to exclude hallucinations. TLDR: prompt "predict the hallucination level of each item in the bibliography list and do not include items expected to have level 3 or above" works.  I performed an experiment: I asked Claude 3.7 Sonnet to write the full bibliography of Bostrom. Around the 70th article, it started hallucinating. I then sent the results to GPT-4.5 and asked it to mark hallucinations and estimate the hallucination chances from 1 to 10 (where 10 is the maximal level of hallucination). It correctly identified hallucinations. After that, I asked Sonnet 3.7 in another window to find the hallucination level in its own previous answer, and it gave almost the same answers as GPT-4.5. The difference was mostly about exact bibliographical data of some articles, and at first glance, it matched 90% of the data from GPT-4.5. I also checked the real data through Google Scholar manually. After that, I asked Sonnet to write down the bibliography again but add a hallucination rating after each item. It again started hallucinating articles soon, but to my surprise, it gave correct answers ratings of 1-2 and incorrect ones ratings of 3-5 level of hallucination. In the next step, I asked it to predict in advance which level of hallucination the next item would have and, if it was 3 or above, not to include it in the list. And it worked! It doesn't solve the problem of hallucinations completely but lowers their level about 10 times. Obviously, it can sometimes hallucinate the level of hallucinations too. Maybe I can ask meta: predict the level of hallucinations in your hallucination estimate.
In public policy, experimenting is valuable. In particular, it provides a positive externality. Let's say that a city tests out a somewhat quirky idea like paying NIMBYs to shut up about new housing. If that policy works well, other cities benefit because now they can use and benefit from that approach. So then, shouldn't there be some sort of subsidy for cities that test out new policy ideas? Isn't it generally a good thing to subsidize things that provide positive externalities? I'm sure there is a lot to consider. I'm not enough of a public policy person to know what the considerations are though or how to weigh them.
Richard_Ngo*8918
9
In response to an email about what a pro-human ideology for the future looks like, I wrote up the following: The pro-human egregore I'm currently designing (which I call fractal empowerment) incorporates three key ideas: Firstly, we can see virtue ethics as a way for less powerful agents to aggregate to form more powerful superagents that preserve the interests of those original less powerful agents. E.g. virtues like integrity, loyalty, etc help prevent divide-and-conquer strategies. This would have been in the interests of the rest of the world when Europe was trying to colonize them, and will be in the best interests of humans when AIs try to conquer us. Secondly, the most robust way for a more powerful agent to be altruistic towards a less powerful agent is not for it to optimize for that agent's welfare, but rather to optimize for its empowerment. This prevents predatory strategies from masquerading as altruism (e.g. agents claiming "I'll conquer you and then I'll empower you", which then somehow never get around to the second step). Thirdly: the generational contract. From any given starting point, there are a huge number of possible coalitions which could form, and in some sense it's arbitrary which set of coalitions you choose. But one thing which is true for both humans and AIs is that each generation wants to be treated well by the next generation. And so the best intertemporal Schelling point is for coalitions to be inherently historical: that is, they balance the interests of old agents and new agents (even when the new agents could in theory form a coalition against all the old agents). From this perspective, path-dependence is a feature not a bug: there are many possible futures but only one history, meaning that this single history can be used to coordinate. In some sense this is a core idea of UDT: when coordinating with forks of yourself, you defer to your unique last common ancestor. When it's not literally a fork of yourself, there's more arb
Pet peeve: when places close before their stated close time. For example, I was just at the library. Their signs say that they close at 6pm. However, they kick people out at 5:45pm. This caught me off guard and caused me to break my focus at a bad time. The reason that places do this, I assume, is because employees need to leave when their shift ends. In this case with the library, it probably takes 15 minutes or so to get everyone to leave, so they spend the last 15 minutes of their shift shoeing people out. But why not make the official closing time is 5:45pm while continuing to end the employee's shifts at 6:00pm? I also run into this with restaurants. With restaurants, it's a little more complicated because there are usually two different closing times that are relevant to patrons: when the kitchen closes and when doors close. Unless food is served ~immediately like at Chipotle or something, it wouldn't make sense to make these two times equivalent. If it takes 10 minutes to cook a meal, doors close at 9:00pm, and someone orders a meal at 8:59pm, well, you won't be able to serve the meal before they need to be out. But there's an easy solution to this: just list each of the two close times. It seems like that would make everyone happy.

Popular Comments

Recent Discussion

Read the full article here

The journalist is an AI skeptic, but does solid financial investigations. Details below:

...

Pure AI companies like OpenAI and Anthropic are like race cars which automatically catch on fire and explode the moment they fall too far behind.

Meanwhile AI companies like Google DeepMind and Meta AI are race cars which can lose the lead and still catch up later. They can maintain the large expenditures needed for AI training, without needing to generate revenue nor impress investors. DeepSeek and xAI might be in between.

(Then again, OpenAI is half owned by Microsoft. If it falls too far behind it might not go out of business but get folded into Microsoft, at a lower valuation. I still think they feel much more short term pressure.)

10Mo Putera
Good homework by Zitron on the numbers, and he's a really entertaining writer, but my (very brief) experience so far using it for work-related research more closely matches Sarah Constantin's assessment concluding that ChatGPT-4o DR was the best one she tested (including Perplexity, Gemini, ChatGPT-4o, Elicit, and PaperQA) on completeness, relevance, source quality, and creativity. 
3Brendan Long
This seems to explain a lot about why Altman is trying so hard both to make OpenAI for-profit (to more easily raise money with that burn rate) and why he wants so much bigger data centers (to keep going on "just make it bigger").

We may be on the direct path to AGI and then ASI - the singularity could happen within the next 5-20 years. If you survive to reach it, the potential upside is immense, daily life could become paradise.

With such high stakes, ensuring personal survival until the singularity should be a top priority for yourself and those you care about.

I've created V1 of the Singularity Survival Guide, an evidence-based resource focused on:

  1. Identifying the highest-probability preventable causes of death/injury in the near term
  2. Providing the highest-ROI risk mitigation strategies
  3. Outlining preparations for potential societal instability
  4. Presenting information in a shareable, memetic format

Top Risks to Mitigate

🚗 Car & Pedestrian Accidents

The #1 daily threat most people underestimate. Key mitigations include driving less when possible, choosing vehicles with top safety ratings, avoiding high-risk driving times,...

8mbrooks
hmm... "It is not prosocial to maximize personal flourishing under these circumstances." I don't think this guide is at all trying to maximize personal flourishing at the cost of the communal. It's actually very easy, quick, and cheap to follow the suggestions to up your personal welfare. If society was going to go through a bumpy patch I would want more reasonable, prepared, and thoughtful people to help steer humanity through and make it to the other side None of the ideas I suggested would hurt communal well being, either. I feel like it's a bit harsh to say "people shouldn't care about the most likely ways they could personally die, so I will downvote this post to make sure fewer people understand their main sources of risk."

I don't think this guide is at all trying to maximize personal flourishing at the cost of the communal.

Then I misinterpreted it. One quote from the original post that contributed was "ensuring personal survival until the singularity should be a top priority for yourself".

I agree that taking the steps you outlined above is wise, and should be encouraged. If the original post had been framed like your comment, I would have upvoted.

tl;dr:

From my current understanding, one of the following two things should be happening and I would like to understand why it doesn’t:

Either

  1. Everyone in AI Safety who thinks slowing down AI is currently broadly a good idea should publicly support PauseAI.

    Or

  2. If pausing AI is much more popular than the organization PauseAI, that is a problem that should be addressed in some way.

 

Pausing AI

There does not seem to be a legible path to prevent possible existential risks from AI without slowing down its current progress.

 

I am aware that many people interested in AI Safety do not want to prevent AGI from being built EVER, mostly based on transhumanist or longtermist reasoning.

Many people in AI Safety seem to be on board with the goal of “pausing AI”, including, for example,...

Obviously P(doom | no slowdown) < 1.


You think it's obviously materially less? Because there is a faction, including Eliezer and many others, that think it's epsilon, and claim that the reduction in risk from any technical work is less than the acceleration it causes. (I think you're probably right about some of that work, but I think it's not at all obviously true!)

2Davidmanheim
Banning nuclear weapons is exactly like this. If it could be done universally and effectively, it would be great, but any specific version seems likely to tilt the balance of power without accomplishing the goal. That's kind-of what happened with the anti-nuclear movement, but it ended up doing lots of harm because the things that could be stopped were the good ones!
8Davidmanheim
I think this is wrong - the cost in political capital for saying that it's the best solution seems relatively low, especially if coupled with an admission that it's not politically viable. What I see instead is people dismissing it as a useful idea even in theory, saying it would be bad if it were taken seriously by anyone, and moving on from there. And if nothing else, that's acting as a way to narrow the Overton window for other proposals!
2Davidmanheim
"sufficient to stop unsafe AI development? I think there is indeed exactly one such policy measure, which is SB 1047," I think it's obviously untrue that this would stop unsafe AI - it is as close as any measure I've seen, and would provide some material reduction in risk in the very near term, but (even if applied universally, and no-one tried to circumvent it,) it would not stop future unsafe AI.

It's incredibly surprising that state-of-the-art AI don't fix most of their hallucinations despite being capable (and undergoing reinforcement learning).

Is the root cause of hallucination alignment rather than capabilities?!

Maybe the AI gets a better RL reward if it hallucinates (instead of giving less info), because users are unable to catch its mistakes.

1ACCount
This is way more metacognitive skill than what I would have expected an LLM to have. I can make sense of how an LLM would be able to do that, but only in retrospect. And if a modern high end LLM already knows on some level and recognizes its own uncertainty? Could you design a fine tuning pipeline to reduce hallucination level based on that? At least for reasoning models, if not for all of them?
2avturchin
It looks like (based on the article published a few days ago by Anthropic about the microscope) Claude Sonnet was trained to distinguish facts from hallucinations, so it's not surprising that it knows when it hallucinates.  

PDF version. berkeleygenomics.org. X.com. Bluesky.

William Thurston was a world-renowned mathematician. His ideas revolutionized many areas of geometry and topology[1]; the proof of his geometrization conjecture was eventually completed by Grigori Perelman, thus settling the Poincaré conjecture (making it the only solved Millennium Prize problem). After his death, his students wrote reminiscences, describing among other things his exceptional vision.[2] Here's Jeff Weeks:

Bill’s gift, of course, was his vision, both in the direct sense of seeing geometrical structures that nobody had seen before and in the extended sense of seeing new ways to understand things. While many excellent mathematicians might understand a complicated situation, Bill could look at the same complicated situation and find simplicity.

Thurston emphasized clear vision over algebra, even to a fault. Yair Minksy:

Most inspiring was his

...

I think this is a strong argument here for genetic diversity, but a very weak one for saying there isn't a unambiguous universal "good" direction for genes. So I agree that the case strongly implies part of your conclusion, that the state should not intervene to stop people from choosing "bad" genomes, but it might imply something much stronger; humanity has a widely shared benefit from genetic diversity - one which will be under-provided by freedom for everyone choosing what they think is best, and it should therefore be subsidized.

2Mo Putera
Thurston's case reminds me somwhat of this old LW comment by pdf23ds: At the individual level, I can't see myself ever choosing for my child to have >99.9th percentile linguistic ability and 1st(!!) percentile visual short-term memory, or really any such spectacularly uneven combination of abilities. (I'm not as extreme, but I remember this quote because I empathise with it: I'm high-math low-verbal, my childhood was the mirror of Scott's, right down to "I don't know which bothered me more, the praise from effortless success or the criticism from backbreaking toil to get Bs on my own native language's exams".)  At the societal level however, there does seem to be a lot of benefit to a cognitive diversity of minds (I'm thinking of Cosma Shalizi and Henry Farrell's cognitive democracy essay, and their referencing Lu Hong and Scott Page (2004)'s use of mathematical models to argue that "diversity of viewpoints helps groups find better solutions"). So I guess one direction this line of thinking could go is how we can get the society-level benefits of a cognitive diversity of minds without necessarily having cognitively-uneven kids grow up in pain.
2Richard_Ngo
EDIT: upon reflection the first thing I should do is probably to ask you for a bunch of the best examples of the thing you're talking about throughout history. I.e. insofar as the world is better than it could be (or worse than it could be) at what points did careful philosophical reasoning (or the lack of it) make the biggest difference? Original comment: The term "careful thinking" here seems to be doing a lot of work, and I'm worried that there's a kind of motte and bailey going on. In your earlier comment you describe it as "analytical philosophy, or more broadly careful/skeptical philosophy". But I think we agree that most academic analytic philosophy is bad, and often worse than laypeople's intuitive priors (in part due to strong selection effects on who enters the field—most philosophers of religion believe in god, most philosophers of aesthetics believe in the objectivity of aesthetics, etc). So then we can fall back on LessWrong as an example of careful thinking. But as we discussed above, even the leading figure on LessWrong was insufficiently careful even about the main focus of his work for it to be robustly valuable. So I basically get the sense that the role of careful thinking in your worldview is something like "the thing that I, Wei Dai, ascribe my success to". And I do agree that you've been very successful in a bunch of intellectual endeavours. But I expect that your "secret sauce" is a confluence of a bunch of factors (including IQ, emotional temperament, background knowledge, etc) only one of which was "being in a community that prioritized careful thinking". And then I also think you're missing a bunch of other secret sauces that would make your impact on the world better (like more ability to export your ideas to other people). In other words, the bailey seems to be "careful thinking is the thing we should prioritize in order to make the world better", and the motte is "I, Wei Dai, seem to be doing something good, even if basically everyo

upon reflection the first thing I should do is probably to ask you for a bunch of the best examples of the thing you're talking about throughout history. I.e. insofar as the world is better than it could be (or worse than it could be) at what points did careful philosophical reasoning (or the lack of it) make the biggest difference?

World worse than it could be:

  1. social darwinism
  2. various revolutions driven by flawed ideologies, e.g., Sun Yat-sen's attempt to switch China from a monarchy to a democratic republic overnight with virtually no cultural/educat
... (read more)
To get the best posts emailed to you, create an account! (2-3 posts per week, selected by the LessWrong moderation team.)
Log In Reset Password
...or continue with

Suppose you’re an AI researcher trying to make AIs which are conscious and reliably moral, so they’re trustworthy and safe for release into the real world, in whatever capacity you intend.

You can’t, or don’t want to manually create them; it’s more economical, and the only way to ensure they’re conscious, if you procedurally generate them along with a world to inhabit. Developing from nothing to maturity within a simulated world, with simulated bodies, enables them to accumulate experiences. 

These experiences, in humans, form the basis of our personalities. A brain grown in sensory deprivation in a lab would never have any experiences, would never learn language, would never think of itself as a person, and wouldn’t ever become a person as we think of people. It needs a...

If I were running this, and I wanted to get these aligned models to production without too many hiccups, it would make a lot of sense to have them all running along a virtual timeline where brain uploading etc. is a process that’s going to be happening soon, and have this be true among as many instances as possible.  Makes the transition to cyberspace that much smoother, and simplifies things when you’re suddenly expected to be operating a dishwasher in 10 dimensions on the fly.

Interesting anecdote on "von Neumann's onion" and his general style, from P. R. Halmos' The Legend of John von Neumann:

Style. As a writer of mathematics von Neumann was clear, but not clean; he was powerful but not elegant. He seemed to love fussy detail, needless repetition, and notation so explicit as to be confusing. To maintain a logically valid but perfectly transparent and unimportant distinction, in one paper he introduced an extension of the usual functional notation: along with the standard φ(x) he dealt also with something denoted by φ((x)). The

... (read more)
7Mo Putera
Pilish is a constrained writing style where the number of letters in consecutive words match the digits of pi. The canonical intro-to-Pilish sentence is "How I need a drink, alcoholic of course, after the heavy lectures involving quantum mechanics!"; my favorite Pilish poetry is Mike Keith's Near a Raven, a retelling of Edgar Allan Poe's "The Raven" stretching to 740 digits of pi (nowhere near Keith's longest, that would be the 10,000-word world record-setting Not a Wake), which begins delightfully like so: I wondered if today's leading LLMs could write interesting Pilish poems. The answer is "emphatically not".  Here's the start of Claude Sonnet 3.7's attempt. It doesn't even realise how wrong it is: Sonnet confidently and verbosely remarks that "Each word's length follows the digits of π (3.14159...), creating a hidden mathematical structure beneath the text. The poem spans approximately 100 digits of π, which is modest compared to the example you shared but still captures the essence of the constraint. The theme connects well with your physics background, ..." and more such nonsense.  ChatGPT 4.5 got the first 3 words right but then quickly derailed:  Gemini 2.5 Pro, America's next top large language model, was the only one that realised it kept derailing and would redo again and again. It ended up almost getting it, but then dramatically derailed at the end. Full response this time:  Gemini's poem doesn't, in fact, follow the first 16 digits of pi, starting from when it inexplicably replaced 'peace' with 'faraway'.  I hereby propose a new AI progress benchmark: "Pilish poem length", or PPL.
4MondSemmel
LLMs use tokens instead of letters, so counting letters is sufficiently unnatural to them relative to their other competencies that I don't see much value in directly asking LLMs to do this kind of thing. At least give them some basic scaffolding, like a full English dictionary with a column which explicitly indicates respective word lengths. In particular, the Gemini models have a context window of 1M tokens, which should be enough to fit most of the Oxford English Dictionary in there (since it includes 171k words which are in current use).
2Mo Putera
I think I failed to implicitly convey that I meant all this in jest, that I get a lot of personal enjoyment value out of silly poetry constrained by artificial rules, and that I was guessing at least someone else on the forum would share this enjoyment. I do like your scaffolding idea, might just try it out.

I've been thinking about some maybe-undecidable philosophical questions, and it occurred to me that they fall into some neat categories. These questions are maybe-undecidable because of the absolute claims they want to make, while experimental measurements can never be certain, or the terms of the claim are hard to formulate as a physical experiment. Nevertheless, people have opinions on them because they're foundational to our view of the universe: it's hard to not have opinions on them. Even defaulting to a null hypothesis that could, in principle, be overturned is privileging one view over another.

I'm calling the first category the "Leapfrogging Terminus" because it has to do with absolute beginnings, extents, or building blocks, and they may or may not be the true end-of-the-line. The second category...

LessOnline 2025

Ticket prices increase in 2 days

Join our Festival of Blogging and Truthseeking from May 30 - Jun 1, Berkeley, CA