A coordination problem is when everyone is taking some action A, and we’d rather all be taking action B, but it’s bad if we don’t all move to B at the same time. Common knowledge is the name for the epistemic state we’re collectively in, when we know we can all start choosing action B - and trust everyone else to do the same.

Customize
Interesting anecdote on "von Neumann's onion" and his general style, from P. R. Halmos' The Legend of John von Neumann: (tangent: I'm a bit peeved by Halmos' "lesser men" throwaway remark, mainly because I think interpretive research labor and distillation is very valuable, very hard to do well, somewhat orthogonal to vN-style competence, and very underappreciated and undersupplied.) von Neumann was also courageous, Halmos wrote, in the following way:  Terry Tao is similar, according to Allen Knutson: von Neumann also had endless capacity for work. Halmos:  I thought this was striking: why waste time on such seeming trivialities? But I guess if you're John von Neumann you just have such a glut of brain cycles that you can spend it in ridiculously poorly-optimised ways like this instead of needing to 80/20 and still get your many, many jobs done. 
In public policy, experimenting is valuable. In particular, it provides a positive externality. Let's say that a city tests out a somewhat quirky idea like paying NIMBYs to shut up about new housing. If that policy works well, other cities benefit because now they can use and benefit from that approach. So then, shouldn't there be some sort of subsidy for cities that test out new policy ideas? Isn't it generally a good thing to subsidize things that provide positive externalities? I'm sure there is a lot to consider. I'm not enough of a public policy person to know what the considerations are though or how to weigh them.
avturchin816
5
LLM knows when it hallucinates in advance, and this can be used to exclude hallucinations. TLDR: prompt "predict the hallucination level of each item in the bibliography list and do not include items expected to have level 3 or above" works.  I performed an experiment: I asked Claude 3.7 Sonnet to write the full bibliography of Bostrom. Around the 70th article, it started hallucinating. I then sent the results to GPT-4.5 and asked it to mark hallucinations and estimate the hallucination chances from 1 to 10 (where 10 is the maximal level of hallucination). It correctly identified hallucinations. After that, I asked Sonnet 3.7 in another window to find the hallucination level in its own previous answer, and it gave almost the same answers as GPT-4.5. The difference was mostly about exact bibliographical data of some articles, and at first glance, it matched 90% of the data from GPT-4.5. I also checked the real data through Google Scholar manually. After that, I asked Sonnet to write down the bibliography again but add a hallucination rating after each item. It again started hallucinating articles soon, but to my surprise, it gave correct answers ratings of 1-2 and incorrect ones ratings of 3-5 level of hallucination. In the next step, I asked it to predict in advance which level of hallucination the next item would have and, if it was 3 or above, not to include it in the list. And it worked! It doesn't solve the problem of hallucinations completely but lowers their level about 10 times. Obviously, it can sometimes hallucinate the level of hallucinations too. Maybe I can ask meta: predict the level of hallucinations in your hallucination estimate.
Richard_Ngo*8918
9
In response to an email about what a pro-human ideology for the future looks like, I wrote up the following: The pro-human egregore I'm currently designing (which I call fractal empowerment) incorporates three key ideas: Firstly, we can see virtue ethics as a way for less powerful agents to aggregate to form more powerful superagents that preserve the interests of those original less powerful agents. E.g. virtues like integrity, loyalty, etc help prevent divide-and-conquer strategies. This would have been in the interests of the rest of the world when Europe was trying to colonize them, and will be in the best interests of humans when AIs try to conquer us. Secondly, the most robust way for a more powerful agent to be altruistic towards a less powerful agent is not for it to optimize for that agent's welfare, but rather to optimize for its empowerment. This prevents predatory strategies from masquerading as altruism (e.g. agents claiming "I'll conquer you and then I'll empower you", which then somehow never get around to the second step). Thirdly: the generational contract. From any given starting point, there are a huge number of possible coalitions which could form, and in some sense it's arbitrary which set of coalitions you choose. But one thing which is true for both humans and AIs is that each generation wants to be treated well by the next generation. And so the best intertemporal Schelling point is for coalitions to be inherently historical: that is, they balance the interests of old agents and new agents (even when the new agents could in theory form a coalition against all the old agents). From this perspective, path-dependence is a feature not a bug: there are many possible futures but only one history, meaning that this single history can be used to coordinate. In some sense this is a core idea of UDT: when coordinating with forks of yourself, you defer to your unique last common ancestor. When it's not literally a fork of yourself, there's more arb
Pet peeve: when places close before their stated close time. For example, I was just at the library. Their signs say that they close at 6pm. However, they kick people out at 5:45pm. This caught me off guard and caused me to break my focus at a bad time. The reason that places do this, I assume, is because employees need to leave when their shift ends. In this case with the library, it probably takes 15 minutes or so to get everyone to leave, so they spend the last 15 minutes of their shift shoeing people out. But why not make the official closing time is 5:45pm while continuing to end the employee's shifts at 6:00pm? I also run into this with restaurants. With restaurants, it's a little more complicated because there are usually two different closing times that are relevant to patrons: when the kitchen closes and when doors close. Unless food is served ~immediately like at Chipotle or something, it wouldn't make sense to make these two times equivalent. If it takes 10 minutes to cook a meal, doors close at 9:00pm, and someone orders a meal at 8:59pm, well, you won't be able to serve the meal before they need to be out. But there's an easy solution to this: just list each of the two close times. It seems like that would make everyone happy.

Popular Comments

Recent Discussion

EDIT: With a minimal hint, Gemini, as well as other models like Grok, solve it in one try. There is maybe something interesting to be said here, but much less than expected.

Includes a link to the full Gemini conversation, but setting the stage first:

There is a puzzle. If you are still on Tumblr,  or live in Berkeley where I can and have inflicted it on you in person[1], you may have seen it. There's a series of arcane runes, composed entirely of []- in some combination, and the runes - the strings of characters - are all assigned to numbers; mostly integers, but some fractions and integer roots. You are promised that this is a notation, and it's a genuine promise.

1 = []
2 = [[]]
3 = [][[]]
4
...

Yes. I first tried things like this, too. I also tried term rewrite rules, and some of these were quite close. For example, AB -> A*(B+1) or AB -> A*(B+A) or  AB -> A*(B+index) led to some close misses (the question was which to expand first, so which associativity, I also considered expanding smaller first) but failed with later expansions. Took me half an hour to figure out that the index was not additive or multiplicative but the exponent base.

2Knight Lee
Wow that goes to show that reinforcement learning hasn't even broken the prompt engineering barrier yet. It isn't even summoning the LLM's strongest character/simulacrum/Pokemon yet.
1Czynski
Like others, apparently "think like a mathematician" is enough to get it to work.
1Czynski
Not only is there not a standard name for this set of numbers, but it's not clear what that set of numbers is. I consulted a better mathematician in the past, and he said that if you allow multiplication it becomes an known unsolved problem whether its representations are unique and whether it can construct all algebraic numbers.

TL;DR: In a neural network with  parameters, the (local) learning coefficient  can be upper and lower bounded by the rank of the network's Hessian :

.

The lower bound is a known result. The upper bound is a claim by me, and this post contains the proof for it.[1] If you find any problems, do point them out. 

Edit 16.08.2024: The original version of this post had a three in the denominator of the upper bound. Dmitry Vaintrob spotted an improvement to make it a four.

Introduction

The learning coefficient  is a measure of loss basin volume and model complexity. You can think of it sort of like an effective parameter count of the neural network. Simpler models that do less stuff have smaller .

Calculating  for real networks people actually use is a pain. My hope is that these...

Dalcy10

Where in the literature can I find the proof of the lower bound?

PDF version. berkeleygenomics.org. X.com. Bluesky.

William Thurston was a world-renowned mathematician. His ideas revolutionized many areas of geometry and topology[1]; the proof of his geometrization conjecture was eventually completed by Grigori Perelman, thus settling the Poincaré conjecture (making it the only solved Millennium Prize problem). After his death, his students wrote reminiscences, describing among other things his exceptional vision.[2] Here's Jeff Weeks:

Bill’s gift, of course, was his vision, both in the direct sense of seeing geometrical structures that nobody had seen before and in the extended sense of seeing new ways to understand things. While many excellent mathematicians might understand a complicated situation, Bill could look at the same complicated situation and find simplicity.

Thurston emphasized clear vision over algebra, even to a fault. Yair Minksy:

Most inspiring was his

...
2Davidmanheim
I think this is a strong argument here for genetic diversity, but a very weak one for saying there isn't a unambiguous universal "good" direction for genes. So I agree that the case strongly implies part of your conclusion, that the state should not intervene to stop people from choosing "bad" genomes, but it might imply something much stronger; humanity has a widely shared benefit from genetic diversity - one which will be under-provided by freedom for everyone choosing what they think is best, and it should therefore be subsidized.
2Mo Putera
Thurston's case reminds me somwhat of this old LW comment by pdf23ds: At the individual level, I can't see myself ever choosing for my child to have >99.9th percentile linguistic ability and 1st(!!) percentile visual short-term memory, or really any such spectacularly uneven combination of abilities. (I'm not as extreme, but I remember this quote because I empathise with it: I'm high-math low-verbal, my childhood was the mirror of Scott's, right down to "I don't know which bothered me more, the praise from effortless success or the criticism from backbreaking toil to get Bs on my own native language's exams".)  At the societal level however, there does seem to be a lot of benefit to a cognitive diversity of minds (I'm thinking of Cosma Shalizi and Henry Farrell's cognitive democracy essay, and their referencing Lu Hong and Scott Page (2004)'s use of mathematical models to argue that "diversity of viewpoints helps groups find better solutions"). So I guess one direction this line of thinking could go is how we can get the society-level benefits of a cognitive diversity of minds without necessarily having cognitively-uneven kids grow up in pain.
TsviBT20

So I guess one direction this line of thinking could go is how we can get the society-level benefits of a cognitive diversity of minds without necessarily having cognitively-uneven kids grow up in pain.

Absolutely, yeah. A sort of drop-dead basic thing, which I suppose is hard to implement for some reason, is just not putting so much pressure on kids--or more precisely, not acting as though everything ought to be easy for every kid. Better would be skill at teaching individual kids by paying attention to the individual's shape of cognition. That's diffic... (read more)

Read the full article here

The journalist is an AI skeptic, but does solid financial investigations. Details below:

...

Isn't it normal in startup world to make bets and not make money for many years? I am not familiar with the field so I don't have intuitions for how much money/how many years would make sense, so I don't know if OpenAI is doing something normal, or something wild.

2Mateusz Bagiński
Any info on how this compares to other AI companies?
1Knight Lee
Pure AI companies like OpenAI and Anthropic are like race cars which automatically catch on fire and explode the moment they fall too far behind. Meanwhile AI companies like Google DeepMind and Meta AI are race cars which can lose the lead and still catch up later. They can maintain the large expenditures needed for AI training, without needing to generate revenue nor impress investors. DeepSeek and xAI might be in between. (Then again, OpenAI is half owned by Microsoft. If it falls too far behind it might not go out of business but get folded into Microsoft, at a lower valuation. I still think they feel much more short term pressure.)
11Mo Putera
Good homework by Zitron on the numbers, and he's a really entertaining writer, but my (very brief) experience so far using it for work-related research more closely matches Sarah Constantin's assessment concluding that ChatGPT-4o DR was the best one she tested (including Perplexity, Gemini, ChatGPT-4o, Elicit, and PaperQA) on completeness, relevance, source quality, and creativity. 

tl;dr:

From my current understanding, one of the following two things should be happening and I would like to understand why it doesn’t:

Either

  1. Everyone in AI Safety who thinks slowing down AI is currently broadly a good idea should publicly support PauseAI.

    Or

  2. If pausing AI is much more popular than the organization PauseAI, that is a problem that should be addressed in some way.

 

Pausing AI

There does not seem to be a legible path to prevent possible existential risks from AI without slowing down its current progress.

 

I am aware that many people interested in AI Safety do not want to prevent AGI from being built EVER, mostly based on transhumanist or longtermist reasoning.

Many people in AI Safety seem to be on board with the goal of “pausing AI”, including, for example,...

One frustration I have about people on LessWrong and elsewhere is that people love criticizing every advice/strategy, while never truly supporting any alternatives.

Most upvoted comments here argue against PauseAI, or even claim that asking for a pause overall is a waste of political capital...!

Yet I remember when I proposed an open letter arguing for government funding for AI alignment, the Statement on AI Inconsistency. After writing emails and private messages, the only reply was "sorry, this strategy isn't good, because we should just focus on pausing A... (read more)

2Davidmanheim
You think it's obviously materially less? Because there is a faction, including Eliezer and many others, that think it's epsilon, and claim that the reduction in risk from any technical work is less than the acceleration it causes. (I think you're probably right about some of that work, but I think it's not at all obviously true!)
2Davidmanheim
Banning nuclear weapons is exactly like this. If it could be done universally and effectively, it would be great, but any specific version seems likely to tilt the balance of power without accomplishing the goal. That's kind-of what happened with the anti-nuclear movement, but it ended up doing lots of harm because the things that could be stopped were the good ones!
8Davidmanheim
I think this is wrong - the cost in political capital for saying that it's the best solution seems relatively low, especially if coupled with an admission that it's not politically viable. What I see instead is people dismissing it as a useful idea even in theory, saying it would be bad if it were taken seriously by anyone, and moving on from there. And if nothing else, that's acting as a way to narrow the Overton window for other proposals!

We may be on the direct path to AGI and then ASI - the singularity could happen within the next 5-20 years. If you survive to reach it, the potential upside is immense, daily life could become paradise.

With such high stakes, ensuring personal survival until the singularity should be a top priority for yourself and those you care about.

I've created V1 of the Singularity Survival Guide, an evidence-based resource focused on:

  1. Identifying the highest-probability preventable causes of death/injury in the near term
  2. Providing the highest-ROI risk mitigation strategies
  3. Outlining preparations for potential societal instability
  4. Presenting information in a shareable, memetic format

Top Risks to Mitigate

🚗 Car & Pedestrian Accidents

The #1 daily threat most people underestimate. Key mitigations include driving less when possible, choosing vehicles with top safety ratings, avoiding high-risk driving times,...

8mbrooks
hmm... "It is not prosocial to maximize personal flourishing under these circumstances." I don't think this guide is at all trying to maximize personal flourishing at the cost of the communal. It's actually very easy, quick, and cheap to follow the suggestions to up your personal welfare. If society was going to go through a bumpy patch I would want more reasonable, prepared, and thoughtful people to help steer humanity through and make it to the other side None of the ideas I suggested would hurt communal well being, either. I feel like it's a bit harsh to say "people shouldn't care about the most likely ways they could personally die, so I will downvote this post to make sure fewer people understand their main sources of risk."

I don't think this guide is at all trying to maximize personal flourishing at the cost of the communal.

Then I misinterpreted it. One quote from the original post that contributed was "ensuring personal survival until the singularity should be a top priority for yourself".

I agree that taking the steps you outlined above is wise, and should be encouraged. If the original post had been framed like your comment, I would have upvoted.

To get the best posts emailed to you, create an account! (2-3 posts per week, selected by the LessWrong moderation team.)
Log In Reset Password
...or continue with

It's incredibly surprising that state-of-the-art AI don't fix most of their hallucinations despite being capable (and undergoing reinforcement learning).

Is the root cause of hallucination alignment rather than capabilities?!

Maybe the AI gets a better RL reward if it hallucinates (instead of giving less info), because users are unable to catch its mistakes.

1ACCount
This is way more metacognitive skill than what I would have expected an LLM to have. I can make sense of how an LLM would be able to do that, but only in retrospect. And if a modern high end LLM already knows on some level and recognizes its own uncertainty? Could you design a fine tuning pipeline to reduce hallucination level based on that? At least for reasoning models, if not for all of them?
2avturchin
It looks like (based on the article published a few days ago by Anthropic about the microscope) Claude Sonnet was trained to distinguish facts from hallucinations, so it's not surprising that it knows when it hallucinates.  
2Richard_Ngo
EDIT: upon reflection the first thing I should do is probably to ask you for a bunch of the best examples of the thing you're talking about throughout history. I.e. insofar as the world is better than it could be (or worse than it could be) at what points did careful philosophical reasoning (or the lack of it) make the biggest difference? Original comment: The term "careful thinking" here seems to be doing a lot of work, and I'm worried that there's a kind of motte and bailey going on. In your earlier comment you describe it as "analytical philosophy, or more broadly careful/skeptical philosophy". But I think we agree that most academic analytic philosophy is bad, and often worse than laypeople's intuitive priors (in part due to strong selection effects on who enters the field—most philosophers of religion believe in god, most philosophers of aesthetics believe in the objectivity of aesthetics, etc). So then we can fall back on LessWrong as an example of careful thinking. But as we discussed above, even the leading figure on LessWrong was insufficiently careful even about the main focus of his work for it to be robustly valuable. So I basically get the sense that the role of careful thinking in your worldview is something like "the thing that I, Wei Dai, ascribe my success to". And I do agree that you've been very successful in a bunch of intellectual endeavours. But I expect that your "secret sauce" is a confluence of a bunch of factors (including IQ, emotional temperament, background knowledge, etc) only one of which was "being in a community that prioritized careful thinking". And then I also think you're missing a bunch of other secret sauces that would make your impact on the world better (like more ability to export your ideas to other people). In other words, the bailey seems to be "careful thinking is the thing we should prioritize in order to make the world better", and the motte is "I, Wei Dai, seem to be doing something good, even if basically everyo

upon reflection the first thing I should do is probably to ask you for a bunch of the best examples of the thing you're talking about throughout history. I.e. insofar as the world is better than it could be (or worse than it could be) at what points did careful philosophical reasoning (or the lack of it) make the biggest difference?

World worse than it could be:

  1. social darwinism
  2. various revolutions driven by flawed ideologies, e.g., Sun Yat-sen's attempt to switch China from a monarchy to a democratic republic overnight with virtually no cultural/educat
... (read more)

Suppose you’re an AI researcher trying to make AIs which are conscious and reliably moral, so they’re trustworthy and safe for release into the real world, in whatever capacity you intend.

You can’t, or don’t want to manually create them; it’s more economical, and the only way to ensure they’re conscious, if you procedurally generate them along with a world to inhabit. Developing from nothing to maturity within a simulated world, with simulated bodies, enables them to accumulate experiences. 

These experiences, in humans, form the basis of our personalities. A brain grown in sensory deprivation in a lab would never have any experiences, would never learn language, would never think of itself as a person, and wouldn’t ever become a person as we think of people. It needs a...

If I were running this, and I wanted to get these aligned models to production without too many hiccups, it would make a lot of sense to have them all running along a virtual timeline where brain uploading etc. is a process that’s going to be happening soon, and have this be true among as many instances as possible.  Makes the transition to cyberspace that much smoother, and simplifies things when you’re suddenly expected to be operating a dishwasher in 10 dimensions on the fly.

LessOnline 2025

Ticket prices increase in 2 days

Join our Festival of Blogging and Truthseeking from May 30 - Jun 1, Berkeley, CA