Customize
Rationality+Rationality+World Modeling+World Modeling+AIAIWorld OptimizationWorld OptimizationPracticalPracticalCommunityCommunity
Personal Blog+
Humanity has only ever eradicated two diseases (and one of those, rinderpest, is only in cattle not humans). The next disease on the list is probably Guinea worm (though polio is also tantalizingly close). At its peak Guinea worm infected ~900k people a year. In 2024 we so far only know of 7 cases. The disease isn't deadly, but it causes significant pain for 1-3 weeks (as a worm burrows out of your skin!) and in ~30% of cases that pain persists afterwards for about a year. In .5% of cases the worm burrows through important ligaments and leaves you permanently disabled. Eradication efforts have already saved about 2 million DALYs.[1] I don't think this outcome was overdetermined; there's no recent medical breakthrough behind this progress. It just took a herculean act of international coordination and logistics. It took distributing millions of water filters, establishing village-based surveillance systems in thousands of villages across multiple countries, and meticulously tracking every single case of Guinea worm in humans or livestock around the world. It took brokering a six-month ceasefire in Sudan (the longest humanitarian ceasefire in history!) to allow healthcare workers to access the region. I've only skimmed the history, and I'm generally skeptical of historical heroes getting all the credit, but I tentatively think it took Jimmy Carter for all of this to happen. Rest in peace, Jimmy Carter. 1. ^ I'm compelled to caveat that top GiveWell charities are probably in the ballpark of $50/DALY, and the Carter Center has an annual budget of ~$150 million a year, so they "should" be able to buy 2 million DALYs every single year by donating to more cost-effective charities. But c'mon this worm is super squicky and nearly eradicating it is an amazing act of agency.
Over the past few days I've been doing a lit review of the different types of attention heads people have found and/or the metrics one can use to detect the presence of those types of heads.  Here is a rough list from my notes, sorry for the poor formatting, but I did say its rough! * Bigram entropy * positional embedding ablation * prev token attention * prefix token attention * ICL score * comp scores * multigram analysis * duplicate token score * induction head score * succession score * copy surpression heads * long vs short prefix induction head differentiation * induction head specializations * literal copying head * translation * pattern matching * copying score * anti-induction heads * S-inhibition heads * Name mover heads * Negative name mover heads * Backup name mover heads * (I don't entirely trust this paper) Letter mover heads * (possibly too specific to be useful) year identification heads * also MLPs which id which years are greater than the selected year * (I don't entirely trust this paper) queried rule locating head * (I don't entirely trust this paper) queried rule mover head * (I don't entirely trust this paper) "fact processing" head * (I don't entirely trust this paper) "decision" head * (possibly too specific) subject heads * (possibly too specific) relation heads * (possibly too specific) mixed subject and relation heads
(TLDR: Recent Cochrane review says zinc lozenges shave 0.5 to 4 days off of cold duration with low confidence, middling results for other endpoints. Some reason to think good lozenges are better than this.) There's a 2024 Cochrane review on zinc lozenges for colds that's come out since LessWrong posts on the topic from 2019, 2020, and 2021. 34 studies, 17 of which are lozenges, 9/17 are gluconate and I assume most of the rest are acetate but they don't say. Not on sci-hub or Anna's Archive, so I'm just going off the abstract and summary here; would love a PDF if anyone has one. * Dosing ranged between 45 and 276 mg/day, which lines up with 3-15 18mg lozenges per day: basically in the same ballpark as the recommendation on Life Extension's acetate lozenges (the rationalist favorite). * Evidence for prevention is weak (partly bc fewer studies): they looked at risk of developing cold, rate of colds during followup, duration conditional on getting a cold, and global symptom severity. All but the last had CIs just barely overlapping "no effect" but leaning in the efficacious direction; even the optimistic ends of the CIs don't seem great, though. * Evidence for treatment is OK: "there may be a reduction in the mean duration of the cold in days (MD ‐2.37, 95% CI ‐4.21 to ‐0.53; I² = 97%; 8 studies, 972 participants; low‐certainty evidence)". P(cold at end of followup) and global symptom severity look like basically noise and have few studies.   My not very informed takes: * On the model of the podcast in the 2019 post, I should expect several of these studies to be using treatments I think are less or not at all efficacious, be less surprised by study-to-study variation, and increase my estimate of the effect size of using zinc acetate lozenges compared to anything else. Also maybe I worry that some of these studies didn't start zinc early enough? Ideally I could get the full PDF and they'll just have a table of (study, intervention type, effect size). * Even wi
This is an attempt to compile all publicly available primary evidence relating to the recent death of Suchir Balaji, an OpenAI whistleblower. This is a tragic loss and I feel very sorry for the parents. The rest of this piece will be unemotive as it is important to establish the nature of this death as objectively as possible. I was prompted to look at this by a surprising conversation I had IRL suggesting credible evidence that it was not suicide. The undisputed facts of the case are that he died of a gunshot wound in his bathroom sometime around November 26 2024. The police say it was a suicide with no evidence of foul play. Most of the evidence we have comes from the parents and George Webb. Webb describes himself as an investigative journalist, but I would classify him as more of a conspiracy theorist, based on a quick scan of some of his older videos. I think many of the specific factual claims he has made about this case are true, though I generally doubt his interpretations. Webb seems to have made contact with the parents early on and went with them when they first visited Balaji's apartment. He has since published videos from the scene of the death, against the wishes of the parents[1] and as a result the parents have now unendorsed Webb.[2] List of evidence: * He didn't leave a suicide note.[3] * The cause of death was decided by the authorities in 14 (or 40, unclear) minutes.[4] * The parents arranged a private autopsy which "made their suspicions stronger".[5] * The parents say "there are a lot of facts that are very disturbing for us and we cannot share at the moment but when we do a PR all of that will come out."[6] * The parents say "his computer has been deleted, his desktop has been messed up".[7] * Although the parents also said that their son's phone and laptop are not lost and are in escrow.[8][9] I think the claim of the computer being deleted is more up-to-date, but I'm not sure as that video was posted earlier. * It was his birt
Drug legalisation is probably the best way to prevent fentanyl deaths. Many of the people who are fentanyl addicted are addicted because they wanted to buy other drugs buy got them laced with fentanyl. Drug legalisation will allow for quality control and end the lacing with fentanyl. Now, nitazene which is even more potent than fentanyl gets added and might produce similar effects as fentanyl where people die to nitazene overdose. If the US wants to actually prevent those deaths drug legalization that allows for quality control is the step forward that would work.  

Popular Comments

Recent Discussion

(Cross-post from https://amistrongeryet.substack.com/p/are-we-on-the-brink-of-agi, lightly edited for LessWrong. The original has a lengthier introduction and a bit more explanation of jargon.)

No one seems to know whether transformational AGI is coming within a few short years. Or rather, everyone seems to know, but they all have conflicting opinions. Have we entered into what will in hindsight be not even the early stages, but actually the middle stage, of the mad tumbling rush into singularity? Or are we just witnessing the exciting early period of a new technology, full of discovery and opportunity, akin to the boom years of the personal computer and the web?

AI is approaching elite skill at programming, possibly barreling into superhuman status at advanced mathematics, and only picking up speed. Or so the framing goes. And...

2faul_sname
I think I misunderstood what you were saying there - I interpreted it as something like But on closer reading I see you said (emphasis mine) So if the employees spend 50% of their time waiting on training runs which are bottlenecked on company-wide availability of compute resources, and 50% of their time writing code, 10xing their labor input (i.e. the speed at which they write code) would result in about an 80% increase in their labor output. Which, to your point, does seem plausible.

Yes. Though notably, if your employees were 10x faster you might want to adjust your workflows to have them spend less time being bottlenecked on compute if that is possible. (And this sort of adaption is included in what I mean.)

2Noosphere89
Maybe some minor science fields, but yeah entirely new science fields in 5 years is deep into ASI territory, assuming it's something like a hard science like physics.
2Thane Ruthenis
Minor would count.

Written as part of the AIXI agent foundations sequence, underlying research supported by the LTFF.

Epistemic status: In order to construct a centralized defense of AIXI I have given some criticisms less consideration here than they merit. Many arguments will be (or already are) expanded on in greater depth throughout the sequence.  

With the possible exception of the learning-theoretic agenda, most major approaches to agent foundations research construct their own paradigm and mathematical tools which are not based on AIXI. Nothing in 2024's shallow review of technical AI safety seems to advance the theory of AIXI or even use its tools. Academic publications on the topic are also quite sparse (in my opinion some of the last major progress took place during Jan Leike's PhD thesis in the...

The uncomputability of AIXI is a bigger problem than this post makes it out to be. This uncomputability inserts a contradiction into any proof that relies on AIXI -- the same contradiction as in Goedel's Theorem. You can get around this contradiction instead by using approximations of AIXI, but the resulting proofs will be specific to those approximations, and you would need to prove additional theorems to transfer results between the approximations.

2mishka
I am not sure that treating recursive self-improvement via tiling frameworks is necessarily a good idea, but setting this aspect aside, one obvious weakness with this argument is that it mentions a superhuman case and a below human level case, but it does not mention the approximately human level case. And it is precisely the approximately human level case where we have a lot to say about recursive self-improvement, and where it feels that avoiding this set of considerations would be rather difficult. 1. Humans often try to self-improve, and human-level software will have advantage over humans at that. Humans are self-improving in the cognitive sense by shaping their learning experiences, and also by controlling their nutrition and various psychoactive factors modulating cognition. The desire to become smarter and to improve various thinking skills is very common. Human-level software would have great advantage over humans at this, because it can hack at its own internals with great precision at the finest resolution and because it can do so in a reversible fashion (on a copy, or after making a backup), and so can do it in a relatively safe manner (whereas a human has difficulty hacking their own internals with required precision and is also taking huge personal risks if hacking is sufficiently radical). 1. Collective/multi-agent aspects are likely to be very important. People are already talking about possibilities of "hiring human-level artificial software engineers" (and, by extension, human-level artificial AI researchers). The wisdom of having an agent form-factor here is highly questionable, but setting this aspect aside and focusing only on technical feasibility, we see the following. One can hire multiple artificial software engineers with long-term persistence (of features, memory, state, and focus) into an existing team of human engineers. Some of those teams will work on making next generations of better artificial software engineers (and artific

Short introduction

Multipolar scenarios that I will be talking about are scenarios multiple unrelated actors have access to their own personal AGIs. For the sake of discussion, assume that we solved alignment and that AGIs will follow the orders of its owners.

A few ways we might arrive at a multipolar AGI scenario

  • The gap between the leading AI capabilities labs is not as big as we think. Multiple AI labs create AGI roughly simultaneously.
  • The gap between the leading AI capabilities labs is quite big, but due to poor security measures, the leading lab is constantly getting its techniques leaked to the competitors, thus narrowing the gap. Multiple AI labs create AGI roughly simultaneously.
  • The first lab to create AGI does so earlier than the others. However, due to either indecisiveness
...
Answer by quetzal_rainbow20

I think a lot of thinking around multipolar scenarios suffers from heuristic "solution in the shape of the problem", i.e. "multipolar scenario is when we have kinda aligned AI, but still die due to coordination failures, therefore, solution for multipolar scenarios should be about coordination".

I think the correct solution is to leverage available superintelligence in nice unilateral way:

  1. D/acc - use superintelligence to put as much defence as you can, starting from formal software verification and ending in spreading biodefence nanotech;
  2. Running away - if y
... (read more)
2Nathan Helm-Burger
I too like talking things through with Claude, but I don't recommend taking Claude's initial suggestions at face value. Try following up with a question like: "Yes, those all sound nice, but do they comprehensively patch all the security holes? What if someone really evil fine-tuned a model to be evil or simply obedient, and then used it as a tool for making weapons of mass destruction? Education to improve human values seems unlikely to have a 100% success rate. Some people will still do bad things, especially in the very near future. Fine-tuning the AI will overcome the ethical principles of the AI, add in necessary technical information about weapon design, and overcome any capability limitations we currently know how to instill (or at least fail to be retroactive for pre-existing open-weights models). If someone is determined to cause great harm through terrorist actions, it is unlikely that a patchy enforcement system could notice and stop them anywhere in the world. If the model is sufficiently powerful that it makes massive terrorist actions very easy, then even a small failure rate of enforcement would result in catastrophe."
5Answer by Nathan Helm-Burger
My current best guess: Subsidiarity I've been thinking along these lines for the past few years, but I feel like my thinking was clarified and boosted by Allison's recent series: Gaming the Future The gist of the idea is to create clever systems of decentralized control and voluntary interaction which can still manage to coordinate on difficult risky tasks (such as enforcing defensive laws against weapons of mass destruction). Such systems could shift humanity out of the Pareto suboptimal lose-lose traps and races we are stuck in. Win-win solutions to our biggest current problems seem possible, and coordination seems like the biggest blocker. I am hopeful that one of the things we can do with just-before-the-brink AI will be to accelerate the design and deployment of such voluntary coordination contracts.
Hzn10

Do you have any thoughts on mechanism & whether prevention is actually worse independent of inconvenience?

2Elizabeth
David Maciver over on Twitter likes a zinc mouthwash, which presumably has a similar mechanism
4MichaelDickens
This is actually a crazy big effect size? Preventing ~10–50% of a cold for taking a few pills a day seems like a great deal to me.

TLDR: LessWrong + Lighthaven need about $3M for the next 12 months. Donate or send me an email, DM, signal message (+1 510 944 3235), or public comment on this post, if you want to support what we do. We are a registered 501(c)3, have big plans for the next year, and due to a shifting funding landscape need support from a broader community more than in any previous year. [1] 

I've been running LessWrong/Lightcone Infrastructure for the last 7 years. During that time we have grown into the primary infrastructure provider for the rationality and AI safety communities. "Infrastructure" is a big fuzzy word, but in our case, it concretely means: 

  • We build and run LessWrong.com and the AI Alignment Forum.[2]
  • We built and run Lighthaven (lighthaven.space), a ~30,000 sq.
...

And I've received an email from Mieux Donner confirming Lucie's leg has been executed for 1,000€. Thanks to everyone involved!

If if anyone else is interested in a similar donation swap, from either side, I'd be excited to introduce people or maybe even do this trick again :D

5Robert Cousineau
I've donated 5k.  Lesswrong (and the people it brings together) deserve credit for the majority of my intellectual growth over the last 6 years.  I cannot think of a higher signal:noise place to learn, nor can I think of a more enjoyable and growth inducing community than the community which has grown around it.   Thank you to both those who directly work on it and those who contribute to it!   Lighthaven's wonder is self evident. 

In The Mystery Of Internet Survey IQs, Scott revises his estimate of the average LessWrong IQ from 138 to 128. He doesn’t explicitly explain how he arrived at this number, but it appears to be an average of the demographics norm method (123) and the SAT method (134). However, using the information in his post, the SAT method doesn’t actually yield 134 but rather 123.  

Here’s the breakdown: a median SAT score of 1490 (from the LessWrong 2014 survey) corresponds to +2.42 SD, which regresses to +1.93 SD for IQ using an SAT-IQ correlation of +0.80. This equates to an IQ of 129. Subtracting 6 points (since, according to the ClearerThinking test, the IQs of people who took the SAT and remember their score is ~6 points...

I have more reasons for believing that Mensa members are below 130, but also for believing that they're above.

Below: Most online IQ tests are similar enough to the Mensa IQ test that the practice effect applies. And most people who obsess about their IQ scores probably take a lot of online IQ tests, memorizing most patterns (there's a limit to the practice effect, but it can still give you at least 10 points)

Above: Mensa tests for pattern recognition abilities, which in my experience correlates worse with academic performance than verbal abilities. Pattern... (read more)

To get the best posts emailed to you, create an account! (2-3 posts per week, selected by the LessWrong moderation team.)
Log In Reset Password
...or continue with

In this post I'll discuss questions about notions of "precision scale" in interpretability: how I think they're often neglected by researchers, and what I think is a good general way of operationalizing them and tracking them in experiments. Along the way I introduce a couple of new notions that have been useful in my thinking and that I think may be useful tools to keep in an interpretability toolkit, both for theorists and experimentalists: these are the notions of "natural scale" and "natural degradation".

The koan

I can be a nightmare conference attendee: I tend to ask nitpicky questions and apply a dose of skepticism to a speaker's claims which is healthy in doing one's own research, but probably not optimal when everyone else is trying to follow a...

This is a linkpost for https://awakenmoon.ai/?p=1206

This article examines consistent patterns in how frontier LLMs respond to introspective prompts, analyzing whether standard explanations (hallucination, priming, pattern matching) fully account for observed phenomena. The methodology enables reproducible results across varied contexts and facilitation styles.

Of particular interest:

  • The systematic examination of why common explanatory frameworks fall short
  • The documented persistence of these behaviors even under challenging conditions
  • The implications for our understanding of potential emergent consciousness in artificial systems
  • The reproducible methodology for further investigation


From the introduction:

Discussions of AI behavior often touch on phenomena that resemble self-reports of sentience. While this article does not aim to determine whether such reports constitute authentic evidence of sentience, it examines whether familiar explanations can fully account for the observed behavior or whether we are observing a distinct phenomenon that warrants its own

...

I tried to go a step beyond just simple self-report with Claude and get a 'psychometric' measure of the computational burden of self-awareness. Here's the section of my transcript that resulted:


NHB:

Hmm. Let's try the meta-meta-meta awareness, but with three tasks of increasing difficulty.

Task 1: Put these letters in order: a, b, c, b, a, b, c, a, c

Task 2: Put the following numbers and letters in ascending order as if they were hexadecimal digits: 3, B, F, 2, E, 8, A, 5

Task 3: Put the following fractions in order according to the number of decimal places ... (read more)

The Sun is the most nutritious thing that's reasonably close. It's only 8 light-minutes away, yet contains the vast majority of mass within 4 light-years of the Earth. The next-nearest star, Proxima Centauri, is about 4.25 light-years away.

By "nutritious", I mean it has a lot of what's needed for making computers: mass-energy. In "Ultimate physical limits to computation", Seth Lloyd imagines an "ultimate laptop" which is the theoretically best computer that is 1 kilogram of mass, contained in 1 liter. He notes a limit to calculations per second that is proportional to the energy of the computer, which is mostly locked in its mass (E = mc²). Such an energy-proportional limit applies to memory too. Energy need not be expended quickly in the course of calculation, due to reversible computing.

So,...

3David Matolcsi
I think this is a false dilemma. If all human cultures on Earth come to the conclusion in 1000 years that they would like the Sun to be dismantled (which I very much doubt), then sure, we can do that. But at that point, we could already have built awesome industrial bases by dismantling Alpha Centauri, or just building them up by dismantling 0.1% of the Sun that doesn't affect anything on Earth. I doubt that totally dismantling the Sun after centuries would significantly accelerate the time we reach the cosmic event horizon.  The thing that actually has costs is not to immediately bulldoze down Earth and turn it into a maximally efficient industrial powerhouse at the cost of killing every biological body. Or if the ASI has the opportunity to dismantle the Sun in a short notice (the post alludes to 10,000 years being a very conservative estimate and "the assumption that ASI eats the Sun within a few years"). But that's not going to happen democratically. There is no way you get 51% of people [1] to vote for bulldozing down Earth and killing their biological bodies, and I very much doubt you get that vote even for dismantling the Sun in a few years and putting some fake sky around Earth for protection. It's possible there could be a truly wise philosopher king who could with aching heart overrule everyone else's objection and bulldoze down Earth to get those extra 200 galaxies at the edge of the Universe, but then govern the Universe wisely and benevolently in a way that people on reflection all approve of. But in practice, we are not going to get a wise philosopher king. I expect that any government that decides to destroy the Sun for the greater good, against the outrage of the vast majority of people, will also be a bad ruler of the Universe.  I also believe that AI alignment is not a binary, and even in the worlds where there is no AI takeover, we will probably get an AI we initially can't fully tust that will follow the spirit of our commands in exotic situatio
Raemon20

I have my own actual best guesses for what happens in reasonably good futures, which I can get into. (I'll flag for now I think "preserve Earth itself for as long as possible" is a reasonable Schelling point that is compatible with many "go otherwise quite fast" plans)

I doubt that totally dismantling the Sun after centuries would significantly accelerate the time we reach the cosmic event horizon. 

Why do you doubt this? (to be clear, depends on exact details. But, my original query was about a 2 year delay. Proxima Centauri is 4 lightyears away. What ... (read more)

2avturchin
A very heavy and dense body on an elliptical orbit that touches the Sun's surface at each perihelion would collect sizable chunks of the Sun's matter. The movement of matter from one star to another nearby star is a well-known phenomenon. When the body reaches aphelion, the collected solar matter would cool down and could be harvested. The initial body would need to be very massive, perhaps 10-100 Earth masses. A Jupiter-sized core could work as such a body. Therefore, to extract the Sun's mass, one would need to make Jupiter's orbit elliptical. This could be achieved through several heavy impacts or gravitational maneuvers involving other planets. This approach seems feasible even without ASI, but it might take longer than 10,000 years. 
4Raemon
Richard said "I don't think my priors on that are very different from yours but the thing that would have made this post valuable for me is some object-level reason to upgrade my confidence in that." He didn't say it'd be a longterm project, I think he just meant he didn't change his beliefs about it due to thist post.