All of Julian Bradshaw's Comments + Replies

I tend to brush hard

Unless a dentist has told you to do this for some reason, you should know this is not recommended. Brushing hard can hurt tooth enamel and cause gum recession (aka your gums shrink down, causes lots of problems).

And, at risk of quashing OP's admirable spirit, a more "robust" toothbrush would exacerbate the relevant harms

Correct. See a more complete list of scaffold features here.

This is kinda-sorta being done at the moment, after Gemini beat the game, the stream has just kept on going. Currently Gemini is lost in Mt. Moon, as is tradition. In fact, the fact that it already explored Mt. Moon earlier seems to be hampering it (no unexplored areas on minimap to lure it towards the right direction).

I believe the dev is planning to do a fresh run soon-ish once they've stabilized their scaffold.

Yeah it's not open source or published anywhere unfortunately.

Gemini 2.5 Pro just beat Pokémon Blue. (https://x.com/sundarpichai/status/1918455766542930004)

A few things ended up being key to the successful run:

  1. Map labeling - very detailed labeling of individual map tiles (including identifying tiles that move you to a new location ("warps" like doorways, ladders, cave entrances, etc.) and identifying puzzle entities)
  2. Separate instances of Gemini with different, narrower prompts - these were used by the main Gemini playing the game to reason about certain tasks (ex. navigation, boulder puzzles, critique of current p
... (read more)
3Thane Ruthenis
I take it the final iteration isn't published anywhere yet? Wasn't able to find it. Seems like the most important part for deciding how to update on that.
6tailcalled
Were these key things made by the AI, or by the people making the run?

Yeah by "robust" I meant "can programmatically interact with game".

There's at least workable tools for Pokémon FireRed (the 2004 re-release of the 1996 original) it turns out, and you can find a scaffold using that here.

Yeah it is confusing. You'd think there's tons of available data on pixelated game screens. Maybe training on it somehow degrades performance on other images?

This has been a consistent weakness of OpenAI's image processing from the start: GPT-4-V came with clearcut warnings against using it on non-photographic inputs like screenshots or documents or tables, and sure enough, I found that it was wildly inaccurate on web page screenshots.

(In particular, I had been hoping to use it to automate Gwern.net regression detection: use a headless browser to screenshot random points in Gwern.net and report back if anything looked 'wrong'. It seemed like the sort of 'I know it when I see it' judgment task a VLM ought to be ... (read more)

I'll let you know. They're working on open-sourcing their scaffold at the moment.

Actually another group released VideoGameBench just a few days ago, which includes Pokémon Red among other games. Just a basic scaffold for Red, but that's fair.

As I wrote in my other post:

Why hasn't anyone run this as a rigorous benchmark? Probably because it takes multiple weeks to run a single attempt, and moreover a lot of progress comes down to effectively "model RNG" - ex. Gemini just recently failed Safari Zone, a difficult challenge, because its inventory happened to be full and it couldn't accept an item it needed. And ex. Claude has taken wildly

... (read more)
2Cole Wyeth
And as long as they keep stumbling around like this, I will remain skeptical of AGI arriving in a few years. 

Re: biosignatures detected on K2-18b, there's been a couple popular takes saying this solves the Fermi Paradox: K2-18b is so big (8.6x Earth mass) that you can't get to orbit, and maybe most life-bearing planets are like that.

This is wrong on several bases:

  1. You can still get to orbit there, it's just much harder (only 1.3g b/c of larger radius!) (https://x.com/CheerupR/status/1913991596753797383)
  2. It's much easier for us to detect large planets than small ones (https://exoplanets.nasa.gov/alien-worlds/ways-to-find-a-planet), but we expect small ones to be
... (read more)

I would say "agent harness" is a type of "scaffolding". I used it in this case because it's how Logan Kilpatrick described it in the tweet I linked at the beginning of the post.

3Ozyrus
Thanks! That makes perfect sense.

I'm not sure that TAS counts as "AI" since they're usually compiled by humans, but the "PokeBotBad" you linked is interesting, hadn't heard of that before. It's an Any% Glitchless speedrun bot that ran until ~2017 and which managed a solid 1:48:27 time on 2/25/17, which was better than the human world record until 2/12/18. Still, I'd say this is more a programmed "bot" than an AI in the sense we care about.

Anyway, you're right that the whole reason the Pokémon benchmark exists is because it's interesting to see how well an untrained LLM can do playing it.

1Edmund Nelson
>I'm not sure that TAS counts as "AI" since they're usually compiled by humans Agreed, it's more "this is what the limit looks like" >Still, I'd say this is more a programmed "bot" than an AI in the sense we care about. Is stockfish 8 not an AI? I feel like the goalposts of what counts as "Ai" keep getting shifted. Pokebotbad is an "AI" that searches to solve the pokemon state space

since there's no obvious reason why they'd be biased in a particular direction

No I'm saying there are obvious reasons why we'd be biased towards truthtelling. I mentioned "spread truth about AI risk" earlier, but also more generally one of our main goals is to get our map to match the territory as a collaborative community project. Lying makes that harder.

Besides sabotaging the community's map, lying is dangerous to your own map too. As OP notes, to really lie effectively, you have to believe the lie. Well is it said, "If you once tell a lie, the truth is ... (read more)

I'm not convinced SBF had conflicting goals, although it's hard to know. But more importantly, I don't agree rationalists "tend not to lie enough". I'm no Kantian, to be clear, but I believe rationalists ought to aspire to a higher standard of truthtelling than the average person, even if there are some downsides to that. 

0eva_
What would you say to the suggestion that rationalists ought to aspire to have the "optimal" standard of truthtelling, and that standard might well be higher or lower than what the average person is doing already (since there's no obvious reason why they'd be biased in a particular direction), and that we'd need empirical observation and seriously looking at the payoffs that exist to figure out approximately how readily to lie is the correct readiness to lie?

Have we forgotten Sam Bankman-Fried already? Let’s not renounce virtues in the name of expected value so lightly. 
 

Rationalism was founded partly to disseminate the truth about AI risk. It is hard to spread the truth when you are a known liar, especially when the truth is already difficult to believe. 

6Garrett Baker
It seems pretty likely SBF happened because everyone in EA was implicitly trusting everyone else in EA. If people were more suspicious of each other, that seems less likely to have been allowed to happen.
4Jiro
Scott once had a post about how it's hard to get advice only to the people who need it. Sam Bankman-Fried may have lied too much (although the real problem was probably goals that conflict with ours) but the essay here is aimed at the typical LW geek, and LW geeks tend not to lie enough.

Huh, seems you are correct. They also apparently are heavily cannibalistic, which might be a good impetus for modeling the intentions of other members of your species…

7ChristianKl
I searched a bit more and it seems they don't have personal relationships with other members of the same species the way mammals and birds can. Personal relationships seem to something that needs intelligence and that birds and mammals evolved separately.

Oh okay. I agree it's possible there's no Great Filter.

Dangit I can't cease to exist, I have stuff to do this weekend.

But more seriously, I don't see the point you're making? I don't have a particular objection to your discussion of anthropic arguments, but also I don't understand how it relates to the "what part of evolution/planetary science/sociology/etc. is the Great Filter" scientific question.

3Knight Lee
What I'm trying to argue is that there could easily be no Great Filter, and there could exist trillions of trillions of observers who live inside the light cone of an old alien civilization, whether directly as members of the civilization, or as observers who listen to their radio. It's just that we're not one of them. We're one of the first few observers who aren't in such a light cone. Even though the observers inside such light cones outnumber us a trillion to one, we aren't one of them. :) if you insist on scientific explanations and dismiss anthropic explanations, then why doesn't this work as an answer?

I think if you frame it as:

if most individuals exist inside the part of the light cone of an alien civilization, why aren't we one of them?

Then yes, 1.0 influence and 4.0 influence both count as "part of the light cone", and so for the related anthropic arguments you could choose to group them together.

But re: anthropic arguments,

Not only am I unable to explain why I'm an observer who doesn't see aliens

This is where I think I have a different perspective. Granting that anthropic arguments (here, about which observer you are and the odds of that) cause frus... (read more)

3Knight Lee
Okay I guess we're getting into the anthropic arguments then :/ So both the Fermi Paradox and the Doomsday Argument are asking, "assuming that the typical civilization lasts a very long time and has trillions of trillions of individuals inside the part of its lightcone it influences (either as members in the Doomsday Argument, or observers in the Fermi Paradox). Then why are we one of the first 100 billion individuals in our civilization?" Before I try to answer it, I first want to point out that even if there was no answer, we should behave as if there was no Doomsday nor great filter. Because from a decision theory point of view, you don't want your past self, in the first nanosecond of your life, to use the Doomsday Argument to prove he's unlikely to live much longer than a nanosecond, and then spend all his resources in the first nanosecond. For the actual answer, I only have theories. One theory is this. "There are so many rocks in the universe, so why am I a human rather than a rock?" The answer is that rocks are not capable of thinking "why am I X rather than Y," so given that you think such a thing, you cannot be a rock and have to be something intelligent like a human. I may also ask you, "why, of all my millions of minutes of life, am I currently in the exact minute where I'm debating someone online about anthropic reasoning?" The answer might be similar to the rock answer: given you are thinking "why am I X rather than Y," you're probably in a debate etc. over anthropic reasoning. If you stretch this form of reasoning to its limits, you may get the result that the only people asking "why am I one of the first 100 billion observers of my civilization," are the people who are the first 100 billion observers. This obviously feels very unsatisfactory. Yet we cannot explain why exactly this explanation feels unsatisfactory, while the previous two explanations feel satisfactory, so maybe it's due to human biases that we reject the third argument by accep

I agree it's likely the Great Filter is behind us. And I think you're technically right, most filters are behind us, and many are far in the past, so the "average expected date of the Great Filter" shifts backward. But, quoting my other comment:

Every other possible filter would gain equally, unless you think this implies that maybe we should discount other evolutionary steps more as well. But either way, that’s still bad on net because we lose probability mass on steps behind us.

So even though the "expected date" shifts backward, the odds for "behind us or... (read more)

3Jonas Hallgren
I see your point, yet if the given evidence is 95% in the past, the 5% in the future only gets a marginal amount added to it, I do still like the idea of crossing off potential filters to see where the risks are so fair enough!

Interesting thought. I think you have a point about coevolution, but I don't think it explains away everything in the birds vs. mammals case. How much are birds really competing with mammals vs. other birds/other animals? Mammals compete with lots of animals, why did only birds get smarter? I tend to think intra-niche/genus competition would generate most of the pressure for higher intelligence, and for whatever reason that competition doesn't seem to lead to huge intelligence gains in most species.

(Re: octopus, cephalopods do have interactions with marine... (read more)

6ChristianKl
The Humboldt squid is an octupus that can coordinate to hunt together. 

Two objections:

  1. Granting that the decision theory that would result from reasoning based on the Fermi Paradox alone is irrational, we'd still want an answer to the question[1] of why we don't see aliens. If we live in a universe with causes, there ought to be some reason, and I'd like to know the answer.
  2. "why aren't we born in a civilization which 'sees' an old alien civilization" is not indistinguishable from "why aren't we born in an old [alien] civilization ourselves?" Especially assuming FTL travel limitations hold, as we generally expect, it would
... (read more)
1Knight Lee
For point 1, I can argue about how rational a decision theory is, but I cannot argue for "why I am this observer rather than that observer." Not only am I unable to explain why I'm an observer who doesn't see aliens, I am unable to explain why I am an observer believes 1+1=2, assuming there are infinite observers who believe 1+1=2 and infinite observers who believe 1+1=3. Anthropic reasoning becomes insanely hard and confusing and even Max Tegmark, Eliezer Yudkowsky and Nate Soares are confused. Let's just focus on point 2, since I'm much more hopeful I get get to the bottom of this one. Of course I don't believe in faster-than-light travel. I'm just saying that "being born as someone who sees old alien civilizations" and "being born as someone inside an old [alien] civilization" are technically the same, if you ignore the completely subjective and unnecessary distinction of "how much does the alien civilization need to influence me before I'm allowed to call myself a member of them?" Suppose at level 0 influence, the alien civilization completely hides from you, and doesn't let you see any of their activity. At level 1.0 influence, the alien civilization doesn't hide from you, and lets you look at their Dyson swarms or start lifting machines and all the fancy technologies. At level 1.1 influence, they let you see their Dyson swarms, plus they send radio signals to us, sharing all their technologies and allowing us to immediately reach technological singularity. Very quickly, we build powerful molecular assemblers, allowing us to turn any instructions into physical objects, and we read instructions from the alien civilization allowing us to build a copy of their diplomats. Some countries may be afraid to follow the radio instructions, but the instructions can easily be designed so that any country which refuses to follow the instructions will be quickly left behind. At this point, there will be aliens on Earth, we'll talk about life and everything, and we are

Yes. Every other possible filter would gain equally, unless you think this implies that maybe we should discount other evolutionary steps more as well. But either way, that’s still bad on net because we lose probability mass on steps behind us.

Couple takeaways here. First, quoting the article:

By comparing the bird pallium to lizard and mouse palliums, they also found that the neocortex and DVR were built with similar circuitry — however, the neurons that composed those neural circuits were distinct.

“How we end up with similar circuitry was more flexible than I would have expected,” Zaremba said. “You can build the same circuits from different cell types.”

This is a pretty surprising level of convergence for two separate evolutionary pathways to intelligence. Apparently the neural circuits are so ... (read more)

1[comment deleted]
8Yoreth
I previously posted Was the K-T event a Great Filter? as a pushback against the notion that different lineages of life on Earth evolving intelligence is really "independent evidence" in any meaningful sense. Intelligence can evolve only if there's selective pressure favoring it, and a large part of that pressure likely comes from the presence of other intelligent creatures competing for resources. Therefore mammals and birds together really should only count as one data point. (It's more plausible that octopus intelligence is independent, since the marine biome is largely separate from the terrestrial, although of course not totally.)
6Davidmanheim
Or that the filter is far behind us - specifically, Eukaryotes only evolved once. And in the chain-model by Sandberg et al, pre-intelligence filters are the vast majority of the probability mass, so it seems to me that eliminating intelligence as a filter shifts the remaining probability mass for a filter backwards in time in expectation.
2Knight Lee
I agree with everything you said but I disagree that the Fermi Paradox needs explanation. Fermi Paradox = Doomsday Argument The Fermi Paradox simply asks, "why haven't we seen aliens?" The answer is that any civilization which an old alien civilization chooses to communicate to (and is able to reach), will learn so much technology that they will quickly reach the singularity. They will be influenced so much that they effectively become a "province" within the old alien civilization. So the Fermi Paradox question "why aren't we born in a civilization which "sees" an old alien civilization," is actually indistinguishable from the Doomsday Argument question "why aren't we born in an old [alien] civilization ourselves?" Doomsday Argument is wrong Here's the problem: the Doomsday Argument is irrational from a decision theory point of view. Suppose your parents were Omega and Omego. The instant you were born, they hand you a $1,000,000 allowance, and they instantly ceased to exist. If you were rational in the first nanosecond of your life, the Doomsday Argument would prove it's extremely unlikely you'll live much longer than 1 nanosecond, and you should spend all your money immediately. If you actually believe the Doomsday Argument, you should thank your lucky stars that you weren't rational in the first nanosecond of your life. Both SSA and SIA are bad decision theories (when combined with CDT), because they are optimizing something different than your utility. Explanation SSA is optimizing the duration of time your average copy has correct probabilities. SIA is optimizing the duration of time your total copies have the correct probabilities. SSA doesn't care if the first nanosecond you is wrong, because he's a small fraction of your time (even if he burns your life savings resulting in near 0 utility). SIA doesn't care if you're almost certainly completely wrong (grossly overestimating the probability of counterfactuals with more copies of you), because
3Pretentious Penguin
To clarify the last part of your comment, the ratio of the probability of the Great Filter being in front of us to the probability of the Great Filter being behind tool-using intelligent animals should be unchanged by this update, right?

Both the slowdown and race models predict that the future of Humanity is mostly in the hands of the United States - the baked-in disadvantage in chips from existing sanctions on China is crippling within short timelines, and no one else is contending.

So, if the CCP takes this model seriously, they should probably blockade Taiwan tomorrow? It's the only fast way to equalize chip access over the next few years. They'd have to weigh the risks against the chance that timelines are long enough for their homegrown chip production to catch up, but there seems to ... (read more)

I think that the scenario of the war between several ASI (each merged with its origin country) is underexplored. Yes, there can be a value handshake between ASIs, but their creators will work to prevent this and see it as a type of misalignment. 

Somehow, this may help some groups of people survive, as those ASI which preserve their people will look more trustworthy in the eyes of other ASIs, and this will help them form temporary unions.

The final outcome will be highly unstable: either one ASI will win, or several ASIs will start space exploration in different directions. 

I'm generally pretty receptive to "adjust the Overton window" arguments, which is why I think it's good PauseAI exists, but I do think there's a cost in political capital to saying "I want a Pause, but I am willing to negotiate". It's easy for your opponents to cite your public Pause support and then say, "look, they want to destroy America's main technological advantage over its rivals" or "look, they want to bomb datacenters, they're unserious". (yes Pause as typically imagined requires international treaties, the attack lines would probably still work, ... (read more)

2Davidmanheim
Very briefly, the fact that "The political position AI safety has mostly taken" is a single stance is evidence that there's no room for even other creative solutions, so we've failed hard at expanding that Overton window. And unless you are strongly confident in that as the only possibly useful strategy, that is a horribly bad position for the world to be in as AI continues to accelerate and likely eliminate other potential policy options.

So I realized Amad’s comment obsession was probably a defense against this dynamic - “I have to say something to my juniors when I see them”.

I think there's a bit of a trap here where, because Amad is known for always making a comment whenever he ends up next to an employee, if he then doesn't make a comment next to someone, it feels like a deliberate insult.

That said, I see the same behavior from US tech leadership pretty broadly, so I think the incentive to say something friendly in the elevator is pretty strong to start (norms of equality, first name basis, etc. in tech), and then once you start doing that you have to always do it to avoid insult.

2Maxwell Peterson
Good point!

I think the concept of Pausing AI just feels unrealistic at this point.

  1. Previous AI safety pause efforts (GPT-2 release delay, 2023 Open Letter calling for a 6 month pause) have come to be seen as false alarms and overreactions
  2. Both industry and government are now strongly committed to an AI arms race
  3. A lot of the non-AI-Safety opponents of AI want a permanent stop/ban in the fields they care about, not a pause, so it lacks for allies
  4. It's not clear that meaningful technical AI safety work on today's frontier AI models could have been done before they were inv
... (read more)
8Davidmanheim
I think this is wrong - the cost in political capital for saying that it's the best solution seems relatively low, especially if coupled with an admission that it's not politically viable. What I see instead is people dismissing it as a useful idea even in theory, saying it would be bad if it were taken seriously by anyone, and moving on from there. And if nothing else, that's acting as a way to narrow the Overton window for other proposals!

Copying over a comment from Chris Olah of Anthropic on Hacker News I thought was good: (along with parent comment)
    
fpgaminer

> This is powerful evidence that even though models are trained to output one word at a time

I find this oversimplification of LLMs to be frequently poisonous to discussions surrounding them. No user facing LLM today is trained on next token prediction.

    
 olah3

Hi! I lead interpretability research at Anthropic. I also used to do a lot of basic ML pedagogy (https://colah.github.io/). I

... (read more)

Good objection. I think gene editing would be different because it would feel more unfair and insurmountable. That's probably not rational - the effect size would have to be huge for it to be bigger than existing differences in access to education and healthcare, which are not fair or really surmountable in most cases - but something about other people getting to make their kids "superior" off the bat, inherently, is more galling to our sensibilities. Or at least mine, but I think most people feel the same way.

Yeah referring to international sentiments. We'd want to avoid a "chip export controls" scenario, which would be tempting I think.

0TsviBT
For chip exports, is this mainly a question of "other countries will have a harder time getting the very latest chip designs"? (I don't know anything about the chip export thing.) For germline engineering, I expect the technological situation to be significantly better for democratization, though not perfect. With chips, the manufacturing process is very complex and very concentrated in a couple companies; and designing chips is very complex; and this knowledge is siloed in commercial orgs. With germline engineering, most of the research is still happening in public academic research (though not all of it), and is happening in several different countries. There could definitely still be significant last-mile breakthroughs that get siloed in industry in one or a few countries, but I'd be pretty surprised if it was nearly as embargoable as chip stuff. E.g. if someone gets in vitro oogenesis, it might be because they figured out some clever sequence of signaling contexts to apply to a stem cell; but they'd probably be working with culture methods not too different from published stuff, and would be working off of published gene regulatory networks based on published scRNA-seq data, etc. Not sure though.

Re: HCAST tasks, most are being kept private since it's a benchmark. If you want to learn more here's the METR's paper on HCAST.

Thanks for the detailed response!

Re: my meaning, you got it correct here:

Spiritually, genomic liberty is individualistic / localistic; it says that if some individual or group or even state (at a policy level, as a large group of individuals) wants to use germline engineering technology, it is good for them to do so, regardless of whether others are using it. Thus, it justifies unequal access, saying that a world with unequal access is still a good world.

Re: genomic liberty makes narrow claims, yes I agree, but my point is that if implemented it will lead ... (read more)

4TsviBT
Thanks for the example. I think nuclear is a special case (though lots of cases are special in different ways): It takes a pretty large project to start up; and it comes with nuclear proliferation, which is freaky because of bombs. Wait, I'm confused; I thought we both think it's at least fairly likely to go well within the US, i.e. lots of people and diverse people have access. So then they can say "it is good, and we are happy about it and want it to be shared, or at least are not going to seriously impede that". (Call me pollyanna if you want lol, but that's kinda what I mainline expect I think?) ....Oh is this also referring to countries being resentful? Hm... Possibly I should be advocating for the technology to not be embargoed/withheld by the federal government (like some military technology is)?

This is a thoughtful post, and I appreciate it. I don't think I disagree with it from a liberty perspective, and agree there are potential huge benefits for humanity here.

However, my honest first reaction is "this reasoning will be used to justify a world in which citizens of rich countries have substantially superior children to citizens of poor countries (as viewed by both groups)". These days, I'm much more suspicious of policies likely to be socially corrosive: it leads to bad governance at a time where, because of AI risk, we need excellent governance... (read more)

8jimrandomh
This seems like an argument that proves too much; ie, the same argument applies equally to childhood education programs, improving nutrition, etc. The main reason it doesn't work is that genetic engineering for health and intelligence is mostly positive-sum, not zero-sum. Ie, if people in one (rich) country use genetic engineering to make their descendents smarter and the people in another (poor) country don't, this seems pretty similar to what has already happened with rich countries investing in more education, which has been strongly positive for everyone.
4TsviBT
Other points: 1. There are diminishing returns and a ceiling on gains. The ceiling is quite high, but it's there. You can only decrease disease risk to "roughly 0", and this probably happens with medium-strength engineering (vaguely speaking); longevity probably (I speculate) has some caps coming from aging processes that aren't fixed using genetic variants existing in the population; and IQ can't be safely pushed way outside the human range. This means that countries could get ahead, but probably not crazy-ahead, and in the long run it should be easier to catch up than to pull ahead farther. (Ok this isn't clear because maybe a head start on intelligence amplification snowballs or something.) 2. Uptake will start slow. It will speed up when people see how good the results are. But that information will be available to roughly everyone at roughly the same time: everyone sees the exceptionally healthy, capable kids at the same time, wherever those kids were. So for the big ramp-up, part that would matter on a national-national scale, there's less of a head start. (There's still probably serial lead time for some elements, but who knows.) 3. I do think there's significant risk of inequality between ancestry groups, which relates to states though not one to one. That's because there's quite large inequalities between how much genomic data has been collected for different groups (see e.g. here: https://gwasdiversitymonitor.com/, though this is about GWASes, not exactly genomic data). Current PGSes don't translate between groups very well. One way to address this is of course to gather more diverse data. (But the situation might not be so bad: plausibly once you more accurately identify which genetic variants are causal, your PGSes generalize between groups much better, or it takes much less additional data from the group to make scores that generalize.)
5TsviBT
I'm not quite sure I follow. Let me check. You might be saying: Or maybe you're saying: To be kinda precise and narrow, the narrow meaning of genomic liberty as a negative right doesn't say it's good or even ok to have worlds with unequal access. As a moral claim, it more narrowly says This does say something about the unequal world--namely that it's "not so morally abhorrent that we should [have international regulation backed by the threat of war] to prevent that use". I don't think that's a ringing endorsement. To be honest, mainly I've thought about inequality within single economic and jurisdictional regimes. (I think that objection is more common than the international version.) I'm not even sure how to orient to international questions--practically, morally, and conceptually. Probably I'd want to have at least a tiny bit of knowledge about other international relations (conflicts, law, treaties, cooperative projects, aid, long-term development) before thinking about this much. I'm unclear on the ethics of one country doing something that doesn't harm other countries in any direct way, and even directly helps other coutries at least on first-order; but also doesn't offer that capability to other countries especially vigorously. Certainly, personally, I take a humanist and universalist stance: It is good for all people to be empowered; it is bad for some group to try to have some enduring advantage over others by harming or suppressing the others or even by permanently withholding information that would help them. It does seem good to think about the international question. I'm unsure whether it should ultimately be a crux, though. I do think it's better if germline engineering is developed in the US before, say, Russia, because the US will work out the liberal version, and will be likely to be generous with the technology in the long-run. It would almost certainly happen to a significant extent. Right, this is a big part of what I hope, and somewhat e

Here's an interesting thread of tweets from one of the paper's authors, Elizabeth Barnes.
Quoting the key sections:

Extrapolating this suggests that within about 5 years we will have generalist AI systems that can autonomously complete ~any software or research engineering task that a human professional could do in a few days, as well as a non-trivial fraction of multi-year projects, with no human assistance or task-specific adaptations required.

However, (...) It’s unclear how to interpret “time needed for humans”, given that this varies wildly between diffe

... (read more)
5Thomas Kwa
That's basically correct. To give a little more context for why we don't really believe this number, during data collection we were not really trying to measure the human success rate, just get successful human runs and measure their time. It was very common for baseliners to realize that finishing the task would take too long, give up, and try to collect speed bonuses on other tasks. This is somewhat concerning for biasing the human time-to-complete estimates, but much more concerning for this human time horizon measurement. So we don't claim the human time horizon as a result.

More than just this. OP actually documents it pretty well, see here.

Random commentary on bits of the paper I found interesting:

Under Windows of opportunity that close early:

Veil of ignorance

Lastly, some important opportunities are only available while we don’t yet know for sure who has power after the intelligence explosion. In principle at least, the US and China could make a binding agreement that if they “win the race” to superintelligence, they will respect the national sovereignty of the other and share in the benefits. Both parties could agree to bind themselves to such a deal in advance, because a guarantee of contr

... (read more)

Okay I got trapped in a Walgreens and read more of this, found something compelling. Emphasis mine:

The best systems today fall short at working out complex problems over longer time horizons, which require some mix of creativity, trial-and-error, and autonomy. But there are signs of rapid improvement: the maximum duration of ML-related tasks that frontier models can generally complete has been doubling roughly every seven months. Naively extrapolating this trend suggests that, within three to six years, AI models will become capable of automating many cogn

... (read more)
3Lizka
FYI: the paper is now out.  See also the LW linkpost: METR: Measuring AI Ability to Complete Long Tasks, and a summary on Twitter.  (IMO this is a really cool paper — very grateful to @Thomas Kwa et al. I'm looking forward to digging into the details.)
4Thomas Kwa
The 7-month doubling trend we measured actually goes back to GPT-2 in 2019. Since 2024, the trend has been faster, doubling roughly every 3-4 months depending on how you measure, but we only have six 2024-25 models so error bars are wide and it's really unclear which trend will be predictive of the future.
1Oliver Daniels
I've been confused what people are talking about when they say "trend lines indicate AGI by 2027" - seems like it's basically this?

Meta: I'm kind of weirded out by how apparently everyone is making their own high-effort custom-website-whitepapers? Is this something that's just easier with LLMs now? Did Situational Awareness create a trend? I can't read all this stuff man.

In general there seems to be way more high-effort work coming out since reasoning models got released. Maybe it's just crunchtime.

4Daniel Kokotajlo
Websites are just a superior format for presenting information, compared to e.g. standard blog posts or PDFs. You can do more with them to display information in the way you want + there's less friction for the user.

I think it's something of a trend relating to a mix of 'tools for thought' and imitation of some websites (LW2, Read The Sequences, Asterisk, Works in Progress & Gwern.net in particular), and also a STEM meta-trend arriving in this area: you saw this in security vulnerabilities where for a while every major vuln would get its own standalone domain + single-page website + logo + short catchy name (eg. Shellshock, Heartbleed). It is good marketing which helps you stand out in a crowded ever-shorter-attention-span world.

I also think part of it is that it ... (read more)

7wdmacaskill
There's definitely a new trend towards custom-website essays. Forethought is a website for lots of research content, though (like Epoch), not just PrepIE. And I don't think it's because of people getting more productive because of reasoning models - AI was helpful for PrepIE but more like 10-20% productivity boost than 100% boost, and I don't think AI was used much for SA, either.

I meant test-time compute as in the compute expended in the thinking Claude does playing the game. I'm not sure I'm convinced that reasoning models other than R1 took only a few million dollars, but it's plausible. Appreciate the prediction!

Amazingly, Claude managed to escape the blackout strategy somehow. Exited Mt. Moon at ~68 hours.

It does have a lot of the info, but it doesn't always use it well. For example, it knows that Route 4 leads to Cerulean City, and so sometimes thinks there's a way around Mt. Moon that sticks solely to Route 4.

No idea. Be really worried, I guess—I tend a bit towards doomer. There's something to be said for not leaving capabilities overhangs lying around, though. Maybe contact Anthropic?

The thing is, the confidence the top labs have in short-term AGI makes me think there's a reasonable chance they have the solution to this problem already. I made the mistake of thinking they didn't once before - I was pretty skeptical that "more test-time compute" would really unhobble LLMs in a meaningful fashion when Situational Awareness came out and didn't elaborate at all on how that would work. But it turned out that at least OpenAI, and probably Anthropic too, already had the answer at the time.

I think this is a fair criticism, but I think it's also partly balanced out by the fact that Claude is committed to trying to beat the game. The average person who has merely played Red probably did not beat it, yes, but also they weren't committed to beating it. Also, Claude has pretty deep knowledge of Pokémon in its training data, making it a "hardcore gamer" both in terms of knowledge and willingness to keep playing. In that way, the reference class of gamers who put forth enough effort to beat the game is somewhat reasonable.

3Davidmanheim
I mostly agree, but "the reference class of gamers who put forth enough effort to beat the game" is still necessarily truncated by omitting any who nonetheless failed to complete it, and is likely also omitting gamers embarrassed of how long it took them.

It's definitely possible to get confused playing Pokémon Red, but as a human, you're much better at getting unstuck. You try new things, have more consistent strategies, and learn better from mistakes. If you tried as long and as consistently as long as Claude is, even as a 6-year-old, you'd do much better.

I played Pokémon Red as a kid too (still have the cartridge!), it wasn't easy, but I beat it in something like that 26 hour number IIRC. You have a point that howlongtobeat is biased towards gamers, but it's the most objective number I can find, and it feels reasonable to me.

as a human, you're much better at getting unstuck

I'm not sure! Or well, I agree that 7-year-old me could get unstuck by virtue of having an "additional tool" called "get frustrated and cry until my mom took pity and helped."[1] But we specifically prevent Claude from doing stuff like that!

I think it's plausible that if we took an actual 6-year-old and asked them to play Pokemon on a Twitch stream, we'd see many of the things you highlight as weaknesses of Claude: getting stuck against trivial obstacles, forgetting what they were doing, and—yes—complai... (read more)

Thanks for the correction! I've added the following footnote:

Actually it turns out this hasn't been done, sorry! A couple RNG attempts were completed, but they involved some human direction/cheating. The point still stands only in the sense that, if Claude took more random/exploratory actions rather than carefully-reasoned shortsighted actions, he'd do better.

I think the idea behind MAIM is to make it so neither China nor the US can build superintelligence without at least implicit consent from the other. This is before we get to the possibility of first strikes.

If you suspect an enemy state is about to build a superintelligence which they will then use to destroy you (or that will destroy everyone), you MAIM it. You succeed in MAIMing it because everyone agreed to measures making it really easy to MAIM it. Therefore, for either side to build superintelligence, there must be a general agreement to do so. If the... (read more)

4Ben Livengood
I think MAIM might only convince people who have p(doom) < 1%. If we're at the point that we can convincingly say to each other "this AGI we're building together can not be used to harm you" we are way closer to p(doom) == 0 than we are right now, IMHO. Otherwise why would the U.S. or China promising to do AGI research in a MAIMable way be any more convincing than the strategies at alignment that would first be necessary to trust AGI at all?  The risk is "anyone gets AGI" until p(doom) is low, and at that point I am unsure if any particular country would choose to forego AGI if it didn't perfectly align politically because, again, for one random blob of humanness to convince an alien-minded AGI to preserve aspects of the random blob it cares about, it's likely to encompass 99.9% of what other human blobs care about. Where that leaves us is that if U.S. and China have very different estimates of p(doom) they are unlikely to cooperate at all in making AGI progress legible to each other.  And if they have similar p(doom) they either cooperate strongly to prevent all AGI or cooperate to build the same thing, very roughly.

This is creative.

TL;DR: To mitigate race dynamics, China and the US should deliberately leave themselves open to the sabotage ("MAIMing") of their frontier AI systems. This gives both countries an option other than "nuke the enemy"/"rush to build superintelligence first" if superintelligence appears imminent: MAIM the opponent's AI. The deliberately unmitigated risk of being MAIMed also encourages both sides to pursue carefully-planned and communicated AI development, with international observation and cooperation, reducing AINotKillEveryone-ism risks.

The ... (read more)

After an inter-party power-struggle, the CCP commits to the perpetual existence of at least one billion Han Chinese people with biological reproductive freedom

You know, this isn't such a bad idea - that is, explicit government commitments against discarding their existing, economically-unproductive populace. Easier to ask for today, rather than later.

Hypothetically this is more valuable in autocracies than in democracies, where the 1 person = 1 vote rule keeps political power in the hands of the people, but I think I'd support adding a constitutional amend... (read more)

It's unclear exactly what the product GPT-5 will be, but according to OpenAI's Chief Product Officer today it's not merely a router between GPT-4.5/o3.

swyx
appreciate the update!!

in gpt5, are gpt* and o* still separate models under the hood and you are making a model router? or are they going to be unified in some more substantive way?


Kevin Weil
Unified 👍

1Thane Ruthenis
Fair enough, I suppose calling it an outright wrapper was an oversimplification. It still basically sounds like just the sum of the current offerings.
Load More