I'm currently researching forecasting and epistemics as part of the Quantified Uncertainty Research Institute.
I'm somewhere between the stock market and the rationalist/EA community on this.
I'm hesitant to accept a claim like "rationalists are far better at the stock market than other top traders". I agree that the general guess "AI will do well" generally was more correct than the market, but it was just one call (in which case luck is a major factor), and there were a lot of other calls made there that aren't tracked.
I think we can point to many people who did make money, but I'm not sure how much this community made on average.
Manifold traders typically give a 27% chance of 500B actually being deployed in 4 years. There's also a more interesting market of more precisely how much will be.
I get the impression that Trump really likes launching things with big numbers, and cares much less about the details or correctness.
That said, it's possible that the government's involvement still increases spending by 20%+, which would still be significant.
Instead, we seem to be headed to a world where
- Proliferation is not bottlenecked by infrastructure.
- Regulatory control through hardware restriction becomes much less viable.
I like the rest of your post, but I'm skeptical of these specific implications.
Even if everyone has access to the SOTA models, some actors will have much more hardware to run on them, and I expect this to matter. This does make the offense/defense balance more weighted on the offense side, arguably, but there are many domains where extra thinking will help a lot.
More generally, and I hate-to-be-that-guy, but I think it's telling that prediction markets and stock markets haven't seem to update that much since R1's release. I think it's generally easy to get hyped up over whatever is the latest thing, and agree that R1 is really neat, but am skeptical of how much it really should cause us to update, in the scheme of things.
I found this extra information very useful, thanks for revealing what you did.
Of course, to me this makes OpenAI look quite poor. This seems like an incredibly obvious conflict of interest.
I'm surprised that the contract didn't allow Epoch to release this information until recently, but that it does allow Epoch to release the information after. This seems really sloppy for OpenAI. I guess they got a bit extra publicity when o3 was released (even though the model wasn't even available), but now it winds up looking worse (at least for those paying attention). I'm curious if this discrepancy was maliciousness or carelessness.
Hiding this information seems very similar to lying to the public. So at very least, from what I've seen, I don't feel like we have many reasons to trust their communications - especially their "tweets from various employees."
> However, we have a verbal agreement that these materials will not be used in model training.
I imagine I can speak for a bunch of people here when I can say I'm pretty skeptical. At very least, it's easy for me to imagine situations where the data wasn't technically directly used in the training, but was used by researchers when iterating on versions, to make sure the system was going in the right direction. This could lead to a very blurry line where they could do things that aren't [literal LLM training] but basically achieve a similar outcome.
That's highly relevant, thanks!
It's possible that from the authors perspective, the specific semantic meanings I took from terms like "automated alignment research" and "fleets" wasn't implied. But if I made the mistake, I'm sure other readers will as well, so I'd like to encourage changes here before these phrases take off much further (if others agree with my take.)
I'm happy this area is getting more attention.
I feel nervous about the terminology. I think that terminology can presuppose some specific assumptions about how this should or will play out, that I don't think are likely.
"automating alignment research" -> I know this has been used before, it sounds very high-level to me. Like saying that all software used as part of financial trading workflows is "automating financial trading." I think it's much easier to say that software is augmenting financial trading or similar. There's not one homogeneous thing called "financial trading," the term typically emphasises the parts that aren't yet automated. The specific ways it's integrated sometimes involve it replacing entire people, sometimes involve it helping people, and often does both in complex ways.
"Algorithmic trading was not just about creating faster digital traders but about reimagining traders as fleets of bots, quants, engineers, and other specialists."
In software, the word fleet sometimes refers to specific deployment strategies. A whole lot of the automation doesn't look like "bots" - rather it's a lot of regular tools, plug-ins, helpers, etc.
"vast digital fleets of specialized AI agents working in concert"
This is one architecture we can choose, but I'm not sure how critical/significant it will be. I very much agree that AI will be a big deal, but this makes it sound like you're assuming a specific way for AI to be used.
All that said, I'm very much in favor of us taking a lot of advantage of AI systems for all the things we want in the world, including AI safety. I imagine that for AI safety, we'll probably use a very eccentric and complex mix of AI technologies. Some with directly replace some existing researchers, we'll have specific scripts for research experiments, maybe agent-like things that do ongoing oversight, etc.
This came from a Facebook thread where I argued that many of the main ways AI was described as failing fall into few categories (John disagreed).
I appreciated this list, but they strike me as fitting into a few clusters.
...I would flag that much of that is unsurprising to me, and I think categorization can be pretty fine.
In order:
1) If an agent is unwittingly deceptive in ways that are clearly catastrophic, and that could be understood by a regular person, I'd probably put that under the "naive" or "idiot savant" category. As in, it has severe gaps in its abilities that a human or reasonable agent wouldn't. If the issue is that all reasonable agents wouldn't catch the downsides of a certain plan, I'd probably put that under the "we made a pretty good bet given the intelligence that we had" category.
2) I think that "What Failure Looks Like" is less Accident risk, more "Systemic" risk. I'm also just really unsure what to think about this story. It feels to me like it's a situation where actors are just not able to regulate externalities or similar.
3) The "fusion power generator scenario" seems like just a bad analyst to me. A lot of the job of an analyst is to flag important considerations. This seems like a pretty basic ask. For this itself to be the catastrophic part, I think we'd have to be seriously bad at this. ("i.e. Idiot Savant")
4) STEM-AGI -> I'd also put this in the naive or "idiot savant" category.
5) "that plan totally fails to align more-powerful next-gen AGI at all" -> This seems orthogonal to "categorizing the types of unalignment". This describes how incentives would create an unaligned agent, not what the specific alignment problem is. I do think it would be good to have better terminology here, but would probably consider it a bit adjacent to the specific topic of "AI alignment" - more like "AI alignment strategy/policy" or something.
6) "AGIs act much like a colonizing civilization" -> This sounds like either unalignment has already happened, or humans just gave AIs their own power+rights for some reason. I agree that's bad, but it seems like a different issue than what I think of as the alignment problem. More like, "Yea, if unaligned AIs have a lot of power and agency and different goals, that would be suboptimal"
7) "but at some point a particular subagent starts self-improving, goes supercritical, and takes over the rest of the system overnight." -> This sounds like a traditional mesa-agent failure. I expect a lot of "alignment" with a system made of a bunch of subcomponents is "making sure no subcomponents do anything terrible." Also, still leaves open the specific way this subsystem becomes/is unaligned.
8 ) "using an LLM to simulate a whole society. " -> Sorry, I don't quite follow this one.
Personally, I like the focus "scheming" has. At the same time, I imagine there are another 5 to 20 clean concerns we should also focus on (some of which have been getting attention).
While I realize there's a lot we can't predict, I think we could do a much better just making lists of different risk factors and allocating research amongst them.
Thanks!
> Are you thinking here of the new-ish canvases built into the chat interfaces of some of the major LLMs (Claude, ChatGPT)? Or are there tools specifically optimized for this that you think are good? Thanks!
I'm primarily thinking of the Python canvas offered by ChatGPT. I don't have other tools in mind.
Welp. I guess yesterday proved this part to be almost embarrassingly incorrect.