Nuclear engineer with a focus in nuclear plant safety and probabilistic risk assessment. Aspiring EA, interested in X-risk mitigation and the intersection of science and policy. Working towards Keegan/Kardashev/Simulacra level 4.
(Common knowledge note: I am not under a secret NDA that I can't talk about, as of Mar 15 2025. I intend to update this statement at least once a year as long as it's true.)
I hate to rain on your parade, but equal coexistence with artificial superintelligence would require solving the alignment problem, or the control problem. Otherwise it can and eventually will do something we consider catastrophic.
The principal-advisor pair forms a (coalitional) agent which, if the previous steps succeed, can be understood as ~perfectly aligned with the principal. The actions of this agent are recorded, and distilled through imitation learning into a successor.
Even assuming imitation learning is safe, how would you get enough data for the first distillation, when you need the human in order to generate actions? And how would you know when you have enough alignment-relevant data in particular? It seems unavoidable that your data distribution will be very constrained compared to the set of situations the distilled agent might encounter.
LessWrong has had a (not that successful) Continue Reading section that I think just needed more iterations
I think it needs an easy way to indicate that you don't want to read the rest of a post. Ideally this would be automatic but I don't know how to do that.
Also, it sometimes offers posts that I did finish.
You may be right about the EO. At the time I felt it was a good thing, because it raised the visibility of safety evaluations at the labs and brought regulation of training, as well as deployment, more into the Overton window. Even without follow-up rules, I think it can be the case that getting a company to report the bad things strongly incentivizes it to reduce the bad things.
I think there was (and is) a common belief that Congress won't do anything significant on it anytime soon, which makes executive action appealing if you think time is running out. If what you're suggesting here is more like a variant of the "wait for a crisis" strategy--get the legislation ready, talk to people about it, and then when Congress is ready to act, they can reach for it--I'm relatively optimistic about that. As long as there's time.
Well, I certainly hope you're right, and it remains to be seen. I don't think I have any special insights.
But even though the companies stay here, the importance of the American companies may decrease relative to international competitors. Also, I think there are things farther up the supply chain that can move overseas. If American cloud companies have big barriers to training their own frontier models, maybe they'll serve up DeepSeek models instead.
I don't think it should be a huge concern in the near term, as long as the regulations are well written. But fundamentally, it feeds back into the race dynamic.
Interesting idea, but I don't think short-term memory and learning really require conscious attention, and also conscious attention mostly isn't the same thing as "consciousness" in the qualia sense. I like the term "cognitive control" and I think that might be a better theme linking a lot of these abilities (planning, preventing hallucinations, agency, maybe knowledge integration). It's been improving though, so it doesn't necessarily indicate a qualitative gap.
If your choice of arguments for AI safety or your way of presenting yourself feels too closely associated with one political party, then you might create or strengthen an association between AI safety and that party, which could lead the other party to reject AI safety on partisan grounds.
This already happened to an extent, and it wasn't because the advocates had trouble suppressing their partisan instincts. They focused on technocratic solutions and advocated them to the people in power at the time--the Biden administration. It sort of worked. Then the administration implemented some of them (or started to lay the groundwork for implementing them) and that alone was enough to create lasting Republican opposition to something that was previously pretty neutral.
As you implied above, pessimism is driven only secondarily by timelines. If things in 2028 don't look much different than they do now, that's evidence for longer timelines (maybe a little longer, maybe a lot). But it's inherently not much evidence about how dangerous superintelligence will be when it does arrive. If the situation is basically the same, then our state of knowledge is basically the same.
So what would be good evidence that worrying about alignment was unnecessary? The obvious one is if we get superintelligence and nothing very bad happens, despite the alignment problem remaining unsolved. But that's like pulling the trigger to see if the gun is loaded. Prior to superintelligence, personally I'd be more optimistic if we saw AI progress requiring even more increasing compute than the current trend--if the first superintelligences were very reliant on massive pools of tightly integrated compute, and had very limited inference capacity, that would make us less vulnerable and give us more time to adapt to them. Also, if we saw a slowdown in algorithmic progress despite widespread deployment of increasingly capable coding software, that would be a very encouraging sign that recursive self-improvement might happen slowly.