ErickBall

Nuclear engineer with a focus in nuclear plant safety and probabilistic risk assessment. Aspiring EA, interested in X-risk mitigation and the intersection of science and policy. Working towards Keegan/Kardashev/Simulacra level 4.

(Common knowledge note: I am not under a secret NDA that I can't talk about, as of Mar 15 2025. I intend to update this statement at least once a year as long as it's true.) 

Wikitag Contributions

Comments

Sorted by

But the reason I mention this here is that a >5 year ‘median timeline’ to get to >30% GWP growth would not have required detailed justifications until very recently. Now, Matthew sees it as conservative, and he’s not wrong.

This seems crazy to me. Unless the machines get rid of the current economy and start from near-zero, I don't think we'll see >30% GWP growth at all, and certainly not right away.

From what I can find, extreme growth rates like this historically have had two causes: 1) recovery from a major disaster, usually war, or 2) discovery of a massive oil reserve in a poor country (e.g. Guyana recently). Less extreme but still high growth rates can occur do to mobilization during a war.

The oil case requires the surrounding world economy to already be much larger--outside investment is used to rapidly exploit the newly discovered resources, and then the oil is exported for cash, and presto, massive GDP growth. It's not a good parallel to endogenous growth because it doesn't require an internal feedback loop to build capacity. It also doesn't translate in the short term to the rest of the economy: Guyana has a GDP per capita of $80k as money accumulates in the Natural Resource Fund, but half its population still lives on less than $5.50/day.

Recovery from disaster also seems like a poor analogy for automation, because it depends on infrastructure (both physical and social/corporate/human capital) that already existed but was forced to sit idle. We will need time to create that capital from scratch.

If someone deployed a superintelligent model tomorrow, do you think in 5 years we could quadruple our production of cars, houses, or airplanes? Would we have four times as many (or four times better) haircuts or restaurant meals? Real estate and leasing alone make up almost 14% of GDP and won't see a boom until after household incomes go up substantially. Even if the AI created wonder drugs for every disease, how long would it take to get them into mass production? 

I think we would get a massive surge of investment comparable to US mobilization in WWII, when real GDP nearly doubled in a six year period and growth exceeded 17% for three years running. But it might not even be that extreme. Production of consumer goods like automobiles, household appliances, and housing was severely curtailed or halted, and shortages/rationing became commonplace--growing pains that would be less tolerable without the pressure of an ongoing war. In the short term, we could probably 10x our production of software and Netflix shows, but it would be unlikely to show up as massive gains in the productivity numbers. See also the Productivity Paradox.

Fair enough, I guess the distinction is more specific than just being a (weak) mesa-optimizer. This model seems to contradict https://www.lesswrong.com/posts/pdaGN6pQyQarFHXF4/reward-is-not-the-optimization-target because it has, in fact, developed reward as the optimization target without ever being instructed to maximize reward. It just had reward-maximizing behaviors reinforced by the training process, and instead of (or in addition to) becoming an adaptation executor it became an explicit reward optimizer. This type of generalization is surprising and a bit concerning, because it suggests that other RL models in real-world scenarios will sometimes learn to game the reward system and then "figure out" that they want to reward hack in a coherent way. This tendency could also be beneficial, though, if it reliably causes recursively self-improving systems to wirehead once they have enough control of their environment.

we can be confident about why it’s doing this: to get a high RM score

Does this constitute a mesa-optimizer? If so, was creating it intentional or incidental? I was under the impression that those were still basically theoretical.

I think this topic is important and many of your recommendations sound like great ideas, but they also involve a lot of "we should" where it's not clear who "we" is. I would like to see, for some of these, targeting to a specific audience: who actually has the capability to help streamline government procurement processes for AI, and how? What organizations might be well positioned to audit agency needs and bottlenecks? I'm left with the sense that these things would be good in the abstract, but that there's little I personally (or most other readers, unless they work high up in the administration) can realistically contribute to them.

I'm also skeptical about doing preparation outside government for rushed adoption. There probably is some opportunity here, but any solution prepared externally would itself have to be adopted during a crisis, and that will take time, maybe just as much time as the government developing its own tools, plans, and protocols. Much of what makes government slow to adopt new systems is the need to get multiple levels of approval, go through a procurement process, and train existing government workers to work with the new system.

Why on earth would pokemon be AGI-complete?

There are big classes of problems that provably can't be solved in a forward pass. Sure, for something where it knows the answer instantly the chain of thought could be just for show. But for anything difficult, the models need the chain of thought to get the answer, so the CoT must contain information about their reasoning process. It can be obfuscated, but it's still in there. 

I kind of see your point about having all the game wikis, but I think I disagree about learning to code being necessarily interactive. Think about what feedback the compiler provides you: it tells you if you made a mistake, and sometimes what the mistake was. In cases where it runs but doesn't do what you wanted, it might "show" you what the mistake was instead. You can learn programming just fine by reading and writing code but never running it, if you also have somebody knowledgeable checking what you wrote and explaining your mistakes. LLMs have tons of examples of that kind of thing in their training data.

Yeah but we train AIs on coding before we make that comparison. And we know that if you train an AI on a videogame it can often get superhuman performance. Here we're trying to look at pure transfer learning, so I think it would be pretty fair to compare to someone who is generally competent but has never played videogames. Another interesting question is to what extent you can train an AI system on a variety of videogames and then have it take on a new one with no game-specific training. I don't know if anyone has tried that with LLMs yet.

The cornerstone of all control theory is the idea of having a set-point and designing a controller to reduce the deviation between the state and the set-point.

But control theory is used for problems where you need a controller to move the system toward the set-point, i.e. when you do not have instant total control of all degrees of freedom. We use tools like PID tuning, lead-lag, pole placement etc. to work around the dynamics of the system through some limited actuator. In the case of AI alignment, not only do we have a very vague concept of what our set-point should be, we also have no reliable way of detecting how close a model is to that set-point once we define it; if we did, we wouldn't need any of the technology of control theory because we could just change the weights to get it to the set-point (following, say, a simple gradient). This will still be subject to Goodhart's law unless our measurement is perfect, but feedback control won't help with that: controllers are only as good as the feedback you send them.

I would think things are headed toward these companies fine tuning an open source near-frontier LLM. Cheaper than building one from scratch but with most of the advantages.

Load More