Tom Davidson - LessWrong

AI-enabled coups: a small group could use AI to seize power

Yep, I think this is a plausible suggestion. Labs can plausibly train models that are v internally useful without being helpful only, and could fine-tune models for evals on a case-by-case basis (and delete the weights after the evals).

Will AI R&D Automation Cause a Software Intelligence Explosion?

Tom Davidson25d20

Agreed there's an ultimate cap on software improvements -- the worry is that it's very far away!

Three Types of Intelligence Explosion

Tom Davidson25d80

It does sound like a lot -- that's 5 OOMs to reach human learning efficiency and then 8 OOMs more. But when we BOTECed the sources of algorithmic efficiency gain on top of the human brain, it seemed like you could easily get more than 8. But agreed it seems like a lot. Though we are talking about ultimate physical limits here!

Interesting re the early years. So you'd accept that learning from 5/6 could be OOMs more efficient, but would deny that the early years could be improved?

Though you're not really speaking to the 'undertrained' point, which is about the number of params vs data points

Three Types of Intelligence Explosion

Tom Davidson25d10

I expect that full stack intelligence explosion could look more like "make the whole economy bigger using a bunch of AI labor" rather than specifically automating the chip production process. (That said, in practice I expect explicit focused automation of chip production to be an important part of the picture, probably the majority of the acceleration effect.) Minimally, you need to scale up energy at some point.

Agreed on the substance, we just didn't explain this well.

You talk about "chip technology" feedback loop as taking months, but presumably improvements to ASML take longer as they often require building new fabs?

Agreed!

Re Flop/joule also agree on the substance -- we went with FLOP/joule bc we wanted a clean estimate for the OOMs before reaching limits for each factor. I believe our estimate of the total OOMs to limits (including both chip tech and chip production) is right, but you're right that there are ways to intutively improve chip tech that don't increase FLOP/joule

Human takeover might be worse than AI takeover

Tom Davidson3mo60

I think rushing full steam ahead with AI increases human takeover risk

ryan_greenblatt's Shortform

Tom Davidson3mo10

Sure! See here: https://docs.google.com/document/d/1DZy1qgSal2xwDRR0wOPBroYE_RDV1_2vvhwVz4dxCVc/edit?tab=t.0#bookmark=id.eqgufka8idwl

ryan_greenblatt's Shortform

Tom Davidson3moΩ9120

Here's my own estimate for this parameter:

Once AI has automated AI R&D, will software progress become faster or slower over time? This depends on the extent to which software improvements get harder to find as software improves – the steepness of the diminishing returns.

We can ask the following crucial empirical question:

When (cumulative) cognitive research inputs double, how many times does software double?

(In growth models of a software intelligence explosion, the answer to this empirical question is a parameter called r.)

If the answer is “< 1”, then software progress will slow down over time. If the answer is “1”, software progress will remain at the same exponential rate. If the answer is “>1”, software progress will speed up over time.

The bolded question can be studied empirically, by looking at how many times software has doubled each time the human researcher population has doubled.

(What does it mean for “software” to double? A simple way of thinking about this is that software doubles when you can run twice as many copies of your AI with the same compute. But software improvements don’t just improve runtime efficiency: they also improve capabilities. To incorporate these improvements, we’ll ultimately need to make some speculative assumptions about how to translate capability improvements into an equivalently-useful runtime efficiency improvement..)

The best quality data on this question is Epoch’s analysis of computer vision training efficiency. They estimate r = ~1.4: every time the researcher population doubled, training efficiency doubled 1.4 times. (Epoch’s preliminary analysis indicates that the r value for LLMs would likely be somewhat higher.) We can use this as a starting point, and then make various adjustments:

Upwards for improving capabilities. Improving training efficiency improves capabilities, as you can train a model with more “effective compute”. To quantify this effect, imagine we use a 2X training efficiency gain to train a model with twice as much “effective compute”. How many times would that double “software”? (I.e., how many doublings of runtime efficiency would have the same effect?) There are various sources of evidence on how much capabilities improve every time training efficiency doubles: toy ML experiments suggest the answer is ~1.7; human productivity studies suggest the answer is ~2.5. I put more weight on the former, so I’ll estimate 2. This doubles my median estimate to r = ~2.8 (= 1.4 * 2).
Upwards for post-training enhancements. So far, we’ve only considered pre-training improvements. But post-training enhancements like fine-tuning, scaffolding, and prompting also improve capabilities (o1 was developed using such techniques!). It’s hard to say how large an increase we’ll get from post-training enhancements. These can allow faster thinking, which could be a big factor. But there might also be strong diminishing returns to post-training enhancements holding base models fixed. I’ll estimate a 1-2X increase, and adjust my median estimate to r = ~4 (2.8*1.45=4).
Downwards for less growth in compute for experiments. Today, rising compute means we can run increasing numbers of GPT-3-sized experiments each year. This helps drive software progress. But compute won't be growing in our scenario. That might mean that returns to additional cognitive labour diminish more steeply. On the other hand, the most important experiments are ones that use similar amounts of compute to training a SOTA model. Rising compute hasn't actually increased the number of these experiments we can run, as rising compute increases the training compute for SOTA models. And in any case, this doesn’t affect post-training enhancements. But this still reduces my median estimate down to r = ~3. (See Eth (forthcoming) for more discussion.)
Downwards for fixed scale of hardware. In recent years, the scale of hardware available to researchers has increased massively. Researchers could invent new algorithms that only work at the new hardware scales for which no one had previously tried to to develop algorithms. Researchers may have been plucking low-hanging fruit for each new scale of hardware. But in the software intelligence explosions I’m considering, this won’t be possible because the hardware scale will be fixed. OAI estimate ImageNet efficiency via a method that accounts for this (by focussing on a fixed capability level), and find a 16-month doubling time, as compared with Epoch’s 9-month doubling time. This reduces my estimate down to r = ~1.7 (3 * 9/16).
Downwards for diminishing returns becoming steeper over time. In most fields, returns diminish more steeply than in software R&D. So perhaps software will tend to become more like the average field over time. To estimate the size of this effect, we can take our estimate that software is ~10 OOMs from physical limits (discussed below), and assume that for each OOM increase in software, r falls by a constant amount, reaching zero once physical limits are reached. If r = 1.7, then this implies that r reduces by 0.17 for each OOM. Epoch estimates that pre-training algorithmic improvements are growing by an OOM every ~2 years, which would imply a reduction in r of 1.02 (6*0.17) by 2030. But when we include post-training enhancements, the decrease will be smaller (as [reason], perhaps ~0.5. This reduces my median estimate to r = ~1.2 (1.7-0.5).

Overall, my median estimate of r is 1.2. I use a log-uniform distribution with the bounds 3X higher and lower (0.4 to 3.6).

ryan_greenblatt's Shortform

Tom Davidson3moΩ370

I'll paste my own estimate for this param in a different reply.

But here are the places I most differ from you:

Bigger adjustment for 'smarter AI'. You've argue in your appendix that, only including 'more efficient' and 'faster' AI, you think the software-only singularity goes through. I think including 'smarter' AI makes a big difference. This evidence suggests that doubling training FLOP doubles output-per-FLOP 1-2 times. In addition, algorithmic improvements will improve runtime efficiency. So overall I think a doubling of algorithms yields ~two doublings of (parallel) cognitive labour.
- --> software singularity more likely
Lower lambda. I'd now use more like lambda = 0.4 as my median. There's really not much evidence pinning this down; I think Tamay Besiroglu thinks there's some evidence for values as low as 0.2. This will decrease the observed historical increase in human workers more than it decreases the gains from algorithmic progress (bc of speed improvements)
- --> software singularity slightly more likely
Complications thinking about compute which might be a wash.
- Number of useful-experiments has increased by less than 4X/year. You say compute inputs have been increasing at 4X. But simultaneously the scale of experiments ppl must run to be near to the frontier has increased by a similar amount. So the number of near-frontier experiments has not increased at all.
  - This argument would be right if the 'usefulness' of an experiment depends solely on how much compute it uses compared to training a frontier model. I.e. experiment_usefulness = log(experiment_compute / frontier_model_training_compute). The 4X/year increases the numerator and denominator of the expression, so there's no change in usefulness-weighted experiments.
  - That might be false. GPT-2-sized experiments might in some ways be equally useful even as frontier model size increases. Maybe a better expression would be experiment_usefulness = alpha * log(experiment_compute / frontier_model_training_compute) + beta * log(experiment_compute). In this case, the number of usefulness-weighted experiments has increased due to the second term.
  - --> software singularity slightly more likely
- Steeper diminishing returns during software singularity. Recent algorithmic progress has grabbed low-hanging fruit from new hardware scales. During a software-only singularity that won't be possible. You'll have to keep finding new improvements on the same hardware scale. Returns might diminish more quickly as a result.
  - --> software singularity slightly less likely
- Compute share might increase as it becomes scarce. You estimate a share of 0.4 for compute, which seems reasonable. But it might fall over time as compute becomes a bottleneck. As an intuition pump, if your workers could think 1e10 times faster, you'd be fully constrained on the margin by the need for more compute: more labour wouldn't help at all but more compute could be fully utilised so the compute share would be ~1.
  - --> software singularity slightly less likely
- --> overall these compute adjustments prob make me more pessimistic about the software singularity, compared to your assumptions

Taking it all together, i think you should put more probability on the software-only singluarity, mostly because of capability improvements being much more significant than you assume.

Should there be just one western AGI project?

Tom Davidson4mo10

One idea that seems potentially promising is to have a single centralised project and minimize the chance it becomes too powerful by minimizing its ability to take actions in the broader world.

Concretely, a ‘Pre-Training Project’ does pre-training and GCR safety assessment, post-training needed for the above activities (including post-training to make AI R&D agents and evaluating the safety of post-training techniques), and nothing else. And then have many (>5) companies that do fine-tuning, scaffolding, productising, selling API access, and use-case-specific safety assessments.

Why is this potentially the best of both worlds?

Much less concentration of power. The Pre-Training Project is strictly banned from these further activities (and indeed from any other activities) and it is closely monitored. This significantly reduces the (massive and very problematic) concentration of power you'd get from just one project selling AGI services to the world. It can't shape the uses of the technology to its own private benefit, can't charge monopoly prices, can't use its superhuman AI and massive profits for political lobbying and shaping public opinion. Instead, multiple private companies will compete to ensure that the rest of the world gets maximum benefit from the tech.
- More work is needed to see whether the power of the Pre-Training Project could really be robustly limited in this way.
No 'race to the bottom' within the west. Only one project is allowed to increase the effective compute used in pre-training. It's not racing with other Western projects, so there is no 'race to the bottom'. (Though obviously international racing here could still be a problem.)

Should there be just one western AGI project?

Tom Davidson4mo10

You could find a way of proving to the world that your AI is aligned, which other labs can't replicate, giving you economic advantage.

I don't expect this to be a very large effect. It feels similar to an argument like "company A will be better on ESG dimensions and therefore more and customers will switch to using it". Doing a quick review of the literature on that, it seems like there's a small but notable change in consumer behavior for ESG-labeled products.

It seems quite different to the ESG case. Customers don't personally benefit from using a company with good ESG. They will benefit from using an aligned AI over a misaligned one.

In the AI space, it doesn't seem to me like any customers care about OpenAI's safety team disappearing (except a few folks in the AI safety world).

Again though, customers currently have no selfish reason to care.

In this particular case, I expect the technical argument needed to demonstrate that some family of AI systems are aligned while others are not is a really complicated argument; I expect fewer than 500 people would be able to actually verify such an argument (or the initial "scalable alignment solution"), maybe zero people.

It's quite common for only a very small number of ppl to have the individual ability to verify a safety case, but many more to defer to their judgement. People may defer to an AISI, or a regulatory agency.

LESSWRONG
LW

Posts

Wikitag Contributions

Comments