ryan_greenblatt

I work at Redwood Research.

Wiki Contributions

Comments

One operationalization is "these AIs are capable of speeding up ML R&D by 30x with less than a 2x increase in marginal costs".

As in, if you have a team doing ML research, you can make them 30x faster with only <2x increase in cost by going from not using your powerful AIs to using them.

With these caveats:

  • The speed up is relative to the current status quo as of GPT-4.
  • The speed up is ignoring the "speed up" of "having better experiments to do due to access to better models" (so e.g., they would complete a fixed research task faster).
  • By "capable" of speeding things up this much, I mean that if AIs "wanted" to speed up this task and if we didn't have any safety precautions slowing things down, we could get these speedups. (Of course, AIs might actively and successfully slow down certain types of research and we might have burdensome safety precautions.)
  • The 2x increase in marginal cost is ignoring potential inflation in the cost of compute (FLOP/$) and inflation in the cost of wages of ML researchers. Otherwise, I'm uncertain how exactly to model the situation. Maybe increase in wages and decrease in FLOP/$ cancel out? Idk.
  • It might be important that the speed up is amortized over a longer duration like 6 months to 1 year.

I'm uncertain what the economic impact of such systems will look like. I could imagine either massive (GDP has already grown >4x due to the total effects of AI) or only moderate (AIs haven't yet been that widely deployed due to inference availability issues, so actual production hasn't increased that much due to AI (<10%), though markets are pricing in AI being a really, really big deal).

So, it's hard for me to predict the immediate impact on world GDP. After adaptation and broad deployment, systems of this level would likely have a massive effect on GDP.

Random error:

Exponential Takeoff:

AI's capabilities grow exponentially, like an economy or pandemic.

(Oddly, this scenario often gets called "Slow Takeoff"! It's slow compared to "FOOM".)

Actually, this isn't how people (in the AI safety community) generally use the term slow takeoff.

Quoting from the blog post by Paul:

Futurists have argued for years about whether the development of AGI will look more like a breakthrough within a small group (“fast takeoff”), or a continuous acceleration distributed across the broader economy or a large firm (“slow takeoff”).

[...]

(Note: this is not a post about whether an intelligence explosion will occur. That seems very likely to me. Quantitatively I expect it to go along these lines. So e.g. while I disagree with many of the claims and assumptions in Intelligence Explosion Microeconomics, I don’t disagree with the central thesis or with most of the arguments.)

Slow takeoff still can involve a singularity (aka an intelligence explosion).

The terms "fast/slow takeoff" are somewhat bad because they are often used to discuss two different questions:

  • How long does it take from the point where AI is seriously useful/important (e.g. results in 5% additional GDP growth per year in the US) to go to AIs which are much smarter than humans? (What people would normally think of as fast vs slow.)
  • Is takeoff discontinuous vs continuous?

And this explainer introduces a third idea:

  • Is takeoff exponential or does it have a singularity (hyperbolic growth)?

The claim is that most applications aren't internal usage of AI for AI development and thus can be made trivially safe.

Not that most applications of AI for AI development can be made trivially safe.

Hmm, I don't think so. Or at least, the novel things in that paper don't seem to correspond.

My understanding of what this paper does:

  • Trains models to predict next 4 tokens instead of next 1 token as an auxilary training objective. Note that this training objective yields better performance on downstream tasks when just using the next token prediction component (the normally trained component) and discarding the other components. Notable, this is just something like "adding this additional prediction objective helps the model learn more/faster". In other words, this result doesn't involve actually changing how the model is actually used, it just adds some additional training task.
  • Uses these heads for speculative executation, a well known approach in the literature for accelerating inference.

I'm skeptical of the RLHF example (see also this post by Paul on the topic).

That said, I agree that if indeed safety researchers produce (highly counterfactual) research advances that are much more effective at increasing the profitability and capability of AIs than the research advances done by people directly optimizing for profitability and capability, then safety researchers could substantially speed up timelines. (In other words, if safety targeted research is better at profit and capabilities than research which is directly targeted at these aims.)

I dispute this being true.

(I do think it's plausible that safety interested people have historically substantially advanced timelines (and might continue to do so to some extent now), but not via doing research targeted at improving safety, by just directly doing capabilities research for various reasons.)

I don't think this particularly needs to be true for my point to hold; they only need to have reasonably good ideas/research, not unusually good, for them to publish less to be a positive thing.

There currently seems to be >10x as many people directly trying to build AGI/improve capabilities as trying to improve safety.

Suppose that the safety people have as good ideas and research ability as the capabilities people. (As a simplifying assumption.)

Then, if all the safety people switched to working full time on maximally advancing capabilities, this would only advance capabilites by less than 10%.

If, on the other hand, they stopped publically publishing safety work and this resulted in a 50% slow down, all safety work would slow down by 50%.

Naively, it seems very hard for publishing less to make sense if the number of safety researchers is much smaller than the number of capabilities researchers and safety researchers aren't much better at capabilities than capabilities researchers.

Compute for doing inference on the weights if you don't have LoRA finetuning set up properly.

My implicit claim is that there maybe isn't that much fine-tuning stuff internally.

I get it if you're worried about leaks but I don't get how it could be a hard engineering problem — just share API access early, with fine-tuning

Fine-tuning access can be extremely expensive if implemented naively and it's plausible that cheap (LoRA) fine-tuning isn't even implemented for new models internally for a while at AI labs. If you make the third party groups pay for it than I suppose this isn't a problem, but the costs could be vast.

I agree that inference access is cheap.

Thanks! I feel dumb for missing that section. Interesting that this is so different from random.

Have you compared this method (finding vectors that change downstream activations as much as possible based on my understanding) with just using random vectors? (I didn't see this in the post, but I might have just missed this.)

In particular, does that yield qualitatively similar results?

Naively, I would expect that this would be qualitatively similar for some norm of random vector. So, I'd be interested in some ablations of the technique.

If random vectors work, that would simplify the story somewhat: you can see salient and qualitatively distinct behaviors via randomly perturbing activations.

(Probably random vectors have to somewhat higher norm to yield qualitatively as large results to vectors which are optimized for changing downstream activations. However, I current don't see a particular a priori (non-empirical) reason to think that there doesn't exist some norm at which the results are similar.)

Load More