Posts

Sorted by New

Wiki Contributions

Comments

Sorted by
erhora52

You can think of a pipeline like

  • feed lots of good papers in [situational awareness / out-of-context reasoning / ...] into GPT-4's context window,
  • ask it to generate 100 follow-up research ideas,
  • ask it to develop specific experiments to run for each of those ideas,
  • feed those experiments for GPT-4 copies equipped with a coding environment,
  • write the results to a nice little article and send it to a human.

Obvious, but perhaps worth reminding ourselves, that this is a recipe for automating/speeding-up AI research in general, so would be a neutral at best update for AI safety if it worked.

It does seem that for automation to have a disproportionately large impact for AI alignment it would have to be specific to the research methods used in alignment. This may not necessarily mean automating the foundational and conceptual research you mention, but I do think it necessarily does not look like your suggest pipeline.

Two examples might be: a philosophically able LLM that can help you de-confuse your conceptual/foundational ideas; automating mech-interp (e.g. existing work on discovering and interpreting features) in a way that does not generalise well to other AI research directions.