erhora
5
2
erhora has not written any posts yet.

You can think of a pipeline like
- feed lots of good papers in [situational awareness / out-of-context reasoning / ...] into GPT-4's context window,
- ask it to generate 100 follow-up research ideas,
- ask it to develop specific experiments to run for each of those ideas,
- feed those experiments for GPT-4 copies equipped with a coding environment,
- write the results to a nice little article and send it to a human.
Obvious, but perhaps worth reminding ourselves, that this is a recipe for automating/speeding-up AI research in general, so would be a neutral at best update for AI safety if it worked.
It does seem that for automation to have a disproportionately large impact for AI alignment it would have... (read more)
This is a super interesting line of work!
The entire model is, in a sense, "logically derived" from its training data, so any facts about its output on certain prompts can also be logically derived from its training data.
Why did you choose to make non-derivability part of your definition? Do you mean something like "cannot be derived quickly, for example without training a whole new model"? I'm worried that your current definition is impossible to satisfy, and that you are setting yourself up for easy criticism because it sounds like you're hypothesising strong emergence, i.e. magic.