Regarding GPT-3, there is some discussion whether growing the model would transform it into an Oracle AI. I looked into the actual benchmark results (Appendix H in the paper) to see if we can predict something useful from the actual measurements.
Method: The OpenAI team ran a suite of 63 different benchmarks (including sub-types), each for zero/one/few shot. In each scenario, there are 8 model sizes. I looked at how results scale with model size. With only 8 measurements, there is a large associated uncertainty for predictions. Formally, one would test the trend function using a
Bayesian model selection between a linear and (e.g.,) a polynomial. I did this for a few and then eye-balled the rest. So, please take the following as an indication only.
Disclaimer: The smallest model for GPT-3 has parameters, the largest . That's a span of 3 orders of magnitude. Scaling this out to many more orders of magnitude is dangerous. Thus, take these numbers only as an indication.
Results. For the following tests, I find an asymptotic trend. Scaling the model will apparently not yield fantastic results for:
- HellaSwag, LAMBADA, PIQA, CoQA, OpenBookQA, Quac, RACE, CB, ReCoRD, WiC
- Translations - but unclear level description.
In the following tests, it is unclear if the trend is asymptotic or better than that:
- SAT: Could be linear, could be asymptotic. If linear, it will achieve 100% at parameters.
- StoryCloze, Winograd, Winogrande, SQuADv2, DROP, Copa.
These tests show a linear scaling:
- TriviaQA ( parameter estimate to achieve 100%)
- BoolQ ()
- MultiRC ()
- ARC ()
- SuperGLUE ()
- WSC ()
- WebQs ()
- Cycled ()
Some tests scale neither linear nor asymptotic:
- Symbol: Near exponential ()
- Arithmetic: Exponential; one-digit composite may achieve 100% at
- Reversed: Near exponential ()
- Anagrams: Polynomial ()
- ANLI: stepped, unclear
- RTE: stepped, unclear
Summary: About half of the tested skills will likely not scale much with larger models. The other half will (e.g., TriviaQA, SuperGLUE, arithmetic, anagrams). Going to e.g., parameters - would that make an Oracle AI? Probably it's not sufficient, but I'm interested in hearing your opinion!
I think GPT-3 should be viewed as roughly as aligned as IDA would be if we pursued it using our current understanding. GPT-3 is trained via self-supervised learning (which is, on the face of it, myopic), so the only obvious x-safety concerns are something like mesa-optimization.
In my mind, the main argument for IDA being safe is still myopia.
I think GPT-3 seems safer than (recursive) reward modelling, CIRL, or any other alignment proposals based on deliberately building agent-y AI systems.
--------------------
In the above, I'm ignoring the ways in which any of these systems increase x-risk via their (e.g. destabilizing) social impact and/or contribution towards accelerating timelines.