I'm confused about how you judged number six. A mapping from indexes to nodes would be a surprising way for the input graph to be stored.
I know, but I gave them to a text predictor not specifically tailored to write code and it wrote correct code anyway. For the first four prompts we might argue that it probably just copied code from the training data, but this seems quite unlikely for the last two. My rough non-expert intuition is that the shallow understanding of "write code" didn't really change that much from GPT-3 to AlphaCode, and the performance boost of the latter is essentially due to fine-tuning and filtering tricks.
Some things that I feel undermine your case: your sample size is fairly small here, and it would have been valuable if you tried sampling maybe 10-20 times for each. Also, these code snippets are either the kind of thing I'd expect would be in the dataset, or are trivial. Plus, GPT-3 wasn't used as a base model for AlphaCode, so it can't have been due to "fine-tuning and filtering tricks". Finally, GPT-3 is way bigger than any AlphaCode model.
GPT-3 wasn't used as a base model for AlphaCode
I had missed this step. Retrospectively it should have been obvious... of course that you don't start from a huge text predictor model to build a code predictor model that only needs to predict compilable code. Thanks for the clarification.
I think the fact that GPT-3 is controlled by OpenAI and AlphaCode is a DeepMind project has more to do with it. Of course you don't need to hotstart by transfer learning, but it's a good idea anyway if you can, which is why DM not using its own GPT-3-equivalent (Gopher, trained at considerable expense) has drawn comment.
The recent OpenAI paper will presumably generate a lot of discussion
Fyi, you've linked to a post discussing a DeepMind paper.
The recent DeepMind paper will presumably generate a lot of discussion, and I'm not an expert enough to completely understand the technicalities. But I still wonder how much of the breakthrough could be reduced to "tune GPT-3 for code generation instead of general text prediction".
Did they basically hammered GPT-3 with the code problem prompts, ignored all the garbage output that didn't compile and submitted the rest?(Update: No they didn't; AlphaCode is not derived from GPT-3). I mean, they admittedly discard >99% samples:Anyway, this post is not meant to be another discussion post about AlphaCode, but rather a little investigation in the code-writing capabilities of vanilla GPT-3, since I've the impression that GPT-3 is already decent at generating correct code by itself and I didn't find any explicit experiment about this reported on LW. I've spent some minutes playing with the Eleuther UI (with default parameters) in order to generate some simple functions that would fit in its limited output size.
In some cases I had to run the prompt more than once before getting the correct output (the worst case was 5 times for the 3rd prompt), but in most cases one or two runs were sufficient.
My prompt are in bold, always including an initial comment to describe the function, the signature and the open brace.
First prompt
Just asking GPT-3 a function to sum a+b. It got a bit carried away returning also functions for a*b and a/b. Obtained at the first run.
Second prompt
A little bit more complicated: find the maximum element of an array. Obtained at the fourth run.
Third prompt
Can GPT-3 output recursive functions? Yes, it can. Also, it knows Fibonacci numbers. Obtained at the fifth run.
Fourth prompt
Does it knows the Euclidean algorithm for finding the greatest common divisor? Of course. Obtained at the second run.
Fifth prompt
Ok, what if we ask for some stupid function nobody would really write? Obtained at the first run.
Sixth prompt
What about prompting a function with a terrible signature? Not only GPT-3 correctly wrote it anyway, it even scolded me about it! Obtained at the second run.