I was impressed by GPT-2, to the point where I wouldn't be surprised if a future version of it could be used pivotally using existing protocols.
Consider generating half of a Turing test transcript, the other half being supplied by a human judge. If this passes, we could immediately implement an HCH of AI safety researchers solving the problem if it's within our reach at all. (Note that training the model takes much more compute than generating text.)
This might not be the first pivotal application of language models that becomes possible as they get stronger.
It's a source of superintelligence that doesn't automatically run into utility maximizers. It sure doesn't look like AI services, lumpy or no.
I don't see how GPT-2 is a step forward towards passing strong versions of the Turing test.
I'm not familiar with the details of GPT-2 and maybe I'm interpreting the definition of "utility maximizer" incorrectly, but isn't GPT-2 some neural network that is trained to minimize a loss function that corresponds to predicting the next word correctly?
GPT-2 works by deterministically fetching the probability distribution over the next token, then sampling from it. It is plausible that the probability it assigns to 6 is no larger than 80%, but it's simple enough to postprocess every probability larger than 50% to 100%. (This isn't always done because when completing a list prefix of size 4, it would always produce an infinite list, because the probability of another , is more than 50%.)