I was impressed by GPT-2, to the point where I wouldn't be surprised if a future version of it could be used pivotally using existing protocols.
Consider generating half of a Turing test transcript, the other half being supplied by a human judge. If this passes, we could immediately implement an HCH of AI safety researchers solving the problem if it's within our reach at all. (Note that training the model takes much more compute than generating text.)
This might not be the first pivotal application of language models that becomes possible as they get stronger.
It's a source of superintelligence that doesn't automatically run into utility maximizers. It sure doesn't look like AI services, lumpy or no.
Not exactly. The best way to minimize the L2 norm of the loss function over the training data is to simply copy the training data to the weights (if there are enough weights) and use some trivial look-up procedure during inference. To get models that are also useful for inputs that are not from the training data, you probably need to use some form of regularization (or use a model that implicitly carries it out), e.g. add to the objective function being minimized the L2 norm of the weights. Regularization is a way to implement Occam's razor in machine learning.
Suppose that due to the regularization, the training results in a system with the goal system: "minimize the expected value of the loss function at the end of the current inference".
(when the concept of probability, which is required to define expectation, corresponds to how humans interpret the word "probability" in a decision-relevant context)
For such a goal system, the malign-output scenario above seems possible (for a sufficiently capable system).