New Scaling Laws for Large Language Models — LessWrong