Compact vs. Wide Models — LessWrong