In this paper authors explore the scaling properties of mixed-modal generative models, discovering new scaling laws that unify the contributions of individual modalities and the interactions between them. I find most interesting that they have found so-called competition barrier - when training with multiple modalities, after a certain number of parameters/data, the loss is smaller than if the modalities were trained independently. This seems to predict cross-modal transfer that was sought after but not found (yet) with GATO.
In this paper authors explore the scaling properties of mixed-modal generative models, discovering new scaling laws that unify the contributions of individual modalities and the interactions between them. I find most interesting that they have found so-called competition barrier - when training with multiple modalities, after a certain number of parameters/data, the loss is smaller than if the modalities were trained independently. This seems to predict cross-modal transfer that was sought after but not found (yet) with GATO.