Authors: Chaojun Xiao, Jie Cai, Weilin Zhao, Guoyang Zeng, Xu Han, Zhiyuan Liu, Maosong Sun. Abstract (bolding mine): > Large Language Models (LLMs) have emerged as a milestone in artificial intelligence, and their performance can improve as the model size increases. However, this scaling brings great challenges to training and...
Author: Yijiong Yu. Abstract: > It has been well-known that Chain-of-Thought can remarkably enhance LLMs’ performance on complex tasks. However, because it also introduces slower inference speeds and higher computational costs, many researches have attempted to use implicit CoT, which does not need LLMs to explicitly generate the intermediate steps....
Authors: Sohee Yang, Nora Kassner, Elena Gribovskaya, Sebastian Riedel, Mor Geva. Abstract: > We evaluate how well Large Language Models (LLMs) latently recall and compose facts to answer multi-hop queries like "In the year Scarlett Johansson was born, the Summer Olympics were hosted in the country of". One major challenge...
Authors: Pantelis Vafidis, Aman Bhargava, Antonio Rangel. Abstract: > Intelligent perception and interaction with the world hinges on internal representations that capture its underlying structure ("disentangled" or "abstract" representations). Disentangled representations serve as world models, isolating latent factors of variation in the world along orthogonal directions, thus facilitating feature-based generalization....
Authors: Beren Millidge ,Yuhang Song, Armin Lak, Mark E. Walton, Rafal Bogacz. Abstract: > Animals can adapt their preferences for different types of reward according to physiological state, such as hunger or thirst. To explain this ability, we employ a simple multi-objective reinforcement learning model that learns multiple values according...
Authors: Anonymous (I'm not one of them). Abstract: > Most analysis of transformer expressivity treats the depth (number of layers) of a model as a fixed constant, and analyzes the kinds of problems such models can solve across inputs of unbounded length. In practice, however, the context length of a...
Authors: Federico Adolfi, Martina G. Vilas, Todd Wareham. Abstract: > Many proposed applications of neural networks in machine learning, cognitive/brain science, and society hinge on the feasibility of inner interpretability via circuit discovery. This calls for empirical and theoretical explorations of viable algorithmic options. Despite advances in the design and...