Here's a list of papers related to reasoning and RL for language models that were published in fall 2020 and that have caught my eye - you may also find it useful if you're interested in the topic.
RL
Learning to summarize from human feedback - finetune GPT-3 to generate pieces of text to accomplish a complex goal, where performance ratings are provided by humans.
Keep CALM and Explore: Language Models for Action Generation in Text-based Games - an instance of the selector approach where a selector chooses between generated text candidates, similarly to "GeDi: Generative Discriminator Guided Sequence Generation". But here they actually use it for RL training.
Graphs
Graph-based Multi-hop Reasoning for Long Text Generation - two-stage approach to language modeling where on the 1st stage you process a knowledge graph corresponding to the context to obtain paths between the concepts, on the 2nd stage you generate text incorporating these paths. You don't need to have graph data, they can be built from text automatically. Seems to generate texts that are more diverse, informative and coherent compared to plain transformers. Seems like a quick fix for the problem of language models where they don't easily have coherent generation intents from letter to letter.
New losses
Answering Complex Open-Domain Questions with Multi-Hop Dense Retrieval - a new transformer-based retrieval model (retrieves an answer to a question by predicting the location of an answer in a pool of documents). This one is a multi-hop model which means that it searches for answers iteratively using information gathered during previous searches. Retrieval models have been successfully combined with text generation in the past to boost question answering performance.
MultiCQA: Zero-Shot Transfer of Self-Supervised Text Matching Models on a Massive Scale - take data from 140 StackExchange forums, train a model to match questions to answers. Performs well at answer selection in other domains unrelated to StackExchange.
Multimodal
Beyond Language: Learning Commonsense from Images for Reasoning - previously lots of methods used images + text in transformers to do e.g. visual reasoning. This one differs in that it shows that even if images are not present at test time, commonsense reasoning is still improved.
Vokenization: Improving Language Understanding with Contextualized, Visual-Grounded Supervision - solve exactly the same problem as "Beyond Language: Learning Commonsense from Images for Reasoning" which is to use images at training-time to benefit text-only generation.
Controllability
Back to the Future: Unsupervised Backprop-based Decoding for Counterfactual and Abductive Commonsense Reasoning - a language model training method that allows to generate missing text based on the past and the future. In addition to giving more control over the generation, it also improves abductive reasoning (hypothesis generation).
GeDi: Generative Discriminator Guided Sequence Generation - similar to the selector RL training idea I described for controlled generation, but without RL training. After generating a bunch of continuations with a generator, apply a selector trained in a different way to choose between them. This is a more sophisticated way to control generation compared to programming via prompts.
Summarize, Outline, and Elaborate : Long-Text Generation via Hierarchical Supervision from Extractive Summaries - introduces a sampling strategy for text generative models where it first generates a high-level plan of the text with summaries of passages, and then generates the passages. Improves the training efficiency by a lot, and improves the likelihood of generated text as well.
Also here's list of some earlier papers that I found interesting:
Analyzing mathematical reasoning abilities of neural models (transformers for symbolic math reasoning, Apr'19)
Transformers as Soft Reasoners over Language (evaluating logical reasoning in natural language domain, May'19)
Teaching Temporal Logics to Neural Networks (transformers for logical inference, Jun'19)
REALM: Retrieval-Augmented Language Model Pre-Training (Google, Feb'20)
Leveraging Passage Retrieval with Generative Models for Open Domain Question Answering (FAIR, Jul'20)
Hi everyone! Here I link to a sketch of my thoughts on how recent advances in language modeling may be connected, or lead, to future advances in developing machine learning models with abstract reasoning capabilities.
This was done as a side project last year during my research fellowship at the Center of Long-Term Risk. Many thanks to Daniel Kokotajlo, Jesse Clifton, Anthony DiGiovanni for useful comments.