Copying the abstract of the paper:
The last decade has seen a significant increase of interest in deep learning research, with many public successes that have demonstrated its potential. As such, these systems are now being incorporated into commercial products. With this comes an additional challenge: how can we build AI systems that solve tasks where there is not a crisp, well-defined specification? While multiple solutions have been proposed, in this competition we focus on one in particular: learning from human feedback. Rather than training AI systems using a predefined reward function or using a labeled dataset with a predefined set of categories, we instead train the AI system using a learning signal derived from some form of human feedback, which can evolve over time as the understanding of the task changes, or as the capabilities of the AI system improve.
The MineRL BASALT competition aims to spur forward research on this important class of techniques. We design a suite of four tasks in Minecraft for which we expect it will be hard to write down hardcoded reward functions. These tasks are defined by a paragraph of natural language: for example, "create a waterfall and take a scenic picture of it", with additional clarifying details. Participants must train a separate agent for each task, using any method they want. Agents are then evaluated by humans who have read the task description. To help participants get started, we provide a dataset of human demonstrations on each of the four tasks, as well as an imitation learning baseline that leverages these demonstrations.
Our hope is that this competition will improve our ability to build AI systems that do what their designers intend them to do, even when the intent cannot be easily formalized. Besides allowing AI to solve more tasks, this can also enable more effective regulation of AI systems, as well as making progress on the value alignment problem.
I also mention this in the latest Alignment Newsletter, but I think this is probably one of the best ways to get started on AI alignment from the empirical ML perspective: it will (hopefully) give you a sense of what it is like to work with algorithms that learn from human feedback, in a more realistic setting than Atari / MuJoCo, while still not requiring a huge amount of background or industry-level compute budgets.
Section 1.1 of the paper goes into more detail about the pathways to impact. At a high level, the story is that better algorithms for learning from human feedback will improve our ability to build AI systems that do what their designers intend them to do. This is straightforwardly improving on intent alignment (though it is not solving it), which in turn allows us to better govern our AI systems by enabling regulations like "your AI systems must be trained to do X" without requiring a mathematical formalization of X.
I agree that's possible. Tbc, we did spend some time thinking about how we might use handcrafted rewards / heuristics to solve the tasks, and eliminated a couple based on this, so I think it probably won't be true here.
No.
For the competition, there's a ban on pretrained models that weren't publicly available prior to competition start. We look at participants' training code to ensure compliance. It is still possible to violate this rule in a way that we may not catch (e.g. maybe you use internal simulator details to do hyperparameter tuning, and then hardcode the hyperparameters in your training code), but it seems quite challenging and not worth the effort even if you are willing to cheat.
For the benchmark (which is what I'm more excited about in the longer run), we're relying on researchers to follow the rules. Science already relies on researchers honestly reporting their results -- it's pretty hard to catch cases where you just make up numbers for your experimental results.
(Also in the benchmark version, people are unlikely to write a paper about how they solved the task using special-case heuristics; that would be an embarrassing paper.)