Could you say more about why you think these exercises are particularly valuable? I'm on Vivek's team and I helped a bit with these exercises, so I'm naturally a fan, but I don't think most people can decide to do these rather than Ngo's exercises or other SERI MATS app questions without more information on why they're good.
Vivek Hebbar recently developed a list of alignment problems. I think more people should try them. I'm impressed with how well they (a) get people to focus on core problems, (b) encourage people to come up with their own ideas, (c) encourage people to notice and articulate confusions, and (d) accomplish a-c while also providing a fair amount of structure and guidance.
You can see the problems in this google doc or pasted below.
Note that Vivek is also a mentor for SERI-MATS. These exercises are also the questions that people need to answer to apply to work with him and Nate Soares. Applications are due today.
Problems:
These problems are basically research questions — we expect them to be difficult, and good responses will be valuable as research in their own right. MIRI will award prizes (likely on the order of $5000 for excellent submissions).
Instructions: We recommend that you focus on 1 or 2 of the hard questions and leave the other hard questions blank. It is mandatory to attempt either #1a-c or #2.
Note on word counts: These are guidelines for how long we think a typical good response will be, but feel free to write more. If you have lots of ideas, it’s great to write them all, and don’t bother trying to shorten them.
Hard / time-consuming questions (contest problems):
Problem 1
Footnote 1: Examples of alignment proposals you could consider. We recommend you pick either a proposal you're familiar with, or something from 11 Proposals so you don't waste time parsing difficult writeups.
Footnote 2: Some rough examples of “tasks” are “drive a car”, “solve alignment”, “run a widget factory”, and “protect a diamond vault”. Try to be more specific than this if possible, for instance, “the AI has cameras and motion sensors, and must protect a diamond in a room from human robbers, drones, and other attacks, by controlling various actuators and weapons in the vault”
Problem 2
2. Suppose you have both of the below relaxations:
Solve alignment given these relaxations; that is, describe a scheme involving this infinite computer that results in an AGI which will maximize the amount of diamond in our universe. (Note: This question is largely to test your ability to think concretely. Ideally, you should give detailed pseudocode + maximally concrete descriptions of any physical setups involved; verbal descriptions often hide the true difficulty of specifying a component.)
Write down every difficulty you notice, and don’t gloss over anything. Do you expect your solution to work? Suppose you run it and it fails; how did it fail?
Whether or not you found any promising scheme, still submit all the difficulties you noticed, what ideas you thought of, and why the failed ideas don’t work. Err on the side of including all your thoughts and don't worry about making your solution look polished.
ETA (10/18/22): If you use deep learning in your solution, please be very specific about the setup (architecture, loss function, regularization if any). Also explain why you expect it to generalize the way you claim it will (this may involve arguments about the mechanistic structure of the learned model and the specific inductive bias of your setup).
Problem 3
3. Note: This involves a lot of reading (on the order of 10 to 100 pages). It’s largely to identify good technical distillers, but also gives us some signal for researchers.
Distill down one of the following posts into a ≤100 word summary of the main alignment idea and a ≤100 word summary of what properties of reality are required for that idea to work, followed by summaries of the key points:
Problem 4
4. Read AGI ruin and/or Paul’s response and/or DeepMind alignment team’s response. Pick a specific point that you have thoughts on (maybe a disagreement, maybe something else) and write an analysis of it.