My quick take here is that your list of topics is not an introduction to AI Safety, it is an introduction to AI safety as seen from inside the MIRI/Yudkowsky bubble, where everything is hard, and nobody is making any progress. Some more diversity in viewpoints would be better.
For your audience, my go-to source would be to cover bits of Christian's The Alignment Problem.
My overall reactions:
Some specific edits I'd make, in order of their destination:
Thanks for your answer!
The downside I think is most likely would be if you write this in the "voice" of an AI authority but confuse or omit some technical details, causing friction with other people in AI or even the audience. I don't know you, but if you're not an AI authority, it's okay to write as yourself - talking about what you personally find interesting / convincing.
I'm going to post each part on LW and collect feedback before I put it all together, to avoid this failure mode in particular.
I'd move "what is agency?" from section 9 to section 3, or just spend more time on it in section 1.
I will think about it.
Under forecasting, I'd put less emphasis on takeoff speed and more emphasis on attempting to teach people that superhuman performance is very possible on nearly every task, AI is not just going to plateau at human level, that's not a plausible future.
I'm not sure it should be in the forecasting section, more like in the introduction (or, if it is harder than I think, in its own separate section).
I would actually not mention the words "inner" or "outer" alignment.
Why not?
I would cut decision theory entirely.
Hmmmm... maybe?
I would merge the EY and capabilities externality bullet points into a more general strategy section. What would the world look like if we were on a trajectory to succeed? What of our actions move us closer / further from that trajectory?
Seems like a good proposal, thanks!
Your outline has a lot of beliefs you expect your students to walk away with, but basically zero skills. If I was one of your prospective students, this would look a lot more like cult indoctrination than a genuine course where I would learn something.
What skills do you hope your students walk away with? Do you hope that they'll know how to avoid overfitting models? That they'll know how to detect trojaned networks? That they'll be able to find circuits in large language models? I'd recommend figuring this out first, and then working backwards to figure out what to teach.
Also, don't underestimate just how smart smart 15- and 16-year-olds can be. At my high school, for example, there were at least a dozen students who knew calculus at this age, and many more who knew how to program. And this was just a relatively normal public high school.
Thanks for your answer!
This is about... I wouldn't say "beliefs" - I will make a lot of caveats like "we are not sure", "there are some smart people who disagree", "this is an arguments against this view", etc. (mental note: do it MORE, thank you for your observation) - but about "motivation" and "discourse". Not about technical skills, that's true.
I have a feeling that there is an attractor "I am AI-researcher and ML is AWESOME, and I will try to make it even more AWESOME, and yes, there are this safety folks and I know some of their memes and may be they have some legitimate concerns, but we will solve it later and everything will be OK". And I think that when someone learns some ML-related technical skills before basic AI Safety concepts and discourse, it's very easy for them to get into this attractor. And from this point it's pretty hard to return back. So I want to create something like a vaccine against this attractor.
Technical skills are neccesary, but for most of them there are already good courses, textbooks and such. The skills I saw no texbooks for are "to understand AIsafetyspeak" and "to see why alignment-related problem X is hard and why obvious solutions may not work". Because of the previously mentioned attractor I think it's better to teach this skills before technical skills.
I make an assumption that average 15-16-year-olds in my target audience know how to program at least a little bit (In Russia basic programming in theory is in the mandatory school program. I don't know about US), but don't know calculus (but I think smart school student can easily understand a concept of a derivative without strict mathematical definition).
Disclaimer: My English isn't very good, but do not dissuade me on this basis - the sequence itself will be translated by a professional translator.
I want to create a sequence that a fifteen or sixteen year old smart school student can read and that can encourage them to go into alignment. Right now I'm running an extracurricular course for several smart school students and one of my goals is "overcome long inferential distances so I will be able to create this sequence".
I deliberately did not include in the topics the most important modern trends in machine learning. I'm optimizing for the scenario "a person reads my sequence, then goes to university for another four years, and only then becomes a researcher." So (with the exception of the last part) I avoided topics that are likely to become obsolete by this time.
Here is my (draft) list of topics (the order is not final, it will be specified in the course of writing):
What else should be here? Maybe something should not be here? Are there reasons why the whole idea can be bad? Any other advices?