Epistemic status: A crazy idea I had that probably won't work. But: It's a very unusual and creative approach to AI alignment, and I suspect this will inspire new ideas in other researchers.
I outline a general approach to achieve this goal that counterintuitively relies on confusing the AI on purpose.
Basic observations
This approach relies on a number of basic observations about the nature of Artificial Intelligence.
An AI is different from a human in multiple ways. This is part of what makes AI alignment such a difficult problem, because our intuitions for how people act often do not apply to AI's. However, several of these differences between AI and humans actually work in our... (read 1900 more words →)
It looks like their coursework has already started, but I have contacted the organizer. Thanks!