It’s occurred to me that training a future, powerful AI on your brainwave patterns might be the best way for it to build a model of you and your preferences. It seems that it’s incredibly hard, if not impossible, to communicate all your preferences and values in words or code, not least because most of these are unknown to you on a conscious level.
Of course, there might be some extreme negatives to the AI having an internal model of you, but I can’t see a way around if we’re to achieve “do what I want, not what I literally asked for”.
This is a linkpost for https://arxiv.org/abs/2308.00678.