Why does this approach only need to be implemented in neo-cortex like AGIs? If we have a factored series of value functions in an RL agent then we should be able to take the same approach? But I guess you are thinking that the basal ganglia learning algorithms already do this for us so it is a convenient approach?
Side note. I found the distinction between confusion and conflict a bit... confusing! Confusion here is the agent updating a belief while conflict is the agent deciding to take an action?
Thanks for this post!
I agree with just about all of it (even though it paints a pretty bleak picture). It was useful to put all of these ideas and inner/outer alignment in one place especially the diagrams.
Two quotes that stood out to me:
" "Nameless pattern in sensory input that you’ve never conceived of” is a case where something is in-domain for the reward function but (currently) out-of-domain for the value function. Conversely, there are things that are in-domain for your value function—so you can like or dislike them—but wildly out-of-domain for ...
Thanks a lot for your detailed reply and sorry for my slow response (I had to take some exams!).
Regarding terminal goals the only compelling one I have come across is coherent extrapolated volition as outlined in Superintelligence. But how to even program this into code is of course problematic and I haven't followed the literature closely since for rebuttals or better ideas.
I enjoyed your piece on Steered Optimizers, and think it has helped give me examples where the algorithmic design and inductive biases can play a part in how controllable our system is...
Hi Steve, thanks for all of your posts.
It is unclear to me how this investigation into brain-like AGI will aid in safety research.
Can you provide some examples of what discoveries would indicate that this is an AGI route that is very dangerous or safe?
Without having thought about this much it seems to me like the control/alignment problem depends upon the terminal goals we provide the AGI rather than the substrate and algorithms it is running to obtain AGI level intelligence.
Thank you for the kind words and flagging some terms to look out for in societal change approaches.
Fair enough but for it to be that powerful and used as part of our immune system we may be free of parasites because we are all dead xD.
Thanks for the informative comments. You make great points. I think the population structure of bats may have something to do with their unique immune response to these infections but definitely want to look at the bat immune system more.
This piece is super interesting, especially the toy models.
A few clarifying questions:
-- Why does it need to be its own separate module? Can you expand on this? And even if separate modules are useful (as per your toy models and different inputs, couldn't the neocortex also be running lookup table like au... (read more)