Memetics. Maybe agents are to memes as ships are to seafarers.
--Maybe memes will be prevalent in highly capable minds; at least, maybe some of the first powerful AIs we build will be meme-vessels like we are.
--Memetics is an entirely new way of understanding minds and agency. It is severely understudied as far as I can tell; the only academic work on the topic that I know of is Blackmore's The Meme Machine from twenty years ago. This should give us some reason to hope that significant progress can be made quickly.
--Better understanding of memetics might have significant benefits for things other than AI alignment also, such as helping us steer the public discourse around AI risk, helping us be more rational and truth-seeking in general, helping us predict a potential decay of public epistemology, etc.
This has long been my suspicion. Combine meme theory with a multi-agent model of mind and it starts to look like the occult concept of "egregores" is right after all - distributed agents composed of many separate programs running on many separate human minds, coordinating via communication and ritualistic behaviors, control most of the world. Corporations and gods are two obvious examples.
Develop a theory of how neural networks function, then apply that theory to either directly align neural networks, or develop an alternative approach to creating AGI that is easier to align. This seems more promising than trying to develop new foundations from scratch, since we already know that neural networks do intelligence-like optimization, the challenge is just figuring out why.
It seems a bit arrogant to just say "what I've been working on," but on the other hand, the things I've been working on have obviously often been my best ideas!
Right now I'm still thinking about how to allow for value specification in hierarchical models. There are two flanks to this problem: the problem of alien concepts and the problem of human underdetermination.
The problem of alien concepts is relatively well-understood: we want the AI to generalize in a human-like way, which runs unto trouble if there are "alien concepts" that predict the training data well but are unsafe to try to maximize. Solving this problem looks like skillful modeling of an environment that includes humans, progress in interpretability, and better learning from human feedback.
The problem of human underdetermination is a bit less appreciated: human behavior underdetermines a utility function, in the sense that you could fit many utility functions to human behavior, all equally well. But there's simultaneously a problem with human inconsistency with intuitive desiderata. Solving this problem looks like finding ways to model humans that strike a decent balance between our incompatible desiderata, or ways to encode and insert our desiderata to avoid "no free lunch" problems in general models of environments that contain humans. Wheras a lot of good progress has been made on the problem of alien concepts using fairly normal ML methods, I think the problem of human underdetermination requires a combination of philosophy, mathematical foundations, and empirical ML research.
This is largely Wei Dai's idea. We have (computationally-unbounded) models like AIXI that can be argued to capture many key aspects of human reasoning and action. But one thing they don't capture is philosophy -- our inclination to ask questions such as 'what is reasoning, really?" in the first place. We could try to come up with a model that could be argued to 'do philosophy' in addition to things like planning and reasoning. This seems promising since the lack of philosophical ability is a really glaring area where our current models are lacking -- in particular, our models can't currently account for our desire to come up with good models!
To date, agent foundations work has taken a "top-down" approach of taking highly idealized agents and gradually trying to make them more realistic. Instead, you could try to proceed in the opposite direction -- start by making models of very simple optimizing systems we actually see in the world, then try to extrapolate that theory to very powerful agents. At first, this might look like some sort of 'theoretical biology', attempting to create models of viruses, then cells, then many-bodied organisms. This might be followed up by a 'theoretical sociology' of some kind. Of course, much of the theory wouldn't generalize to AI, but it's plausible that some of it would: for example, anti-cancer mechanisms can be seen as solving a subagent alignment problem. I think this is promising because it engages what we care about(optimization pressure in the real world) more directly than approaches based on idealized agents, and has lots of data to test ideas on.
I argue for this approach on the basis that Agent Foundations is really hard to do right and that the fact that we have run into difficulties suggests a need to go right back to the philosophical foundations to confirm they are on solid ground.
FWIW that's not how I read that update. It seems like MIRI was working on some secret "entirely different type of AI not based on existing machine learning techniques" that they gave up on, but are still pursuing their agent foundations agenda.
"We are currently in a state of regrouping, weighing our options, and searching for plans that we believe may yet have a shot at working[...]Given our increased uncertainty about the best angle of attack, it may turn out to be valuable to house a more diverse portfolio of projects[...]We may commit to an entirely new approach"
To me, this makes it sound like most of their effort is not going towards research similar to the stuff they did prior to 2017, but rather coming up with something totally new, which may or may not address the same sort of problems.
(I work at MIRI.)
We're still pursuing work related to agent foundations, embedded agency, etc. We shifted a large amount of our focus onto the "new research directions" in early 2017 (post), and then we wrote a longer explanation of what we were doing and why in 2018 (post). The 2020 strategy update is an update that MIRi's scaling back work on the "new research directions," not scaling back work on the set of projects linked to agent foundations.
Thanks for the clairification!
In the OP I meant 'agent foundations' in a broad sense, as in any research aimed at establishing a foundational understanding of agency, so by "giving up on their current approach to agent foundations" I meant that MIRI was shifting away from the approach to agent foundations(broad sense) that they previously saw as most promising. I didn't mean to imply that MIRI had totally abandoned their research on agent foundations(narrow sense), as in the set of research directions initiated in the agent foundations agenda.
Gotcha. Either way, I think this is a great idea for a thread, and I appreciate you making it. :)
To avoid confusion, when I say "agent foundations" I mean one of these things:
We originally introduced the term "agent foundations" because (a) some people (I think Stuart Russell?) thought it was a better way of signposting the kind of alignment research we were doing, and (b) we wanted to distinguish our original research agenda from the 2016 "Alignment for Advanced Machine Learning Systems" agenda (AAMLS).
A better term might have been "agency foundations," since you almost certainly don't want your first AGI systems to be "agentic" in every sense of the word, but you do want to fundamentally understand the components of agency (good reasoning, planning, self-modeling, optimization, etc.). The idea is to understand how agency works, but not to actually build a non-task-directed, open-ended optimizer (until you've gotten a lot of practice with more limited, easier-to-align AGI systems).
Apparently, MIRI has given up on their current mainline approach to understanding agency and are trying to figure out what to do next. It seems like it might be worthwhile to collect some alternative approaches to the problem -- after all, intelligence and agency feature in pretty much all areas of human thought and action, so the space of possible ways to make progress should be pretty vast. By no means is it exhausted by the mathematical analysis of thought experiments! What are people's best ideas?
(By 'understanding agency' I mean research that is attempting to establish a better understanding of how agency works, not alignment research in general. So IDA would not be considered agent foundations, since it takes ML capabilities as a black-box. )
ETA: I originally wrote 'agent foundations' in place of 'understanding agency' in the above, which was ambiguous between a broad sense of the term(any research aimed at obtaining a foundational understanding of agency) and a narrow sense(the set of research directions outlined in the agent foundations agenda document). See this comment by Rob re: MIRI's ongoing work on agent foundations(narrow sense).