Currently an independent AI Safety researcher. Ex software developer, ex QA.
Prior to working in industry was involved with academic research of cognitive architectures (the old ones). I'm a generalist with a focus on human-like AIs (know a couple of things about developmental psychology, cognitive science, ethology, computational models of the mind).
Personal research vectors: ontogenetic curriculum and the narrative theory. The primary theme is consolidating insights from various mind related areas into plausible explanation of human value dynamics.
A long-time lesswronger (~8 years). Mostly been active in the local LW community (as a consumer and as an org).
Recently I've organised a sort peer-to-peer accelerator for anyone who wants to become AI Safety researcher. Right now there are 17 of us.
Was a part of AI Safety Camp 2023 (Positive Attractors team).
(My day-to-day job is literally to tackle the 'generality' of intelligence)
While having high IQ/g is useful, it is not what lies at the core of great performance. Having developed 'intelligences' around the task you're tackling, + determination/commitment/obsession, + agency is what creates great results.
I think it's better to focus on things one could change/train, sadly IQ/g is not one those things.
There is a book called The Culture Map. It maps behavioral differences across cultures including related to genuineness. For example in cultures with a direct attitude to criticism/feedback you can be more certain that some comment is truthful than in cultures with indirect feedback. (And more so if the comment is harsh)
“Alan Turing started off by wanting to 'build the brain' and ended up with a computer”
- Henry Markram, The Blue Brain Project
Recently I’ve come to terms with the idea that I have to publish my research even if it feels unfinished or slightly controversial. The mind is too complex (who would have thought), each time you think you get something, the new bit comes up and crushes your model. Time after time after time. So, waiting for at least remotely good answers is not an option. I have to “fail fast” even though it’s not a widely accepted approach among scientists nowadays.
With that, Reinforcement learning and in-depth analysis of the mentioned models will be covered later. The goal of this part is to explain the reasoning behind the choice of the surface area.
Artificial Neural Networks are the face of modern artificial intelligence and the most successful branch of it too. But success unfortunately doesn’t mean biological plausibility. Even though most ML algorithms have been inspired by the aspects of biological neural networks final models end up pretty far from the source material. This makes their usefulness for the quest of reverse engineering the mind questionable. What I mean here is that almost no insights can be directly brought back to neuroscience to help with the research. I’ll explain why so in a bit. (note, this doesn’t mean that they can not serve as an inspiration. This is very much possible and, I’m sure, a good idea.)
There are three main show-stoppers:
(Reason #1) is the use of an implausible learning algorithm (read backpropagation). There were numerous attempts at finding something analogous to the backpropagation but all of them felt short as far as I know. The core objection to the biological plausibility of backpropagation is that weight updates in multi-layered networks require access to information that is non-local (i.e. error signals generated by units many layers downstream) In contrast, plasticity in biological synapses depends primarily on local information (i.e., pre- and post-synaptic neuronal activity)[1].
(Reason #2) is the fact that ANNs are being used to solve “synthetic” problems. The vast majority of ANNs originated from industry, designed to solve some practical real-world problem. For us, this means that the training data used for these models would have almost nothing in common with the human ontogenetic curriculum (or part of it) and hence not allow us to use it for this kind of research.
(Reason #3) is the use of implausible building blocks and morphology of the network, resulting in implausible neural dynamics. (e.g. use of point neurons instead of full-blown multi-compartment neurons, the use of all types of neural interaction instead of just STDP). We still don’t know crucial those alternative modes are, but the consensus on this matter is “we need more than we use right now”.
However, there are three notable exceptions:
(The first exception) is convolutional neural networks and their successors. They have been copied from the mammalian visual cortex and are considered sufficiently biologically plausible. The success of convNets is based on the utilization of design principles specific to the visual cortex, specifically shared weights and pooling[2]. The area of applicability of these principles is an open question.
(The second) is highly biologically plausible networks like Izhikevich’s, The Blue Brain project, and others. Izhkevich’s model is built from multi-compartment high-fidelity neurons displaying all the alternative modes of neural/ganglia interaction[3]. Among the results, my personal is “Network exhibits sleeplike oscillations, gamma (40 Hz) rhythms, conversion of firing rates to spike timings, and other interesting regimes. Due to the interplay between the delays and STDP, the spiking neurons spontaneously self-organize into groups and generate patterns of stereotypical polychronous activity. To our surprise, the number of coexisting polychronous groups far exceeds the number of neurons in the network, resulting in an unprecedented memory capacity of the system.”
(The third) is Hierarchical Temporal Memory by Jeff Hawkins. It’s a framework inspired by the principles of the neocortex. It claims that the role of neocortex is to integrate the upstream sensory data and then find patterns within the combined stream of neural activity. It views neocortex as an auto-association machine (the view I at least partially endorse). HTM has been developed almost two decades ago but, to my best knowledge, failed to earn much recognition. Still, it’s the best model of this type, so it is worth considering.
Demis Hassabis. Neuroscience-Inspired Artificial Intelligence. https://www.sciencedirect.com/science/article/pii/S0896627317305093
Y. Lecun, Y. Bengio. Gradient-based learning applied to document recognition. https://ieeexplore.ieee.org/abstract/document/726791
E. Izhikevich. Polychronization: Computation with Spikes. https://direct.mit.edu/neco/article-abstract/18/2/245/7033/Polychronization-Computation-with-Spikes
if the problems are the same, it (evolution) often finds the same solution"
- Richard Dawkins, The Blind Watchmaker
Neural Darwinism, also known as the theory of neuronal group selection, is a theory that proposes that the development and organisation of the brain is similar to the process of biological evolution. According to this theory, the brain is composed of a large number of neural networks that compete with each other for resources and survival, much like biological organisms competing for resources in their environment.
The main similarity between Neural Darwinism and evolution is that they both involve a process of variation, selection, and adaptation. In biological evolution, organisms with advantageous traits are more likely to survive and reproduce, passing those traits on to their offspring. Similarly, in Neural Darwinism, neural networks that are better able to compete for resources and perform necessary functions are more likely to be preserved and strengthened, while weaker or less effective networks are pruned away.
The core claims of Neural Darwinism[1]:
ND has little to say about how cognitive processes such as decision-making, problem-solving, and other executive functions exactly occur but it provides plausible basis for future developments. It has been mostly accepted (except for the fact that it lacks “units of evolution”, replicators capable of hereditary variation[2]. I personally do not endorse this criticism and will address it in Narrative Theory section) and became a part of fruitful direction of research.
Neural Darwinism: The theory of neuronal group selection. GM Edelman. https://psycnet.apa.org/record/1987-98537-000
The Neuronal Replicator Hypothesis. Chrisantha Fernando, Richard Goldstein, Eörs Szathmáry. https://direct.mit.edu/neco/article-abstract/22/11/2809/7586/The-Neuronal-Replicator-Hypothesis
"Evolution is a tinkerer, not an engineer. It works with what is already there and takes the path of least resistance.
It is not always the most efficient solution, but it is the dumbest solution that works."
-François Jacob, "The Logic of Life: A History of Heredity"
Reverse engineering complex systems is a tricky problem. Look for example at the design of modern microprocessors, how easy it would be to see the underlying principle of the Turing machine behind all the caches, branch prediction, thread balancing, and the rest. Not that easy I would say. This example might be also applicable to reverse engineering the mind. After peeling out all the evolutionary optimizations the underlying principles of brain design might turn out modestly simple. Some of these optimizations have been already identified and well studied [1] (e.g. mechanisms of translating chemical signals to electrical ones, numerous structural decisions dedicated to spending as less wire and energy as possible, ways of getting information from different sensors to the same frequency). While we have not yet succeeded in this quest the idea of simple but powerful core principles should be part of our strategy of getting there.
Biology offers one more piece of the strategy. Although the overall amount of “design work” done by evolution is incredible, only a fraction of it is directly associated with the decision-making circuitry of the mind. The relatively slow pace of cognitive evolution means that there was not much time for reinventing cognitive architecture between subsequent species. Meaning, the most necessary parts of the apparatus have been already present in primates, some smaller part has been present in mammals, and so on.
The reasoning above together with a liberal application of the Lindy Effect[2] justify us taking a certain stance towards reverse engineering the mind - the longer some specific design principle has been around, the more it got represented in the construction of the system and the more we should put emphasis on it while building our models.
By this logic, we should expect the bulk of the design to be implemented via the use of a tiny set of the oldest mechanisms (if we take into account the timeframes of the introduction of all of them). The prime suspects for that set are:
3. Principles of Neural Design. Peter Sterling and Simon Laughlin. https://mitpress.mit.edu/9780262534680/principles-of-neural-design/
"Is an ant colony an organism, or is an organism a colony?"
- Mark A. Changizi
As of now, there are two kinds of evolution: genetic evolution and memetic evolution. The first one is your usual evolution concerned with "change in the heritable characteristics of biological populations over successive generations", responsible for all the biological diversity that we know, and happening on the scale of at least hundreds of years. Memetic evolution, strictly speaking, is just a particularly powerful set of adaptations that appeared in primates (and unique only to them), that enabled the accumulation of adaptations during a lifetime, responsible for the cultural progress of humanity, and happening on the scale from minutes to years depending on definition.
The meme as a concept was coined by biologist Richard Dawkins in his 1976 book "The Selfish Gene"[1] and refers to units of cultural information that are transmitted from person to person through imitation or other forms of cultural transmission. Like genes in biological evolution, memes can undergo processes of variation, selection, and transmission that can lead to their spread or decline within a population.
Memetic evolution became possible after the introduction of several key mechanisms: the obvious suspects such as language and social learning; their dependencies like signaling (prelinguistic communication), {niche construction, extended phenotype}[2], scaffolded upbringing, theory of mind; and development of necessary neural substrates enabling all these mechanisms (whatever they are)
The main benefit that the development of meme evolution has brought up is the drastic increase in problem solving capacity (both on the level of population and, more importantly for this post on an individual level)
While general dynamics remained the same (organisms being innovation aggregators) the details have changed:
Richard Dawkins. The Selfish gene. https://www.goodreads.com/book/show/61535.The_Selfish_Gene?from_search=true&from_srp=true&qid=oWwQlQJHhQ&rank=1
Richard Dawkins. The extended phenotype. https://www.goodreads.com/book/show/61538.The_Extended_Phenotype?from_search=true&from_srp=true&qid=Ko5sX4zBtL&rank=1
The ultimate goal of this line of research is to gain a better understanding of how human value system operates. The problem I see regarding current approaches to studying values is that we cannot study {values/desires/preferences} in isolation from the rest of cognitive mechanisms, cause according to latest theories values are just a part of a broader system governing behaviour in general. With that you have to have a decent model of human behaviour first to then be able to explain value dynamics.
To get a good theory of the mind you have to meet multiple requirements:
To meet these requirements I’ve combined insights from several fields: Developmental Psychology, Neuroscience, Ethology and Computation models of mind. The result is the Narrative Theory. The research is still far from completion but there are already interesting insights to be shared.
At this moment NT is similar to Shard Theory in many ways, but it also differs from it in many others: (1) NT is trying to integrate “more distant” but still crucial perspectives (like ethology and linguistics). (2) It is concerned with the flow of development of human behaviour as a whole instead of focusing of values. And (3) NT is only concerned with human intelligence, for now ignoring the topic of artificial agents entirely.
It’s pretty audacious to say that one can make progress on something as big as computational theory of human behaviour but there are two things giving me hope of succeeding: (1) It’s been quite a while since the last wave of overarching psychological theories. (2) The last decades were sort of a divergent period of scientific inquiry (when it comes to mind studies), efforts mostly have been focused on puzzling out the smaller pieces of The Problem and there have been no serious attempts at updating previous theories with newly found evidence (or even integrating those theories between each other). These together promise that there is now a room for improvements to be made.
Note on vocabulary. Each mentioned theory has it’s own unique language. This may present a problem for unprepared readers. While I will unpack and rephrase convoluted terms when possible, not everything can be stripped away.
This post is structured as follows: (the first section) is a list of constraints discovered by various mind related fields that are crucial for building an overarching theory mind; (the second section) presents the first claims of Narrative Theory built according with known constraints; and (the third section) covers implications of the theory, problems and future work directions.
David Marr. Vision: A Computational Investigation into the Human Representation and Processing of Visual Information. https://academic.oup.com/mit-press-scholarship-online/book/13528
Agreed. That said, some efforts in this direction do exist. for example Ekdeep Singh Lubana and his Explaining Emergence in NN with Model Systems Analysis
https://ekdeepslubana.github.io/