Follow-up to my Metaculus Journal post on Action Ontologies, Computer Ontologies
Cross-posted from Putanumonit


Minor Delays

Some major changes happened in my life recently:

Yes, I’ve been lifting, thank you for noticing.

We’ve also had a baby daughter!

She’s one of the reasons why I haven’t been writing much lately. It’s not that I’ve been overwhelmed with work — my wonderful mother in law has helped out and the baby spends most of her time sleeping or eating anyway, neither of which require extraordinary effort on my part.

But I used to get anxious if at the end of the day I had accomplished little work, and sometimes that anxiety would push me to write late into night to make up for it. Now at the end of each day, regardless of my productivity in other domains. I feel happy and satisfied that I’ve kept my child alive for another 24 hours. And what else do I need?

This post is a follow-up to my essay on Action Ontologies and Computer Ontologies. Go read that now and leave that tab open for later to browse some of the great writing on the Metaculus Journal.

Inferred Realities

My Metaculus essay focused on how our brains construct the reality we inhabit, and how that reality may be different for an AI. It expanded on three main points.

The first is that not a single neuron in our brain is labeled “tomato detector” or even “redness detector” or even “vision”. Our brains are born with a particular architecture (e.g., some neurons happen to be hooked up to light-sensitive cells in the retina), but all the components of our perception down to the very basics are merely inferences we learn over time.

Second: our brains evolved to guide action and keep our bodies in metabolic balance, so the model of the world our minds construct depends not only on our eyes but also on our hands and even our blood vessels. An artificial mind that doesn’t operate in a human body may require and develop a different model of reality with an entirely different ontology, even if it was pursuing a similar goal to us like driving a car.

A consequence of these two points is that “reality” itself is just a perception and an inference. A tomato on a table seems real because you can pick it up with your hands and absorb its nutrients in your stomach. The image of the tomato above doesn’t seem real because you can’t do either with it, even though the screen image generates the same light waves hitting your eye as a real tomato would. The reality you perceive in real objects is a property of the map, not the territory.

Permission to Speculate

These three points may seem wild and unintuitive, but they’re pretty well grounded in the contemporary science of how our brains work and how they produce the contents of our consciousness. I elaborated on them (with references) in the Metaculus essay so I won’t do so here.

The rest of this post is going to use those three points to speculate wildly on baby consciousness and AI, two topics I’m not an expert on but am learning a lot about every day, whether I want to or not.

Here’s roughly the conversation I had before writing this post with a friend who has a background in developmental psychology:

— I’m curious about the development of a newborn’s subjective experience. For example, what are the prerequisites for a child perceiving a word of separate and real objects?
— How would you even know what’s a baby experiences? We prefer to focus on what the baby can do and react to.
— So your baby behaviorism can’t disprove me if I speculate wildly on newborn consciousness?
— $#&%!@

As for my thoughts about AI, I ran them past a few friends who work in the field. One told me that they’re obviously true, one that they’re clearly false, and the rest shrugged. So this post isn’t quite in clown on fire territory, but someone should probably check the clown for smoke just to be sure.

World From Scratch

You’re a brain trying to predict its own state, and all you have is some innate wiring. How do you start? By noticing simple patterns, and then building up to complex ones.

A simple pattern: some neurons seem to fire together regardless of what else is happening in the brain. Perhaps when (unknown to you) the light goes on in the room all the neurons connected to the retina experience a burst of activity, which propagates to other neurons in the visual cortex but not to the insular cortex. Your brain infers that they belong to the same sense modality (in this example, one you’ll later call “vision”). Learning to predict the state of your visual cortex neurons (for example, but anticipating the state of one part based on the state of another) is learning to see.

Once vision is established as a sense, your brain notices that retinal cells right next to each other often receive the same input — our visual field is composed mostly of monochromatic surfaces and not of random visual noise. The simpler patterns are combined into more complicated ones, such as edges and surfaces and then faces and other objects.

Some patterns seem very important. For example, an experience of hunger tends to take over the entire brain and it correlates with observations such as finding yourself crying loudly and flailing your limbs. Hunger propagates an error signal throughout the brain, as if an important built-in expectation of the brain’s state is being violated. Anything that correlates with hunger gets more attention paid to it.

This is how the baby learns about the most important thing in its world: its mom. Roughly every three hours from the moment a baby is born, it receives a highly-correlated multi-sensory signal that consists of face, voice, smell, warmth, milk, and the sensation of hunger subsiding.

Mom-recognition is so important is comes long before self-recognition. At a month old, my daughter responds to my wife’s voice but is seemingly unaware that the fist punching her nose sometimes is her own. Infants recognize themselves in the mirror only around 18 months of age, but they recognize their primary caregiver on sight at 3-4 months.

Conceptual Learning

A child builds up their world model from observing regular patterns, but this only works for simple things that form a clear cluster (like human faces). We live, however, in a world of complex narratives and mental constructs. The way humans learn these concepts is not through mere pattern detection but through language. In particular, through concept-teaching speech from their primary caregiver.

Here’s an excerpt from How Emotions Are Made by Lisa Feldman Barret which explores this in great detail:

The developmental psychologists Sandra R. Waxman and Susan A. Gelman, leaders in this area of research, hypothesize that words invite an infant to form a concept, but only when adults speak with intent to communicate: “Look, sweetie, a flower!” […]

Fei Xu and her students have demonstrated this experimentally by showing objects to ten-month-old infants, giving the objects nonsense names like “wug” or “dak”. The objects were wildly dissimilar, including dog-like and fish-like toys, cylinders with multicolored beads, and rectangles covered in foam flowers. Each one also made a ringing or rattling noise. Nevertheless, the infants learned patterns. Infants who heard the same nonsense name across several objects, regardless of their appearance, expected those objects to make the same noise. Likewise, if two objects had different names, the infants expected them to make different noises. […]

From an infant’s perspective, the concept “Wug” did not exist in the world before an adult taught it to her. This sort of social reality, in which two or more people agree that something purely mental is real, is a foundation of human culture and civilization. Infants thereby learn to categorize the world in ways that are consistent, meaningful, and predictable to us (the speakers), and eventually to themselves. Their mental model of the world becomes similar to ours, so we can communicate, share experiences, and perceive the same world.

How Emotions Are Made goes on to argue (quite convincingly) that our emotions such as anger and pride are not innate but are concepts taught to us by others when we are children. Consider that what unites such disparate instance of “anger” as a child screaming and tossing toys around and an adult coldly seething in the presence of their boss are mostly mental inferences about the goals and subjective experience of the angry person, not any objective “signature” in their appearance or sound.

After childhood we keep on learning concepts from other people (mostly from Scott Alexander) and these concepts make up the world we perceive. But getting this process jumpstarted requires a very particular sequence of steps:

  1. The chaotic mind of a newborn
  2. Telling the senses (internal and external) apart
  3. Recognizing basic percepts in key senses
  4. Recognizing primary caregivers through the significant multi-sensory experience of interacting with them
  5. Learning to pay particular attention to your caregivers’ speech among all the ambient sounds and separating it into words
  6. Identifying “concept teaching mode” in adults’ speech and using these learned concepts to acquire distinct percepts

One important thing about this sequence is that it’s very particular to human children growing up in a society of humans. Nothing remotely like this exists for dogs, or octopuses, or for any current AI architecture. This doesn’t mean that AIs can’t learn concepts in principle, but it means that if they do they’ll have to learn them in a very different way from how humans manage it.

GPT is a Wordcel

A long time ago in a land far far away, a team of AI researchers wanted to train a text-prediction AI. They fed it relatively unfiltered text produced by a wide variety of people on the internet. The resulting model was very good at predicting the sort of text people would write online, including all their biases and bigotry and “misinformation”.

The mainstream journalists in that land, who were generally hostile to and fearful of emerging technologies, demanded that language models be trained only on text free of all bias and falsity. That is, that they be trained only on the recent contents of mainstream journalistic publications. The resulting language model was able to write excellent columns for mainstream papers but was useless for any other task. And so the people of that land fired their journalists and replaced them with GPT, and all lived happily ever.

Ok, so GPT-3 can apparently write newspaper articles. But we already knew that these often consist of nothing but fnords and a randomly arranged set of affect-laden words. Could GPT-5 write a Scott Alexander essay? Could it actually think in concepts, combine them in original ways, and invent new ones that refer to things in real world?

This is either an existential question or merely a trillion dollar question, depending on whether you’re an AI pessimist or optimist. (What, you couldn’t destroy the world and/or make a trillion dollars if you had access to a billion silicon Scott Alexanders running at hyperspeed?) I can’t give a definite answer to this question, of course, but I can offer a few intuitions in both directions.

The first thing you could ask is, isn’t GPT-3 close enough already? My intuition is that it isn’t, because I can write like GPT-3 and I can also write in a completely different way and those feel qualitatively different and produce different outputs.

GPT-3 predicts missing words or phrases given some context, which is something that humans can do easily on autopilot. “Emma walked into the park and saw ____ and then she ______”. “Elon Musk’s statement was condemned by ____ who said that it ______”. As Sarah Constantin noted, humans can also skim-read this sort of text, whether generated by GPT or an ad-libbing human, and they get the general gist without noticing gaps in logic. This sort of writing, stringing together symbols that kinda fit together but don’t actually convey much substance, was the original meaning of the now-viral word wordcel.

I can do GPT-style wordcelry and also a different type of writing, one in which I think of ideas that I then translate to words. This process requires concentrated System 2 attention to both write and parse, while GPT wordcelry can be done by System 1 alone. Individual sentences or paragraphs can then be filled in automatically, making up a large portion of the text perhaps, but you can’t generate an entire Scott Alexander essay on autopilot.

But will GPT-N be able to do it even if GPT-3 can’t?

Robot Shortcuts

The argument in favor is that the GPTs have been getting better and better at generalized text prediction simply through increasing the number of model parameters and token in the training data. “Scott Alexander wrote about the surprising connection between Nazifurs and heterotic string theory, explaining that ____’ seems like the sort of text prediction task GPT is getting really good at without requiring a wholly new architecture.

Why would that change? Going from simple patterns to complex, abstract ones we see that:

  1. Even older models are basically perfect at stringing letters together in a word, aka spelling. You only need to see a word in text a few times to learn how it’s spelled.
  2. GPT-3 rarely makes mistakes in stringing words together in a sentence, aka grammar. You probably don’t need more than a few dozen examples of a word in a sentence to figure out how the word fits grammatically.
  3. GPT-3 is fairly good at stringing sentences logically together in a paragraph. It got much better at this than GPT-2, an improvement that required going from millions of words in the training corpus to hundreds of billions.
  4. GPT-3 doesn’t yet do a good job stringing paragraphs together with purpose to say anything new and meaningful. How many words of training data will it take to get there?

The answer to the last question may be “only a few more”, in which case GPT-4 will take over this blog and most human writing jobs in the near future. Or the answer could be “orders of magnitude more words than humans have written so far”, in which case better language models would have to do something other than brute forcing their way through undifferentiated text dumps. They’ll need “learning shortcuts” telling them what to pay attention to and what to ignore.

Human babies have this shortcut. When a caregiver (identified through their consistent multi-sensory impact on the baby’s body budget) utters the syllables “look honey, it’s a…”, the baby knows that the string of sounds it will hear next is worth paying much closer attention to than the million other sounds it has heard that day. For AI, this shortcut would probably have to be built as opposed to emerging naturally from bigger and bigger training sets.

Will AI Infer Reality?

Speaking of Scott Alexander essays, I was surprised by this section in his review of the Yudkowski—Ngo AI debate, emphasis mine:

I found it helpful to consider the following hypothetical: suppose you tried to get GPT-∞ — which is exactly like GPT-3 in every way except infinitely good at its job — to solve AI alignment through the following clever hack. You prompted it with “This is the text of a paper which completely solved the AI alignment problem: ___ ” and then saw what paper it wrote. Since it’s infinitely good at writing to a prompt, it should complete this prompt with the genuine text of such a paper. A successful pivotal action!

I disagree that GPT’s job, the one that GPT-∞ is infinitely good at, is answering text-based questions correctly. It’s the job we may wish it had, but it’s not, because that’s not the job its boss is making it do. GPT’s job is to answer text-based questions in a way that would be judged as correct by humans or by previously-written human text. If no humans, individually or collectively, know how to align AI, neither would GPT-∞ that’s trained on human writing and scored on accuracy by human judges.

Humans aren’t born knowing that physical reality exists just because we live in it. We have to slowly infer it, and eating the wrong right mushroom can even make us forget this fact temporarily. An AI born and raised in the world of human text could in principle learn to infer that physical reality is a thing and what its properties are, but it’s not a given that this will happen.

The main goal of this post isn’t to make strong claims about what AI might or might not do, but to dispel the anthropocentrism of how many people (including me until quite recently) think about possible minds. Humans think in concepts and perceive ourselves occupying physical reality, and we take these two things for granted. But we weren’t born doing either, as the newborn on my lap can attest.


A parting thought: you just read two posts that seemed full of ideas and concepts. Did I manage to actually convey something meaningful to you or did I just wordcel 5,000 nice-sounding words together? How would you be sure?

New Comment
3 comments, sorted by Click to highlight new comments since:

I disagree that GPT’s job, the one that GPT-∞ is infinitely good at, is answering text-based questions correctly. It’s the job we may wish it had, but it’s not, because that’s not the job its boss is making it do. GPT’s job is to answer text-based questions in a way that would be judged as correct by humans or by previously-written human text. If no humans, individually or collectively, know how to align AI, neither would GPT-∞ that’s trained on human writing and scored on accuracy by human judges.

This is actually also an incorrect statement of GPT's job. GPT's job is to predict the most likely next token in the distribution its corpus was sampled from. GPT-∞ would give you, uh, probably with that exact prompt a blog post about a paper which claims that it solves the alignment problem. It would be on average exactly the same quality as other articles from the internet containing that text.

Did I manage to actually convey something meaningful to you or did I just wordcel 5,000 nice-sounding words together? How would you be sure?

I think you can actually judge that by the value/effort balance of the communication.

I see a kind of spectrum between teaching and.. let's call it meditation (as in "meditate on X"), where both can convey meaningful ideas and concepts, but the latter takes much more effort to get anything useful, and yields more random results.

With teaching, I'm probably getting all the intended ideas on my first interpretation, and then it's reliably useful to me. With meditation, I have to bring a lot more effort and ideas myself to try to get something meaningful out of it, and I might still be wrong. You can pick any random sentence and ponder it as a koan, free-associating about it until you feel like you learned or realized something useful. But that'll feel very different from just going on wikipedia and learning something useful.

It is a spectrum though. GPT-3 doesn't give random sentences, but when I play around with it and occasionally find something "useful," it feels more like I'm doing the koan thing (more effortful). Reading good blog posts is much less effortful per value added.

I hope that some of the "new" parts from what you are trying to convey are 

“reality” itself is just a perception and an inference

and 

The reality you perceive in real objects is a property of the map, not the territory.

They are not new, by any means, but the idea that "reality" can be perceived more or less by "Direct Observation" seems to hinder a lot of discourse here. The territory may be hiding in the mist somewhere, but we have to contend with the observation that our perception, logic etc. are models all the way down. And different stacks of models end up inferring completely different "realities," as we constantly observe, but anti-memetically ignore.