Neural Basis for Global Workspace Theory

Hazard

Epistemic Status: Intense amateur neuroscience here. Hoping to leverage Cunningham's Law to reach enlightenment.

Kaj Sotala has a great sequence on Multiagent Models of the Mind, a sequence that's lead to a lot of fun developments in how I think about minds in general. It also introduced me to Global Workspace Theory, one of the current mainstream theories of consciousness.

When studying the mind, you can attack from different levels of abstraction. From the lowest level of studying the anatomy of neurons, all the way up to postulating abstract cognitive algorithms that people might use in their thought. Kaj's sequence lives mostly on a mid-tier level; "imagine a system that works like this", and only dips into the neuroscience enough to give you a sense that someone has in fact looked into the neural plausibility of the idea. I think this was a good decision, as most of the interesting ideas come at the mid-tier functional level ("If you think of your mind as composed of subagents communicating through a global workspace, you'd expect ABC behavior").

This post details some of the lower level brain anatomy I've been investigating in an attempt to clear up some confusions I've had from thinking a lot of about Global Workspace Theory and how it relates to consciousness and Predictive Processing. Specifically, it looks at the neural basis for attention mechanisms, and the neural basis for the Global Workspace. Most of the details come from this paper, and this paper.

The Tangential Intracortical Network (TIN) (i.e The Global Neuronal Workplace?)

People like Jeff Hawkins assert that every part of the neocortex is running the same algorithm. Even people that don't go as far as him note that there's a lot of uniformity in the neocortex (it looks like it's mostly composed of tons of cortical columns).

Most of the connections in the cortex go from one part of a column to another part of the same column. There's also plenty of cortex to [non cortex part of the brain] connections. Most of the intracortical connections are part of a big web of mid-range fibers that span the whole neocortex, called the Tangential Intracortical Network (TIN) by Baars and Newman. They point to it as a plausible physical basis for the GNW. We'll run with that.

Functionally, the global workspace is an area that disparate parts of the cortex can all compete to put a value on. This competition is winner-takes-all, and only one value can be on the network at a time. Once a value is on the network, the rest of the cortex is able to read the value, thus serving as a temporary "global state", hence the name.

I'm not exactly sure how the TIN implements this functionality, but I'm imagining it as resulting from fairly mechanistic network dynamics. Something like:

If multiple areas of cortex are active, they'll all automatically be sending signals on the TIN.
All the signals will briefly propagate, but because of [network dynamics magic] even slight differences in signal strength or random chance will lead to one overwhelming the others and taking over the entire network.

The important aspect of this to me is the mechanistic nature of the competition. A winner is not "chosen", one signal simply beats out the others. Though to be fair, when we get to the thalamus, we'll find that that thalamus has lot of connections to the cortex that seem capable of signal boosting a chunk of the cortex, allowing it to dominate and become the contents of the GNW. This creates an indirect route through which the contents of the GNW can be manipulated.

Basal Ganglia (BG) and "Action" Selection

The BG is pretty solidly understood to be central to "action" selection. The air quotes are because the BG also seems to do a selection operation on inputs from places that aren't motor regions (like the prefrontal cortex). So it's capable of doing a winner-takes-all selection of various abstract and concrete "cognitive actions".

There are a few things that makes competition on the BG different from competition on the GNW. First, the BG has several different selection channels that can act in parallel. There's at least five different loops that all follow the pattern of: cortex projecting onto BG which uses the thalamus to give a go-ahead to the cortex.

Second difference is that action selection in the BG makes use of previous learned rewards. It basically seems to be doing the evidence accumulation that Kaj outlines in Subagents, neural Turing Machines, thought selection, and blindspots. Multiple subsystem (chunks of cortex) put their plans on the BG. The first option whose accumulated expected reward exceeds some threshold is chosen. Compare this to the mechanistic network dynamical magic of the GNW.

Third difference is what happens after the "selection" occurs. In the GNW, being "selected" just means you are the signal currently dominating the GNW. This results in the rest of the cortex being able to use your value as an input. With the BG, there seem to be two possible results; either the thalamus is used to boost you into taking over the GNW (like what happens when a production rule fires in the neural Turing machine model), or the BG can use the thalamus to route your plan to another part of the cortex. This seems to be what happens with motor actions. High-level action plans made in the frontal cortex are approved by the BG and routed to the motor cortex which creates the implementation details with the help of the cerebellum.

The difference between competition on the BG and competition on the GNW also accounts for one discrepancy Kaj mentions:

There seems to me to be a conceptual difference between the kinds of actions that change the contents of consciousness, and the kinds of actions which accumulate evidence over many items in consciousness (such as iterative memories of snacks). Zylberberg et al. talk about a “winner-take-all race” to trigger a production rule, which to me implies that the evidence accumulated in favor of each production rule is cleared each time that the contents of consciousness is changed. This is seemingly incompatible with accumulating evidence over many consciousness-moments, so postulating a two-level distinction between accumulators seems like a straightforward way of resolving the issue.

Something being on the GNW can boost evidence accumulation at the basal ganglia, which is maintained across changes in the contents of GNW.

Understanding the role of the BG was big for me, because it helped make a lot more sense of where you do and don't expect to find bottlenecks. The BG can be choosing and routing actions that are being proposed by the cortex without having to wait to use the GNW. If there's a clear and obvious winner, the BG just chooses the right action and sends it along. It's only going to be novel situations when one action doesn't have a clear expected reward edge, and that's exactly when you'd expect someone's conscious attention to be on high, searching for any sliver of information that could push you towards a decisive action!

Also, don't forget that the GNW can only do one thing at a time, whereas the BG has multiple selection channels.

The Thalamus: router/central hub

Though the TIN is supposed to be the network that is the GNW, the thalamus is what allows for the guiding and management of attention. There are three attention-esque things that the thalamus seems to do:

Allow for sensory gating, controlling what sense data even makes it to the cortex for higher level processing.
Allow for affecting the contents of the global neuronal workspace.
Allow for routing information between disparate chunks of cortex.

The following is a brief overview of the thalamus that will help us build up to understanding its role in these three functions.

The thalamus is a central hub that all cortex-bound sense data has to pass through, making it a prime suspect for some sort of attentional control system. An important note, though the cortex is where most of the "intelligent" processing of data happens, it's not the only place that processes sense data. All sensory channels also connect directly to the brain stem. For vision, the optic nerve is connected to the thalamus (specifically the lateral geniculate nucleus) and also connected to the midbrain (specifically the superior colliculus), where as with audition all data goes hindbrain -> midbrain -> thalamus.

This is important for thinking about voluntary and involuntary control of attention and brain processing; we're going to be talking about attention mechanisms that act on the thalamus, which means these mechanisms hold no sway over what data the brainstem receives. One prediction I'd make from this: we see things where inattention can lead to ignoring huge changes in your environment. But the things that change in those experiments are visual features that are processed in the cortex. If you had a small black dot scuttling across the screen (what your brainstem uses to trigger the "AAAH SPIDERS" reflex), I'd bet people would still have a startle reflex even if they weren't paying attention.

Here's a picture of the thalamus, split into its various parts (the names aren't super important for the post, I just like visuals).

It's common to split the thalamus into "first-order" and "high-order" sections. The first-order parts (also called relay sections) don't interconnect that much, and mostly just shuttle their sensory data off to the cortex. The lateral geniculate nucleus routes most of the visual data, the medial geniculate nucleus routes most of the auditory data, and other parts do other things. I think of them as mostly inert pipes that just transmit whatever is coming from their fixed input location. The way that these relays connect to the cortex is highly organized; neurons that are close together in lateral geniculate nucleus space are close together in retina space and get mapped close together in visual cortex space.

In contrast, the higher-order nuclei are much more interconnected, projecting out of the thalamus in more diffuse patterns, and receiving their inputs from various chunks of cortex as opposed to from sensory channels. Some of these are called association nuclei and seem to allow for routing info from one chunk of cortex to another, and others are called nonspecific nuclei and project out to the cortex in a very diffuse manner. The former seem to be important for cortex to cortex routing, and the latter seems to be important for influencing the global neuronal workspace.

Cortex to cortex routing is used in the execution of motor plans that we talked about with the BG. The diffuse connections to the cortex are used to signal boost things onto the GNW.

For every connection taking data from the thalamus to the cortex, there's several reverse connections connecting the cortex to the thalamus. This sort of re-entrant feedback is what makes advanced control loops possible.

The Thalamic Reticular Nucleus (TRN): the gates of the thalamus

The thalamus has a "shell" around it called the thalamic reticular nucleus. It's composed of a web of inhibitory neurons, and all of the thalamus's outgoing axons pass through it. If a chunk of the TRN gets activated, it will block outgoing signals. You can think of the TRN as being composed of a bunch of little gates that can all be triggered to block outgoing data.

Since neurons spike over time (as opposed to logic gates which maintain ON or OFF), these gates are controlled by maintaining standing waves across the TRN. Baars and Newman describe it as being capable of acting as a fourier filter, selectively blocking outgoing signals that don't match the frequency of the filtering standing wave. This is really cool because it allows a lot more fine grained control than "don't let visual data in". The parts of the brain that control the TRN can learn what sorts of oscillations block what sorts of sense data. You won't get filtering at the highest conceptual levels ("block incoming visuals of bats"), but you can get more sophisticated than simple topographic blocking like "ignore everything in my peripherals".

There are at least three areas of the brain that plug into the TRN that control how it filters thalamic output.

Prefrontal Cortex (PFC): responsible for the "executive" attention; attending to something because you want to, noticing stuff that's relevant to your goals, etc.
Midbrain reticular formation (MRF): responsible for attending to novel and dangerous stuff.
Posterior Cortex (PC): responsible for attending to stimuli similar to what you were just attending to, creating a sort of recency bias.

All of these are capable of exerting control over the TRN and the thalamus. Sensory gating is what happens when the TRN is filtering out certain sense data, preventing it from ever reaching the cortex. This is very different from GNW attention, where various data is all being processed in the cortex, but only one value is being held in attention at any given moment.

Once again, just like we split apart "selection" into the BG and the GNW, we can split the phenomena of "attending to something" into having it on the GNW, and sensory gating.

I'd expect GNW attention to allow for things like the cocktail party effect, and being completely zoned out of driving, yet upon being prompted being able to recall the last few seconds of detail. Basically any situation where you weren't paying attention to something, but when you do you find details that imply you must have already been processing the situation.

Sensory gating seems to correspond to having your attention broadened or narrowed. Being intensely focused a math problem and then jumping out of your skin when your roommate taps you on your shoulder; putting down your phone and feeling like you've suddenly been flooded with all this space. A wilderness first aid instructor I know that spent a lot of time as a kid coon hunting told me a story about a similar phenomena in dogs. When they get really into a chase, their senses "shut off" in order of their least to most important as they get more and more aroused; first the hearing goes, then vision, leaving them navigating only by smell.

Outro

I'm confident in the broad strokes of the neuroscience here, but I am certainly wrong about a large number of the details. When writing this I was really faced with the magnitude of everything I just don't understand about the brain. I often questioned why I was investigating the neuroscience if I was only going to do an amateur job, and when I was really interested in the more abstract implications.

Despite all that, this has been really helpful for getting a sense of better questions to ask.

Understanding the difference between competition on the GNW and competition on the BG was really useful. Understanding the difference between attention via sensory gating and attention via the GNW was also really useful. If you've been following along with Kaj's sequence, I think these are the two main takeaways.

References

Newman, James, Bernard J. Baars, and Sung-Bae Cho. "A neural global workspace model for conscious attention." Neural Networks 10.7 (1997): 1195-1206.

Newman, James, and Bernard J. Baars. "A neural attentional model for access to consciousness: A global workspace perspective." Concepts in Neuroscience 4.2 (1993): 255-290.

Models of Thalamocortical System: scholarpedia

[-]Kaj_Sotala5y80

Nice! That certainly clarifies things. :) Mind if I edit my article to include a reference to your post?

[-]Hazard5y20

No problem!

[-]Kaj_Sotala5y40

Cool, done. :)

[EDIT: Hazard suggests that the two-level split is implemented by the basal ganglia carrying out evidence accumulation across changes in conscious content.]

[-]Gordon Seidoh Worley5y60

Thanks, I really enjoy these kinds of details about the brain. I really liked the level of detail you provided, as less would have left me wanting more and more would have probably been more than was necessary.

That's really useful feedback! Picking the level to write at was a challenge and it's good to hear that this worked for someone.

[-]Signer5y30

Functionally, the global workspace is an area that disparate parts of the cortex can all compete to put a value on. This competition is winner-takes-all, and only one value can be on the network at a time. Once a value is on the network, the rest of the cortex is able to read the value, thus serving as a temporary “global state”, hence the name.

What does it even mean for a network to have a global value? What's the evidence for that selection of winner always happening in TIN? Because it seems unnecessary for an explanation of conscious processing and attention when we already have a feedback loop with thalamus. Like, we get a visual input, it propagates through TIN, makes thalamus switch attention from external sensations to mental imagery, which when mixed with the current state of TIN after some iterations produces an action. Subliminal stimuli just don't make it to the feedback loop and therefore don't influence things very much.

[-]Hazard5y50

I don't know the concrete details about what "taking on a global value" looks like, but I visualize a grid (like in Kevin Simler's going critical post) that has a few competing colors trying to spread, and it seems reasonable that you could tweak the setting of the network such that very quickly one signal dominates the entire network.

But I don't know how to actually make something like that.

If you're interested in the TIN specifically, what I got from the paper was "here's a totally plausible candidate, and from what we know about self-organization in neural networks, it could totally do this functionality".

The biggest reason to think that there's something that's winner-take-all with a global value, is to explain bottlenecks that won't go away. Intentional conscious thought seems to be very serial, and the neural turing machine model does a decent jump of showing how a global workspace is central to this. If there's no global workspace, and there's just the thalamus doing sensory gating, and routing chunks of cortex to each other, I'd expect to see a lot more multi tasking ability.

Also, this is more a property than a constraint, if global communication works by routing then everything that's routed needs to know where it's going. This makes sense for some systems, but I think part of the cool flexibility in a GNW architecture is that all of the cortex sees the contents of the GNW, and subsystems that compute with that as an input can spontaneously arise.

If there’s no global workspace, and there’s just the thalamus doing sensory gating, and routing chunks of cortex to each other, I’d expect to see a lot more multi tasking ability.

What if there is global workspace, but it doesn't hold one value? On some level it has to be true anyway - perception is not one-dimensional. And it all depends on definition (granularity) of task - if we need to explain why global workspace can't be dominated by page with half math problems and half story, then we can use the same explanation for why the state of workspace learned to not usually be like that. I can see how interconnectedness of workspace means all parts of input vector influence all of the workspace's state, and so you can't easily process different inputs independently, but can't you process combined input? Isn't it what happens, when you first just see something, then hear "Tell me what you see", and the action is produced because of what you see and hear?

[-]Hazard5y60

Some sort of "combination" seems plausible for perception. Baars actually mentions "The binding problem" (how is it that disparate features combine to make a cohesive singular perception) but I couldn't see how their idea addressed it.

This is actually one of the reasons I'm interested in looking for stuff that might be the "clock time" of any sort of bottleneck. Some amount of simultaneity of perception seems to be a post production thing. The psychological refractory period relates to experiments where you see and hear something and have to respond, and one seems to block the other for a moment (I haven't investigated this in depth, so I'm not v familiar with the experimental paradigm). But there are other things that totally seem like simultaneously experience modalities of perception. I wonder what sorts of experiments would piece apart "actually happening at the same time" from "rapid concurrent switching + post production experience construction". I'm very interested in finding out.

LESSWRONG
LW

31