Michael Edward Johnson

Wiki Contributions

Comments

Sorted by

In defense of David’s point, consciousness research is currently pre-scientific, loosely akin to 1400’s alchemy. Fields become scientific as they settle on a core ontology and methodology for generating predictions from this ontology; consciousness research presently has neither.

Most current arguments about consciousness and uploading are thus ultimately arguments by intuition. Certainly an intuitive story can be told why uploading a brain and running it as a computer program would also simply transfer consciousness, but we can also tell stories where intuition pulls in the opposite direction, e.g. see Scott Aaronson’s piece here https://scottaaronson.blog/?p=1951 ; my former colleague Andres also has a relevant paper arguing against computationalist approaches here https://www.degruyter.com/document/doi/10.1515/opphil-2022-0225/html 

Of the attempts to formalize the concept of information flows and its relevance to consciousness, the most notable is probably Tononi’s IIT (currently on version 4.0). However, Tononi himself believes computers could be only minimally conscious and only in a highly fragmented way, for technical reasons relating to his theory. Excerpted from Principia Qualia:

>Tononi has argued that “in sharp contrast to widespread functionalist beliefs, IIT implies that digital computers, even if their behaviour were to be functionally equivalent to ours, and even if they were to run faithful simulations of the human brain, would experience next to nothing” (Tononi and Koch 2015). However, he hasn’t actually published much on why he thinks this. When pressed on this, he justified this assertion by reference to IIT’s axiom of exclusion – thi axiom effectively prevents ’double counting’ a physical element to be part of multiple virtual elements, and when he ran a simple neural simulation on a simple microprocessor and looked at what the hardware was actually doing, a lot of the “virtual neurons” were being run on the same logic gates (in particular, all virtual neurons extensively share the logic gates which run the processor clock). Thus, the virtual neurons don’t exist in the same causal clump (“cause-effect repertoire”) like they do in a real brain. His conclusion was that there might be small fragments of consciousness scattered around a digital computer, but he’s confident that ‘virtual neurons’ emulated on a Von Neumann system wouldn’t produce their original qualia.

At any rate, there are many approaches to formalizing consciousness across the literature, each pointing to a slightly different set of implications for uploads, and no clear winner yet. I assign more probability mass than David or Tononi that computers generate nontrivial amounts of consciousness (see here https://opentheory.net/2022/12/ais-arent-conscious-but-computers-are/) but find David’s thesis entirely reasonable.

Part of my research impact model has been something like: LLM knowledge will increasingly be built via dialectic with other LLMs. In dialectics, if you can say One True Thing in a domain, this can function as a diamond-perfect kernel of knowledge that can be used to win arguments against other AIs with, and shape LLM dialectic on this topic (analogy to soft sweeps in genetics).

Alignment research and consciousness research are not the same thing. But they’re not orthogonal, and I think I’ve seen some ways to push consciousness research forward, so I’ve been focused on trying to (1) speedrun what I see as the most viable consciousness research path, while (2) holding a preference for One True Thing type knowledge that LLMs will likely be bad at creating but good at using (E.g., STV, or these threads)

(I don’t care about influencing future LLM dialectics other than giving them true things; or rather I care but I suspect it’s better to be strictly friendly / non-manipulative)

One thing I messed up on was storing important results in pdfs; I just realized today the major training corpuses don’t yet pull from pdfs.

I’m really enjoying this series of posts. Perhaps this will be addressed in 4-6, but I’m wondering about prescriptions which might follow from c-risks.

I have a sense that humanity has built a tower of knowledge and capital and capacities, and just above the tower is a rope upwards labeled “AGI”. We don’t know exactly where it leads, aside from upwards.

But the tower we’ve built is also dissolving along multiple dimensions. Civilizations require many sorts of capital and investment, and in some ways we’ve been “eating the seed corn.” Decline is a real possibility, and if we leap for the AGI rope and miss, we might fall fairly hard and have to rebuild for a while.

There might be silver linings to a fall, as long as it’s not too hard and we get a second chance at things. Maybe the second attempt at the AGI rope could be more ‘sane’ in certain ways. My models aren’t good enough here to know what scenario to root for.

At any rate, today it seems like we’re in something like Robin Hanson’s “Dreamtime”, an era of ridiculous surplus, inefficiency, and delusion. Dreamtimes are finite; they end. I think either AGI or civilizational collapse will end our Dreamtime.

What’s worth doing in this Dreamtime, before both surpluses and illusions vanish? My sense is:

  1. If we might reach AGI during this cycle, I think it would be good to make a serious attempt at understanding consciousness. (An open note here that I stepped down from the board of QRI and ended all affiliation with the institution. If you want to collaborate please reach out directly.)
  2. If we’re headed for collapse instead of AGI, it seems wise to use Dreamtime resources to invest in forms of capital that will persist after a collapse and be useful for rebuilding a benevolent civilization.

Investing in solving the dysfunctions of our time and preventing a hard collapse also seems hugely worthwhile, if it’s tractable!

Looking forward to your conclusions.

Dennett talks about Darwin’s theory of evolution being a “universal acid” that flowed everywhere, dissolved many incorrect things, and left everything we ‘thought’ we knew forever changed. Wittgenstein’s Philosophical Investigations, with its description of language-games and the strong thesis that this is actually the only thing language is, was that for philosophy. Before PI it was reasonable to think that words have intrinsic meanings; after, it wasn’t.

“By their fruits you shall know them.”

A frame I trust in these discussions is trying to elucidate the end goal. What does knowledge about consciousness look like under Eliezer’s model? Under Jemist’s? Under QRI’s?

Let’s say you want the answer to this question enough you go into cryosleep with the instruction “wake me up when they solve consciousness.” Now it’s 500, or 5000, or 5 million years in the future and they’ve done it. You wake up. You go to the local bookstore analogue, pull out the Qualia 101 textbook and sit down to read. What do you find in the pages? Do you find essays on how we realized consciousness was merely a linguistic confusion, or equations for how it all works?

As I understand Eliezer’s position, consciousness is both (1) a linguistic confusion (leaky reification) and (2) the seat of all value. There seems a tension here, that would be good to resolve since the goal of consciousness research seems unclear in this case. I notice I’m putting words in peoples’ mouths and would be glad if the principals could offer their own takes on “what future knowledge about qualia looks like.”

My own view is if we opened that hypothetical textbook up we would find crisp equations of consciousness, with deep parallels to the equations of physics; in fact the equations may be the same, just projected differently.

My view on the brand of physicalism I believe in, dual aspect monism, and how it constrains knowledge about qualia: https://opentheory.net/2019/06/taking-monism-seriously/

My arguments against analytic functionalism (which I believe Eliezer’s views fall into): https://opentheory.net/2017/07/why-i-think-the-foundational-research-institute-should-rethink-its-approach/

Goal factoring is another that comes to mind, but people who worked at CFAR or Leverage would know the ins and outs of the list better than I.

Speaking personally, based on various friendships with people within Leverage, attending a Leverage-hosted neuroscience reading group for a few months, and having attended a Paradigm Academy weekend workshop.

I think Leverage 1.0 was a genuine good-faith attempt at solving various difficult coordination problems. I can’t say they succeeded or failed; Leverage didn’t obviously hit it out of the park, but I feel they were at least wrong in interesting, generative ways that were uncorrelated with the standard and more ‘boring’ ways most institutions are wrong. Lots of stories I heard sounded weird to me, but most interesting organizations are weird and have fairly strict IP protocols so I mostly withhold judgment.

The stories my friends shared did show a large focus on methodological experimentation, which has benefits and drawbacks. Echoing some of the points, I do think when experiments are done on people, and they fail, there can be a real human cost. I suspect some people did have substantially negative experiences from this. There’s probably also a very large set of experiments where the result was something like, “I don’t know if it was good, or if was bad, but something feels different.”

There’s quite a lot about Leverage that I don’t know and can’t speak to, for example the internal social dynamics.

One item that my Leverage friends were proud to share is that Leverage organized the [edit: precursor to the] first EA Global conference. I was overall favorably impressed by the content in the weekend workshop I did, and I had the sense that to some degree Leverage 1.0 gets a bad rap simply because they didn’t figure out how to hang onto credit for the good things they did do for the community (organizing EAG, inventing and spreading various rationality techniques, making key introductions). That said I didn’t like the lack of public output.

I’ve been glad to see Leverage 2.0 pivot to progress studies, as it seems to align more closely with Leverage 1.0’s core strength of methodological experimentation, while avoiding the pitfalls of radical self-experimentation.

Would the world have been better if Leverage 1.0 hadn’t existed? My personal answer is a strong no. I’m glad it existed and was unapologetically weird and ambitious in the way it was and I give its leadership serious points for trying to build something new. 

Hi Steven,

This is a great comment and I hope I can do it justice (took an overnight bus and am somewhat sleep-deprived).

First I’d say that neither we nor anyone has a full theory of consciousness. I.e. we’re not at the point where we can look at a brain, and derive an exact mathematical representation of what it’s feeling. I would suggest thinking of STV as a piece of this future full theory of consciousness, which I’ve tried to optimize for compatibility by remaining agnostic about certain details. 

One such detail is the state space: if we knew the mathematical space consciousness ‘live in’, we could zero in on symmetry metrics optimized for this space. Tononi’s IIT for instance suggests it‘s a vector space — but I think it would be a mistake to assume IIT is right about this. Graphs assume less structure than vector spaces, so it’s a little safer to speak about symmetry metrics in graphs.

Another ’move’ motivated by compatibility is STV’s focus on the mathematical representation of phenomenology, rather than on patterns in the brain. STV is not a neuro theory, but a metaphysical one. I.e. assuming that in the future we can construct a full formalism for consciousness, and thus represent a given experience mathematically, the symmetry in this representation will hold an identity relationship with pleasure.

Appreciate the remarks about Smolensky! I think what you said is reasonable and I’ll have to think about how that fits with e.g. CSHW. His emphasis is of course language and neural representation, very different domains.

>(Also, not to gripe, but if you don't yet have a precise definition of "symmetry", then I might suggest that you not describe STV as a "crisp formalism". I normally think "formalism" ≈ "formal" ≈ "the things you're talking about have precise unambiguous definitions". Just my opinion.)

I definitely understand this. On the other hand, STV should basically have zero degrees of freedom once we do have a full formal theory of consciousness. I.e., once we know the state space, have example mathematical representations of phenomenology, have defined the parallels between qualia space and physics, etc, it should be obvious what symmetry metric to use. (My intuition is, we’ll import it directly from physics.) In this sense it is a crisp formalism. However, I get your objection and more precisely it’s a dependent formalism, and dependent upon something that doesn’t yet exist.

>(FWIW, I think that "pleasure", like "suffering" etc., is a learned concept with contextual and social associations, and therefore won't necessarily exactly correspond to a natural category of processes in the brain.)

I think one of the most interesting questions in the universe is whether you’re right, or whether I’m right! :) Definitely hope to figure out good ways of ‘making beliefs pay rent’ here. In general I find the question of “what are the universe’s natural kinds?” to be fascinating.

Hi Steven, amazing comment, thank you. I’ll try to address your points in order.

0. I get your Mario example, and totally agree within that context; however, this conclusion may or may not transfer to brains, depending on how e.g. they implement utility functions. If the brain is a ‘harmonic computer’ then it may be doing e.g. gradient descent in such a way that the state of its utility function can be inferred from its large-scale structure.

1. On this question I’ll gracefully punt to lsusr‘s comment :) I endorse both his comment and framing. I’d also offer that dissonance is in an important sense ‘directional’ — if you have a symmetrical network and something breaks its symmetry, the new network pattern is not symmetrical and this break in symmetry allows you to infer where the ‘damage’ is. An analogy might be, a spider’s spiderweb starts as highly symmetrical, but its vibrations become asymmetrical when a fly bumbles along and gets stuck. The spider can infer where the fly is on the web based on the particular ‘flavor’ of new vibrations. 

2. Complex question. First I’d say that STV as technically stated is a metaphysical claim, not a claim about brain dynamics. But I don’t want to hide behind this; I think your question deserves an answer. This perhaps touches on lsusr’s comment, but I’d add that if the brain does tend to follow a symmetry gradient (following e.g. Smolensky’s work on computational harmony), it likely does so in a fractal way. It will have tiny regions which follow a local symmetry gradient, it will have bigger regions which span many circuits where a larger symmetry gradient will form, and it will have brain-wide dynamics which follow a global symmetry gradient. How exactly these different scales of gradients interact is a very non-trivial thing, but I think it gives at least a hint as to how information might travel from large scales to small, and from small to large.

3. I think my answer to (2) also addresses this;

4. I think, essentially, that we can both be correct here. STV is intended to be an implementational account of valence; as we abstract away details of implementation, other frames may become relatively more useful. However, I do think that e.g. talk of “pleasure centers” involves potential infinite regress: what ‘makes’ something a pleasure center? A strength of STV is it fundamentally defines an identity relationship.

I hope that helps! Definitely would recommend lsusr’s comments, and just want to thank you again for your careful comment.

Neural Annealing is probably the most current actionable output of this line of research. The actionable point is that the brain sometimes enters high-energy states which are characterized by extreme malleability; basically old patterns ‘melt’ and new ones reform, and the majority of emotional updating happens during these states. Music, meditation, and psychedelics are fairly reliable artificial triggers for entering these states. When in such a malleable state, I suggest the following:

>Off the top of my head, I’d suggest that one of the worst things you could do after entering a high-energy brain state would be to fill your environment with distractions (e.g., watching TV, inane smalltalk, or other ‘low-quality patterns’). Likewise, it seems crucial to avoid socially toxic or otherwise highly stressful conditions. Most likely, going to sleep as soon as possible without breaking flow would be a good strategy to get the most out of a high-energy state- the more slowly you can ‘cool off’ the better, and there’s some evidence annealing can continue during sleep. Avoiding strong negative emotions during such states seems important, as does managing your associations (psychedelics are another way to reach these high-energy states, and people have noticed there’s an ‘imprinting’ process where the things you think about and feel while high can leave durable imprints on how you feel after the trip). It seems plausible that taking certain nootropics could help strengthen (or weaken) the magnitude of this annealing process.

(from The Neuroscience of Meditation)

Load More