Steven Byrnes

I'm an AGI safety / AI alignment researcher in Boston with a particular focus on brain algorithms. Research Fellow at Astera. See https://sjbyrnes.com/agi.html for a summary of my research and sorted list of writing. Physicist by training. Email: steven.byrnes@gmail.com. Leave me anonymous feedback here. I’m also at: RSS feed , X/Twitter , Bluesky , Mastodon , Threads , GitHub , Wikipedia , Physics-StackExchange , LinkedIn

Sequences

Intuitive Self-Models
Valence
Intro to Brain-Like-AGI Safety

Wiki Contributions

Comments

Sorted by

How is it that some tiny number of man made mirror life forms would be such a threat to the millions of naturally occurring life forms, but those millions of naturally occurring life forms would not be an absolutely overwhelming symmetrical threat to those few man made mirror forms?

Can’t you ask the same question for any invasive species? Yet invasive species exist. “How is it that some people putting a few Nile perch into Lake Victoria in the 1950s would cause ‘the extinction or near-extinction of several hundred native species’, but the native species of Lake Victoria would not be an absolutely overwhelming symmetrical threat to those Nile perch?”

If I'm not mistaking, you've already changed the wording

No, I haven’t changed anything in this post since Dec 11, three days before your first comment.

valid EA response … EA forum … EA principles …

This isn’t EA forum. Also, you shouldn’t equate “EA” with “concerned about AGI extinction”. There are plenty of self-described EAs who think that AGI extinction is astronomically unlikely and a pointless thing to worry about. (And also plenty of self-described EAs who think the opposite.)

prevent spam/limit stupid comments without causing distracting emotions

If Hypothetical Person X tends to write what you call “stupid comments”, and if they want to be participating on Website Y, and if Website Y wants to prevent Hypothetical Person X from doing that, then there’s an irreconcilable conflict here, and it seems almost inevitable that Hypothetical Person X is going to wind up feeling annoyed by this interaction. Like, Website Y can do things on the margin to make the transaction less unpleasant, but it’s surely going to be somewhat unpleasant under the best of circumstances.

(Pick any popular forum on the internet, and I bet that either (1) there’s no moderation process and thus there’s a ton of crap, or (2) there is a moderation process, and many of the people who get warned or blocked by that process are loudly and angrily complaining about how terrible and unjust and cruel and unpleasant the process was.)

Anyway, I don’t know why you’re saying that here-in-particular. I’m not a moderator, I have no special knowledge about running forums, and it’s way off-topic. (But if it helps, here’s a popular-on-this-site post related to this topic.)

[EDIT: reworded this part a bit.] 

what would be a valid EA response to the arguments coming from people fitting these bullets:

  • Some are over-optimistic based on mistaken assumptions about the behavior of humans;
  • Some are over-optimistic based on mistaken assumptions about the behavior of human institutions;

That’s off-topic for this post so I’m probably not going to chat about it, but see this other comment too.

I think of myself as having high ability and willingness to respond to detailed object-level AGI-optimist arguments, for example:

…and more.

I don’t think this OP involves “picturing AI optimists as stubborn simpletons not being able to get persuaded finally that AI is a terrible existential risk”. (I do think AGI optimists are wrong, but that’s different!) At least, I didn’t intend to do that. I can potentially edit the post if you help me understand how you think I’m implying that, and/or you can suggest concrete wording changes etc.; I’m open-minded.

Yeah, the word “consummatory” isn’t great in general (see here), maybe I shouldn’t have used it. But I do think walking is an “innate behavior”, just as sneezing and laughing and flinching and swallowing are. E.g. decorticate rats can walk. As for human babies, they’re decorticate-ish in effect for the first months but still have a “walking / stepping reflex” from day 1 I think.

There can be an innate behavior, but also voluntary cortex control over when and whether it starts—those aren’t contradictory, IMO. This is always true to some extent—e.g. I can voluntarily suppress a sneeze. Intuitively, yeah, I do feel like I have more voluntary control over walking than I do over sneezing or vomiting. (Swallowing is maybe the same category as walking?) I still want to say that all these “innate behaviors” (including walking) are orchestrated by the hypothalamus and brainstem, but that there’s also voluntary control coming via cortex→hypothalamus and/or cortex→brainstem motor-type output channels.

I’m just chatting about my general beliefs.  :)  I don’t know much about walking in particular, and I haven’t read that particular paper (paywall & I don’t have easy access).

Oh I forgot, you’re one of the people who seems to think that the only conceivable reason that anyone would ever talk about AGI x-risk is because they are trying to argue in favor of, or against, whatever AI government regulation was most recently in the news. (Your comment was one of the examples that I mockingly linked in the intro here.)

If I think AGI x-risk is >>10%, and you think AGI x-risk is 1-in-a-gazillion, then it seems self-evident to me that we should be hashing out that giant disagreement first; and discussing what if any government regulations would be appropriate in light of AGI x-risk second. We’re obviously not going to make progress on the latter debate if our views are so wildly far apart on the former debate!! Right?

So that’s why I think you’re making a mistake whenever you redirect arguments about the general nature & magnitude & existence of the AGI x-risk problem into arguments about certain specific government policies that you evidently feel very strongly about.

(If it makes you feel any better, I have always been mildly opposed to the six month pause plan.)

I’ve long had a tentative rule-of-thumb that:

  • medial hypothalamus neuron groups are mostly “tracking a state variable”;
  • lateral hypothalamus neuron groups are mostly “turning on a behavior” (especially a “consummatory behavior”).

(…apart from the mammillary areas way at the posterior end of the hypothalamus. They’re their own thing.)

State variables are things like hunger, temperature, immune system status, fertility, horniness, etc.

I don’t have a great proof of that, just some indirect suggestive evidence. (Orexin, contiguity between lateral hypothalamus and PAG, various specific examples of people studying particular hypothalamus neurons.) Anyway, it’s hard to prove directly because changing a state variable can lead to taking immediate actions. And it’s really just a rule of thumb; I’m sure there’s exceptions, and it’s not really a bright-line distinction anyway.

The literature on the lateral hypothalamus is pretty bad. The main problem IIUC is that LH is “reticular”, i.e. when you look at it under the microscope you just see a giant mess of undifferentiated cells. That appearance is probably deceptive—appropriate stains can reveal nice little nuclei hiding inside the otherwise-undifferentiated mess. But I think only one or a few such hidden nuclei are known (the example I’m familiar with is “parvafox”).

Yup! I think discourse with you would probably be better focused on the 2nd or 3rd or 4th bullet points in the OP—i.e., not “we should expect such-and-such algorithm to do X”, but rather “we should expect people / institutions / competitive dynamics to do X”.

I suppose we can still come up with “demos” related to the latter, but it’s a different sort of “demo” than the algorithmic demos I was talking about in this post. As some examples:

  • Here is a “demo” that a leader of a large active AGI project can declare that he has a solution to the alignment problem, specific to his technical approach, but where the plan doesn’t stand up to a moment’s scrutiny.
  • Here is a “demo” that a different AGI project leader can declare that even trying to solve the alignment problem is already overkill, because misalignment is absurd and AGIs will just be nice, again for reasons that don’t stand up to a moment’s scrutiny.
  • (And here’s a “demo” that at least one powerful tech company executive might be fine with AGI wiping out humanity anyway.)
  • Here is a “demo” that if you give random people access to an AI, one of them might ask it to destroy humanity, just to see what would happen. Granted, I think this person had justified confidence that this particular AI would fail to destroy humanity …
  • … but here is a “demo” that people will in fact do experiments that threaten the whole world, even despite a long track record of rock-solid statistical evidence that the exact thing they’re doing is indeed a threat to the whole world, far out of proportion to its benefit, and that governments won’t stop them, and indeed that governments might even fund them.
  • Here is a “demo” that, given a tradeoff between AI transparency (English-language chain-of-thought) and AI capability (inscrutable chain-of-thought but the results are better), many people will choose the latter, and pat themselves on the back for a job well done.
  • Every week we get more “demos” that, if next-token prediction is insufficient to make a powerful autonomous AI agent that can successfully pursue long-term goals via out-of-the-box strategies, then many people will say “well so much the worse for next-token prediction”, and they’ll try to figure some other approach that is sufficient for that.
  • Here is a “demo” that companies are capable of ignoring or suppressing potential future problems when they would interfere with immediate profits.
  • Here is a “demo” that it’s possible for there to be a global catastrophe causing millions of deaths and trillions of dollars of damage, and then immediately afterwards everyone goes back to not even taking trivial measures to prevent similar or worse catastrophes from recurring.
  • Here is a “demo” that the arrival of highly competent agents with the capacity to invent technology and to self-reproduce is a big friggin’ deal.
  • Here is a “demo” that even small numbers of such highly competent agents can maneuver their way into dictatorial control over a much much larger population of humans.

I could go on and on. I’m not sure your exact views, so it’s quite possible that none of these are crux-y for you, and your crux lies elsewhere.  :)

Thanks!

I feel like the actual crux between you and OP is with the claim in post #2 that the brain operates outside the neuron doctrine to a significant extent.

I don’t think that’s quite right. Neuron doctrine is pretty specific IIUC. I want to say: when the brain does systematic things, it’s because the brain is running a legible algorithm that relates to those things. And then there’s a legible explanation of how biochemistry is running that algorithm. But the latter doesn’t need to be neuron-doctrine. It can involve dendritic spikes and gene expression and astrocytes etc.

All the examples here are real and important, and would impact the algorithms of an “adequate” WBE, but are mostly not “neuron doctrine”, IIUC.

Basically, it’s the thing I wrote a long time ago here: “If some [part of] the brain is doing something useful, then it's humanly feasible to understand what that thing is and why it's useful, and to write our own CPU code that does the same useful thing.” And I think “doing something useful” includes as a special case everything that makes me me.

I don't get what you mean when you say stuff like "would be conscious (to the extent that I am), and it would be my consciousness (to a similar extent that I am)," since afaik you don't actually believe that there is a fact of the matter as to the answers to these questions…

Just, it’s a can of worms that I’m trying not to get into right here. I don’t have a super well-formed opinion, and I have a hunch that the question of whether consciousness is a coherent thing is itself a (meta-level) incoherent question (because of the (A) versus (B) thing here). Yeah, just didn’t want to get into it, and I haven’t thought too hard about it anyway.  :)

Right, what I actually think is that a future brain scan with future understanding could enable a WBE to run on a reasonable-sized supercomputer (e.g. <100 GPUs), and it would be capturing what makes me me, and would be conscious (to the extent that I am), and it would be my consciousness (to a similar extent that I am), but it wouldn’t be able to reproduce my exact train of thought in perpetuity, because it would be able to reproduce neither the input data nor the random noise of my physical brain. I believe that OP’s objection to “practical CF” is centered around the fact that you need an astronomically large supercomputer to reproduce the random noise, and I don’t think that’s relevant. I agree that “abstraction adequacy” would be a step in the right direction.

Causal closure is just way too strict. And it’s not just because of random noise. For example, suppose that there’s a tiny amount of crosstalk between my neurons that represent the concept “banana” and my neurons that represent the concept “Red Army”, just by random chance. And once every 5 years or so, I’m thinking about bananas, and then a few seconds later, the idea of the Red Army pops into my head, and if not for this cross-talk, it counterfactually wouldn’t have popped into my head. And suppose that I have no idea of this fact, and it has no impact on my life. This overlap just exists by random chance, not part of some systematic learning algorithm. If I got magical brain surgery tomorrow that eliminated that specific cross-talk, and didn’t change anything else, then I would obviously still be “me”, even despite the fact that maybe some afternoon 3 years from now I would fail to think about the Red Army when I otherwise might. This cross-talk is not randomness, and it does undermine “causal closure” interpreted literally. But I would still say that “abstraction adequacy” would be achieved by an abstraction of my brain that captured everything except this particular instance of cross-talk.

Yeah duh I know you’re not talking about MCMC. :) But MCMC is a simpler example to ensure that we’re on the same page on the general topic of how randomness can be involved in algorithms. Are we 100% on the same page about the role of randomness in MCMC? Is everything I said about MCMC super duper obvious from your perspective? If not, then I think we’re not yet ready to move on to the far-less-conceptually-straightforward topic of brains and consciousness.

I’m trying to get at what you mean by:

But imagine instead that (for sake of argument) it turned out that high-resolution details of temperature fluctuations throughout the brain had a causal effect on the execution of the algorithm such that the algorithm doesn't do what it's meant to do if you just take the average of those fluctuations.

I don’t understand what you mean here. For example:

  • If I run MCMC with a PRNG given random seed 1, it outputs 7.98 ± 0.03. If I use a random seed of 2, then the MCMC spits out a final answer of 8.01 ± 0.03. My question is: does the random seed entering MCMC “have a causal effect on the execution of the algorithm”, in whatever sense you mean by the phrase “have a causal effect on the execution of the algorithm”?
  • My MCMC code uses a PRNG that returns random floats between 0 and 1. If I replace that PRNG with return 0.5, i.e. the average of the 0-to-1 interval, then the MCMC now returns a wildly-wrong answer of 942. Is that replacement the kind of thing you have in mind when you say “just take the average of those fluctuations”? If so, how do you reconcile the fact that “just take the average of those fluctuations” gives the wrong answer, with your description of that scenario as “what it’s meant to do”? Or if not, then what would “just take the average of those fluctuations” mean in this MCMC context?
Load More