I'm an AGI safety / AI alignment researcher in Boston with a particular focus on brain algorithms. Research Fellow at Astera. See https://sjbyrnes.com/agi.html for a summary of my research and sorted list of writing. Physicist by training. Email: steven.byrnes@gmail.com. Leave me anonymous feedback here. I’m also at: RSS feed , X/Twitter , Bluesky , Mastodon , Threads , GitHub , Wikipedia , Physics-StackExchange , LinkedIn
If a spy slips a piece of paper to his handler, and then the counter-espionage officer arrests them and gets the piece of paper, and the piece of paper just says “85”, then I don’t know wtf that means, but I do learn something like “the spy is not communicating all that much information that his superiors don’t already know”.
By the same token, if you say that humans have 25,000 genes (or whatever), that says something important about how many specific things the genome designed in the human brain and body. For example, there’s something in the brain that says “if I’m malnourished, then reduce the rate of the (highly-energy-consuming) nonshivering thermogenesis process”. It’s a specific innate (not learned) connection between two specific neuron groups in different parts of the brain, I think one in the arcuate nucleus of the hypothalamus, the other in the periaqueductal gray of the brainstem (two of many hundreds or low-thousands of little idiosyncratic cell groups in the hypothalamus and brainstem). There’s nothing in the central dogma of molecular biology, and there’s nothing in the chemical nature of proteins, that makes this particular connection especially prone to occurring, compared to the huge number of superficially-similar connections that would be maladaptive (“if I’m malnourished, then get goosebumps” or whatever). So this connection must be occupying some number of bits of DNA—perhaps not a whole dedicated protein, but perhaps some part of some protein, or whatever. And there can only be so many of that type of thing, given a mere 25,000 genes for the whole body and everything in it.
That’s an important thing that you can learn from the size of the genome. We can learn it without expecting aliens to be able to decode DNA or anything like that. And Archimedes’s comment above doesn’t undermine it—it’s a conclusion that’s robust to the “procedural generation” complexities of how the embryonic development process unfolds.
I don’t understand your comment but it seems vaguely related to what I said in §5.1.1.
Yeah, if we make the (dubious) assumption that all AIs at all times will have basically the same ontologies, same powers, and same ways of thinking about things, as their human supervisors, every step of the way, with continuous re-alignment, then IMO that would definitely eliminate sharp-left-turn-type problems, at least the way that I define and understand such problems right now.
Of course, there can still be other (non-sharp-left-turn) problems, like maybe the technical alignment approach doesn’t work for unrelated reasons (e.g. 1,2), or maybe we die from coordination problems (e.g.), etc.
Modern ML systems use gradient descent with tight feedback loops and minimal slack
I’m confused; I don’t know what you mean by this. Let’s be concrete. Would you describe GPT-o1 as “using gradient descent with tight feedback loops and minimal slack”? What about AlphaZero? What precisely would control the “feedback loop” and “slack” in those two cases?
I don’t think that any of {dopamine, NE, serotonin, acetylcholine} are scalar signals that are “widely broadcast through the brain”. Well, definitely not dopamine or acetylcholine, almost definitely not serotonin, maybe NE. (I recently briefly looked into whether the locus coeruleus sends different NE signals to different places at the same time, and ended up at “maybe”, see §5.3.1 here for a reference.)
I don’t know anything about histamine or orexin, but neuropeptides are a better bet in general for reasons in §2.1 here.
As far as I can tell, parasympathetic tone is basically Not A Thing
Yeah, I recall reading somewhere that the term “sympathetic” in “sympathetic nervous system” is related to the fact that lots of different systems are acting simultaneously. “Parasympathetic” isn’t supposed to be like that, I think.
Nice, thanks!
Can’t you infer changes in gravity’s direction from signals from the semicircular canals?
If it helps, back in my military industrial complex days, I wound up excessively familiar with inertial navigation systems. An INS needs six measurements: rotation measurement along three axes (gyroscopes), and acceleration measurement along three axes (accelerometers).
In theory, if you have all six of those sensors with perfect precision and accuracy, and you perfectly initialize the position and velocity and orientation of the sensor, and you also have a perfect map of the gravitational field, then an INS can always know exactly where it is forever without ever having to look at its surroundings to “get its bearings”.
Three measurements doesn’t work. You need all six.
I’m not sure whether animals with compound eyes (like dragonflies) have multiple fovea, or if that’s just not a sensible question.
If it helps, back in my optical physics postdoc days, I spent a day or two compiling some fun facts and terrifying animal pictures into a quick tour of animal vision: https://sjbyrnes.com/AnimalVisionJournalClub2015.pdf
As the above image may make obvious, the lens focuses light onto a point. That point lands on the fovea. So I guess you’d need several lenses to concentrate light on several different fovea, which probably isn’t worth the hassle? I’m still confused as to the final details.
No, the lens focuses light into an extended image on the back of the eye. Different parts of the retina capture different part of that extended image. Any one part of what you’re looking at (e.g. the corner of the table) at any particular moment, sends out light that gets focused to one point (unless you have blurry vision), but the fleck of dirt on top of the table sends out light that gets focused to a slightly different point.
In theory, your whole retina could have rods and cones packed as densely as the fovea does. My guess is, there wouldn’t be much benefit to compensate for the cost. The cost is not just extra rods and cones, but more importantly brain real estate to analyze it. A smaller area of dense rods and cones plus saccades that move it around are evidently good enough. (I think gemini’s answer is not great btw.)
Osmotic pressure seems weird
One way to think about it is, there are constantly water molecules bumping into the membrane from the left, and passing through to the right, and there are constantly water molecules bumping into the membrane from the right, and passing through to the left. Water will flow until those rates are equal. If the right side is saltier, then that reduces how often the water molecules on the right bump into the membrane, because that real estate is sometimes occupied by a salt ion. But if the pressure on the right is higher, that can compensate.
“Procedural generation” can’t create useful design information from thin air. For example, Minecraft worlds are procedurally generated with a seed. If I have in mind some useful configuration of Minecraft stuff that takes 100 bits to specify, then I probably need to search through 2^100 different seeds on average, or thereabouts, before I find one with that specific configuration at a particular pre-specified coordinate.
The thing is: the map from seeds to outputs (Minecraft worlds) might be complicated, but it’s not complicated in a way that generates useful design information from thin air.
By the same token, the map from DNA to folded proteins is rather complicated to simulate on a computer, but it’s not complicated in a way that generates useful design information from thin air. Random DNA creates random proteins. These random proteins fold in a hard-to-simulate way, as always, but the end-result configuration is useless. Thus, the design information all has to be in the DNA. The more specific you are about what such-and-such protein ought to do, the more possible DNA configurations you need to search through before you find one that encodes a protein with that property. The complexity of protein folding doesn’t change that—it just makes it so that the “right” DNA in the search space is obfuscated. You still need a big search space commensurate with the design specificity.
By contrast, here’s a kernel of truth adjacent to your comment: It is certainly possible for DNA to build a within-lifetime learning algorithm, and then for that within-lifetime learning algorithm to wind up (after months or years or decades) containing much more useful information than was in the DNA. By analogy, it’s very common for an ML source code repository to have much less information in its code, than the information that will eventually be stored in the weights of the trained model built by that code. (The latter can be in the terabytes.) Same idea.
Unlike protein folding, running a within-lifetime learning algorithm does generate new useful information. That’s their whole point.
Hmm, I’ll be more explicit.
(1) If the human has a complete and correct specification, then there isn’t any problem to solve.
(2) If the human gets to see and understand the AI’s plans before the AI executes them, then there also isn’t any problem to solve.
(3) If the human adds a specification, not because the human directly wants that specification to hold, in and of itself, but rather because that specification reflects what the human is expecting a solution to look like, then the human is closing off the possibility of out-of-the-box solutions. The whole point of out-of-the-box solutions is that they’re unexpected-in-advance.
(4) If the human adds multiple specifications that are (as far as the human can tell) redundant with each other, then no harm done, that’s just good conservative design.
(5) …And if the human then splits the specifications into Group A which are used by the AI for the design, and Group B which trigger shutdown when violated, and where each item in Group B appears redundant with the stuff in Group A, then that’s even better, as long as a shutdown event causes some institutional response, like maybe firing whoever was in charge of making the Group A specification and going back to the drawing board. Kinda like something I read in “Personal Observations on the Reliability of the Shuttle” (Richard Feynman 1986):
The software is checked very carefully in a bottom-up fashion. First, each new line of code is checked, then sections of code or modules with special functions are verified. The scope is increased step by step until the new changes are incorporated into a complete system and checked. This complete output is considered the final product, newly released. But completely independently there is an independent verification group, that takes an adversary attitude to the software development group, and tests and verifies the software as if it were a customer of the delivered product. There is additional verification in using the new programs in simulators, etc. A discovery of an error during verification testing is considered very serious, and its origin studied very carefully to avoid such mistakes in the future. Such unexpected errors have been found only about six times in all the programming and program changing (for new or altered payloads) that has been done. The principle that is followed is that all the verification is not an aspect of program safety, it is merely a test of that safety, in a non-catastrophic verification. Flight safety is to be judged solely on how well the programs do in the verification tests. A failure here generates considerable concern.
Re-reading the post, I think it’s mostly advocating for (5) (which is all good), but there’s also some suggestion of (3) (which would eat into the possibility of out-of-the-box solutions, although that might be a price worth paying).
FYI §14.4 of my post here is a vaguely similar genre although I don’t think there’s any direct overlap.
There’s a general problem that people will want AGIs to find clever out-of-the-box solutions to problems, and there’s no principled distinction between “finding a clever out-of-the-box solution to a problem” and “Goodharting the problem specification”. We call it “clever out-of-the-box solution” when we’re happy with how it turned out, and we call it “Goodharting” when we’re sad about how it turned out, but it’s not a structural difference. So systems that systematically block the second thing are inevitably gonna systematically block the first thing, and I claim that your proposal here is no exception. That’s an alignment tax, which might be fine (depending on the scenario) but should be kept in mind.
If you say e.g. "IQ exists", will other people classify you as a good guy, or as a bad guy?
That’s not a criticism of Harden’s book though, right? I think she’s trying (among other things) to make it more socially acceptable to say that IQ exists.
Maybe the dumber they are, the more kids they want to have.
Ah, good for them! Kids are wonderful! Let us celebrate life. Here’s a Bryan Caplan post for you.
What if the "least advantaged" e.g. dumb people actively want things that will hurt everyone (including the least advantaged people themselves, in long term)? …Or maybe the dumber they are, the more they want to make decisions about scientific research. Should the biologically privileged respect them as equals (and e.g. let themselves get outvoted democratically), or should they say no?
I think that people of all IQs vote against their interests. I’m not even sure that the sign of the correlation is what you think it is; for example, intellectuals were disproportionately supportive of communism back in the day, even while Stalin and Mao were killing tens of millions. I’m sure you can think of many more such examples, which I won’t list right here in order to avoid getting into politics fights.
The answer to questions like “what if [group] wants [stupid thing]” is that various groups have always been wanting stupid things. We should just keep fighting the good fight to try to push things in a good direction on the margin. For example, I think prediction market legalization and normalization would be excellent, as would widespread truth-seeking AI tools, and of course plain old-fashioned “advocating for causes you believe in”, etc. If some people in society are unusually wise, then let them apply their wisdom towards crafting very effective advocacy for good causes, or towards making money and funding good things, etc.
And this whole thing is moot anyway, because I would be very surprised if the genetic makeup of any country changes more than infinitesimally (via differential fertility) before we get superintelligent AGIs making all the important decisions in the world. The idea of humans making important government and business decisions in a post-ASI world is every bit as absurd as the idea of moody 7-year-olds making important government and business decisions in today’s world. Like, you’re talking about small putative population correlations between fertility and other things. If those correlations are real at all, and if they’re robust across time and future cultural and societal and technological shifts etc., (these are very big and dubious “ifs”!), then we’re still talking about dynamics that will play out over many generations. You really think nothing is going to happen in the next century or two that makes your extrapolations inapplicable? Not ASI? Not other technologies, e.g. related to medicine and neuroscience? Seems extremely unlikely to me. Think of how much has changed in the last 100 years, and the rate of change has only accelerated since then.
- A process or machine prepares either |0> or |1> at random, each with 50% probability. Another machine prepares either |+> or |-> based on a coin flick, where |+> = (|0> + |1>)/root2, and |+> = (|0> - |1>)/root2. In your ontology these are actually different machines that produce different states. In contrast, in the density matrix formulation these are alternative descriptions of the same machine. In any possible experiment, the two machines are identical. Exactly how much of a problem this is for believing in wavefuntions but not density matrices is debatable - "two things can look the same, big deal" vs "but, experiments are the ultimate arbiters of truth, if experiemnt says they are the same thing then they must be and the theory needs fixing."
I like “different machines that produce different states”. I would bring up an example where we replace the coin by a pseudorandom number generator with seed 93762. If the recipient of the photons happens to know that the seed is 93762, then she can put every photon into state |0> with no losses. If the recipient of the photons does not know that the random seed is 93762, then she has to treat the photons as unpolarized light, which cannot be polarized without 50% loss.
So for this machine, there’s no getting away from saying things like: “There’s a fact of the matter about what the state of each output photon is. And for any particular experiment, that fact-of-the-matter might or might not be known and acted upon. And if it isn’t known and acted upon, then we should start talking about probabilistic ensembles, and we may well want to use density matrices to make those calculations easier.”
I think it’s weird and unhelpful to say that the nature of the machine itself is dependent on who is measuring its output photons much later on, and how, right?
A couple years ago I wrote Thoughts on “Process-Based Supervision”. I was describing (and offering a somewhat skeptical take on) an AI safety idea that Holden Karnofsky had explained to me. I believe that he got it in turn from Paul Christiano.
This AI safety idea seems either awfully similar to MONA, or maybe identical, at least based on this OP.
So then I skimmed your full paper, and it suggests that “process supervision” is different from MONA! So now I’m confused. OK, the discussion in the paper identifies “process supervision” with the two papers Let’s verify step by step (2023) and Solving math word problems with process- and outcome-based feedback (2022). I haven’t read those, but my impression from your MONA paper summary is:
Is that right?
To be clear, I’m not trying to make some point like “gotcha! your work is unoriginal!”, I’m just trying to understand and contextualize things. As far as I know, the “Paul-via-Holden-via-Steve conceptualization of process-based supervision for AI safety” has never been written up on arxiv or studied systematically or anything like that. So even if MONA is an independent invention of the same idea, that’s fine, it’s still great that you did this project. :)