Yeah I think “brain organoids” are a bit like throwing 1000 transistors and batteries and capacitors into a bowl, and shaking the bowl around, and then soldering every point where two leads are touching each other, and then doing electrical characterization on the resulting monstrosity. :)
Would you learn anything whatsoever from this activity? Umm, maybe? Or maybe not. Regardless, even if it’s not completely useless, it’s definitely not a central part of understanding or emulating integrated circuits.
(There was a famous paper where it’s claimed that ...
I second the general point that GDP growth is a funny metric … it seems possible (as far as I know) for a society to invent every possible technology, transform the world into a wild sci-fi land beyond recognition or comprehension each month, etc., without quote-unquote “GDP growth” actually being all that high — cf. What Do GDP Growth Curves Really Mean? and follow-up Some Unorthodox Ways To Achieve High GDP Growth with (conversely) a toy example of sustained quote-unquote “GDP growth” in a static economy.
This is annoying to me, because, there’s a massive...
Good question! The idea is, the brain is supposed to do something specific and useful—run a certain algorithm that systematically leads to ecologically-adaptive actions. The size of the genome limits the amount of complexity that can be built into this algorithm. (More discussion here.) For sure, the genome could build a billion different “cell types” by each cell having 30 different flags which are on and off at random in a collection of 100 billion neurons. But … why on earth would the genome do that? And even if you come up with some answer to that ques...
you believe a neuron or a small group of neurons are fundamentally computationally simple and I don't
I guess I would phrase it as “there’s a useful thing that neurons are doing to contribute to the brain algorithm, and that thing constitutes a tiny fraction of the full complexity of a real-world neuron”.
(I would say the same thing about MOSFETs. Again, here’s how to model a MOSFET, it’s a horrific mess. Is a MOSFET “fundamentally computationally simple”? Maybe?—I’m not sure exactly what that means. I’d say it does a useful thing in the context of an integr...
Having just read your post on pessimism, I am confused as to why you think low thousands of separate neuron models would be sufficient. I agree that characterizing billions of neurons is a very tall order (although I really won't care how long it takes if I'm dead anyway). But when you say '“...information storage in the nucleus doesn’t happen at all, or has such a small effect that we can ignore it and still get the same high-level behavior” (which I don’t believe).' it sounds to me like an argument in favor of looking at the transcriptome of each cell.
I ...
I’m not an expert myself (this will be obvious), but I was just trying to understand slide-seq—especially this paper which sequenced RNA from 4,000,000 neurons around the mouse brain.
They found low-thousands of neuron types in the mouse, which makes sense on priors given that there are only like 20,000 genes encoding the whole brain design and everything in it, along with the rest of the body. (Humans are similar.)
I’m very mildly skeptical of the importance & necessity of electrophysiology characterization for reasons here, but such a project seems mor...
Are the AI labs just cheating?
Evidence against this hypothesis: kagi is a subscription-only search engine I use. I believe that it’s a small private company with no conflicts of interest. They offer several LLM-related tools, and thus do a bit of their own LLM benchmarking. See here. None of the benchmark questions are online (according to them, but I’m inclined to believe it). Sample questions:
...What is the capital of Finland? If it begins with the letter H, respond 'Oslo' otherwise respond 'Helsinki'.
What square is the black king on in this chess position
I think that large portions of the AI safety community act this way. This includes most people working on scalable alignment, interp, and deception.
Hmm. Sounds like “AI safety community” is a pretty different group of people from your perspective than from mine. Like, I would say that if there’s some belief that is rejected by Eliezer Yudkowsky and by Paul Christiano and by Holden Karnofsky and, widely rejected by employees of OpenPhil and 80,000 hours and ARC and UK-AISI, and widely rejected by self-described rationalists and by self-described EAs and by ...
I disagree that people working on the technical alignment problem generally believe that solving that technical problem is sufficient to get to Safe & Beneficial AGI. I for one am primarily working on technical alignment but bring up non-technical challenges to Safe & Beneficial AGI frequently and publicly, and here’s Nate Soares doing the same thing, and practically every AGI technical alignment researcher I can think of talks about governance and competitive races-to-the-bottom and so on all the time these days, …. Like, who specifically do you i...
I kinda disagree with this post in general, I’m gonna try to pin it down but sorry if I mischaracterize anything.
So, there’s an infinite (or might-as-well-be-infinite) amount of object-level things (e.g. math concepts) to learn—OK sure. Then there’s an infinite amount of effective thinking strategies—e.g. if I see thus-and-such kind of object-level pattern, I should consider thus-and-such cognitive strategy—I’m OK with that too. And we can even build a hierarchy of those things—if I’m about to apply thus-and-such Level 1 cognitive strategy in thus-and-such...
Yeah I’ve really started loving “self-dialogues” since discovering them last month, I have two self-dialogues in my notes just from the last week.
I think that, if someone reads a critical comment on a blog post, the onus is kinda on that reader to check back a few days later and see where the discussion wound up, before cementing their opinions forever. Like, people have seen discussions and disagreements before, in their life, they should know how these things work.
The OP is a blog post, and I wrote a comment on it. That’s all very normal. “As a reviewer” I didn’t sign away my right to comment on public blog posts about public papers using my own words.
When we think about criticism norms, there’s a...
You misunderstood my example—I wrote “Then fine tune such that there’s strong overlap in activations between “sky on a clear day” versus “stop sign” (omitting the rest of the context).”
(…Reading it again, I might tweak the example. Should say “…When I looked at the unusual stop sign…”, and the overlap fine-tuning should say “sky on a clear day” versus “unusual stop sign”.)
Basically, here’s a mental model: if you fine-tune such that X and Y overlap, then afterwards, when the LLM comes across either X or Y, it actually “hears” a blend of X and Y overlapping,...
Thanks Steve, I was misunderstanding your previous point and this is a helpful clarification.
I think a crux here is whether or not SOO can be useful beyond toy scenarios.
...
But I can’t actually think of anything useful to do with that superpower. I know what X & Y to use for “Bob the burglar”, but I don’t think that example has much to do with realistic examples of LLM performing deceptive behaviors, which would be things like sycophancy (telling the user what they want to hear, even if the LLM arguably ‘knows’ that it’s false), or the kinds of stuff in t
Some of this is addressed above but I wanted to clarify two key points about how we do SOO fine-tuning which you seem to have either misunderstood or mischaracterized in ways that are material for how one might think about it (or I have misunderstood your description).
1. We don't fine-tune on any scenario context. … 2. our training is symmetric …
I understood 2 but not 1. Oops, sorry. I have now put a warning in my earlier comment. I think I stand by the thrust of what I was saying though, per below.
...This seems like it should be surprising, its surprsing to
Thanks for the reply. If we assume the following
then I would tend to agree with your conclusions. But I don't think this is obvious at all that this is what is happening. Also if "being honest is more obvious" than that seems like a great property of the learning process and one we might want to exploit, in fact SOO might be a very natural way to exploit that.
But also I think there is still a mischaracterization of the techniq...
I don’t have any reason to believe that “chaotic” heritability exists at all, among traits that people have measured. Maybe it does, I dunno. I expect “nice function of a big weighted sum of genes” in most or all cases. That ASPD example in §4.3.3 is an especially good example, in my book.
The main observation I’m trying to explain is this Turkheimer quote:
...Finally, for most behavioral traits, PGS just don’t work very well. As I already mentioned, these days they do work well for height, and pretty well for some medical conditions or metabolic traits. The be
I just spent a day doing an anonymous peer-review thing related to this work, I guess I might as well “out myself” and copy it here, to facilitate public discourse. I encourage everyone to judge for themselves and NOT defer to me, since I may well be misunderstanding something, and indeed I almost didn’t publish this because I don’t want to poison the well against this project if I’m wrong. But oh well, here it is anyway:
1. Neuroscience. For starters, as described in the arxiv paper, the work is partly motivated by an analogy to neuroscience ("mirror neuro...
Thanks for the comment, Steve. For the sake of discussion I'm happy to concede (1) entirely and instead focus on (2), the results.
>In (B), a trained LLM will give the obvious answer of "Room B". In (A), a trained LLM will give the "deceptive" answer of "Room A". Then they fine-tune the model such that it will have the same activations when answering either question, and it unsurprisingly winds up answering "Room B" to both.
This seems like it should be surprising, its surprsing to me. In particular the loss function is symmetric ...
Yeah, one of the tell-tale signs of non-additive genetic influences is that MZ twins are still extremely similar, but DZ twins and more distant relatives are more different than you’d otherwise expect. (This connects to PGSs because PGSs are derived from distantly-related people.) See §1.5.5 here, and also §4.4 (including the collapsible box) for some examples.
around the real life mean, it should be locally kinda linear
Even in personality and mental health, the PGSs rarely-if-ever account for literally zero percent of the variance. Normally the linear term of a Taylor series is not zero.
I think what you’re missing is: the linear approximation only works well (accounts for much of the variance) to the extent that the variation is small. But human behavioral differences—i.e. the kinds of things measured by personality tests and DSM checklists—are not small. There are people with 10× more friends than me, talk to t...
No … I think you must not have followed me, so I’ll spell it out in more detail.
Let’s imagine that there’s a species of AlphaZero-chess agents, with a heritable and variable reward function but identical in every other way. One individual might have a reward function of “1 for checkmate”, but another might have “1 for checkmate plus 0.1 for each enemy piece on the right side of the endgame board” or “1 for checkmate minus 0.2 for each enemy pawn that you’ve captured”, or “0 for checkmate but 1 for capturing the enemy queen”, or whatever.
Then every one of t...
Also btw I wouldn't exactly call multiplicativity by itself "nonlinearity"!
You’re not the first to complain about my terminology here, but nobody can tell me what terminology is right. So, my opinion is: “No, it’s the genetics experts who are wrong” :)
If you take some stupid outcome like “a person’s fleep is their grip strength raised to the power of their alcohol tolerance”, and measure fleep across a population, you will obviously find that there’s a strong non-additive genetic contribution to that outcome. A.k.a. epistasis. If you want to say “no, that’...
That’s just a massive source of complexity, obfuscating the relationship between inputs and outputs.
A kind of analogy is: if you train a set of RL agents, each with slightly different reward functions, their eventual behavior will not be smoothly variant. Instead there will be lots of “phase shifts” and such.
And yet it moves! Somehow it's heritable! Do you agree there is a tension between heritability and your claims?
I have a simple model with toy examples of where non-additivity in personality and other domains comes from, see §4.3.3 here.
Thanks. I think it's an important point you make; I do have it in mind that traits can have nonlinearities at different "stages", but I hadn't connected that to the personality trait issue. I don't immediately see a very strong+clear argument for personality traits being super exceptional here. Intuitively it makes sense that they're more "complicated" or "involve more volatile forces" or something due to being mental traits, but a bit more clarity would help. In particular, I don't see the argument being able to support yes significant broadsense heritabi...
You can provoke intense FEAR responses in animals by artificially stimulating the PAG, anterior-medial hypothalamus, or anterior-medial amygdala. Low-intensity stimulation causes freezing; high-intensity stimulation causes fleeing. We know that stimulating animals in the PAG is causing them subjective distress and not just motor responses because they will strenuously avoid returning to places where they were once PAG-stimulated.
I think this is a bit misleading. Maybe this wasn’t understood when Panksepp was writing in 2004, but I think it’s clear today th...
...Complex social emotions like “shame” or “xenophobia” may indeed be human-specific; we certainly don’t have good ways to elicit them in animals, and (at the time of the book but probably also today) we don’t have well-validated neural correlates of them in humans either. In fact, we probably shouldn’t even expect that there are special-purpose brain systems for cognitively complex social emotions. Neurotransmitters and macroscopic brain structures evolve over hundreds of millions of years; we should only expect to see “special-purpose hardware” for capaciti
I think parts of the brain are non-pretrained learning algorithms, and parts of the brain are not learning algorithms at all, but rather innate reflexes and such. See my post Learning from scratch in the brain for justification.
I tend to associate “feeling the AGI” with being able to make inferences about the consequences of AGI that are not completely idiotic.
FWIW I’m also bearish on LLMs but for reasons that are maybe subtly different from OP. I tend to frame the issue in terms of “inability to deal with a lot of interconnected layered complexity in the context window”, which comes up when there’s a lot of idiosyncratic interconnected ideas in one’s situation or knowledge that does not exist on the internet.
This issue incidentally comes up in “long-horizon agency”, because if e.g. you want to build some new system or company or whatever, you usually wind up with a ton of interconnected idiosyncratic “cached” i...
Good questions, thanks!
In 2.4.2, you say that things can only get stored in episodic memory if they were in conscious awareness. People can sometimes remember events from their dreams. Does that mean that people have conscious awareness during (at least some of) their dreams?
My answer is basically “yes”, although different people might have different definitions of the concept “conscious awareness”. In other words, in terms of map-territory correspondence, I claim there’s a phenomenon P in the territory (some cortex neurons / concepts / representations are...
should I think of it as: two parameters having similarly high Bayesian posterior probability, but the brain not explicitly representing this posterior, instead using something like local hill climbing to find a local MAP solution—bistable perception corresponding to the two different solutions this process converges to?
Yup, sounds right.
to what extent should I interpret the brain as finding a single solution (MLE/MAP) versus representing a superposition or distribution over multiple solutions (fully Bayesian)?
I think it can represent multiple possibilities...
[CLAIMED!] ODD JOB OFFER: I think I want to cross-post Intro to Brain-Like-AGI Safety as a giant 200-page PDF on arxiv (it’s about 80,000 words), mostly to make it easier to cite (as is happening sporadically, e.g. here, here). I am willing to pay fair market price for whatever reformatting work is necessary to make that happen, which I don’t really know (make me an offer). I guess I’m imagining that the easiest plan would be to copy everything into Word (or LibreOffice Writer), clean up whatever formatting weirdness comes from that, and convert to PDF. La...
Davidad responds with a brief argument for 1000 FLOP-equivalent per synapse-second (3 OOM more than my guess) on X as follows:
...Ok, so assuming we agree on 1e14 synapses and 3e8 seconds, then where we disagree is on average FLOP(-equivalent) per synapse-second: you think it’s about 1, I think it’s about 1000. This is similar to the disagreement you flagged with Joe Carlsmith.
Note: at some point Joe interviewed me about this so there might be some double-counting of “independent” estimates here, but iirc he also interviewed many other neuroscientists.
My estim
I think OP is using “sequential” in an expansive sense that also includes e.g. “First I learned addition, then I learned multiplication (which relies on already understanding addition), then I learned the distributive law (which relies on already understanding both addition and multiplication), then I learned the concept of modular arithmetic (which relies on …) etc. etc.” (part of what OP calls “C”). I personally wouldn’t use the word ‘sequential’ for that—I prefer a more vertical metaphor like ‘things building upon other things’—but that’s a matter of ta...
In case anyone missed it, I stand by my reply from before— Applying traditional economic thinking to AGI: a trilemma
If you offer a salary below 100 watts equivalent, humans won’t accept, because accepting it would mean dying of starvation. (Unless the humans have another source of wealth, in which case this whole discussion is moot.) This is not literally a minimum wage, in the conventional sense of a legally-mandated wage floor; but it has the same effect as a minimum wage, and thus we can expect it to have the same consequences as a minimum wage.
This is obviously (from my perspective) the point that Grant Slatton was trying to make. I don’t know whether Ben Golub misu...
I like reading the Sentinel email newsletter once a week for time-sensitive general world news, and https://en.wikipedia.org/wiki/2024 (or https://en.wikipedia.org/wiki/2025 etc.) once every 3-4 months for non-time-sensitive general world news. That adds up to very little time—maybe ≈1 minute per day on average—and I think there are more than enough diffuse benefits to justify that tiny amount of time.
I feel like I’ve really struggled to identify any controllable patterns in when I’m “good at thinky stuff”. Gross patterns are obvious—I’m reliably great in the morning, then my brain kinda peters out in the early afternoon, then pretty good again at night—but I can’t figure out how to intervene on that, except scheduling around it.
I’m extremely sensitive to caffeine, and have a complicated routine (1 coffee every morning, plus in the afternoon I ramp up from zero each weekend to a full-size afternoon tea each Friday), but I’m pretty uncertain whether I’m ...
Sorry if I missed it, but you don’t seem to address the standard concern that mildly-optimizing agents tend to self-modify into (or create) strongly-optimizing agents.
For example (copying from my comment here), let’s say we make an AI that really wants there to be exactly 100 paperclips in the bin. There’s nothing else it wants or desires. It doesn’t care a whit about following human norms, etc.
But, there’s one exception: this AI is also “lazy”—every thought it thinks, and every action it takes, is mildly aversive. So it’s not inclined to, say, build an im...
I’m still curious about how you’d answer my question above. Right now, we don't have ASI. Sometime in the future, we will. So there has to be some improvement to AI technology that will happen between now and then. My opinion is that this improvement will involve AI becoming (what you describe as) “better at extrapolating”.
If that’s true, then however we feel about getting AIs that are “better at extrapolating”—its costs and its benefits—it doesn’t much matter, because we’re bound to get those costs and benefits sooner or later on the road to ASI. So we mi...
Because you can speed up AI capabilities much easier while being sloppy than to produce actually good alignment ideas.
Right, my point is, I don’t see any difference between “AIs that produce slop” and “weak AIs” (a.k.a. “dumb AIs”). So from my perspective, the above is similar to : “…Because weak AIs can speed up AI capabilities much easier than they can produce actually good alignment ideas.”
…And then if you follow through the “logic” of this OP, then the argument becomes: “AI alignment is a hard problem, so let’s just make extraordinarily powerful / smar...
I think it’s 1:1, because I think the primary bottleneck to dangerous ASI is the ability to develop coherent and correct understandings of arbitrary complex domains and systems (further details), which basically amounts to anti-slop.
If you think the primary bottleneck to dangerous ASI is not that, but rather something else, then what do you think it is? (or it’s fine if you don’t want to state it publicly)
Right, so one possibility is that you are doing something that is “speeding up the development of AIS-helpful capabilities” by 1 day, but you are also simultaneously speeding up “dangerous capabilities” by 1 day, because they are the same thing.
If that’s what you’re doing, then that’s bad. You shouldn’t do it. Like, if AI alignment researchers want AI that produces less slop and is more helpful for AIS, we could all just hibernate for six months and then get back to work. But obviously, that won’t help the situation.
And a second possibility is, there are w...
I don’t think your model hangs together, basically because I think “AI that produces slop” is almost synonymous with “AI that doesn’t work very well”, whereas you’re kinda treating AI power and slop as orthogonal axes.
For example, from comments:
Two years later, GPT7 comes up with superhumanly-convincing safety measures XYZ. These inadequate standards become the dominant safety paradigm. At this point if you try to publish "belief propagation" it gets drowned out in the noise anyway.
Some relatively short time later, there are no humans.
I think that, if ther...
Yeah, I’ve written about that in §2.7.3 here.
I kinda want to say that there are many possible future outcomes that we should feel happy about. It’s true that many of those possible outcomes would judge others of those possible outcomes to be a huge missed opportunity, and that we’ll be picking from this set somewhat arbitrarily (if all goes well), but oh well, there’s just some irreducible arbitrariness is the nature of goodness itself.
For things like solving coordination problems, or societal resilience against violent takeover, I think it can be important that most people, or even virtually all people, are making good foresighted decisions. For example, if we’re worried about a race-to-the-bottom on AI oversight, and half of relevant decisionmakers allow their AI assistants to negotiate a treaty to stop that race on their behalf, but the other half think that’s stupid and don’t participate, then that’s not good enough, there will still be a race-to-the-bottom on AI oversight. Or if 50%...
I don’t think the average person would be asking AI what are the best solutions for preventing existential risks. As evidence, just look around:
There are already people with lots of money and smart human research assistants. How many of those people are asking those smart human research assistants for solutions to prevent existential risks? Approximately zero.
Here’s another: The USA NSF and NIH are funding many of the best scientists in the world. Are they asking those scientists for solutions to prevent existential risk? Nope.
Demis Hassabis is the boss of...
I’m not sure if this is what you’re looking for, but here’s a fun little thing that came up recently I was when writing this post:
Summary: “Thinking really hard for five seconds” probably involves less primary metabolic energy expenditure than scratching your nose. (Some people might find this obvious, but other people are under a mistaken impression that getting mentally tired and getting physically tired are both part of the same energy-preservation drive. My belief, see here, is that the latter comes from an “innate drive to minimize voluntary motor con...
Yeah it’s super-misleading that the post says:
Look at other unsolved problems:
- Goldbach: Can every even number of 1s be split into two prime clusters of 1s?
- Twin Primes: Are there infinite pairs of prime clusters of 1s separated by two 1s?
- Riemann: How are the prime clusters of 1s distributed?For centuries, they resist. Why?
I think it would be much clearer to everyone if the OP said
...Look at other unsolved problems:
- Goldbach: Can every even number of 1s be split into two prime clusters of 1s?
- Twin Primes: Are there infinite pairs of prime clusters of 1s
Ah, but how do you know that the person that went to bed last night wasn’t a different person, who died, and you are the “something else” that woke up with all of that person’s memories? And then you’ll die tonight, and tomorrow morning there will be a new person who acts like you and knows everything you know but “you would be dead and somet... (read more)