All of Adam Kaufman 's Comments + Replies

Kudos for releasing a concept of a plan! Some thoughts:

Regarding the first safety case:

  • The amount of progress in mech interp required to make the first safety case suitable seems overly optimistic to me; I basically think that most of the "limitations" are in fact pretty serious. However, I appreciate that the attempt to include concrete requirements.
  • I believe that getting good results from the following experiments might be particularly unrealistic:
    • "In order to robustify our evals against sandbagging, we ran an experiment where we steered one or more trut
... (read more)

I have such a strong intuitive opposition to the Internal Reaction Drive that I agree with your conclusion that we should update away from any theory which allows it.  Then again, perhaps it is impossible to build such a drive for the merely practical reason that any material with a positive or negative index of refraction will absorb enough light to turn the drive into an expensive radiator.

Especially given the recent Nobel prize announcement, I think the most concerning piece of information is that there are cultural forces from within the physics community discouraging people from trying to answer the question at all.

You need abstractions to think and plan at all with limited compute, not just to speak. I would guess that plenty animals which are incapable of speaking also mentally rely on abstractions. For instance, when foraging for apples, I suspect an animal probably has a mental category for apples, and treats them as the same kind of thing rather than completely unrelated configurations of atoms.

1invertedpassion
I agree. Limited resources drives the need to have abstractions. Communication probably simply provides another drive to abstract at even higher levels. I would imagine for many animals evolution would have pre-programmed abstractions like "object" or "fruits" so they come pre-parsed.

The planet Mercury is a pretty good source of material:

Mass:  kg (which is about 70% iron)

Radius:  m

Volume:  m^3

Density:  kg/m^3 

Orbital radius:   m

A spherical shell around the sun at roughly same radius as Mercury's orbit would have a surface area of  m^2, and spreading out Mercury's volume over this area gives a thickness of about 1.4 mm. This means Mercury alone provides ample material for collecting all of the Sun's energy via reflect... (read more)

A better way to do the memory overwrite experiment is to prepare a list of what’s in the box to match each of ten possible numbers, then have someone provide a random number while your short term memory doesn’t work and see if you can successfully overwrite the memory that corresponds to that number (as measured by correctly guessing the number much later).

I’m confused. I know that it is like something to be me (this is in some sense the only thing I know for sure). It seems like there rules which shape the things I experience, and some of those rules can be studied (like the laws of physics). We are good enough at understanding some of these rules to predict certain systems with a high degree of accuracy, like how an asteroid will orbit a star or how electrons will be pushed through a wire by a particular voltage in a circuit. But I have no way to know or predict if it is like something to be a fish or GPT-... (read more)

1Signer
You don't. All your specific experiences are imprecise approximations: you can't be sure what exact color you saw for how many nanoseconds, you can't be sure all your brain except small part implementing only current thought haven't evaporated microsecond ago. So you can have imprecise models of a fish brain the same way you have imprecise models of your brain - your awareness of your brain is casually connected to your brain the same way your thoughts can be casually connected to a fish brain. You just can't be fully fish.
2Charbel-Raphaël
But I can predict what you say; I can predict if you are confused by the hard problem just by looking at your neural activation; I can predict word by word the following sentence that you are uttering: "The hard problem is really hard." I would be curious to know what you think about the box solving the meta-problem just before the addendum. Do you think it is unlikely that AI would rediscover the hard problem in this setting?

The only metric natural selection is “optimizing” for is inclusive genetic fitness. It did not “try” to align humans with social status, and in many cases people care about social status to the detriment of their inclusive genetic fitness. This is a failure of alignment, not a success.

3Eli Tyre
True and important. I don't mean to imply otherwise. Evolution failed at it's "alignment goal".  If (as I'm positing here) it successfully constructed humans to be aligned to some other concept, that's not the alignment goal, and that concept, and that alignment, generalized well, that doesn't mean that evolution failed any less hard. But it does seem notable if that's what happened! Because it's some evidence about alignment generalization.

I am not so sure it will be possible to extract useful work towards solving alignment out of systems we do not already know how to carefully steer. I think that substantial progress on alignment is necessary before we know how to build things that actually want to help us advance the science. Even if we built something tomorrow that was in principle smart enough to do good alignment research, I am concerned we don’t know how to make it actually do that rather than, say, imitate more plausible-sounding but incorrect ideas. The fact that appending silly phra... (read more)

How does Harry know the name “Lucius Malfoy”?

3zyansheep
Draco Malfoy mentioned it previously :)

We aren’t surprised by HHTHHTTTHT or whatever because we perceive it as the event “a sequence containing a similar number of heads and tails in any order, ideally without a long subsequence of H or T”, which occurs frequently.

2Ape in the coat
Yep. This is essentially the point of the post.

I’m enjoying this series, and look forward to the next installment.

The thing I mean by “superintelligence” is very different from a government. A government cannot design nanotechnology, and is made of humans which value human things.

4faul_sname
The two examples everyone loves to use to demonstrate that massive top-down engineering projects can sometimes be a viable alternative to iterative design (the Manhattan Project and the Apollo Program) were both government-led initiatives, rather than single very smart people working alone in their garages. I think it's reasonable to conclude that governments have considerably more capacity to steer outcomes than individuals, and are the most powerful optimizers that exist at this time. I think restricting the term "superintelligence" to "only that which can create functional self-replicators with nano-scale components" is misleading. Concretely, that definition of "superintelligence" says that natural selection is superintelligent, while the most capable groups of humans are nowhere close, even with computerized tooling.
2[anonymous]
A government funded total effort could not design nanotechnology or are you saying because a present day nanotech rush would be accomplished a team of elite scientists and engineers and near future AI tools, it's not "the government"? (The government being made of elderly leaders and mountains of people who process administrative procedures. Were a nanotech rush to succeed it would be accomplished by a "skunkworks" style effort with nearly unlimited resources) Just kinda confused, because "the government" has not meaningfully tried in this domain. A true total effort would be at large integrated sites, it would identify potential routes to the goal and fully fund then all in parallel for redundancy. You would see rush built buildings and safety procedures and most federal laws would be waived. As I understand it, NNI essentially gives separate university labs small grants to work on "nanotechnology" which includes a broad range of topics that are unrelated to the important one of a self replicating molecular assembler. Presumably a reasonable outside view would be this effort will not develop such an assembler prior to 2100 or later. If it became known that a rival government had nearly finished a working assembler and was busy developing "kill dust" that can make any human in earth drop dead on command, you would see such an effort.

What can men do against such reckless indifference?

Can someone with more knowledge give me a sense of how new this idea is, and guess at the probability that it is onto something?

3Ilio
(Epistemic fstatus: first thoughts after first reading) Most is very standard cognitive neuroscience, although with more emphasis on some things (the subdivision of synaptic buttons into silent/modifiable/stable, notion of complex and simple cells in the visual system) than other (the critical periods, brain rhythms, iso/allo cortices, brain symetry and circuits, etc). There’s one bit or two wrong, but that’s nitpicks or my mistake. The idea of synapses as detecting frequency code is not exactly novel (it is the usual working hypothesis for some synapses in the cerebellum, although the exact code is not known I think), but the idea that it’s a general principle that works because the synapse recognize it’s own noise is either novel or not well known even within cognitive science (it might be a common idea among specialists of synaptic transmission, or original). I feel it promising, like how Hebb has the idea of it’s law.

Why are we so sure chatbots (and parrots for that matter) are not conscious? Well, maybe the word is just too slippery to define, but I would bet that parrots have some degree of subjective experience, and I am sufficiently uncertain regarding chatbots that I do worry about it slightly.

3JenniferRM
The parrot species Forpus conspicillatus have "signature calls" that parents use with babies, then the babies learn to use when they meet others, then the others use it to track the identity of the babies in greeting. This is basically an independent evolution of "personal names". Names seem to somewhat reliably arise in species with a "fission/fusion cultural pattern" where small groups form and fall apart over time, and reputations for being valuable members of teams are important to cultivate (or fake), and where detecting fakers who deserve a bad reputation is important to building strong teams. Beluga whales also have names, so the pattern has convergently evolved at least three times on Earth so far.

Please note that the graph of per capita war deaths is on a log scale. The number moves over several orders of magnitude. One could certainly make the case that local spikes were sometimes caused by significant shifts in the offense-defense balance (like tanks and planes making offense easier for a while at the beginning of WWII). These shifts are pushed back to equilibrium over time, but personally I would be pretty unhappy about, say, deaths from pandemics spiking 4 orders of magnitude before returning to equilibrium.

2gilch
Yeah, I think it's the amplitude of the swings we need to be concerned with. The supposed mean-reversion tendency (or not) is not load bearing. A big enough swing still wipes us out. Supposing we achieve AGI, someone will happen to get there first. At the very least, a human-level AI could be copied to another computer, and then you have two human-level AIs. If inference remains cheaper than training (seems likely, given how current LLMs work), then it could probably be immediately copied to thousands of computers, and you have a whole company of them. If they can figure out how to use existing compute to run themselves more efficiently, they'll probably shortly thereafter stop operating on anything like human timescales and we get a FOOM. No-one else has time to catch up.

I'd also guess the equilibrizing force is that there's a human tendency to give up after a certain percentage of deaths that holds regardless how those deaths happen. This force ceases to be relevant outside certain political motivations for war.

This random Twitter person says that it can't. Disclaimer: haven't actually checked for myself.

 


https://chat.openai.com/share/36c09b9d-cc2e-4cfd-ab07-6e45fb695bb1

Here is me playing against GPT-4, no vision required. It does just fine at normal tic-tac-toe, and figures out anti-tic-tac-toe with a little bit of extra prompting.

[anonymous]103

GPT-4 can follow the rules of tic-tac-toe, but it cannot play optimally. In fact it often passes up opportunities for wins. I've spent about an hour trying to get GPT-4 to play optimal tic-tac-toe without any success. 
 

Here's an example of GPT-4 playing sub-optimally: https://chat.openai.com/share/c14a3280-084f-4155-aa57-72279b3ea241

Here's an example of GPT-4 suggesting a bad move for me to play: https://chat.openai.com/share/db84abdb-04fa-41ab-a0c0-542bd4ae6fa1

6Rafael Harth
Gonna share mine because that was pretty funny. I thought I played optimally missing a win whoops, but GPT-4 won anyway, without making illegal moves. Sort of.

Yes. I think the title of my post is misleading (I have updated it now). I think I am trying to point at the problem that the current incentives mean we are going to mess up the outer alignment problem, and natural selection will favor the systems that we fail the hardest on.

That's a very fair response. My claim here is really about the outer alignment problem, and that if lots of people have access to the ability to create / fine tune AI agents, many agents that have goals misaligned with humanity as a whole will be created, and we will lose control of the future.

I suppose what I'm trying to point to is some form of the outer alignment problem. I think we may end up with AIs that are aligned with human organizations like corporations more than individual humans. The reason for this is that corporations or militaries which employ more ruthless AIs will, over time, accrue more power and resources. It's not so much explicit (i.e. violent) competition, but rather the gradual tendency for systems which are power-seeking and resource-maximizing to end up with more power and resources over time. If we allow for the creati... (read more)

Yeah. I think a key point that is often overlooked is that even if powerful AI is technically controllable, i.e. we solve inner alignment, that doesn't mean society will handle it safely. I think by default it looks like every company and military is forced to start using a ton of AI agents (or they will be outcompeted by someone else who does). Competition between a bunch of superhuman AIs that are trying to maximize profits or military tech seems really bad for us. We might not lose control all at once, but rather just be gradually outcompeted by machines, where "gradually" might actually be pretty quick. Basically, we die by Moloch.

5Nathan Helm-Burger
Yes, I see Moloch as my, and humanity's, primary enemy here. I think there are quite a few different plausible future paths in which Moloch rears its ugly head. The challenge, and duty, of coordination to defeat Moloch goes beyond what we think of as governance. We need coordination between AI researchers, AI alignment researchers, forecasters, politicians, investors, CEOs. We need people realizing their lives are at stake and making sacrifices and compromises to reduce the risks.
4[comment deleted]

Yeah, I've seen this video before. Still excellent.

Yeah, in general, we are pretty compute limited and should stick to good heuristics for most kinds of problems. I do think that most people rely too much on heuristics, so for the average person the useful lesson is "actually stop and think about things once in a while", but I can see how the opposite problem may sometimes arise in this community.

I find it useful to distinguish between epistemic and instrumental rationality. You're talking about instrumental rationality – and it could be instrumentally useful to convince someone of your beliefs, to teach them to think clearly, or to actively mislead them. 
Epistemic rationality, on the other hand, means trying to have true beliefs, and in this case it's better to teach someone to fish than to force them to accept your fish.

In the doomsday argument, we are the random runner. If the runner with only 10 people behind him assumed his position was randomly selected, and tried to estimate the total number of runners, he would be very wrong. We could very well be that runner near the back of the race; we weren't randomly selected to be at the back, we just are, and the fact that there are ten people behind us doesn't give us meaningful information about the total number of runners.

0ThirdSequence
Humans alive today not being a random sample can be a valid objection against the Doomsday argument but not for the reasons that you are mentioning.  You seem to be suggesting something along the lines of "Given that I am at the beginning, I cannot possibly be somewhere else. Everyone who finds themselves in the position of the first humans has a 100% chance of being in that position". However, for the Doomsday argument, your relative ranking among all humans is not the given variable but the unknown variable. Just because your ranking is fixed (you could not possibly be in any other position), does not mean that it is known and that we cannot make probabilistic statements about it. 

Okay, suppose I was born in Teenytown, a little village on the island nation of Nibblenest. The little one-room schoolhouse in Teenytown isn't very advanced, so no one ever teaches me that there are billions of other people living in all the places I've never heard of. Now, I might think to myself, the world must be very small – surely, if there were billions of people living in millions of towns and cities besides Teenytown, it would be very unlikely to be born in Teenytown; therefore, Teenytown must be one of the only villages on Earth.

Clearly, this is a... (read more)

0ThirdSequence
It seems that your understanding of the Doomsday argument is not entirely correct - at least your village example doesn't really capture the essence of the argument. Here is a different analogy: Let's imagine a marathon with an unknown number of participants. For the sake of argument, let's assume it could be a small local event or a massive international competition with billions of runners. You're trying to estimate the size of this marathon, and to assist you, the organizer picks a random runner and tells you how many participants are trailing behind them. For example, if you're told that there are only 10 runners behind the selected participant, it would be reasonable to conclude that this is likely not a marathon of billions. In such a large event, the odds of picking a runner with only 10 people behind them would be incredibly low. This logic also applies to the Doomsday Argument, whether you're an outside observer or one of the 'participants'. The key piece of information here is that you only know the number of individuals 'behind' you, which can be used to infer how likely it is that the total number of total 'participants' is more than X.  

I think the claim that we basically understand the universe is misleading. I'm especially unconvinced by your vague explanation of consciousness; I don't think we have anything close to an empirically supported mechanistic model that makes good predictions. I personally have significant uncertainty regarding what kinds of things can have subjective experiences, or why they do.

This also feels like a good opportunity to say that the Doomsday argument has never made much sense to me; it has always felt wrong to me to treat being “me” as a random sample of obs... (read more)

1ThirdSequence
Your objection against the Doomsday does not make much sense to me. The argument is simply based on the number of humans born to date (whether you are looking at it from your own perspective or not). 

Planes would not be required for stratospheric injection of SO2. It could in theory be done much more cheaply with balloons: https://caseyhandmer.wordpress.com/2023/06/06/we-should-not-let-the-earth-overheat/

Exactly, it has always felt wrong to me to treat being “me” as a random sample of observers. I couldn’t be anyone except me. If the future has trillions of humans or no humans, the person which is me will feel the same way in either case. I find the doomsday argument absurd because it treats my perspective as a random sample, which feels like a type error.

Indeed. I think about this type of thing often when I consider the concept of superhuman AI - when I spend hours stuck on a problem with a simple solution or forget something important, it’s not hard to imagine an algorithm much smarter than me that just doesn’t make those mistakes. I think the bar really isn’t that high for improving substantially on human cognition. Our brains have to operate under very strict energy constraints, but I can easily imagine a machine which performs a lot better than me by applying more costly but effective algorithms and us... (read more)

I am a literal freshman, and not feeling super optimistic about the future right now. How should I think about how to spend my time?

5Greg C
Advocate for a global moratorium on AGI. Try and buy (us all) more time. Learn the basics of AGI safety (e.g. AGI Safety Fundamentals) so you are able to discuss the reasons why we need a moratorium in detail. YMMV, but this is what I'm doing as a financially independent 42 year-old. I feel increasingly like all my other work is basically just rearranging deckchairs on the Titanic.
3Jayson_Virissimo
FWIW, if my kids were freshmen at a top college, I would advise them to continue schooling, but switch to CS and take every AI-related course that was available if they hasn't already done so.
4Matthew_Opitz
In a similar vein, I'm an historian who teaches as an adjunct instructor.  While I like my job, I am feeling more and more like I might not be able to count on this profession to make a living over the long term due to LLMs making a lot of the "bottom-rung" work in the social sciences redundant. (There will continue to be demand for top-notch research work for a while longer because LLMs aren't quite up to that yet, but that's not what I do currently).   Would there be any point in someone like me going back to college to get another 4-year degree in computer science at this moment? Or is that field just as at-risk of being made technologically-obsolete (especially the bottom rungs of the ladder)? Perhaps I should remain as an historian where, since I have about 10 years of experience in that field, I'm at least on the middle rungs of the ladder and might escape technological obsolescence if AGI gobbles up the bottom rungs. And let's say I did get a computer science degree, or even did some sort of more-focused coding boot camp type of thing.  By the time I finished my training, would my learning even remain relevant, or are things already moving too quickly to make bottom-rung coding knowledge useful?  Let's say I didn't care about making a living and just wanted to maximize my contributions to AI alignment. Would I be of more use to AI alignment by continuing my "general well-rounded public intellectual education" as an historian (especially one who dabbles in adjacent fields like economics and philosophy probably more than average), or would I be able to make greater contributions to AI alignment by becoming more technically proficient in computer science?

Hi Naomi!

The advice about applying to MIT/Stanford is probably correct, if just to have the option. That said, I definitely don't regret ending up here!

We are quite similar! I was also accepted to Harvard REA – exactly one year ago – and was too lazy mentally drained by the application process to apply to MIT after that. I arrived intending to study physics, but I've since realized AI safety is a much more important and exciting problem to work on. Seems like you got there a bit sooner than I did! HAIST is a wonderful community, and also a great resource for finding upskilling and research opportunities. 

I've only been here for a semester, so take this with a grain of salt, but I don't think you shou... (read more)