LESSWRONG
LW

All of Mau's Comments + Replies

Mau2y*108

I agree with parts of that. I'd also add the following (or I'd be curious why they're not important effects):

Slower takeoff -> warning shots -> improved governance (e.g. through most/all major actors getting clear[er] evidence of risks) -> less pressure to rush
(As OP argued) Shorter timelines -> China has less of a chance to have leading AI companies -> less pressure to rush

More broadly though, maybe we should be using more fine-grained concepts than "shorter timelines" and "slower takeoffs":

The salient effects of "shorter timelines"

... (read more)

2Thomas Larsen2y

Agree that this is an effect. The reason it wasn't immediately as salient is because I don't expect the governance upside to outweigh the downside of more time for competition. I'm not confident of this and I'm not going to write down reasons right now. Agree, though I think on the current margin US companies have several years of lead time on China, which is much more than they have on each other. So on the current margin, I'm more worried about companies racing each other than nations. Agreed.

Gradient hacking is extremely difficult

Mau2y10

Thanks for writing! I agree the factors this post describes make some types of gradient hacking extremely difficult, but I don't see how they make the following approach to gradient hacking extremely difficult.

Suppose that an agent has some trait which gradient descent is trying to push in direction x because the x-ness of that trait contributes to the agent’s high score; and that the agent wants to use gradient hacking to prevent this. Consider three possible strategies that the agent might try to implement, upon noticing that the x-component of the tra

... (read more)

2beren2y

I think this is only possible if the coupling between the gradient hacker's implementation of its malign behaviour and the good performance is extremely strong and essentially the correlation has to be 1. It is not like gradient descent has only one knob to turn for 'more gradient hacker' or 'less gradient hacker'. Instead, it has access to all of the internal weights of the gradient hacker and will change them to both a.) strengthen the positive aspects of the gradient hacker wrt the outer loss and b.) weaken the negative aspects. I.e. so if the gradient hacker is good at planning which is useful for the model but is malign in some other way, then gradient descent will strengthen the planning related parameters and weaken the malign ones simultaneously. The only way this fails is if there is literally no way to decouple these two aspects of the model, which I think would be very hard to maintain in practice. This is basically property 1 that gradient descent optimises all parameters in the network and leaves no slack.

The case against AI alignment

Mau2y40

Ah sorry, I meant the ideas introduced in this post and this one (though I haven't yet read either closely).

The case against AI alignment

Mau2y*2619

Thanks for posting, but I think these arguments have major oversights. This leaves me more optimistic about the extent to which people will avoid and prevent the horrible misuse you describe.

First, this post seems to overstate the extent to which people tend to value and carry out extreme torture. Maximally cruel torture fortunately seems very rare.

The post asks "How many people have you personally seen who insist on justifying some form of suffering for those they consider undesirable[?]" But "justifying some form of suffering" isn't actually an example

... (read more)

3Aaron_Scher2y

I first want to signal-boost Mauricio’s comment. My experience reading the post was that I kinda nodded along without explicitly identifying and interrogating cruxes. I’m glad that Mauricio has pointed out the crux of “how likely is human civilization to value suffering/torture”. Another crux is “assuming some expectation about how much humans value suffering, how likely are we to get a world with lots of suffering, assuming aligned ai”, another is “who is in control if we get aligned AI”, another is “how good is the good that could come from aligned ai and how likely is it”. In effect this post seems to argue “because humans have a history of producing lots of suffering, getting an AI aligned to human intent would produce an immense amount of suffering, so much that rolling the dice is worse than extinction with certainty” It matters what the probabilities are, and it matters what the goods and bars are, but this post doesn’t seem to argue very convincingly that extremely-bads are all that likely (see Mauricio’s bullet points).

3andrew sauer2y

I'll have to think about the things you say, particularly the part about support and goodwill. I am curious about what you mean by trading with other worlds?

Let’s think about slowing down AI

Mau2y154

Thanks for writing!

I want to push back a bit on the framing used here. Instead of the framing "slowing down AI," another framing we could use is, "lay the groundwork for slowing down in the future, when extra time is most needed." I prefer this latter framing/emphasis because:

An extra year in which the AI safety field has access to pretty advanced AI capabilities seems much more valuable for the field's progress (say, maybe 10x) than an extra year with current AI capabilities, since the former type of year would give the field much better opportunities t

... (read more)

-6trevor2y

Let’s think about slowing down AI

Mau2y*40

Work to spread good knowledge regarding AGI risk / doom stuff among politicians, the general public, etc. [...] Emphasizing “there is a big problem, and more safety research is desperately needed” seems good and is I think uncontroversial.

Nitpick: My impression is that at least some versions of this outreach are very controversial in the community, as suggested by e.g. the lack of mass advocacy efforts. [Edit: "lack of" was an overstatement. But these are still much smaller than they could be.]

2Steven Byrnes2y

For example, Eliezer Yudkowsky went on the Sam Harris podcast in 2018, Stuart Russell wrote an op-ed in the New York Times, Nick Bostrom wrote a book, … I dunno, do you have examples? Nobody is proposing to play a commercial about AGI doom during the Superbowl or whatever, but I think that’s less “we are opposed to the general public having an understanding of why AGI risk is real and serious” and more “buying ads would not accomplish that”, I think?

Predicting GPU performance

Mau2y10

It does, thanks! (I had interpreted the claim in the paper as comparing e.g. TPUs to CPUs, since the quote mentions CPUs as the baseline.)

Predicting GPU performance

Mau2y20

Thanks! To make sure I'm following, does optimization help just by improving utilization?

2SteveZ2y

Yeah pretty much. If you think about mapping something like matrix-multiply to a specific hardware device, details like how the data is laid out in memory, utilizing the cache hierarchy effectively, efficiently moving data around the system, etc are important for performance.

Predicting GPU performance

Mau2y30

Sorry, I'm a bit confused. I'm interpreting the 1st and 3rd paragraphs of your response as expressing opposite opinions about the claimed efficiency gains (uncertainty and confidence, respectively), so I think I'm probably misinterpreting part of your response?

3Marius Hobbhahn2y

By uncertainty I mean, I really don't know, i.e. I could imagine both very high and very low gains. I didn't want to express that I'm skeptical. For the third paragraph, I guess it depends on what you think of as specialized hardware. If you think GPUs are specialized hardware than a gain of 1000x from CPUs to GPUs sounds very plausible to me. If you think GPUs are the baseline and specialized hardware are e.g. TPUs, then a 1000x gain sounds implausible to me. My original answers wasn't that clear. Does this make more sense to you?

Predicting GPU performance

Mau2y40

This is helpful for something I've been working on - thanks!

I was initially confused about how these results could fit with claims from this paper on AI chips, which emphasizes the importance of factors other than transistor density for AI-specialized chips' performance. But on second thought, the claims seem compatible:

The paper argues that increases in transistor density have (recently) been slow enough for investment in specialized chip design to be practical. But that's compatible with increases in transistor density still being the main driver of pe

... (read more)

4SteveZ2y

Yep, I think you're right that both views are compatible. In terms of performance comparison, the architectures are quite different and so while looking at raw floating-point performance gives you a rough idea of the device's capabilities, performance on specific benchmarks can be quite different. Optimization adds another dimension entirely, for example NVIDIA has highly-optimized DNN libraries that achieve very impressive performance (as a fraction of raw floating-point performance) on their GPU hardware. AFAIK nobody is spending that much effort (e.g. teams of engineers x several months) to optimize deep learning models on CPU these days because it isn't worth the return on investment.

3Marius Hobbhahn2y

As a follow-up to building the model, I was looking into specialized AI hardware and I have to say that I'm very uncertain about the claimed efficiency gains. There are some parts of the AI training pipeline that could be improved with specialized hardware but others seem to be pretty close to their limits. We intend to understand this better and publish a piece in the future but it's currently not high on the priority list. Also, when compared to CPUs, it's no wonder that any parallelized hardware is 1000x more efficient. So it really depends on what the exact comparison is that the authors used.

The Alignment Community Is Culturally Broken

Mau2y*40

One specific concern people could have with this thoughtspace is the concern that it's hard to square with the knowledge that an AI PhD [edit: or rather, AI/ML expertise more broadly] provides. I took this point to be strongly suggested by the author's suggestions that "experts knowledgeable in the relevant subject matters that would actually lead to doom find this laughable" and that someone who spent their early years "reading/studying deep learning, systems neuroscience, etc." would not find risk arguments compelling. That's directly refuted by the surv... (read more)

The Alignment Community Is Culturally Broken

Mau2y*1310

experts knowledgeable in the relevant subject matters that would actually lead to doom find this laughable

This seems overstated; plenty of AI/ML experts are concerned. [1] [2] [3] [4] [5] [6] [7] [8] [9]

Quoting from [1], a survey of researchers who published at top ML conferences:

The median respondent’s probability of x-risk from humans failing to control AI was 10%

Admittedly, that's a far cry from "the light cone is about to get ripped to shreds," but it's also pretty far from finding those concerns laughable. [Edited to add: another recent survey ... (read more)

5jacob_cannell2y

To be clear I also am concerned, but at lower probability levels and mostly not about doom. The laughable part is the specific "our light cone is about to get ripped to shreds" by a paperclipper or the equivalent, because of an overconfident and mostly incorrect EY/LW/MIRI argument involving supposed complexity of value, failure of alignment approaches, fast takeoff, sharp left turn, etc. I of course agree with Aaro Salosensaari that many of the concerned experts were/are downstream of LW. But this also works the other way to some degree: beliefs about AI risk will influence career decisions, so it's obviously not surprising that most working on AI capability research think risk is low and those working on AI safety/alignment think the risk is greater.

5Aaro Salosensaari2y

Hyperbole aside, how many of those experts linked (and/or contributing to the 10% / 2% estimate) have arrived to their conclusion with a thought process that is "downstream" from the thoughtspace the parent commenter thinks suspect? Then it would not qualify as independent evidence or rebuttal, as it is included as the target of criticism.

Instead of technical research, more people should focus on buying time

Mau2y*50

Yep! Here's a compilation.

If someone's been following along with popular LW posts on alignment and is new to governance, I'd expect them to find the "core readings" in "weeks" 4-6 most relevant.

3Ben Pace2y

Thanks!

Instead of technical research, more people should focus on buying time

Mau2y53

I'm sympathetic under some interpretations of "a ton of time," but I think it's still worth people's time to spend at least ~10 hours of reading and ~10 hours of conversation getting caught up with AI governance/strategy thinking, if they want to contribute.

Arguments for this:

Some basic ideas/knowledge that the field is familiar with (e.g. on the semiconductor supply chain, antitrust law, immigration, US-China relations, how relevant governments and AI labs work, the history of international cooperation in the 20th century) seem really helpful for thinki

... (read more)

5habryka2y

Yeah, totally, 10 hours of reading seems definitely worth it, and like, I think many hours of conversation, if only because those hours of conversation will probably just help you think through things yourself. I also think it does make a decent amount of sense to coordinate with existing players in the field before launching new initiatives and doing big things, though I don't think it should be a barrier before you suggest potential plans, or discuss potential directions forward.

3Ben Pace2y

Do you have links to stuff you think would be worthwhile for newcomers to read?

Instead of technical research, more people should focus on buying time

Mau2y41

more researchers should backchain from “how do I make AGI timelines longer

Like you mention, "end time" seems (much) more valuable than earlier time. But the framing here, as well as the broader framing of "buying time," collapses that distinction (by just using "time" as the metric). So I'd suggest more heavily emphasizing buying end time.

One potential response is: it doesn't matter; both framings suggest the same interventions. But that seems wrong. For example, slowing down AI progress now seems like it'd mostly buy "pre-end time" (potentially by burn... (read more)

Instead of technical research, more people should focus on buying time

Mau2y*1912

Thanks for posting this!

There's a lot here I agree with (which might not be a surprise). Since the example interventions are all/mostly technical research or outreach to technical researchers, I'd add that a bunch of more "governance-flavored" interventions would also potentially contribute.

One of the main things that might keep AI companies from coordinating on safety is that some forms of coordination--especially more ambitious coordination--could violate antitrust law.
- One thing that could help would be updating antitrust law or how it's enforced so

Mau3y*20

I agree with a lot of that. Still, if

nuclear non proliferation [to the extent that it has been achieved] is probably harder than a ban on gain-of-function

that's sufficient to prove Daniel's original criticism of the OP--that governments can [probably] fail at something yet succeed at some harder thing.

(And on a tangent, I'd guess a salient warning shot--which the OP was conditioning on--would give the US + China strong incentives to discourage risky AI stuff.)

Warning Shots Probably Wouldn't Change The Picture Much

Mau3y30

I agree it's some evidence, but that's a much weaker claim than "probably policy can't deliver the wins we need."

Warning Shots Probably Wouldn't Change The Picture Much

Mau3y10

An earlier comment seems to make a good case that there's already more community investment in AI policy, and another earlier thread points out that the content in brackets doesn't seem to involve a good model of policy tractability.

4Vaniver3y

There was already a moratorium on funding GoF research in 2014 after an uproar in 2011, which was not renewed when it expired. There was a Senate bill in 2021 to make the moratorium permanent (and, I think, more far-reaching, in that institutions that did any such research were ineligible for federal funding, i.e. much more like a ban on doing it at all than simply deciding not to fund those projects) that, as far as I can tell, stalled out. I don't think this policy ask was anywhere near as crazy as the AI policy asks that we would need to make the AGI transition survivable! It sounds like you're arguing "look, if your sense of easy and hard is miscalibrated, you can't reason by saying 'if they can't do easy things, then they can't do hard things'," which seems like a reasonable criticism on logical grounds but not probabilistic ones. Surely not being able to do things that seem easy is evidence that one's not able to do things that seem hard?

Warning Shots Probably Wouldn't Change The Picture Much

Mau3y32

Perhaps the sorts of government interventions needed to make AI go well are not all that large, and not that precise.

I confess I don't really understand this view.

Specifically for the sub-claim that "literal global cooperation" is unnecessary, I think a common element of people's views is that: the semiconductor supply chain has chokepoints in a few countries, so action from just these few governments can shape what is done with AI everywhere (in a certain range of time).

Warning Shots Probably Wouldn't Change The Picture Much

Mau3y*Ω2115

I'd guess the very slow rate of nuclear proliferation has been much harder to achieve than banning gain-of-function research would be, since, in the absence of intervention, incentives to get nukes would have been much bigger than incentives to do gain-of-function research.

Also, on top of the taboo against chemical weapons, there was the verified destruction of most chemical weapons globally.

3LawrenceC3y

I agree that nuclear non proliferation is probably harder than a ban on gain-of-function. But in this case, the US + USSR both had a strong incentive to discourage nuclear proliferation, and had enough leverage to coerce smaller states to not work on nuclear weapon development (e.g one or the other were the security provider for the current government of said states). Ditto with chemical weapons, which seem to have lost battlefield relevance to conflicts between major powers (ie it did not actually break the trench warfare stalemate in WWI even when deployed on a massive scale, and is mainly useful as a weapon of terror against weaker opponents). At this point, the moral arguments + downside risk of chemical attacks vs their own citizens shifted the calculus for major powers. Then the major powers were able to enforce the ban somewhat successfully on smaller countries. I do think that banning GoF (especially on pathogens that already have or are likely to cause a human pandemic) should be ~ as hard as the chemical weapons case---there's not much benefit to doing it, and the downside risk is massive. My guess is a generally sane response to COVID is harder, since it required getting many things right, though I think the median country's response seems much worse than the difficulty of the problem would lead you to believe. Unfortunately, I think that AGI relevant research has way more utility than many of the military technologies that we've failed to ban. Plus, they're super financially profitable, instead of being expensive to maintain. So the problem for AGI is harder than the problems we've really seen solved via international coordination?

Slowing down AI progress is an underexplored alignment strategy

Mau3y*4-2

Thanks for the post - I think there are some ways heavy regulation of AI could be very counterproductive or ineffective for safety:

If AI progress slows down enough in countries were safety-concerned people are especially influential, then these countries (and their companies) will fall behind in international AI development. This would eliminate much/most of safety-concerned people's opportunities for impacting how AI goes.
If China "catches up" to the US in AI (due to US over-regulation) when AI is looking increasingly economically and militarily import

Mau3y10

My problem is that most of the scenarios I see being discussed are dependent on a long chain of assumptions being true and they often seem to ignore that many things could go wrong, invalidating the full thing: you don't need to be wrong in all those steps, one of them is just enough.

This feels a bit like it might be shifting the goalposts; it seemed like your previous comment was criticizing a specific argumentative step ("reasons not to believe in doom: [...] Orthogonality of intelligence and agency"), rather than just pointing out that there were m... (read more)

3ChristianKl3y

With the current transformer models we see that once a model is trained not only direct copies of it are created but also derivates that are smaller and potentially trained to be able to be better at a task. Just like human cognitive diversity is useful to act in the world it's likely also more effective to have slight divergence in AGI models.

Convince me that humanity *isn’t* doomed by AGI

Mau3y*30

Orthogonality of intelligence and agency. I can envision a machine with high intelligence and zero agency, I haven't seen any convincing argument yet of why both things must necessarily go together

Hm, what do you make of the following argument? Even assuming (contestably) that intelligence and agency don't in principle need to go together, in practice they'll go together because there will appear to be strong economic or geopolitical incentives to build systems that are both highly intelligent and highly agentic (e.g., AI systems that can run teams). ... (read more)

1mukashi3y

Yes, that might be true. It can also be true that. there are really no limits to the things that can be planned, It can also be true that the machine does really want to kill us all for some reason. My problem, in general, is not that AGI doom cannot happen. My problem is that most of the scenarios I see being discussed are dependent on a long chain of assumptions being true and they often seem to ignore that many things could go wrong, invalidating the full thing: you don't need to be wrong in all those steps, one of them is just enough. This is fantastic, you just formulated a new reason: 5. The different AGIs might find it hard/impossible to coordinate. The different AGIs might even be in conflict with one another

Mau3y100

So we need a way to have alignment deployed throughout the algorithmic world before anyone develops AGI. To do this, we'll start by offering alignment as a service for more limited AIs.

I'm tentatively fairly excited about some version of this, so I'll suggest some tweaks that can hopefully be helpful for your success (or for the brainstorming of anyone else who's thinking about doing something similar in the future).

We will refine and develop this deployment plan, depending on research results, commercial opportunities, feedback, and suggestions.

I s... (read more)

4Tony Barrett3y

Adding on to Mauricio's idea: Also explore partnering with companies that offer a well-recognized, high-quality package of mainstream "trustworthy AI services" -- e.g., addressing bias, privacy issues, and other more mainstream concerns -- where you have comparative advantage on safety/alignment concerns and they have comparative advantage on the more mainstream concerns. Together with a partner, you could provide a more comprehensive offering. (That's part of the value proposition for them. Also, of course, be sure to highlight the growing importance of safety/alignment issues, and the expertise you'd bring.) Then you wouldn't have to compete in the areas where they have comparative advantage.

4Stuart_Armstrong3y

Thanks for the ideas! We'll think on them.

What failure looks like

Mau3y20

A more recent clarification from Paul Christiano, on how Part 1 might get locked in / how it relates to concerns about misaligned, power-seeking AI:

I also consider catastrophic versions of "you get what you measure" to be a subset/framing/whatever of "misaligned power-seeking." I think misaligned power-seeking is the main way the problem is locked in.

My Overview of the AI Alignment Landscape: Threat Models

Mau3y*Ω0110

I'm still pretty confused by "You get what you measure" being framed as a distinct threat model from power-seeking AI (rather than as another sub-threat model). I'll try to address two defenses of that (of framing them as distinct threat models) which I interpret this post as suggesting (in the context of this earlier comment on the overview post). Broadly, I'll be arguing that: power-seeking AI is necessary for "you get what you measure" issues posing existential threats, so "you get what you measure" concerns are best thought of as a sub-threat model of ... (read more)

paulfchristiano3yΩ7120

I'm still pretty confused by "You get what you measure" being framed as a distinct threat model from power-seeking AI (rather than as another sub-threat model)

I also consider catastrophic versions of "you get what you measure" to be a subset/framing/whatever of "misaligned power-seeking." I think misaligned power-seeking is the main way the problem is locked in.

To a lesser extent, "you get what you measure" may also be an obstacle to using AI systems to help us navigate complex challenges without quick feedback, like improving governance. But I don't think... (read more)

Zvi’s Thoughts on the Survival and Flourishing Fund (SFF)

Mau3y*140

In my model, one should be deeply skeptical whenever the answer to ‘what would do the most good?’ is ‘get people like me more money and/or access to power.’ One should be only somewhat less skeptical when the answer is ‘make there be more people like me’ or ‘build and fund a community of people like me.’ [...] I wish I had a better way to communicate what I find so deeply wrong here

I'd be very curious to hear more fleshed-out arguments here, if you or others think of them. My best guess about what you have in mind is that it's a combination of the follo... (read more)

Zvi3y150

This is a good first attempt and it is directionally correct as to what my concerns are.

The big difference is something like your apparent instinct that these problems are practical and avoidable, limited in scope and only serious if you go 'all-in' on power or are 'doing it wrong' in some sense.

Whereas my model says that these problems are unavoidable even under the best of circumstances and at best you can mitigate them, the scope of the issue is sufficient to reverse the core values of those involved and the core values being advanced by groups in... (read more)

Self-Integrity and the Drowning Child

Mau3y*442

I agree with and appreciate the broad point. I'll pick on one detail because I think it matters.

this whole parable of the drowning child, was set to crush down the selfish part of you, to make it look like you would be invalid and shameful and harmful-to-others if the selfish part of you won [...]

It is a parable calculated to set at odds two pieces of yourself... arranging for one of them to hammer down the other in a way that would leave it feeling small and injured and unable to speak in its own defense.

This seems uncharitable? Singer's thought ex... (read more)

-1Alex Vermillion3y

Peter Singer and Keltham live in different worlds; someone else devised the story there.