LESSWRONG
LW

280
Richard_Ngo
20304Ω288317111420
Message
Dialogue
Subscribe

Formerly alignment and governance researcher at DeepMind and OpenAI. Now independent.

Sequences

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No wikitag contributions to display.
Twitter threads
Understanding systematization
Stories
Meta-rationality
Replacing fear
Shaping safer goals
AGI safety from first principles
6Richard Ngo's Shortform
Ω
6y
Ω
457
Status Is The Game Of The Losers' Bracket
Richard_Ngo5h60

The people I instinctively checked after reading this:

  • Pichai: 5'11
  • Gates: 5'10
  • Ballmer: 6'5
  • I got conflicting estimates for Jobs and Nadella
Reply
AI safety undervalues founders
Richard_Ngo2d40

See my post on pessimization.

Reply1
AI safety undervalues founders
Richard_Ngo2d187

A few quick comments, on the same theme as but mostly unrelated to the exchange so far:

  1. I'm not very sold on "cares about xrisk" as a key metric for technical researchers. I am more interested in people who want to very deeply understand how intelligence works (whether abstractly or in neural networks in particular). I think the former is sometimes a good proxy for the latter but it's important not to conflate them. See this post for more.
  2. Having said that, I don't get much of a sense that many MATS scholars want to deeply understand how intelligence works. When I walked around the poster showcase at the most recent iteration of MATS, a large majority of the projects seemed like they'd prioritized pretty "shallow" investigations. Obviously it's hard to complete deep scientific work in three months but at least on a quick skim I didn't see many projects that seemed like they were even heading in that direction. (I'd cite Tom Ringstrom as one example of a MATS scholar who was trying to do deep and rigorous work, though I also think that his core assumptions are wrong.)
  3. As one characterization of an alternative approach: my intership with Owain Evans back in 2017 consisted of me basically sitting around and thinking about AI safety for three months. I had some blog posts as output but nothing particularly legible. I think this helped nudge me towards thinking more deeply about AI safety subsequently (though it's hard to assign specific credit).
  4. There's an incentive alignment problem where even if mentors want scholars to spend their time thinking carefully, the scholars' careers will benefit most from legible projects. In my most recent MATS cohort I've selected for people who seem like they would be happy to just sit around and think for the whole time period without feeling much internal pressure to produce legible outputs. We'll see how that goes.
Reply
Put numbers on stuff, all the time, otherwise scope insensitivity will eat you
Richard_Ngo2d95

At some point I recall thinking to myself "huh, LessWrong is really having a surge of good content lately". Then I introspected and realized that about 80% of that feeling was just that you've been posting a lot.

Reply1
Please, Don't Roll Your Own Metaethics
Richard_Ngo3dΩ163017

"Please don't roll your own crypto" is a good message to send to software engineers looking to build robust products. But it's a bad message to send to the community of crypto researchers, because insofar as they believe you, then you won't get new crypto algorithms from them.

In the context of metaethics, LW seems much more analogous to the "community of crypto researchers" than the "software engineers looking to build robust products". Therefore this seems like a bad message to send to LessWrong, even if it's a good message to send to e.g. CEOs who justify immoral behavior with metaethical nihilism.

Reply4
The Charge of the Hobby Horse
Richard_Ngo3d84

FWIW, in case this is helpful, my impression is that:

  • It is accurate to describe Wei as doing a "charge of the hobby-horse" in his initial comment, and this should be considered a mild norm violation. I'm also surprised and a bit disappointed that it got so many upvotes.
  • By the time that Tsvi announced the ban, Wei had already acknowledged that his original comments had been partly based on a misunderstanding. In my culture, I would expect more of an apology for doing so than the "ok...but to be fair" follow-up Wei actually gave. However, the phrase "Also, another part of my motivation is still valid and I think it would be interesting to try to answer" is a clear enough acknowledgement of a distinct line of inquiry that I no longer consider that comment to be a continuation of the "charge of the hobby-horse".
  • Tsvi banning Wei for "grossly negligent reading comprehension" after Wei had acknowledged that he was mistaken seems like a mild norm violation. It wouldn't have been a norm violation if Wei's comment hadn't made that acknowledgement; however, it would have been a stronger norm violation if Wei's comment had included an actual apology.
Reply
Wei Dai's Shortform
Richard_Ngo3d20

This has pretty low argumentative/persuasive force in my mind.

Note that my comment was not optimized for argumentative force about the overarching point. Rather, you asked how they "can" still benefit the world, so I was trying to give a central example.

In the second half of this comment I'll give a couple more central examples of how virtues can allow people to avoid the traps you named. You shouldn't consider these to be optimized for argumentative force either, because they'll seem ad-hoc to you. However, they might still be useful as datapoints.

Figuring out how to describe the underlying phenomenon I'm pointing at in a compelling, non-ad-hoc way is one of my main research focuses. The best I can do right now is to say that many of the ways in which people produce outcomes which are harmful (by their own lights) seem to arise from a handful of underlying dynamics. I call this phenomenon pessimization. One way in which I'm currently thinking about virtues is as a set of cognitive tools for preventing pessimization. As one example, kindness and forgiveness help to prevent cycles of escalating conflict with others, which is a major mechanism by which people's values get pessimized. This one is pretty obvious to most people; let me sketch out some less obvious mechanisms below.

what if someone isn't smart enough to come up with a new line of illegible research, but does see some legible problem with an existing approach that they can contribute to? What would cause them to avoid this?

This actually happened to me: when I graduated from my masters I wasn't cognitively capable of coming up with new lines of illegible alignment research, in part because I was too status-seeking. Instead I went to work at DeepMind, and ended up spending a lot of my time working on RLHF, which is a pretty central example of a "legible" line of research.

However, I also wasn't cognitively capable of making much progress on RLHF, because I couldn't see how it addressed the core alignment problem, and so it didn't seem fundamental enough to maintain my interest. Instead I spent most of my time trying to understand the alignment problem philosophically (resulting in this sequence) at the expense of my promotion prospects.

In this case I think I had the virtue of deep curiosity, which steered my attention towards illegible problems even though my top-down plan was to contribute to alignment by doing RLHF research. These days, whatever you might think of my research, few people complain that it's too legible.

There are other possible versions of me who had that deep curiosity but weren't smart enough to have generated a research agenda like my current one; however, I think they would still have left DeepMind, or at least not been very productive on RLHF.

And even the hypothetical virtuous person who starts doing illegible research on their own, what happens when other people catch up to him and the problem becomes legible to leaders/policymakers? How would they know to stop working on that problem and switch to another problem that is still illegible?

When a field becomes crowded, there's a pretty obvious inference that you can make more progress by moving to a less crowded field. I think people often don't draw that inference because moving to a less crowded field loses them prestige, is emotionally/financially risky, etc. Virtues help remove those blockers.

Reply
Paranoia: A Beginner's Guide
Richard_Ngo3d*61

though I think you don't need to invoke knightian uncertainty. I think it's simply enough to model there being a very large attack surface combined with a more intelligent adversary.

One of the problems I'm pointing to is that you don't know what the attack surface is. This puts you in a pretty different situation than if you have a known large attack surface to defend, even against a smarter adversary (e.g. the whole length of a border; or every possible sequence of Go moves).

Separately, I may be being a bit sloppy by using "Knightian uncertainty" as a broad handle for cases where you have important "unknown unknowns", aka you don't even know what ontology to use. But it feels close enough that I'm by default planning to continue describing the research project outlined above as trying to develop a theory of Knightian uncertainty in which Bayesian uncertainty is a special case.

Reply
Paranoia: A Beginner's Guide
Richard_Ngo3d50

I also have a short story about (some aspects of) paranoia from the inside.

Reply
Paranoia: A Beginner's Guide
Richard_Ngo3d52

Fair point. Let me be more precise here.

Both the market for lemons in econ and adverse selection in trading are simple examples of models of adversarial dynamics. I would call these non-central examples of paranoia insofar as you know the variable about which your adversary is hiding information (the quality of the car/the price the stock should be). This makes them too simple to get at the heart of the phenomenon.

I think Habyrka is gesturing at something similar in his paragraph starting "All that said, in reality, navigating a lemon market isn't too hard." And I take him to be gesturing at a more central description of paranoia in his subsequent description: "What do you do in a world in which there are not only sketchy used car salesmen, but also sketchy used car inspectors, and sketchy used car inspector rating agencies, or more generally, competent adversaries who will try to predict whatever method you will use to orient to the world, and aim to subvert it for their own aims?"

This is similar to my criticism of maximin as a model of paranoia: "It's not actually paranoid in a Knightian way, because what if your adversary does something that you didn't even think of?"

Here's a gesture at making this more precise: what makes something a central example of paranoia in my mind is when even your knowledge of how your adversary is being adversarial is also something that has been adversarially optimized. Thus chess is not a central example of paranoia (except insofar as your opponent has been spying on your preparations, say) and even markets for lemons aren't a central example (except insofar as buyers weren't even tracking that dishonesty was a strategy sellers might use—which is notably a dynamic not captured by the economic model).

Reply21
Load More
58Book Announcement: The Gentle Romance
9d
0
3521st Century Civilization curriculum
1mo
10
169Underdog bias rules everything around me
3mo
53
61On Pessimization
3mo
3
64Applying right-wing frames to AGI (geo)politics
4mo
25
35Well-foundedness as an organizing principle of healthy minds and societies
7mo
7
99Third-wave AI safety needs sociopolitical thinking
8mo
23
97Towards a scale-free theory of intelligent agency
Ω
8mo
Ω
46
92Elite Coordination via the Consensus of Power
8mo
15
253Trojan Sky
8mo
39
Load More