LESSWRONG
LW

407
Wei Dai
43528Ω3167148519118
Message
Dialogue
Subscribe

If anyone wants to have a voice chat with me about a topic that I'm interested in (see my recent post/comment history to get a sense), please contact me via PM.

My main "claims to fame":

  • Created the first general purpose open source cryptography programming library (Crypto++, 1995), motivated by AI risk and what's now called "defensive acceleration".
  • Published one of the first descriptions of a cryptocurrency based on a distributed public ledger (b-money, 1998), predating Bitcoin.
  • Proposed UDT, combining the ideas of updatelessness, policy selection, and evaluating consequences using logical conditionals.
  • First to argue for pausing AI development based on the technical difficulty of ensuring AI x-safety (SL4 2004, LW 2011).
  • Identified current and future philosophical difficulties as core AI x-safety bottlenecks, potentially insurmountable by human researchers, and advocated for research into metaphilosophy and AI philosophical competence as possible solutions.

My Home Page

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
10Wei Dai's Shortform
Ω
2y
Ω
309
Please, Don't Roll Your Own Metaethics
Wei Dai6hΩ220

By "metaethics," do you mean something like "a theory of how humans should think about their values"?

I feel like I've seen that kind of usage on LW a bunch, but it's atypical. In philosophy, "metaethics" has a thinner, less ambitious interpretation of answering something like, "What even are values, are they stance-independent, yes/no?"

By "metaethics" I mean "the nature of values/morality", which I think is how it's used in academic philosophy. Of course the nature of values/morality has a strong influence on "how humans should think about their values" so these are pretty closely connected, but definitionally I do try to use it the same way as in philosophy, to minimize confusion. This post can give you a better idea of how I typically use it. (But as you'll see below, this is actually not crucial for understanding my post.)

Anyway, I'm asking about this because I found the following paragraph hard to understand:

So in the paragraph that you quoted (and the rest of the post), I was actually talking about philosophical fields/ideas in general, not just metaethics. While my title has "metaethics" in it, the text of the post talks generically about any "philosophical questions" that are relevant for AI x-safety. If we substitute metaethics (in my or the academic sense) into my post, then you can derive that I mean something like this:

Different metaethics (ideas/theories about the nature of values/morality) have different implications for what AI designs or alignment approaches are safe, and if you design an AI assuming that one metaethical theory is true, it could be disastrous if a different metaethical theory actually turns out to be true.

For example, if moral realism is true, then aligning the AI to human values would be pointless. What you really need to do is design the AI to be able to determine and follow objective moral truths. But this approach would be disastrous if moral realism is actually false. Similarly, if moral noncognitivism is true, that means that humans can't be wrong about their values, and implies "how humans should think about their values" is of no importance. If you design AI under this assumption, that would be disastrous if actually humans can be wrong about their values and they really need AIs to help them think about their values and avoid moral errors.

I think in practice a lot of alignment researchers may not even have explicit metaethical theories in mind, but are implicitly making certain metaethical assumptions in their AI design or alignment approach. For example they may largely ignore the question of how humans should think about their values or how AIs should help humans think about their values, thus essentially baking in an assumption of noncognitivism.

You're conceding that morality/values might be (to some degree) subjective, but you're cautioning people from having strong views about "metaethics," which you take to be the question of not just what morality/values even are, but also a bit more ambitiously: how to best reason about them and how to (e.g.) have AI help us think about what we'd want for ourselves and others.

If we substitute "how humans/AIs should reason about values" (which I'm not sure has a name in academic philosophy but I think does fall under metaphilosophy, which covers all philosophical reasoning) into the post, then your conclusion here falls out, so yes, it's also a valid interpretation of what I'm trying to convey.

I hope that makes everything a bit clearer?

Reply1
"But You'd Like To Feel Companionate Love, Right? ... Right?"
Wei Dai11h70

Conditional on True Convergent Goodness being a thing, companionate love would not be one of my top candidates for being part of it, as it seems too parochial to (a subset of) humans. My current top candidate would be something like "maximization of hedonic experiences" with a lot of uncertainty around:

  1. Problems with consciousness/qualia.
  2. How to measure/define/compare how hedonic an experience is?
  3. Selfish vs altruistic, and a lot of subproblems around these, including identity and population ethics
  4. Does it need to be real in some sense (e.g., does being in an Experience Machine satisfy True Convergent Goodness)?
  5. Does there need to be diversity/variety or is it best to tile the universe with the same maxed out hedonic experience? (I guess if variety is part of True Convergent Goodness, then companionate love may make it in after all, indirectly.)

Other top candidates include negative or negative-leaning utilitarianism, and preference utilitarianism (although this is a distant 3rd). And a lot of credence on "something we haven't thought of yet."

Reply11
Problems I've Tried to Legibilize
Wei Dai12hΩ228

A lab leader who’s concerned enough to slow down will be pressured by investors to speed back up, or get replaced, or get outcompeted. Really you need to convince the whole lab and its investors. And you need to be more convincing than the magic of the market!

This seems to imply that lab leaders would be easier to convince if there were no investors and no markets, in other words if they had more concentrated power.

If you spread out the power of AI more, won't all those decentralized nodes of spread out AI power still have to compete with each other in markets? If market pressures are the core problem, how does decentralization solve that?

I'm concerned that your proposed solution attacks "concentration of power" when the real problem you've identified is more like market dynamics. If so, it could fail to solve the problem or make it even worse.

My own perspective is that markets are a definite problem, and concentration of power per se is more ambiguous (I'm not sure if it's good or bad). To solve AI x-safety we basically have to bypass or override markets somehow, e.g., through international agreements and government regulations/bans.

Reply
The Charge of the Hobby Horse
[+]Wei Dai13h-5-5
Wei Dai's Shortform
Wei Dai14h20

Need: A way to load all comments and posts of a user. Right now it only loads the top N by karma.

Want: A "download" button, for some users who have up to hundreds of MB of content, too unwieldy to copy/paste. Ability to collate/sort in various ways, especially as flat list of mixed posts and comments, sorted by posting date from oldest to newest.

Reply
Wei Dai's Shortform
Wei Dai16h20

Hey, it's been 6 months. Can I get an updated ETA on 5 please? If it's going to take much longer, please let me know and I'll just code up something myself.

Reply
The Charge of the Hobby Horse
Wei Dai17h50

My understanding of your position on what? Is it:

  1. Whether LW should allow unilateral author moderation at all? I've already given up on trying to convince the LW team about this. Are you saying that you want to reopen this issue? Or,
  2. Whether you endorse the specific kind of moderation that Tsvi did, namely to ban someone without warning, and then try to discuss it afterwards? I don't think I've seen the mods talk about this before, hence I have no understanding of your position and am asking about it for the first time?
Reply1
The Charge of the Hobby Horse
[+]Wei Dai18h-71
The Charge of the Hobby Horse
Wei Dai1d250

It appears from this post that the ban was itself based on a misunderstanding of my final comment. Nowhere in my comment did I say anything resembling "Anyway, let's talk about how Y is not true." with Y being "People should have been deferring to Yudkowsky as much as they did."

What I actually did was acknowledge my misunderstanding and then propose a new, related topic I thought might be interesting: the actual root causes of the deference. This was an invitation to a different conversation, which Tsvi was free to ignore.

There is no plausible interpretation of my comment as a refusal to drop the original point. The idea that I was stuck on a hobby horse that could only be stopped by a ban is directly contradicted by the text of the comment itself:

Ok, it looks like part of my motivation for going down this line of thought was based on a misunderstanding. But to be fair, in this post after you asked "What should we have done instead?" with regard to deferring to Eliezer, you didn't clearly say "we should have not deferred or deferred less", but instead wrote "We don't have to stop deferring, to avoid this correlated failure. We just have to say that we're deferring." Given that this is a case where many people could have and should have not deferred, this just seems like a bad example to illustrate "given that to some extent at the end of the day we do have to defer on many things, what can we do to alleviate some of those problems?", leading to the kind of confusion I had.

Also, another part of my motivation is still valid and I think it would be interesting to try to answer why didn't you (and others) just not defer? Not in a rhetorical sense, but what actually caused this? Was it age as you hinted earlier? Was it just human nature to want to defer to someone? Was it that you were being paid by an organization that Eliezer founded and had very strong influence over? Etc.? And also why didn't you (and others) notice Eliezer's strategic mistakes, if that has a different or additional answer?

I think there are other significant misrepresentations in his "gloss" of the thread, that I won't go into. This episode has given me quite a large aversion around engaging with Tsvi, which will inform my future participation on LW.

Reply
The Charge of the Hobby Horse
Wei Dai1d*53

Can you please remove the example involving me, or anonymize it and make it a hypothetical example? I think it's a significant misrepresentation of my words (that makes me appear more unreasonable than I was), but don't have the time/energy/interest to debate you to try to get it corrected. Edit: Since you're refusing this request, I wrote one comment to (partially) give my perspective, but will not be engaging further.

Reply
Load More
147Please, Don't Roll Your Own Metaethics
Ω
3d
Ω
46
119Problems I've Tried to Legibilize
Ω
6d
Ω
18
337Legible vs. Illegible AI Safety Problems
Ω
6d
Ω
92
71Trying to understand my own cognitive edge
12d
17
10Wei Dai's Shortform
Ω
2y
Ω
309
66Managing risks while trying to do good
2y
28
49AI doing philosophy = AI generating hands?
Ω
2y
Ω
24
228UDT shows that decision theory is more puzzling than ever
Ω
2y
Ω
56
163Meta Questions about Metaphilosophy
Ω
2y
Ω
80
34Why doesn't China (or didn't anyone) encourage/mandate elastomeric respirators to control COVID?
Q
3y
Q
15
Load More
Carl Shulman
2 years ago
Carl Shulman
2 years ago
(-35)
Human-AI Safety
2 years ago
Roko's Basilisk
7 years ago
(+3/-3)
Carl Shulman
8 years ago
(+2/-2)
Updateless Decision Theory
12 years ago
(+62)
The Hanson-Yudkowsky AI-Foom Debate
13 years ago
(+23/-12)
Updateless Decision Theory
13 years ago
(+172)
Signaling
13 years ago
(+35)
Updateless Decision Theory
14 years ago
(+22)
Load More