Wei Dai

I think I need more practice talking with people in real time (about intellectual topics). (I've gotten much more used to text chat/comments, which I like because it puts less time pressure on me to think and respond quickly, but I feel like I now incur a large cost due to excessively shying away from talking to people, hence the desire for practice.) If anyone wants to have a voice chat with me about a topic that I'm interested in (see my recent post/comment history to get a sense), please contact me via PM.

Posts

Sorted by New

10Wei Dai's Shortform

139

10Wei Dai's Shortform

139

61Managing risks while trying to do good

46AI doing philosophy = AI generating hands?

218UDT shows that decision theory is more puzzling than ever

161Meta Questions about Metaphilosophy

34Why doesn't China (or didn't anyone) encourage/mandate elastomeric respirators to control COVID?

54How to bet against civilizational adequacy?

5AI ethics vs AI alignment

114A broad basin of attraction around human values?

227Morality is Scary

116

Wikitag Contributions

(-35)

(+3/-3)

(+2/-2)

Updateless Decision Theory

12y

(+62)

The Hanson-Yudkowsky AI-Foom Debate

12y

(+23/-12)

Updateless Decision Theory

12y

(+172)

Signaling

12y

(+35)

Updateless Decision Theory

14y

(+22)

Comments

Sorted by

Newest

Wei Dai's Shortform

Wei Dai10h20

This is part of why I'm less sold on "careful philosophical reasoning" as the key thing. Indeed, wanting to "commit prematurely to a specific, detailed value system" is historically very correlated with intellectualism (e.g. elites tend to be the rabid believers in communism, libertarianism, religion, etc—a lot of more "normal" people don't take it that seriously even when they're nominally on board). And so it's very plausible that the thing we want is less philosophy, because (like, say, asteroid redirection technology) the risks outweigh the benefits.

Here, you seem to conflate "careful philosophical reasoning" with intellectualism and philosophy in general. But in an earlier comment, I tried to draw a distinction between careful philosophical reasoning and the kind of hand-wavy thinking that has been called "philosophy" in most times and places. You didn't respond to it in that thread... did you perhaps miss it?

More substantively, Eliezer talked about the Valley of Bad Rationality, and I think there's probably something like that for philosophical thinking as well, which I admit definitely complicates the problem. I'm not going around and trying to push random people "into philosophy", for example.

If you take your interim strategy seriously (but set aside x-risk) then I think you actually end up with something pretty similar to the main priorities of classic liberals: prevent global lock-in (by opposing expansionist powers like the Nazis), prevent domestic political lock-in (via upholding democracy), prevent ideological lock-in (via supporting free speech), give our descendants more optionality (via economic and technological growth). I don't think this is a coincidence—it just often turns out that there are a bunch of heuristics that are really robustly good, and you can converge on them from many different directions.

Sure, there's some overlap on things like free speech and preventing lock-in. But calling it convergence feels like a stretch. One of my top priorities is encouraging more people to base their moral evolution on careful philosophical reasoning instead of random status games. That's pretty different from standard classical liberalism. Doesn't this big difference suggest the other overlaps might just be coincidence? Have you explained your reasons anywhere for thinking it's not a coincidence and that these heuristics are robust enough on their own, without grounding in some explicit principle like "normative option value" that could be used to flexibly adjust the heuristics according to the specific circumstances?

Yes, but also: it's very plausible to me that the net effect of LessWrong-inspired thinking on AI x-risk has been and continues to be negative.

I think this is plausible too, but want to attribute it mostly to insufficiently careful thinking and playing other status games. I feel like with careful enough thinking and not being distracted/influenced by competing motivations, a lot of the negative effects could have been foreseen and prevented. For example, did you know that Eliezer/MIRI for years pursued a plan of racing to build the first AGI and making it aligned (Friendly), which I think inspired/contributed (via the founding of DeepMind) to the current crop of AI labs and their AI race, and that I had warned him at the time (in a LW post or comment) that the plan was very unlikely to succeed and would probably backfire this way?

Also, I would attribute Sam and Elon's behavior not to mental health issues, but to (successfully) playing their own power/status game, with "not trusting Google / each other" just a cover for wanting to be the hero that saves the world, which in turn is just a cover for grabbing power and status. This seems perfectly reasonable and parsimonious from an evolutionary psychology perspective, and I don't see why we need to hypothesize mental health issues to explain what they did.

Wei Dai's Shortform

Wei Dai20h20

Ok, I see where you're coming from, but think you're being overconfident about non-cognitivism. My current position is that non-cognitivism is plausible, but we can't be very sure that it is true, and making progress on this meta-ethical question also requires careful philosophical reasoning. These two posts of mine are relevant on this topic: Six Plausible Meta-Ethical Alternatives , Some Thoughts on Metaphilosophy

Wei Dai's Shortform

Wei Dai2d61

None of these seem as crucial as careful philosophical reasoning, because moral progress is currently not bottlenecked on any of them (except possibly the last item, which I do not know the contents of). To explain more, I think the strongest conclusion from careful philosophical reasoning so far is that we are still very far from knowing what normativity (decision theory and values, or more generally rationality and morality) consists of, and therefore the most important thing right now is to accumulate and preserve normative option value (the ability to eventually do the best thing with the most resources).

What is blocking this "interim morality" from being more broadly accepted? I don't think it's lack of either political activism (plenty of people in free societies also don't care about preserving normative option value), neuroscience/psychology (how would it help at this point?), or introspection + emotional health (same question, how would it help?), but just that the vast majority of people do not care about trying to figure out normativity via careful philosophical reasoning, and instead are playing status games with other focal points.

<summary>Here's a longer, more complete version of my argument, written by Gemini 2.5 Pro after some back and forth. Please feel free to read or ignore (if my own writing above seems clear enough).</summary>

Goal: The ultimate aim is moral progress, which requires understanding and implementing correct normativity (how to decide, what to value).
Primary Tool: The most fundamental tool we have for figuring out normativity at its roots is careful, skeptical philosophical reasoning. Empirical methods (like neuroscience) can inform this, but the core questions (what should be, what constitutes a good reason) are philosophical.
Current Philosophical State: The most robust conclusion from applying this tool carefully so far is that we are deeply uncertain about the content of correct normativity. We haven't converged on a satisfactory theory of value or decision theory. Many plausible-seeming avenues have deep problems.
Rational Response to Uncertainty & Its Urgent Implication:
- Principle: In the face of such profound, foundational uncertainty, the most rational interim strategy isn't to commit prematurely to a specific, detailed value system (which is likely wrong), but to preserve and enhance optionality. This means acting in ways that maximize the chances that whatever the correct normative theory turns out to be, we (or our successors) will be in the best possible position (knowledge, resources, freedom of action) to understand and implement it. This is the "preserve normative option value" principle.
- Urgent Application: Critically, the most significant threats to preserving this option value today are existential risks (e.g., from unaligned AI, pandemics, nuclear war) which could permanently foreclose any desirable future. Therefore, a major, urgent practical consequence of accepting the principle of normative option value is the prioritization of mitigating these existential risks.
The Current Bottleneck: Moral progress on the most critical front is primarily stalled because this philosophical conclusion (deep uncertainty) and its strategic implication (preserve option value)—especially its urgent consequence demanding the prioritization of x-risk mitigation—are not widely recognized, accepted, or acted upon with sufficient seriousness or resources.
Why Other Factors Aren't the Primary Strategic Bottleneck Now:
- Politics: Free societies exist where discussion could happen, yet this conclusion isn't widely adopted within them. The bottleneck isn't solely the lack of freedom, but the lack of focus on this specific line of reasoning and its implications.
- Neuroscience/Psychology: While useful eventually, understanding the brain's mechanisms doesn't currently resolve the core philosophical uncertainty or directly compel the strategic focus on option value / x-risk. The relevant insight is primarily conceptual/philosophical at this stage.
- Introspection/Emotional Health: While helpful, the lack of focus on option value / x-risk isn't plausibly primarily caused by a global deficit in emotional health preventing people from grasping the concept. It's more likely due to lack of engagement with the specific philosophical arguments, different priorities, and incentive structures.
- Directness: Furthermore, addressing the conceptual bottleneck around option value and its link to x-risk seems like a more direct path to potentially shifting priorities towards mitigating the most pressing dangers quickly, compared to the slower, more systemic improvements involved in fixing politics, cognition, or widespread emotional health.

</details>

Edit: Hmm, <details> doesn't seem to work in Markdown and I don't know how else to write collapsible sections in Markdown, and I can't copy/paste the AI content correctly in Docs mode. Guess I'll leave it like this for now until the LW team fixes things.

Wei Dai's Shortform

Wei Dai2d62

The One True Form of Moral Progress (according to me) is using careful philosophical reasoning to figure out what our values should be, what morality consists of, where our current moral beliefs are wrong, or generally, the contents of normativity (what we should and shouldn't do). Does this still seem wrong to you?

The basic justification for this is that for any moral "progress" or change that is not based on careful philosophical reasoning, how can we know that it's actually a change for the better? I don't think I've written a post specifically about this, but Morality is Scary is related, in that it complains that most other kinds of moral change seem to be caused by status games amplifying random aspects of human values or motivation.

Richard Ngo's Shortform

Wei Dai3d71

I'm not sure that fear or coercion has much to do with it, because there's often no internal conflict when someone is caught up in some extreme form of the morality game, they're just going along with it wholeheartedly, thinking they're just being a good person or helping to advance the arc of history. In the subagents frame, I would say that the subagents have an implicit contract/agreement that any one of them can seize control, if doing so seems good for the overall agent in terms of power or social status.

But quite possibly I'm not getting your point, in which case please explain more, or point to some specific parts of your articles that are especially relevant?

Wei Dai's Shortform

Wei Dai3d137

My early posts on LW often consisted of pointing out places in the Sequences where Eliezer wasn't careful enough. Shut Up and Divide? and Boredom vs. Scope Insensitivity come to mind. And of course that's not the only way to gain status here - the big status awards are given for coming up with novel ideas and backing them up with carefully constructed arguments.

Wei Dai's Shortform

Wei Dai3d6121

To branch off the line of thought in this comment, it seems that for most of my adult life I've been living in the bubble-within-a-bubble that is LessWrong, where the aspect of human value or motivation that is the focus of our signaling game is careful/skeptical inquiry, and we gain status by pointing out where others haven't been careful or skeptical enough in their thinking. (To wit, my repeated accusations that Eliezer and the entire academic philosophy community tend to be overconfident in their philosophical reasoning, don't properly appreciate the difficulty of philosophy as an enterprise, etc.)

I'm still extremely grateful to Eliezer for creating this community/bubble, and think that I/we have lucked into the One True Form of Moral Progress, but must acknowledge that from the outside, our game must look as absurd as any other niche status game that has spiraled out of control.

Richard Ngo's Shortform

Wei Dai4d*3310

How would this ideology address value drift? I've been thinking a lot about the kind quoted in Morality is Scary. The way I would describe it now is that human morality is by default driven by a competitive status/signaling game, where often some random or historically contingent aspect of human value or motivation becomes the focal point of the game, and gets magnified/upweighted as a result of competitive dynamics, sometimes to an extreme, even absurd degree.

(Of course from the inside it doesn't look absurd, but instead feels like moral progress. One example of this that I happened across recently is filial piety in China, which became more and more extreme over time, until someone cutting off a piece of their flesh to prepare a medicinal broth for an ailing parent was held up as a moral exemplar.)

Related to this is my realization is that the kind of philosophy you and I are familiar with (analytical philosophy, or more broadly careful/skeptical philosophy) doesn't exist in most of the world and may only exist in Anglophone countries as a historical accident. There, about 10,000 practitioners exist who are funded but ignored by the rest of the population. To most of humanity, "philosophy" is exemplified by Confucius (morality is everyone faithfully playing their feudal roles) or Engels (communism, dialectical materialism). To us, this kind of "philosophy" is hand waving and make things up out of thin air, but to them, philosophy is learned from a young age and unquestioned. (Or if questioned, they're liable to jump to some other equally hand-wavy "philosophy" like China's move from Confucius to Engels.)

Empowering a group like this... are you sure that's a good idea? Or perhaps you have some notion of "empowerment" in mind that takes these issues into account already and produces a good outcome anyway?

You can, in fact, bamboozle an unaligned AI into sparing your life

Wei Dai6mo50

If you only care about the real world and you're sure there's only one real world, then the fact that you at time 0 would sometimes want to bind yourself at time 1 (e.g., physically commit to some action or self-modify to perform some action at time 1) seems very puzzling or indicates that something must be wrong, because at time 1 you're in a strictly better epistemic position, having found out more information about which world is real, so what sense does it make that your decision theory makes you-at-time-0 decide to override you-at-time-1's decision?

(If you believed in something like Tegmark IV but your values constantly change to only care about the subset of worlds that you're in, then time inconsistency, and wanting to override your later selves, would make more sense, as your earlier self and later self would simply have different values. But it seems counterintuitive to be altruistic this way.)

Mark Xu's Shortform

Wei Dai6moΩ307140

Better control solutions make AI more economically useful, which speeds up the AI race and makes it even harder to do an AI pause.

When we have controlled unaligned AIs doing economically useful work, they probably won't be very useful for solving alignment. Alignment will still be philosophically confusing, and it will be hard to trust the alignment work done by such AIs. Such AIs can help solve some parts of alignment problems, parts that are easy to verify, but alignment as a whole will still be bottle-necked on philosophically confusing, hard to verify parts.

Such AIs will probably be used to solve control problems for more powerful AIs, so the basic situation will continue and just become more fragile, with humans trying to control increasingly intelligent unaligned AIs. This seems unlikely to turn out well. They may also persuade some of us to trust their alignment work, even though we really shouldn't.

So to go down this road is to bet that alignment has no philosophically confusing or hard to verify parts. I see some people saying this explicitly in the comments here, but why do they think that? How do they know? (I'm afraid that some people just don't feel philosophically confused about much of anything, and will push forward on that basis.) But you do seem to worry about philosophical problems, which makes me confused about the position you take here.

BTW I have similar objections to working on relatively easy forms of (i.e., unscalable) alignment solutions, and using the resulting aligned AIs to solve alignment for more powerful AIs. But at least there, one might gain some insights into the harder alignment problems from working on the easy problems, potentially producing some useful strategic information or making it easier to verify future proposed alignment solutions. So while I don't think that's a good plan, this plan seems even worse.