LESSWRONG
LW

453
Wei Dai
41709Ω2885144508518
Message
Dialogue
Subscribe

If anyone wants to have a voice chat with me about a topic that I'm interested in (see my recent post/comment history to get a sense), please contact me via PM.

My main "claims to fame":

  • Created the first general purpose open source cryptography programming library (Crypto++, 1995).
  • Published one of the first descriptions of a cryptocurrency based on a distributed public ledger (b-money, 1998), predating Bitcoin.
  • Proposed UDT, combining the ideas of updatelessness, policy selection, and evaluating consequences using logical conditionals.
  • First to argue for pausing AI development based on the technical difficulty of ensuring AI x-safety (SL4 2004, LW 2011).
  • Identified current and future philosophical difficulties as core AI x-safety bottlenecks, potentially insurmountable by human researchers, and advocated for research into metaphilosophy and AI philosophical competence as possible solutions.

My Home Page

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
10Wei Dai's Shortform
Ω
2y
Ω
203
Wei Dai's Shortform
Wei Dai5h20

It's based on the idea that Keju created a long-term selective pressure for intelligence.

  • The exams selected for heritable cognitive traits.
  • Success led to positions in the imperial government, and therefore power and wealth.
  • Power and wealth allowed for more wives, concubines, food, resources, and many more surviving children than the average person, which was something many Chinese consciously aimed for. (Note that this is very different from today's China or the West, where cultural drift/evolution has much reduced or completely eliminated people's desires to translate wealth into more offspring.)
Reply
Wei Dai's Shortform
Wei Dai6h00

(The following is written by AI (Gemini 2.5 Pro) but I think it correctly captured my position.)

You're right to point out that I'm using a highly stylized and simplified model of "Chinese civilization." The reality, with its dynastic cycles, periods of division, and foreign rule, was far messier and more brutal than my short comment could convey.

My point, however, isn't about a specific, unbroken political entity. It's about a civilizational attractor state. The remarkable thing about the system described in "Romance of the Three Kingdoms" is not that it fell apart, but that it repeatedly put itself back together into a centralized, bureaucratic, agrarian empire, whereas post-Roman Europe fragmented permanently. Even foreign conquerors like the Manchus were largely assimilated by this system, adopting its institutions and governing philosophy (the "sinicization" thesis).

Regarding the Keju, the argument isn't for intentional eugenics, but a de facto one. The mechanism is simple: if (1) success in the exams correlates with heritable intelligence, and (2) success confers immense wealth and reproductive opportunity (e.g., supporting multiple wives and children who survive to adulthood), then over a millennium you have created a powerful, systematic selective pressure for those traits.

The core of the thought experiment remains: is a civilization that structurally, even if unintentionally, prioritizes stability and slow biological enhancement over rapid, disruptive technological innovation better positioned to handle long-term existential risks?

Reply
Wei Dai's Shortform
Wei Dai9h9-1

Maybe Chinese civilization was (unintentionally) on the right path: discourage or at least don't encourage technological innovation but don't stop it completely, run a de facto eugenics program (Keju, or Imperial Examination System) to slowly improve human intelligence, and centralize control over governance and culture to prevent drift from these policies. If the West hadn't jumped the gun with its Industrial Revolution, by the time China got to AI, human intelligence would be a lot higher and we might be in a much better position to solve alignment.

This was inspired by @dsj's complaint about centralization, using the example of it being impossible for a centralized power or authority to deal with the Industrial Revolution in a positive way. The contrarian in my mind piped up with "Maybe the problem isn't with centralization, but with the Industrial Revolution!" If the world had more centralization, such that the Industrial Revolution never started in an uncontrolled way, perhaps it would have been better off in the long run.

One unknown is what would the trajectory of philosophical progress look like in this centralized world, compared to a more decentralized world like ours. The West seems to have better philosophy than China, but it's not universal (e.g. analytical vs Continental philosophy). (Actually "not universal" is a big understatement given how little attention most people pay to good philosophy, aside from a few exceptional bubbles like LW.) Presumably in the centralized world there is a strong incentive to stifle philosophical progress (similar to China historically), for the sake of stability, but what happens when average human IQ reaches 150 or 200?

Reply4
Max Harms's Shortform
Wei Dai10h20

Have you seen/read my A broad basin of attraction around human values?

Reply
Christian homeschoolers in the year 3000
Wei Dai1d20

Yeah I think this outcome is quite plausible, which is in part why I only claimed "some hope". But

  1. It's also quite plausible that it won't be like that, for example maybe a good solution to meta-philosophy will be fairly attractive to everyone despite invalidating deeply held object-level beliefs, or it only clearly invalidates such beliefs after being applied with a lot of time/compute, which won't be available yet so people won't reject the meta-philosophy based on such invalidations.
  2. "What should be done if some/many people do reject the meta-philosophy based on it invalidating their beliefs?" is itself a philosophical question which the meta-philosophy could directly help us answer by accelerating philosophical progress, and/or that we can better answer after having a firmer handle on the nature of philosophy and therefore the ethics of changing people's philosophical beliefs. Perhaps the conclusion will be that symmetrical persuasion tactics, or centrally imposed policies, are justified in this case. Or maybe we'll use the understanding to find more effective asymmetrical or otherwise ethical persuasion tactics.

Basically my hope is that things become a lot clearer after we have a better understanding of metaphilosophy, as it seems to be a major obstacle to determining what should be done about the kind of problem described in the OP. I'm still curious whether you have any other solutions or approaches in mind.

Reply
Christian homeschoolers in the year 3000
Wei Dai1d40

I mean greater certainty/clarity than our current understanding of mathematical reasoning, which seems to me far from complete (e.g., realism vs formalism is unsettled, what is the deal with Berry's paradox, etc). By the time we have a good meta-philosophy, I expect our philosophy of math will be much improved too.

If there is not a good meta-philosophy to find even in the sense of matching/exceeding our current level of understanding of mathematical reasoning, which I think is plausible, but it would be a seemingly very strange and confusing state of affairs, as it would mean in that in all or most fields of philosophy there is no objective or commonly agreed way to determine good how an argument is, or whether some statement is true or false, even given infinite compute or subjective time, including fields that seemingly should have objective answers like philosophy of math or meta-ethics. (Lots of people claim that morality is subjective, but almost nobody claims that "morality is subjective" is itself subjective!)

If after lots and lots of research (ideally with enhanced humans), we just really can't find a good meta-philosophy, I would hope that we can at least find some clues as to why this is the case, or some kind of explanation that makes the situation less confusing, and then use those clues to guide us as to what to do next, as far as how to handle super-persuasion, etc.

Reply
Ethical Design Patterns
Wei Dai2d180

IMO, it’s hard to get a consensus for Heuristic C at the moment even though it kind of seems obvious.

Consider that humanity couldn't achieve a consensus around banning or not using cigarettes, leaded gasoline, or ozone-destroying chemicals, until they had done a huge amount of highly visible damage. There must have been plenty of arguments about their potential danger based on established science, and clear empirical evidence of the damage that they actually caused, far earlier, but such consensus still failed to form until much later, after catastrophic amounts of damage had already been caused. The consensus against drunk driving also only formed after extremely clear and undeniable evidence about its danger (based on accident statistics) became available.

I'm skeptical that more intentionally creating ethical design patterns could have helped such consensus form earlier in those cases, or in the case of AI x-safety, as it just doesn't seem to address the main root causes or bottlenecks for the lack of such consensus or governance failures, which IMO are things like:

  1. natural diversity of human opinions, when looking at the same set of arguments/evidence
  2. lack of extremely clear/undeniable evidence of harm
  3. democracy's natural difficulties around concentrated interests imposing diffused harms (due to "rational ignorance" of voters and collective action problems)

Something that's more likely to work is "persuasion design patterns", like what helped many countries pass anti-GMO legislation despite lack of clear scientific evidence for their harm, but I think we're all loathe to use such tactics.

Reply
Buck's Shortform
Wei Dai10d50

I've been reading a lot of web content, including this post, after asking my favorite LLM[1] to "rewrite it in Wei Dai's style" which I find tends to make it shorter and easier for me to read, while still leaving most of the info intact (unlike if I ask for a summary). Before I comment, I'll check the original to make sure the AI's version didn't miss a key point (or read the original in full if I'm sufficiently interested), and also ask the AI to double-check that my comment is sensible.


  1. currently Gemini 2.5 Pro because it's free through AI Studio, and the rate limit is high enough that I've never hit it ↩︎

Reply
Alignment as uploading with more steps
Wei Dai16d60

Thanks for the suggested readings.

I’m trying not to die here.

There are lots of ways to cash out "trying not to die", many of which imply that solving AI alignment (or getting uploaded) isn't even the most important thing. For instance under theories of modal or quantum immortality, dying is actually impossible. Or consider that most copies of you in the multiverse or universe are probably living in simulations of Earth rather than original physical entities, so the most important thing from a survival-defined-indexically perspective may be to figure out what the simulators want, or what's least likely to cause them to want to turn off the simulation or most likely to "rescue" you after you die here. Or, why aim for a "perfectly aligned" AI instead of one that cares just enough about humans to keep us alive in a comfortable zoo after the Singularity (which they may already do by default because of acausal trade, or maybe the best way to ensure this is to increase the cosmic resources available to aligned AI so they can do more of this kind of trade)?

And because I don’t believe in “correct” values.

The above was in part trying to point out that even something like not wanting to die is very ill defined, so if there are no correct values, not even relative to a person or a set of initial fuzzy non-preferences, then that's actually a much more troubling situation then you seem to think.

I don’t know how to build a safe philosophically super-competent assistant/oracle

That's in part why I'd want to attempt this only after a long pause (i.e. at least multi decades) to develop the necessary ideas, and probably only after enhancing human intelligence.

Reply
Christian homeschoolers in the year 3000
Wei Dai16d146

I've been talking about the same issue in various posts and comments, most prominently in Two Neglected Problems in Human-AI Safety. It feels like an obvious problem that (confusingly) almost no one talks about, so it's great to hear another concerned voice.

A potential solution I've been mooting is "metaphilosophical paternalism", or having AI provide support and/or error correction for humans' philosophical reasoning, based on a true theory of metaphilosophy (i.e., understanding of what philosophy is and what constitutes correct philosophical reasoning), to help them defend against memetic attacks and internal errors. So this is another reason I've been advocating for research into metaphilosophy, and for pausing AI (presumably for at least multiple decades) until metaphilosophy (and not just AI alignment, unless broadly defined to imply a solution to this problem) can be solved.

On your comment about "centrally enforced policy" being "kind of fucked up and illiberal", I think there is some hope that given enough time and effort, there can be a relatively uncontroversial solution to metaphilosophy[1], that most people can agree on at the end of the AI pause so central enforcement wouldn't be needed. Failing that, perhaps we should take a look at what the metaphilosophy landscape looks like after a lot of further development, and then collectively make a decision on how to proceed.

I'm curious if this addresses your concern, or if you see a differently shaped potential solution.


  1. similar to how there's not a huge amount of controversy today about what constitutes correct mathematical or scientific reasoning, although I'd want to aim for even greater certainty/clarity than that ↩︎

Reply1
Load More
10Wei Dai's Shortform
Ω
2y
Ω
203
65Managing risks while trying to do good
2y
26
46AI doing philosophy = AI generating hands?
Ω
2y
Ω
23
224UDT shows that decision theory is more puzzling than ever
Ω
2y
Ω
56
163Meta Questions about Metaphilosophy
Ω
2y
Ω
80
34Why doesn't China (or didn't anyone) encourage/mandate elastomeric respirators to control COVID?
Q
3y
Q
15
55How to bet against civilizational adequacy?
Q
3y
Q
20
5AI ethics vs AI alignment
3y
1
117A broad basin of attraction around human values?
Ω
3y
Ω
18
234Morality is Scary
Ω
4y
Ω
116
Load More
Carl Shulman
2 years ago
Carl Shulman
2 years ago
(-35)
Human-AI Safety
2 years ago
Roko's Basilisk
7 years ago
(+3/-3)
Carl Shulman
8 years ago
(+2/-2)
Updateless Decision Theory
12 years ago
(+62)
The Hanson-Yudkowsky AI-Foom Debate
13 years ago
(+23/-12)
Updateless Decision Theory
13 years ago
(+172)
Signaling
13 years ago
(+35)
Updateless Decision Theory
14 years ago
(+22)
Load More