LESSWRONG
LW

1
Wei Dai
43479Ω3162148518318
Message
Dialogue
Subscribe

If anyone wants to have a voice chat with me about a topic that I'm interested in (see my recent post/comment history to get a sense), please contact me via PM.

My main "claims to fame":

  • Created the first general purpose open source cryptography programming library (Crypto++, 1995), motivated by AI risk and what's now called "defensive acceleration".
  • Published one of the first descriptions of a cryptocurrency based on a distributed public ledger (b-money, 1998), predating Bitcoin.
  • Proposed UDT, combining the ideas of updatelessness, policy selection, and evaluating consequences using logical conditionals.
  • First to argue for pausing AI development based on the technical difficulty of ensuring AI x-safety (SL4 2004, LW 2011).
  • Identified current and future philosophical difficulties as core AI x-safety bottlenecks, potentially insurmountable by human researchers, and advocated for research into metaphilosophy and AI philosophical competence as possible solutions.

My Home Page

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
10Wei Dai's Shortform
Ω
2y
Ω
304
130Please, Don't Roll Your Own Metaethics
Ω
2d
Ω
33
117Problems I've Tried to Legibilize
Ω
5d
Ω
16
336Legible vs. Illegible AI Safety Problems
Ω
5d
Ω
92
71Trying to understand my own cognitive edge
11d
17
10Wei Dai's Shortform
Ω
2y
Ω
304
66Managing risks while trying to do good
2y
28
49AI doing philosophy = AI generating hands?
Ω
2y
Ω
24
228UDT shows that decision theory is more puzzling than ever
Ω
2y
Ω
56
163Meta Questions about Metaphilosophy
Ω
2y
Ω
80
34Why doesn't China (or didn't anyone) encourage/mandate elastomeric respirators to control COVID?
Q
3y
Q
15
Load More
The Charge of the Hobby Horse
Wei Dai7h142

It appears from this post that the ban was itself based on a misunderstanding of my final comment. Nowhere in my comment did I say anything resembling "Anyway, let's talk about how Y is not true." with Y being "People should have been deferring to Yudkowsky as much as they did."

What I actually did was acknowledge my misunderstanding and then propose a new, related topic I thought might be interesting: the actual root causes of the deference. This was an invitation to a different conversation, which Tsvi was free to ignore.

There is no plausible interpretation of my comment as a refusal to drop the original point. The idea that I was stuck on a hobby horse that could only be stopped by a ban is directly contradicted by the text of the comment itself:

Ok, it looks like part of my motivation for going down this line of thought was based on a misunderstanding. But to be fair, in this post after you asked "What should we have done instead?" with regard to deferring to Eliezer, you didn't clearly say "we should have not deferred or deferred less", but instead wrote "We don't have to stop deferring, to avoid this correlated failure. We just have to say that we're deferring." Given that this is a case where many people could have and should have not deferred, this just seems like a bad example to illustrate "given that to some extent at the end of the day we do have to defer on many things, what can we do to alleviate some of those problems?", leading to the kind of confusion I had.

Also, another part of my motivation is still valid and I think it would be interesting to try to answer why didn't you (and others) just not defer? Not in a rhetorical sense, but what actually caused this? Was it age as you hinted earlier? Was it just human nature to want to defer to someone? Was it that you were being paid by an organization that Eliezer founded and had very strong influence over? Etc.? And also why didn't you (and others) notice Eliezer's strategic mistakes, if that has a different or additional answer?

I think there are other significant misrepresentations in his "gloss" of the thread, that I won't go into. This episode has given me quite a large aversion around engaging with Tsvi, which will inform my future participation on LW.

Reply
The Charge of the Hobby Horse
Wei Dai8h31

Can you please remove the example involving me, or anonymize it and make it a hypothetical example? I think it's a significant misrepresentation of my words (that makes me appear more unreasonable than I was), but don't have the time/energy/interest to debate you to try to get it corrected.

Reply
Human Values ≠ Goodness
Wei Dai9h81

This post was one of several examples of "rolling your own metaethics" that I had in mind when writing Please, Don't Roll Your Own Metaethics, because it's not just proposing or researching a new metaethical idea, but deploying it, in the sense of trying to spread it among people who the author does not expect to reflect carefully about the idea.

Reply
Questioning Computationalism
Wei Dai10h72

The multiple-realizability of computation "cuts the ties" to the substrate. These ties to the substrate are important. This idea leads Sahil to predict, for example, that LLMs will be too "stuck in simulation" to engage very willfully in their own self-defense.

Many copies of me are probably stuck in simulations around the multiverse, and I/we are still "engaging willfully in our own self-defense" e.g. by reasoning about who might be simulating me and for what reasons, and trying to be helpful/interesting to our possible simulators. This is a direct counter-example to Sahil's prediction.

Overall FGF's side's arguments seem very weak. I generally agree with CGF's counterarguments, but would emphasize more that "Doesn't that seem somehow important?" is not a good argument when there are many differences between a human brain and a LLM. It seems like a classic case of privileging the hypothesis.

I'm curious what about Sahil that causes you to pay attention to his ideas (and collaborate in other ways), sometimes (as in this case) in opposition to your own object-level judgment. E.g., what works of his impressed you and might be interesting for me to read?

Reply
Wei Dai's Shortform
Wei Dai17h174

I think when a human gets a negative reward signal, probably all the circuits that contributed to the "episode trajectory" gets downweighted, and antagonistic circuits get upweighted, similar to AI being trained with RL. I can override my subconscious circuits with conscious willpower but I only have so much conscious processing and will power to go around. For example I'm currently feeling a pretty large aversion towards talking with you, but am overriding it because I think it's worth the effort to get this message out, but I can't keep the "override" active forever.

Of course I can consciously learn more precise things, if you were to write about them, but that seems unlikely to change the subconscious learning that happened already.

Reply111
Wei Dai's Shortform
Wei Dai18h2119

I think the cultural slide will include self-censorship, e.g., having had this experience (of being banned out of the blue), in the future I'll probably subconsciously be constantly thinking "am I annoying this author too much with my comments" and disengage early or change what I say before I get banned, and this will largely be out of my conscious control.

Reply1
Wei Dai's Shortform
Wei Dai1d1615

(Thanks for reposting without the link/quotes. I added back the karma your comment had, as best as I could.) Previously, the normal way to disengage was to just disengage, or to say that one is disengaging and then stop responding, not to suddenly ban someone without warning based on one thread. I do not recall seeing a ban previously that wasn't based on some long term pattern of behavior.

Reply
Wei Dai's Shortform
Wei Dai1d*110

Today I was author-banned for the first time, without warning and as a total surprise to me, ~8 years after banning power was given to authors, but less than 3 months since @Said Achmiz was removed from LW. It seems to vindicate my fear that LW would slide towards a more censorious culture if the mods went through with their decision.

Has anyone noticed any positive effects, BTW? Has anyone who stayed away from LW because of Said rejoined?

Edit: In addition to the timing, previously, I do not recall seeing a ban based on just one interaction/thread, instead of some long term pattern of behavior. Also, I'm not linking the thread because IIUC the mods do not wish to see authors criticized for exercising their mod powers, and I also don't want to criticize the specific author. I'm worried about the overall cultural trend caused by admin policies/preferences, not trying to apply pressure to the author who banned me.

Reply11
Human Values ≠ Goodness
Wei Dai1d*42

One way you could apply it is by not endorsing so completely/confidently the kind of "rolling your own metaethics" that I argued against (that I see John as doing here), i.e., by saying "the distinction John is making here is correct, plus his advice on how to approach it." (Of course you wrote that before I posted, but I'm hoping this is one of the takeaways people get from my post.)

Reply
Human Values ≠ Goodness
Wei Dai1d20

Have you also seen https://www.lesswrong.com/posts/KCSmZsQzwvBxYNNaT/please-don-t-roll-your-own-metaethics which was also partly in response to that thread? BTW why is my post still in "personal blog"?

Reply
Load More
Carl Shulman
2 years ago
Carl Shulman
2 years ago
(-35)
Human-AI Safety
2 years ago
Roko's Basilisk
7 years ago
(+3/-3)
Carl Shulman
8 years ago
(+2/-2)
Updateless Decision Theory
12 years ago
(+62)
The Hanson-Yudkowsky AI-Foom Debate
13 years ago
(+23/-12)
Updateless Decision Theory
13 years ago
(+172)
Signaling
13 years ago
(+35)
Updateless Decision Theory
14 years ago
(+22)
Load More