Wei Dai

I think I need more practice talking with people in real time (about intellectual topics). (I've gotten much more used to text chat/comments, which I like because it puts less time pressure on me to think and respond quickly, but I feel like I now incur a large cost due to excessively shying away from talking to people, hence the desire for practice.) If anyone wants to have a voice chat with me about a topic that I'm interested in (see my recent post/comment history to get a sense), please contact me via PM.

Posts

Sorted by New

10Wei Dai's Shortform

190

10Wei Dai's Shortform

190

63Managing risks while trying to do good

46AI doing philosophy = AI generating hands?

221UDT shows that decision theory is more puzzling than ever

163Meta Questions about Metaphilosophy

34Why doesn't China (or didn't anyone) encourage/mandate elastomeric respirators to control COVID?

55How to bet against civilizational adequacy?

5AI ethics vs AI alignment

115A broad basin of attraction around human values?

233Morality is Scary

116

Wikitag Contributions

(-35)

(+3/-3)

(+2/-2)

Updateless Decision Theory

12y

(+62)

The Hanson-Yudkowsky AI-Foom Debate

13y

(+23/-12)

Updateless Decision Theory

13y

(+172)

Signaling

13y

(+35)

Updateless Decision Theory

14y

(+22)

Comments

Sorted by

Newest

Pitfalls of Building UDT Agents

Wei Dai13d105

A lot of things you state here with apparent certainty, e.g., "We only care about this universe." are things that I think are potential problems, but am unsure about. E.g. in UDT shows that decision theory is more puzzling than ever I wrote:

Indexical values are not reflectively consistent. UDT "solves" this problem by implicitly assuming (via the type signature of its utility function) that the agent doesn't have indexical values. But humans seemingly do have indexical values, so what to do about that?

which I think is talking about the same or related issue. I think a lot of these (e.g. whether or not we really care or should care only about this universe) seem like hard philosophical problems that can't be solved easily, so directly trying to solve them, or confidently assuming some solution like "We only care about this universe", as part of AI safety/alignment seems like a bad idea to me.

Meta-discussion from "Circling as Cousin to Rationality"

Wei Dai13d75

For now all I can think of is to note that some users, like Wei Dai, ask lots of pointed and clarifying questions and never provoke in me the same kind of frustration that many of Said's comments do.

Reflecting on this, I think there are a few reasons for this:

Generally being less confident that I'm right and my interlocutor is wrong, or that when I don't understand something it's because the writing itself is confused vs that I'm not smart enough or lack enough background to understand it.
Having less interest in commenting on posts that I'm highly skeptical about, like ones that are positive on Circling or religion, thus avoiding getting into conflicts with their authors.
When I do write pointed/skeptical comments, I'm more mindful of my own potential bias to be overconfident or my own status-seeking tendencies, and also because I have a more conflict-avoiding personality, I intentionally "tone down" my skepticism/hostility/derision, or leave threads instead of trying to win every argument.

So I think I understand and can sympathize with those who find Said's style too abrasive/aversive, but also think it's probably good to have one or two people like him around, 1. to criticize posts that I think should be criticized but people more like me have less interest in directly engaging with, and 2. holding open one end of the Overton Window so that people in the community know that occasional harsh criticism should be expected and tolerated (but possibly ignored/downvoted), so that when I (and others like me who are a bit more attuned to social reality and other people's feelings) try to criticize something, we don't feel even more pressure to avoid offending the other person.

(Tagging @habryka since this might be relevant to his decision to ban Said or not.)

[Meta] New moderation tools and moderation guidelines

Wei Dai22d43

I think I have elaborated non-trivially on my reasons in this thread, so I don't really think it's an issue of people not finding it.

It's largely an issue of lack of organization and conciseness (50k+ words is a minus, not a plus in my view), but also clearly an issue of "not finding it", given that you couldn't find an important comment of your own, one that (judging from your description of it) contains a core argument needed to understand your current insistence on authors-as-unilateral-moderators.

[Meta] New moderation tools and moderation guidelines

Wei Dai22d52

I meant pivot in the sense of "this doesn't seem to be working well, we should seriously consider other possibilities" not "we're definitely switching to a new moderation model", but I now get that you disagree with Ray even about this.

Your comment under Ray's post wrote:

We did end up implementing the AI Alignment Forum, which I do actually think is working pretty well and is a pretty good example of how I imagine Archipelago-like stuff to play out. We now also have both the EA Forum and LessWrong creating some more archipelago-like diversity in the online-forum space.

This made me think you were also no longer very focused on the authors-as-unilateral-moderators model and was thinking more about subreddit-like models that Ray mentioned in his post.

BTW I've been thinking for a while that LW needs a better search, as I've also often been in the position being unable to find some comment I've written in the past.

Instead of one-on-one chats (or in addition to them), I think you should collect/organize your thoughts in a post or sequence, for a number of reasons including that you seem visibly frustrated that after having written 50k+ words on the topic, people like me still don't know your reasons for preferring your solution.

[Meta] New moderation tools and moderation guidelines

Wei Dai22d53

It seems to me there are plenty of options aside from centralized control and giving authors unilateral powers, and last I remember (i.e., at the end of this post) the mod team seems to be pivoting to other possibilities, some of which I would find much more reasonable/acceptable. I'm confused why you're now so focused again on the model of authors-as-unilateral-moderators. Where have you explained this?

[Meta] New moderation tools and moderation guidelines

Wei Dai22d71

I mean, mostly we've decided to give the people who complain about moderation a shot

What do you mean by this? Until I read this sentence, I saw you as giving the people who demand unilateral moderation powers a shot, and denying the requests of people like me to reduce such powers.

My not very confident guess at this point is that if it weren't for people like me, you would have pushed harder for people to moderate their own spaces more, perhaps by trying to publicly encourage this? And why did you decide to go against your own judgment on it, given that "people who complain about moderation" have no particular powers, except the power of persuasion (we're not even threatening to leave the site!), and it seems like you were never persuaded?

My guess is LW would be a lot better if more people felt comfortable moderating things, and in the present world, there are a lot of costs born by the site admins that wouldn't be necessary otherwise.

This seems implausible to me given my understanding of human nature (most people really hate to see/hear criticism) and history (few people can resist the temptation to shut down their critics when given the power and social license or cover to do so). If you want a taste of this, try asking DeepSeek some questions about the CCP.

But presumably you also know this (at least abstractly, but perhaps not as viscerally as I do, coming from a Chinese background, where even before the CCP, criticism in many situations was culturally/socially impossible), so I'm confused and curious why you believe what you do.

My guess is that you see a constant stream of bad comments, and wish you could outsource the burden of filtering them to post authors (or combine efforts to do more filtering). But as an occasional post author, my experience is that I'm not a reliable judge of what counts as a "bad comment", e.g., I'm liable to view a critique as a low quality comment, only to change my mind later after seeing it upvoted and trying harder to understand/appreciate its point. Given this, I'm much more inclined to leave the moderation to the karma system, which seems to work well enough in leaving bad comments at low karma/visibility by not upvoting them, and even when it's occasionally wrong, still provides a useful signal to me that many people share the same misunderstanding and it's worth my time to try to correct (or maybe by engaging with it I find out that I still misjudged it).

But if you don't think it works well enough... hmm I recall writing a post about moderation tech proposals in 2016 and maybe there has been newer ideas since then?

[Meta] New moderation tools and moderation guidelines

Wei Dai23d63

If someone finds interacting with you very unpleasant and you don't understand quite why, it's often bad form to loudly complain about it every time they don't want to interact with you any more, even if you have an uncharitable hypothesis as to why.

If I was in this circumstance, I would be pretty worried about my own biases, and ask neutral or potentially less biased parties whether there might be more charitable and reasonable hypotheses why that person doesn't want to interact with me. If there isn't though, why shouldn't I complain and e.g. make it common knowledge that my valuable criticism is being suppressed? (Obviously I would also take into consideration social/political realities, not make enemies I can't afford to make, etc.)

I've seen many spaces degrade due to unwillingness to moderate

But most people aren't using this feature, so to the extent that LW hasn't degraded (and that's due to moderation), isn't it mainly because of the site moderators and karma voters? The benefits of having a few people occasionally moderate their own spaces hardly seems worth the cost (to potential critics and people like me who really value criticism) of not knowing when their critiques might be unilaterally deleted or banned by post authors. I mean aside from the "benefit" of attracting/retaining the authors who demand such unilateral powers.

And, man, this is a lot of moderation discussion.

Aside from the above "benefit", It seems like you're currently getting the worst of both worlds: lack of significant usage and therefore potential positive effects, and lots of controversy when it is occasionally used. If you really thought this was an important feature for the long term health of the community, wouldn't you do something to make it more popular? (Or have done it in the past 7 years since the feature came out?) But instead you (the mod team) seem content that few people use it, only coming out to defend the feature when people explicitly object to it. This only seems to make sense if the main motivation is again to attract/retain certain authors.

I am somewhat wary you will keep asking me a lot of short questions that, due to your inexperience moderating spaces, you will assume have simple answers, and I will have to do lots of work generating all the contexts to show how things play out

It seems like if you actually wanted or expected many people to use this feature, you would have written some guidelines on what people can and can't do, or under what circumstances their moderation actions might be reversed by the site moderators. I don't think I was expecting the answers to my questions to necessarily be simple, but rather that the answers already exist somewhere, at least in the form of general guidelines that might need to be interpreted to answer my specific questions.

[Meta] New moderation tools and moderation guidelines

Wei Dai24d92

the problem is that many bad comments try to make some things low status that I am trying to cultivate on the site

What are these things? Do you have a post about them?

[Meta] New moderation tools and moderation guidelines

Wei Dai24d75

We're arguing that authors on LessWrong should be able to moderate their posts with different norms/standards from one another, and that there should not reliably be retribution or counter-punishment by other commenters for them moderating in that way.

What is currently the acceptable range of moderation norms/standards (according to the LW mod team)? For example if someone blatantly deletes/bans their most effective critics, is that acceptable? What if they instead subtly discourage critics (while being overtly neutral/welcoming) by selectively enforcing rules more stringently against their critics? What if they simply ban all "offensive" content, which as a side effect discourages critics (since as I mentioned earlier, criticism almostly inescapably implies offense)?

And what does "retribution or counter-punishment" mean? If I see an author doing one of the above, and question or criticize that in the comments or elsewhere, is that considered "retribution or counter-punishment" given that my comment/post is also inescapably offensive (status-lowering) toward the author?

Comment on "Four Layers of Intellectual Conversation"

Wei Dai26d70

As a note on intellectual history, I think I was influenced by "Four Layers of Intellectual Conversation" (Yudkowsky 2016) and "AI Safety via Debate" (Irving, Christiano, Amodei 2018) when I wrote "Some Thoughts on Metaphilosophy" (2019), and I wonder if "AI Safety via Debate" itself was influenced by Eliezer's post, even though there's no direct citation, since both emphasize the problem-solving power of large/unlimited number of layers/rounds of debate.