Running Lightcone Infrastructure, which runs LessWrong and Lighthaven.space. You can reach me at habryka@lesswrong.com.
(I have signed no contracts or agreements whose existence I cannot mention, which I am mentioning here as a canary)
FWIW, I would currently take bets that Musk will pretty unambiguously enact and endorse censorship of things critical of him or the Trump administration more broadly within the next 12 months. I agree this case is ambiguous, but my pretty strong read based on him calling for criminal prosecution of journalists who say critical things about him or the Trump administration is that the moment its a question of political opportunity, not willingness. I am not totally sure, but sure enough to take a 1:1 bet on this operationalization.
My best guess is (which I roughly agree with) is that your comments are too long, likely as a result of base-model use.
Maybe I am confused (and it's been a while since I thought about these parts of decision theory), but I thought smoking lesion is usually the test case for showing why EDT is broken, and newcombs is usually the test case for why CDT is broken, so it makes sense that Smoking Lesion wouldn't convince you that CDT is wrong.
Back in 2022 when our team was getting the ball rolling on the whole dangerous capabilities testing / evals agenda, I was like…
Looks like the rest of the comment got cut off?
(My comments start with a higher vote-total since my small-vote strength is 2. Then looks like one person voted on mine but not yours, but one vote is really just random noise, I would ignore it)
I mean, one thing base models love to do is to generate biographical details about my life which are not accurate. Once when I was generating continuations to a thread where Alex Zhu was arguing with me about near-death experiences the model just claimed that I really knew that you don't have any kind of "life flashing before your eyes" thing that happens when you are near death, because actually, I had been in 5+ near death experiences, and so I really would know. This would of course be a great argument to make if it was true, but it of course is not. Smaller variations of this kind of stuff happen to me all the time, and they are easy to miss.
I think you are underestimating the degree to which contribution to LessWrong is mostly done by people who have engaged with each other a lot. We review all posts from new users before they go live. We can handle more submissions, and lying to the moderators about your content being AI written is not going to work for that many iterations. And with that policy, if we find out you violated the content policies, we feel comfortable banning you.
I think it's a little more complex than that, but not much. Humans can't tell LLM writing from human writing in controlled studies. The question isn't whether you can hide the style or even if it's hard, just how easy.
I am quite confident I can tell LLM writing from human writing. Yes, there are prompts sufficient to fool me, but only for a bit until I pick up on it. Adding "don't write in a standard LLM style" would not be enough, and my guess is nothing that takes less than half an hour to figure out would be enough.
I have also done a lot of writing with base models! (Indeed, we have an admin-only base-model completion feature built into the LW editor that I frequently use).
I think roughly the same guideline applies to base models as for research assistants:
A rough guideline is that if you are using AI for writing assistance, you should spend a minimum of 1 minute per 50 words (enough to read the content several times and perform significant edits), you should not include any information that you can't verify, haven't verified, or don't understand, and you should not use the stereotypical writing style of an AI assistant.
Base models like to make stuff up even more so than assistant models, and they do so in more pernicious ways, so I would probably increase this threshold a bit. They do help me write, but I do really need to read everything 3-4 times to check they didn't just make up something random about me, or imply something false.
We do alt-account detection and mass-voting detection. I am quite confident we would reliably catch any attempts at this, and that this hasn't been happening so far.
Because this would cause people to basically not downvote things, drastically reducing the signal to noise ratio of the site.