This is a special post for quick takes by habryka. Only they can create top-level comments. Comments here also appear on the Quick Takes page and All Posts page.

In an attempt to get myself to write more here is my own shortform feed. Ideally I would write something daily, but we will see how it goes.

Habryka's Shortform Feed
366 comments, sorted by Click to highlight new comments since:
Some comments are truncated due to high volume. (⌘F to expand all)Change truncation settings
[-]habryka39060

I am confident, on the basis of private information I can't share, that Anthropic has asked at least some employees to sign similar non-disparagement agreements that are covered by non-disclosure agreements as OpenAI did. 

Or to put things into more plain terms: 

I am confident that Anthropic has offered at least one employee significant financial incentive to promise to never say anything bad about Anthropic, or anything that might negatively affect its business, and to never tell anyone about their commitment to do so.

I am not aware of Anthropic doing anything like withholding vested equity the way OpenAI did, though I think the effect on discourse is similarly bad.

I of course think this is quite sad and a bad thing for a leading AI capability company to do, especially one that bills itself on being held accountable by its employees and that claims to prioritize safety in its plans.

Reply25228874211

Hey all, Anthropic cofounder here.  I wanted to clarify Anthropic's position on non-disparagement agreements:

  1. We have never tied non-disparagement agreements to vested equity: this would be highly unusual. Employees or former employees never risked losing their vested equity for criticizing the company.
  2. We historically included standard non-disparagement terms by default in severance agreements, and in some non-US employment contracts. We've since recognized that this routine use of non-disparagement agreements, even in these narrow cases, conflicts with our mission. Since June 1st we've been going through our standard agreements and removing these terms.
  3. Anyone who has signed a non-disparagement agreement with Anthropic is free to state that fact (and we regret that some previous agreements were unclear on this point). If someone signed a non-disparagement agreement in the past and wants to raise concerns about safety at Anthropic, we welcome that feedback and will not enforce the non-disparagement agreement.

In other words— we're not here to play games with AI safety using legal contracts. Anthropic's whole reason for existing is to increase the chance that AI goes well, and spu... (read more)

Reply19922

Please keep up the pressure on us

OK:

  1. You should publicly confirm that your old policy don't meaningfully advance the frontier with a public launch has been replaced by your RSP, if that's true, and otherwise clarify your policy.
  2. You take credit for the LTBT (e.g. here) but you haven't published enough to show that it's effective. You should publish the Trust Agreement, clarify these ambiguities, and make accountability-y commitments like if major changes happen to the LTBT we'll quickly tell the public.
  3. (Reminder that a year ago you committed to establish a bug bounty program (for model issues) or similar but haven't. But I don't think bug bounties are super important.)
    1. [Edit: bug bounties are also mentioned in your RSP—in association with ASL-2—but not explicitly committed to.]
  4. (Good job in many areas.)

(Sidenote: it seems Sam was kind of explicitly asking to be pressured, so your comment seems legit :)  
But I also think that, had Sam not done so, I would still really appreciate him showing up and responding to Oli's top-level post, and I think it should be fine for folks from companies to show up and engage with the topic at hand (NDAs), without also having to do a general AMA about all kinds of other aspects of their strategy and policies. If Zach's questions do get very upvoted, though, it might suggest there's demand for some kind of Anthropic AMA event.) 

[-]habryka10016

Anyone who has signed a non-disparagement agreement with Anthropic is free to state that fact (and we regret that some previous agreements were unclear on this point) [emphasis added]

This seems as far as I can tell a straightforward lie? 

I am very confident that the non-disparagement agreements you asked at least one employee to sign were not ambiguous, and very clearly said that the non-disparagement clauses could not be mentioned.

To reiterate what I know to be true: Employees of Anthropic were asked to sign non-disparagement agreements with a commitment to never tell anyone about the presence of those non-disparagement agreements. There was no ambiguity in the agreements that I have seen.

@Sam McCandlish: Please clarify what you meant to communicate by the above, which I interpreted as claiming that there was merely ambiguity in previous agreements about whether the non-disparagement agreements could be disclosed, which seems to me demonstrably false.

I can confirm that my concealed non-disparagement was very explicit that I could not discuss the existence or terms of the agreement, I don't see any way I could be misinterpreting this. (but I have now kindly been released from it!)

EDIT: It wouldn't massively surprise me if Sam just wasn't aware of its existence though

We're not claiming that Anthropic never offered a confidential non-disparagement agreement. What we are saying is: everyone is now free to talk about having signed a non-disparagement agreement with us, regardless of whether there was a non-disclosure previously preventing it. (We will of course continue to honor all of Anthropic's non-disparagement and non-disclosure obligations, e.g. from mutual agreements.)

If you've signed one of these agreements and have concerns about it, please email hr@anthropic.com.

[-]habryka5918

Hmm, I feel like you didn't answer my question. Can you confirm that Anthropic has asked at least some employees to sign confidential non-disparagement agreements?

I think your previous comment pretty strongly implied that you think you did not do so (i.e. saying any previous agreements were merely "unclear" I think pretty clearly implies that none of them did include a non-ambiguous confidential non-disparagement agreement). I want to it to be confirmed and on the record that you did, so I am asking you to say so clearly.

-2lemonhope
"Unclear on this point" means what you think it means and is not a L I E for a spokesperson to say in my book. You got the W here already

I really think the above was meant to imply that the non disparagement agreements were merely unclear on whether they were covered by a non disclosure clause (and I would be happy to take bets on how a randomly selected reader would interpret it).

My best guess is Sam was genuinely confused on this and that there are non disparagement agreements with Anthropic that clearly are not covered by such clauses.

EDIT: Anthropic have kindly released me personally from my entire concealed non-disparagement, not just made a specific safety exception. Their position on other employees remains unclear, but I take this as a good sign

If someone signed a non-disparagement agreement in the past and wants to raise concerns about safety at Anthropic, we welcome that feedback and will not enforce the non-disparagement agreement.

Thanks for this update! To clarify, are you saying that you WILL enforce existing non disparagements for everything apart from safety, but you are specifically making an exception for safety?

this routine use of non-disparagement agreements, even in these narrow cases, conflicts with our mission

Given this part, I find this surprising. Surely if you think it's bad to ask future employees to sign non disparagements you should also want to free past employees from them too?

[-]aysja4833

This comment appears to respond to habryka, but doesn’t actually address what I took to be his two main points—that Anthropic was using NDAs to cover non-disparagement agreements, and that they were applying significant financial incentive to pressure employees into signing them.

We historically included standard non-disparagement agreements by default in severance agreements

Were these agreements subject to NDA? And were all departing employees asked to sign them, or just some? If the latter, what determined who was asked to sign? 

Anyone who has signed a non-disparagement agreement with Anthropic is free to state that fact (and we regret that some previous agreements were unclear on this point).

I'm curious as to why it took you (and therefore Anthropic) so long to make it common knowledge (or even public knowledge) that Anthropic used non-disparagement contracts as a standard and was also planning to change its standard agreements.

The right time to reveal this was when the OpenAI non-disparagement news broke, not after Habryka connects the dots and builds social momentum for scrutiny of Anthropic.

[-]habryka3529

that Anthropic used non-disparagement contracts as a standard and was also planning to change its standard agreements.

I do want to be clear that a major issue is that Anthropic used non-disparagement agreements that were covered by non-disclosure agreements. I think that's an additionally much more insidious thing to do, that contributed substantially to the harm caused by the OpenAI agreements, and I think is important fact to include here (and also makes the two situations even more analogous).

Note, since this is a new and unverified account, that Jack Clark (Anthropic co-founder) confirmed on Twitter that the parent comment is the official Anthropic position https://x.com/jackclarkSF/status/1808975582832832973

Thank you for responding! (I have more comments and questions but figured I would shoot off one quick question which is easy to ask)

We've since recognized that this routine use of non-disparagement agreements, even in these narrow cases, conflicts with our mission

Can you clarify what you mean by "even in these narrow cases"? If I am understanding you correctly, you are saying that you were including a non-disparagement clause by default in all of your severance agreements, which sounds like the opposite of narrow (edit: though as Robert points out it depends on what fraction of employees get offered any kind of severance, which might be most, or might be very few).

I agree that it would have technically been possible for you to also include such an agreement on start of employment, but that would have been very weird, and not even OpenAI did that.

I think using the sentence "even in these narrow cases" seems inappropriate given that (if I am understanding you correctly) all past employees were affected by these agreements. I think it would be good to clarify what fraction of past employees were actually offered these agreements.

Severance agreements typically aren't offered to all departing employees, but usually only those that are fired or laid off.  We know that not all past employees were affected by these agreements, because Ivan claims to not have been offered such an agreement, and he left[1] in mid-2023, which was well before June 1st.

  1. ^

    Presumably of his own volition, hence no offered severance agreement with non-disparagement clauses.

3habryka
Ah, fair, that would definitely make the statement substantially more accurate. @Sam McCandlish: Could you clarify whether severance agreements were also offered to voluntarily departing employees, and if so, under which conditions?
[-]kave1210

To expand on my "that's a crux": if the non-disparagement+NDA clauses are very standard, such that they were included in a first draft by an attorney without prompting and no employee ever pushed back, then I would think this was somewhat less bad.

It would still be somewhat bad, because Anthropic should be proactive about not making those kinds of mistakes. I am confused about what level of perfection to demand from Anthropic, considering the stakes.

And if non-disparagement is often used, but Anthropic leadership either specified its presence or its form, that would seem quite bad to me, because mistakes of commision here are more evidence of poor decisionmaking than mistakes of omission. If Anthropic leadership decided to keep the clause when a departing employee wanted to remove the clause, that would similarly seem quite bad to me.

I think that both these clauses are very standard in such agreements. Both severance letter templates I was given for my startup, one from a top-tier SV investor's HR function and another from a top-tier SV law firm, had both clauses. When I asked Claude, it estimated 70-80% of startups would have a similar non-disparagement clause and 80-90% would have a similar confidentiality-of-this-agreement's-terms clause.  The three top Google hits for "severance agreement template" all included those clauses.

These generally aren't malicious. Terminations get messy and departing employees often have a warped or incomplete picture of why they were terminated–it's not a good idea to tell them all those details, because that adds liability, and some of those details are themselves confidential about other employees. Companies view the limitation of liability from release of various wrongful termination claims as part of the value they're "purchasing" by offering severance–not because those claims would succeed, but because it's expensive to explain in court why they're justified. But the expenses disgruntled ex-employees can cause is not just legal, it's also reputational. You usually don'... (read more)

And internally, we have an anonymous RSP non-compliance reporting line so that any employee can raise concerns about issues like this without any fear of retaliation.
 

Are you able to elaborate on how this works? Are there any other details about this publicly, couldn't find more detail via a quick search.

Some specific qs I'm curious about: (a) who handles the anonymous complaints, (b) what is the scope of behavior explicitly (and implicitly re: cultural norms) covered here, (c) handling situations where a report would deanonymize the reporter (or limit them to a small number of people)?

5Zach Stein-Perlman
Anthropic has not published details. See discussion here. (I weakly wish they would; it's not among my high-priority asks for them.)

OK, let's imagine I had a concern about RSP noncompliance, and felt that I needed to use this mechanism.

(in reality I'd just post in whichever slack channel seemed most appropriate; this happens occasionally for "just wanted to check..." style concerns and I'm very confident we'd welcome graver reports too. Usually that'd be a public channel; for some compartmentalized stuff it might be a private channel and I'd DM the team lead if I didn't have access. I think we have good norms and culture around explicitly raising safety concerns and taking them seriously.)

As I understand it, I'd:

  • Remember that we have such a mechanism and bet that there's a shortcut link. Fail to remember the shortlink name (reports? violations?) and search the list of "rsp-" links; ah, it's rsp-noncompliance. (just did this, and added a few aliases)
  • That lands me on the policy PDF, which explains in two pages the intended scope of the policy, who's covered, the proceedure, etc. and contains a link to the third-party anonymous reporting platform. That link is publicly accessible, so I could e.g. make a report from a non-work device or even after leaving the company.
  • I write a report on that platform desc
... (read more)

Good that it's clear who it goes to, though if I was an anthropic I'd want an option to escalate to a board member who isn't Dario or Daniella, in case I had concerns related to the CEO

Makes sense - if I felt I had to use an anonymous mechanism, I can see how contacting Daniela about Dario might be uncomfortable. (Although to be clear I actually think that'd be fine, and I'd also have to think that Sam McCandlish as responsible scaling officer wouldn't handle it)

If I was doing this today I guess I'd email another board member; and I'll suggest that we add that as an escalation option.

[-]Raemon169

Are there currently board members who are meaningfully separated in terms of incentive-alignment with Daniella or Dario? (I don't know that it's possible for you to answer in a way that'd really resolve my concerns, given what sort of information is possible to share. But, "is there an actual way to criticize Dario and/or Daniella in a way that will realistically be given a fair hearing by someone who, if appropriate, could take some kind of action" is a crux of mine)

5William_S
Absent evidence to the contrary, for any organization one should assume board members were basically selected by the CEO. So hard to get assurance about true independence, but it seems good to at least to talk to someone who isn't a family member/close friend.
7Zach Stein-Perlman
(Jay Kreps was formally selected by the LTBT. I think Yasmin Razavi was selected by the Series C investors. It's not clear how involved the leadership/Amodeis were in those selections. The three remaining members of the LTBT appear independent, at least on cursory inspection.)
2Zac Hatfield-Dodds
I think that personal incentives is an unhelpful way to try and think about or predict board behavior (for Anthropic and in general), but you can find the current members of our board listed here. For whom to criticize him/her/them about what? What kind of action are you imagining? For anything I can imagine actually coming up, I'd be personally comfortable raising it directly with either or both of them in person or in writing, and believe they'd give it a fair hearing as well as appropriate follow-up. There are also standard company mechanisms that many people might be more comfortable using (talk to your manager or someone responsible for that area; ask a maybe-anonymous question in various fora; etc). Ultimately executives are accountable to the board, which will be majority appointed by the long-term benefit trust from late this year.
2Zach Stein-Perlman
Re 3 (and 1): yay. If I was in charge of Anthropic I just wouldn't use non-disparagement.

Anthropic has asked employees

[...]

Anthropic has offered at least one employee

As a point of clarification: is it correct that the first quoted statement above should be read as "at least one employee" in line with the second quoted statement? (When I first read it, I parsed it as "all employees" which was very confusing since I carefully read my contract both before signing and a few days ago (before posting this comment) and I'm pretty sure there wasn't anything like this in there.)

(I'm a full-time employee at Anthropic.)
I carefully read my contract both before signing and a few days ago [...] there wasn't anything like this in there.

Current employees of OpenAI also wouldn't yet have signed or even known about the non-disparagement agreement that is part of "general release" paperwork on leaving the company. So this is only evidence about some ways this could work at Anthropic, not others.

6habryka
Yep, both should be read as "at least one employee", sorry for the ambiguity in the language.

FWIW I recommend editing OP to clarify this.

2Neel Nanda
Agreed, I think it's quite confusing as is
4habryka
Added a "at least some", which I hope clarifies. 

I am disappointed. Using nondisparagement agreements seems bad to me, especially if they're covered by non-disclosure agreements, especially if you don't announce that you might use this.

My ask-for-Anthropic now is to explain the contexts in which they have asked or might ask people to incur nondisparagement obligations, and if those are bad, release people and change policy accordingly. And even if nondisparagement obligations can be reasonable, I fail to imagine how non-disclosure obligations covering them could be reasonable, so I think Anthropic should at least do away with the no-disclosure-of-nondisparagement obligations.

Does anyone from Anthropic want to explicitly deny that they are under an agreement like this? 

(I know the post talks about some and not necessarily all employees, but am still interested). 

I left Anthropic in June 2023 and am not under any such agreement.

EDIT: nor was any such agreement or incentive offered to me.

I left [...] and am not under any such agreement.

Neither is Daniel Kokotajlo. Context and wording strongly suggest that what you mean is that you weren't ever offered paperwork with such an agreement and incentives to sign it, but there remains a slight ambiguity on this crucial detail.

Correct, I was not offered such paperwork nor any incentives to sign it. Edited my post to include this.

I am a current Anthropic employee, and I am not under any such agreement, nor has any such agreement ever been offered to me.

If asked to sign a self-concealing NDA or non-disparagement agreement, I would refuse.

Reply1454
7RobertM
Did you see Sam's comment?
6aysja
Agreed. I'd be especially interested to hear this from people who have left Anthropic.  

I agree that this kind of legal contract is bad, and Anthropic should do better. I think there are a number of aggrevating factors which made the OpenAI situation extrodinarily bad, and I'm not sure how much these might obtain regarding Anthropic (at least one comment from another departing employee about not being offered this kind of contract suggest the practice is less widespread).

-amount of money at stake
-taking money, equity or other things the employee believed they already owned if the employee doesn't sign the contract, vs. offering them something new (IANAL but in some cases, this could be a felony "grand theft wages" under California law if a threat to withhold wages for not signing a contract is actually carried out, what kinds of equity count as wages would be a complex legal question)
-is this offered to everyone, or only under circumstances where there's a reasonable justification?
-is this only offered when someone is fired or also when someone resigns?
-to what degree are the policies of offering contracts concealed from employees?
-if someone asks to obtain legal advice and/or negotiate before signing, does the company allow this?
-if this becomes public, does the comp... (read more)

This is true. I signed a concealed non-disparagement when I left Anthropic in mid 2022. I don't have clear evidence this happened to anyone else (but that's not strong evidence of absence). More details here

EDIT: I should also clarify that I personally don't think Anthropic acted that badly, and recommend reading about what actually happened before forming judgements. I do not think I am the person referred to in Habryka's comment.

In the case of OpenAI most of the debate was about ex-employees. Are we talking about current employees or ex-employees here?

I am including both in this reference class (i.e. when I say employee above, it refers to both present employees and employees who left at some point). I am intentionally being broad here to preserve more anonymity of my sources.

Not sure how to interpret the "agree" votes on this comment. If someone is able to share that they agree with the core claim because of object-level evidence, I am interested. (Rather than agreeing with the claim that this state of affairs is "quite sad".)

7Dagon
A LOT depends on the details of WHEN the employees make the agreement, and the specifics of duration and remedy, and the (much harder to know) the apparent willingness to enforce on edge cases.   "significant financial incentive to promise" is hugely different from "significant financial loss for choosing not to promise".  MANY companies have such things in their contracts, and they're a condition of employment.  And they're pretty rarely enforced.  That's a pretty significant incentive, but it's prior to investment, so it's nowhere near as bad.  
5Jacob Pfau
A pre-existing market on this question https://manifold.markets/causal_agency/does-anthropic-routinely-require-ex?r=SmFjb2JQZmF1
4Zach Stein-Perlman
What's your median-guess for the number of times Anthropic has done this?

(Not answering this question since I think it would leak too many bits on confidential stuff. In general I will be a bit hesitant to answer detailed questions on this, or I might take a long while to think about what to say before I answer, which I recognize is annoying, but I think is the right tradeoff in this situation)

3Zane
I'm kind of concerned about the ethics of someone signing a contract and then breaking it to anonymously report what's going on (if that's what your private source did). I think there's value from people being able to trust each others' promises about keeping secrets, and as much as I'm opposed to Anthropic's activities, I'd nevertheless like to preserve a norm of not breaking promises. Can you confirm or deny whether your private information comes from someone who was under a contract not to give you that private information? (I completely understand if the answer is no.)

(Not going to answer this question for confidentiality/glommarization reasons)

3Ben Pace
I think this is a reasonable question to ask. I will note that in this case, if your guess is right about what happened, the breaking of the agreement is something that it turned out the counterparty endorsed, or at least, after the counterparty became aware of the agreement, they immediately lifted it. I still think there's something to maintaining all agreements regardless of context, but I do genuinely think it matters here if you (accurately) expect the entity you've made the secret agreement with would likely retract it if they found out about it. (Disclaimer that I have no private info about this specific situation.)
[-]habryka15935

Reputation is lazily evaluated

When evaluating the reputation of your organization, community, or project, many people flock to surveys in which you ask randomly selected people what they think of your thing, or what their attitudes towards your organization, community or project are. 

If you do this, you will very reliably get back data that looks like people are indifferent to you and your projects, and your results will probably be dominated by extremely shallow things like "do the words in your name invoke positive or negative associations".

People largely only form opinions of you or your projects when they have some reason to do that, like trying to figure out whether to buy your product, or join your social movement, or vote for you in an election. You basically never care about what people think about you while engaging in activities completely unrelated to you, you care about what people will do when they have to take any action that is related to your goals. But the former is exactly what you are measuring in attitude surveys.

As an example of this (used here for illustrative purposes, and what caused me to form strong opinions on this, but not intended as the central po... (read more)

Reply1982
[-]Buck197

I don't like the fact that this essay is a mix of an insightful generic argument and a contentious specific empirical claim that I don't think you support strongly; it feels like the rhetorical strength of the former lends credence to the latter in a way that isn't very truth-tracking.

I'm not claiming you did anything wrong here, I just don't like something about this dynamic.

I do think the EA example is quite good on an illustrative level. It really strikes me as a rare case where we have an enormous pile of public empirical evidence (which is linked in the post) and it also seems by now really quite clear from a common-sense perspective. 

I don't think it makes sense to call this point "contentious". I think it's about as clear as these cases go. At least of the top of my head I can't think of an example that would have been clearer (maybe if you had some social movement that more fully collapsed and where you could do a retrospective root cause analysis, but it's extremely rare to have as clear of a natural experiment as the FTX one). I do think it's political in our local social environment, and so is harder to talk about, so I agree on that dimension a different example would be better.

I do think it would be good/nice to add an additional datapoint, but I also think this would risk being misleading. The point about reputation being lazily evaluated is mostly true from common-sense observations and logical reasoning, and the EA point is mostly trying to provide evidence for "yes, this is a real mistake that real people make". I think even if you... (read more)

7Buck
I am persuaded by neither the common sense or the empirical evidence for the point about EA. To be clear (as I've said to you privately) I'm not at all trying to imply that I specifically disagree with you, I'm just saying that the evidence you've provided doesn't persuade me of your claims.
4habryka
Yeah, makes sense. I don't think I am providing a full paper trail of evidence one can easily travel along, but I would take bets you would come to agree with it if you did spend the effort to look into it.
[-]Guive1419

This is good. Please consider making it a top level post. 

1metachirality
It ought to be a top-level post on the EA forum as well.
2habryka
(Someone is welcome to link post, but indeed I am somewhat hoping to avoid posting over there as much, as I find it reliably stressful in mostly unproductive ways) 

There's another important effect here: a laggy time course of public opinion. I saw more popular press articles about EA than I ever have, linking SBF to them, but with a large lag after the events. So the early surveys showing a small effect happened before public conversation really bounced around the idea that SBFs crimes were motivated by EA utilitarian logic. The first time many people would remember hearing about EA would be from those later articles and discussions.

The effect probably amplified considerably over time as that hypothesis bounced through public discourse.

The original point stands but this is making the effect look much larger in this case.

9Hauke Hillebrandt
This lag effect might amplify a lot more when big budget movies about SBF/FTX come out.
9Zach Stein-Perlman
Edit 2: after checking, I now believe the data strongly suggest FTX had a large negative effect on EA community metrics. (I still agree with Buck: "I don't like the fact that this essay is a mix of an insightful generic argument and a contentious specific empirical claim that I don't think you support strongly; it feels like the rhetorical strength of the former lends credence to the latter in a way that isn't very truth-tracking." And I disagree with habryka's claims that the effect of FTX is obvious.) ---------------------------------------- I want more evidence on your claim that FTX had a major effect on EA reputation. Or: why do you believe it? ---------------------------------------- Edit: relevant thing habryka said that I didn't quote above:

Practically all growth metrics are down (and have indeed turned negative on most measures), a substantial fraction of core contributors are distancing themselves from the EA affiliation, surveys among EA community builders report EA-affiliation as a major recurring obstacle[1], and many of the leaders who previously thought it wasn't a big deal now concede that it was/is a huge deal.

Also, informally, recruiting for things like EA Fund managers, or getting funding for EA Funds has become substantially harder. EA leadership positions appear to be filled by less competent people, and in most conversations I have with various people who have been around for a while, people seem to both express much less personal excitement or interest in identifying or championing anything EA-related, and report the same for most other people.

Related to the concepts in my essay, when measured the reputational differential also seem to reliably point towards people updating negatively towards EA as they learn more about EA (which shows up in the quotes you mentioned, and which more recently shows up in the latest Pulse survey, though I mostly consider that survey uninformative for roughly the reasons ou... (read more)

Hey! Sorry for the silence, I was feeling a bit stressed by this whole thread, and so I wanted to step away and think about this before responding. I’ve decided to revert the dashboard back to its original state & have republished the stale data. I did some quick/light data checks but prioritised getting this out fast. For transparency: I’ve also added stronger context warnings and I took down the form to access our raw data in sheet form but intend to add it back once we’ve fixed the data. It’s still on our stack to Actually Fix this at some point but we’re still figuring out the timing on that.

On reflection, I think I probably made the wrong call here (although I still feel a bit sad / misunderstood but 🤷🏻‍♀️). It was a unilateral + lightly held call I made in the middle of my work day — like truly I spent 5 min deciding this & maybe another ~15 updating the thing / leaving a comment. I think if I had a better model for what people wanted from the data, I would have made a different call. I’ve updated on “huh, people really care about not deleting data from the internet!” — although I get that the reaction here might be especially strong because it’s about CEA... (read more)

[musing] Actually another mistake here which I wish I just said in the first comment: I didn't have a strong enough TAP for, if someone says a negative thing about your org (or something that could be interpreted negatively), you should have a high bar for not taking away data (meaning more broadly than numbers) that they were using to form that perception, even if you think the data is wrong for reasons they're not tracking. You can like, try and clarify the misconception (ideally, given time & energy constraints etc.), and you can try harder to avoid putting wrong things out there, but don't just take it away -- it's not on to reader to treat you charitably and it kind of doesn't matter what your motives were.

I think I mostly agree with something like that / I do think people should hold orgs to high standards here. I didn't pay enough attention to this and regret it. Sorry! (I'm back to ignoring this thread lol but just felt like sharing a reflection 🤷🏻‍♀️)

4habryka
Thank you! I appreciate the quick oops here, and agree it was a mistake (but fixing it as quickly as you did I think basically made up for all the costs, and I greatly appreciate it). Just to clarify, I don't want to make a strong statement that it's worth updating the data and maintaining the dashboard. By my lights it would be good enough to just have a static snapshot of it forever. The thing that seemed so costly to me was breaking old links and getting rid of data that you did think was correct.  Thanks again!
7the gears to ascension
I suspect fixing this would need to involve creating something new which doesn't have the structural problems in EA which produced this, and would involve talking to people who are non-sensationalist EA detractors but who are involved with similarly motivated projects. I'd start here and skip past the ones that are arguing "EA good" to find the ones that are "EA bad, because [list of reasons ea principles are good, and implication that EA is bad because it fails at its stated principles]" I suspect, even without seeking that out, the spirit of EA that made it ever partly good has already and will further metastasize into genpop.
5angelinahli
Hi! A quick note: I created the CEA Dashboard which is the 2nd link you reference. The data here hadn’t been updated since August 2024, and so was quite out of date at the time of your comment. I've now taken this dashboard down, since I think it's overall more confusing than helpful for grokking the state of CEA's work. We still intend to come back and update it within a few months. Just to be clear on why / what’s going on: * I stopped updating the dashboard in August because I started getting busy with some other projects, and my manager & I decided to deprioritize this. (There are some manual steps needed to keep the data live). * I’ve now seen several people refer to that dashboard as a reference for how CEA is doing in ways I think are pretty misleading. * We (CEA) still intend to come back and fix this, and this is a good nudge to prioritize it. Thanks!

Oh, huh, that seems very sad. Why would you do that? Please leave up the data that we have. I think it's generally bad form to break links that people relied on. The data was accurate as far as I can tell until August 2024, and you linked to it yourself a bunch over the years, don't just break all of those links.

I am pretty up-to-date with other EA metrics and I don't really see how this would be misleading. You had a disclaimer at the top that I think gave all the relevant context. Let people make their own inferences, or add more context, but please don't just take things down.

Unfortunately, archive.org doesn't seem to have worked for that URL, so we can't even rely on that to show the relevant data trends.

Edit: I'll be honest, after thinking about it for longer, the only reason I can think of why you would take down the data is because it makes CEA and EA look less on an upwards trajectory. But this seems so crazy. How can I trust data coming out of CEA if you have a policy of retracting data that doesn't align with the story you want to tell about CEA and EA? The whole point of sharing raw data is to allow other people to come to their own conclusions. This really seems like such a dumb move from a trust perspective.

I also believe that the data making EA+CEA looks bad is the causal reason why it was taken down. However, I want to add some slight nuance.

I want to contrast a model whereby Angelina Li did this while explicitly trying to stop CEA from looking bad, versus a model whereby she senses that something bad might be happening, she might be held responsible (e.g. within her organization / community), and is executing a move that she's learned is 'responsible' from the culture around her.

I think many people have learned to believe the reasoning step "If people believe bad things about my team I think are mistaken with the information I've given them, then I am responsible for not misinforming people, so I should take the information away, because it is irresponsible to cause people to have false beliefs". I think many well-intentioned people will say something like this, and that this is probably because of two reasons (borrowing from The Gervais Principle):

  1. This is a useful argument for powerful sociopaths to use when they are trying to suppress negative information about themselves.
  2. The clueless people below them in the hierarchy need to rationalize why they are following the orders of the
... (read more)

I agree, and I am a bit disturbed that it needs to be said.

At normal, non-EA organizations -- and not only particularly villainous ones, either! -- it is understood that you need to avoid sharing any information that reflects poorly on the organization, unless it's required by law or contract or something. The purpose of public-facing communications is to burnish the org's reputation. This is so obvious that they do not actually spell it out to employees.

Of COURSE any organization that has recently taken down unflattering information is doing it to maintain its reputation. 

I'm sorry, but this is how "our people" get taken for a ride. Be more cynical, including about people you like.

I think many people have learned to believe the reasoning step "If people believe bad things about my team I think are mistaken with the information I've given them, then I am responsible for not misinforming people, so I should take the information away, because it is irresponsible to cause people to have false beliefs". I think many well-intentioned people will say something like this, and that this is probably because of two reasons (borrowing from The Gervais Principle):

(Comment not specific to the particulars of this issue but noted as a general policy:) I think that as a general rule, if you are hypothesizing reasons for why somebody might say a thing, you should always also include the hypothesis that "people say a thing because they actually believe in it". This is especially so if you are hypothesizing bad reasons for why people might say it. 

It's very annoying when someone hypothesizes various psychological reasons for your behavior and beliefs but never even considers as a possibility the idea that maybe you might have good reasons to believe in it. Compare e.g. "rationalists seem to believe that superintelligence is imminent; I think this is probably because that l... (read more)

4Ben Pace
I feel more responsibility to be the person holding/tracking the earnest hypothesis in a 1-1 context, or if I am the only one speaking; in larger group contexts I tend to mostly ask "Is there a hypothesis here that isn't or likely won't be tracked unless I speak up" and then I mostly focus on adding hypotheses to track (or adding evidence that nobody else is adding).
2habryka
(Did Ben indicate he didn’t consider it? My guess is he considered it, but thinks it’s not that likely and doesn’t have amazingly interesting things to say on it. I think having a norm of explicitly saying “I considered whether you were saying the truth but I don’t believe it” seems like an OK norm, but not obviously a great one. In this case Ben also responded to a comment of mine which already said this, and so I really don’t see a reason for repeating it.)
2Kaj_Sotala
(I read as implying that the list of reasons is considered to exhaustive, such that any reasons besides those two have negligible probability.)
2Ben Pace
I gave my strongest hypothesis for why it looks to me that many many people believe it's responsible to take down information that makes your org look bad. I don't think alternative stories have negligible probability, nor does what I wrote imply that, though it is logically consistent with that. There are many anti-informative behaviors that are widespread for which people do for poor reasons, like saying that their spouse is the best spouse in the world, or telling customers that their business is the best business in the industry, or saying exclusively glowing things about people in reference letters, that are best explained by the incentives on the person to present themselves in the best light; at the same time, it is respectful to a person, while in dialogue with them, to keep a track of the version of them who is trying their best to have true beliefs and honestly inform others around them, in order to help them become that person (and notice the delta between their current behavior and what they hopefully aspire to).  Seeing orgs in the self-identified-EA space take down information that makes them look bad is (to me) not that dissimilar to the other things I listed. I think it's good to discuss norms about how appropriate it is to bring up cynical hypotheses about someone during a discussion in which they're present. In this case I think raising this hypothesis was worthwhile it for the discussion, and I didn't cut off any way for the person in question to continue to show themselves to be broadly acting in good faith, so I think it went fine. Li replied to Habryka, and left a thoughtful pair of comments retracting and apologizing, which reflected well on them in my eyes.
2Kaj_Sotala
Okay! Good clarification. To clarify, my comment wasn't specific to the case where the person is present. There are obvious reasons why the consideration should get extra weight when the person is present, but there's also a reason to give it extra weight if none of the people discussed are present - namely that they won't be able to correct any incorrect claims if they're not around. Agree. (As I mentioned in the original comment, the point I made was not specific to the details of this case, but noted as a general policy. But yes, in this specific case it went fine.)
6angelinahli
Quick thoughts on this: * “The data was accurate as far as I can tell until August 2024” * I’ve heard a few reports over the last few weeks that made me unsure whether the pre-Aug data was actually correct. I haven’t had time to dig into this. * In one case (e.g. with the EA.org data) we have a known problem with the historical data that I haven’t had time to fix, that probably means the reported downward trend in views is misleading. Again I haven’t had time to scope the magnitude of this etc. * I’m going to check internally to see if we can just get this back up in a week or two (It was already high on our stack, so this just nudges up timelines a bit). I will update this thread once I have a plan to share. I’m probably going to drop responding to “was this a bad call” and prioritize “just get the dashboard back up soon”.
4angelinahli
More thoughts here, but TL;DR I’ve decided to revert the dashboard back to its original state & have republished the stale data. (Just flagging for readers who wanted to dig into the metrics.)
3angelinahli
Hey! I just saw your edited text and wanted to jot down a response: I'm sorry this feels bad to you. I care about being truth seeking and care about the empirical question of "what's happening with EA growth?". Part of my motivation in getting this dashboard published in the first place was to contribute to the epistemic commons on this question. I also disagree that CEA retracts data that doesn't align with "the right story on growth”. E.g. here's a post I wrote in mid 2023 where the bottom line conclusion was that growth in meta EA projects was down in 2023 v 2022. It also publishes data on several cases where CEA programs grew slower in 2023 or shrank. TBH I also think of this as CEA contributing to the epistemic commons here — it took us a long time to coordinate and then get permission from people to publish this. And I’m glad we did it! On the specific call here, I'm not really sure what else to tell you re: my motivations other than what I've already said. I'm going to commit to not responding further to protect my attention, but I thought I'd respond at least once :)
6habryka
I would currently be quite surprised if you had taken the same action if I was instead making an inference that positively reflects on CEA or EA. I might of course be wrong, but you did do it right after I wrote something critical of EA and CEA, and did not do it the many other times it was linked in the past year. Sadly your institution has a long history of being pretty shady with data and public comms this way, and so my priors are not very positively inclined. I continue to think that it would make sense to at least leave the data up that CEA did feel comfortable linking in the last 1.5 years. By my norms invalidating links like this, especially if the underlying page happens to be unscrapeable by the internet archive, is really very bad form. I did really appreciate your mid 2023 post!
1yanni kyriacos
I spent 8 years working in strategy departments for Ad Agencies. If you're interested in the science behind brand tracking, I recommend you check out the Ehrenberg-Bass Institutes work on Category Entry Points: https://marketingscience.info/research-services/identifying-and-prioritising-category-entry-points/

Thoughts on integrity and accountability

[Epistemic Status: Early draft version of a post I hope to publish eventually. Strongly interested in feedback and critiques, since I feel quite fuzzy about a lot of this]

When I started studying rationality and philosophy, I had the perspective that people who were in positions of power and influence should primarily focus on how to make good decisions in general and that we should generally give power to people who have demonstrated a good track record of general rationality. I also thought of power as this mostly unconstrained resource, similar to having money in your bank account, and that we should make sure to primarily allocate power to the people who are good at thinking and making decisions.

That picture has changed a lot over the years. While I think there is still a lot of value in the idea of "philosopher kings", I've made a variety of updates that significantly changed my relationship to allocating power in this way:

  • I have come to believe that people's ability to come to correct opinions about important questions is in large part a result of whether their social and monetary incentives reward them when they ha
... (read more)

Just wanted to say I like this a lot and think it'd be fine as a full fledged post. :)

6Zvi
More than fine. Please do post a version on its own. A lot of strong insights here, and where I disagree there's good stuff to chew on. I'd be tempted to respond with a post. I do think this has a different view of integrity than I have, but in writing it out, I notice that the word is overloaded and that I don't have as good a grasp of its details as I'd like. I'm hesitant to throw out a rival definition until I have a better grasp here, but I think the thing you're in accordance with is not beliefs so much as principles?
1Eli Tyre
Seconded.
2Kaj_Sotala
Thirded.
1Saul Munn
fourthed. oli, do you intend to post this? if not, could i post this text as a linkpost to this shortform?
2habryka
It's long been posted! Integrity and accountability are core parts of rationality 
1Saul Munn
ah, lovely! maybe add that link as an edit to the top-level shortform comment?

This was a great post that might have changed my worldview some.

Some highlights:

1.

People's rationality is much more defined by their ability to maneuver themselves into environments in which their external incentives align with their goals, than by their ability to have correct opinions while being subject to incentives they don't endorse. This is a tractable intervention and so the best people will be able to have vastly more accurate beliefs than the average person, but it means that "having accurate beliefs in one domain" doesn't straightforwardly generalize to "will have accurate beliefs in other domains".

I've heard people say things like this in the past, but haven't really taken it seriously as an important component of my rationality practice. Somehow what you say here is compelling to me (maybe because I recently noticed a major place where my thinking was majorly constrained by my social ties and social standing) and it prodded me to think about how to build "mech suits" that not only increase my power but incentives my rationality. I now have a todo item to "think about principles for incentivizing true belief... (read more)

3mako yass
I think you might be confusing two things together under "integrity". Having more confidence in your own beliefs than the shared/imposed beliefs of your community isn't really a virtue or.. it's more just a condition that a person can be in, whether it's virtuous is completely contextual. Sometimes it is, sometimes it isn't. I can think of lots of people who should have more confidence other peoples' beliefs than they have in their own. In many domains, that's me. I should listen more. I should act less boldly. An opposite of that sense of integrity is the virtue of respect- recognising other peoples' qualities- it's a skill. If you don't have it, you can't make use of other peoples' expertise very well. A superfluence of respect is a person who is easily moved by others' feedback, usually, a person who is patient with their surroundings. On the other hand I can completely understand the value of {having a known track record of staying true to self-expression, claims made about the self}. Humility is actually a part of that. The usefulness of deliniating that into a virtue separate from the more general Honesty is clear to me.
3Pattern
There's a lot of focus on personally updating based on evidence. Groups aren't addressed as much. What does it mean for a group to have a belief? To have honesty or integrity?
1ioannes
See Sinclair: "It is difficult to get a man to understand something, when his salary depends upon his not understanding it!"

Welp, I guess my life is comic sans today. The EA Forum snuck some code into our deployment bundle for my account in-particular, lol: https://github.com/ForumMagnum/ForumMagnum/pull/9042/commits/ad99a147824584ea64b5a1d0f01e3f2aa728f83a

Screenshot for posterity.

[-]jp115

🙇‍♂️

2habryka
😡
5habryka
And finally, I am freed from this curse.
2winstonBosan
I hope the partial unveiling of a your user_id hash will not doom us all, somehow. 
2habryka
You can just get people's userIds via the API, so it's nothing private. 
[-]habryka5622

A thing that I've been thinking about for a while has been to somehow make LessWrong into something that could give rise to more personal-wikis and wiki-like content. Gwern's writing has a very different structure and quality to it than the posts on LW, with the key components being that they get updated regularly and serve as more stable references for some concept, as opposed to a post which is usually anchored in a specific point in time. 

We have a pretty good wiki system for our tags, but never really allowed people to just make their personal wiki pages, mostly because there isn't really any place to find them. We could list the wiki pages you created on your profile, but that doesn't really seem like it would allocate attention to them successfully.

I was thinking about this more recently as Arbital is going through another round of slowly rotting away (its search currently being broken and this being very hard to fix due to annoying Google Apps Engine restrictions) and thinking about importing all the Arbital content into LessWrong. That might be a natural time to do a final push to enable people to write more wiki-like content on the site.

Reply1083
[-]gwern479

somehow make LessWrong into something that could give rise to more personal-wikis and wiki-like content. Gwern's writing has a very different structure and quality to it than the posts on LW...We have a pretty good wiki system for our tags, but never really allowed people to just make their personal wiki pages, mostly because there isn't really any place to find them. We could list the wiki pages you created on your profile, but that doesn't really seem like it would allocate attention

Multi-version wikis are a hard design problem.

It's something that people kept trying, when they soured on a regular Wikipedia: "the need for consensus makes it impossible for minority views to get a fair hearing! I'll go make my own Wikipedia where everyone can have their own version of an entry, so people can see every side! with blackjack & hookers & booze!" And then it becomes a ghost town, just like every other attempt to replace Wikipedia. (And that's if you're lucky: if you're unlucky you turn into Conservapedia or Rational Wiki.) I'm not aware of any cases of 'non-consensus' wikis that really succeed - it seems that usually, there's so little editor activity to go around that having ... (read more)

7habryka
So, the key difficulty this feels to me like its eliding is the ontology problem. One thing that feels cool about personal wikis is that people come up with their own factorization and ontology for the things they are thinking about. Like, we probably won't have a consensus article on the exact ways L in Death Note made mistakes, but Gwern.net would be sadder without that kind of content. So I think in addition to the above there needs to be a way for users to easily and without friction add a personal article for some concept they care about, and to have a consistent link to it, in a way that doesn't destroy any of the benefits of the collaborative editing.  My sense is that collaboratively edited wikis tend to thrive heavily around places where there is a clear ontology and decay when the ontology is unclear or the domain permits many orthogonal carvings. This makes video game wikis so common and usually successful, as via the nature of their programming they will almost always have a clear structure to them (the developer probably coded an abstraction for "enemies" and "attack patterns" and "levels" and so the wiki can easily mirror them and document them). It feels to me that anything that wants to somehow build a unification of personal wikis and consensus wikis needs to figure out how to gracefully handle the ontology problem.
[-]gwern123

One thing that feels cool about personal wikis is that people come up with their own factorization and ontology for the things they are thinking about...So I think in addition to the above there needs to be a way for users to easily and without friction add a personal article for some concept they care about, and to have a consistent link to it, in a way that doesn't destroy any of the benefits of the collaborative editing.

My proposal already provides a way to easily add a personal article with a consistent link, while preserving the ability to do collaborative editing on 'public' articles. Strictly speaking, it's fine for people to add wiki entries for their own factorization and ontology.

There is no requirement for those to all be 'official': there doesn't have to be a 'consensus' entry. Nothing about a /wiki/Acausal_cooperation/gwern user entry requires the /wiki/Acausal_cooperation consensus entry to exist. (Computers are flexible like that.) That just means there's nothing there at that exact URL, or probably better, it falls back to displaying all sub-pages of user entries like usual. (User entries presumably get some sort of visual styling, in the same way that comments o... (read more)

2Chris_Leong
1. Users can just create pages corresponding to their own categories 2. Like Notion we could allow two-way links between pages so users would just tag the category in their own custom inclusions.
2Chris_Leong
I agree with Gwern. I think it's fairly rare that someone wants to write the whole entry themselves or articles for all concepts in a topic. It's much more likely that someone just wants to add their own idiosyncratic takes on a topic. For example, I'd love to have a convenient way to write up my own idiosyncratic takes on decision theory. I tried including some of these in the main Wiki, but it (understandably) was reverted. I expect that one of the main advantages of this style of content would be that you can just write a note without having to bother with an introduction or conclusion. I also think it would be fairly important (though not at the start) to have a way of upweighting the notes added by particular users. I agree with Gwern that this may result in more content being added to the main wiki pages when other users are in favour of this.
5Seth Herd
TLDR: The only thing I'd add to Gwern's proposal is making sure there are good mechanisms to discuss changes. Improving the wiki and focusing on it could really improve alignment research overall. Using the LW wiki more as a medium for collaborative research could be really useful in bringing new alignment thinkers up to speed rapidly. I think this is an important part of the overall project; alignment is seeing a burst of interest, and being able to rapidly make use of bright new minds who want to donate their time to the project might very well make the difference in adequately solving alignment in time. As it stands, someone new to the field has to hunt for good articles on any topic, and they provide some links to other important articles, but that's not really their job. The wiki's tags does serve that purpose. The articles are sometimes a good overview of that concept or topic, but more community focus on the wiki could make them work much better as a way Ideally each article aims to be a summary of current thinking on that topic, including both majority and minority views. One key element is making this project build community rather than strain it. Having people with different views work well collaboratively is a bit tricky. Good mechanisms for discussion are one way to reduce friction and any trend toward harsh feelings when ones' contributions are changed. The existing comment system might be adequate, particularly with more of a norm of linking changes to comments, and linking to comments from the main text for commentary.
5Dagon
Do you have an underlying mission statement or goal that can guide decisions like this?  IMO, there are plenty of things that should probably continue to live elsewhere, with some amount of linking and overlap when they're lesswrong-appropriate.   One big question in my mind is "should LessWrong use a different karma/voting system for such content?".  If the answer is yes, I'd put a pretty high bar for diluting LessWrong with it, and it would take a lot of thought to figure out the right way to grade "wanted on LW" for wiki-like articles that aren't collections/pointers to posts.  
3niplav
One small idea: Have the ability to re-publish posts to allPosts or the front page after editing. This worked in the past, but now doesn't anymore (as I noticed recently when updating this post).
5habryka
Yeah, the EA Forum team removed that functionality (because people kept triggering it accidentally). I think that was a mild mistake, so I might revert it for LW.
2Chris_Leong
Cool idea, but before doing this one obvious inclusion would be to make it easier to tag LW articles, particularly your own articles, in posts by @including them.
[-]habryka5324

Does anyone have any takes on the two Boeing whistleblowers who died under somewhat suspicious circumstances? I haven't followed this in detail, and my guess is it is basically just random chance, but it sure would be a huge deal if a publicly traded company now was performing assassinations of U.S. citizens. 

Curious whether anyone has looked into this, or has thought much about baseline risk of assassinations or other forms of violence from economic actors.

7habryka
@jefftk comments on the HN thread on this:  Another HN commenter says (in a different thread): 

I'm probably missing something simple, but what is 356? I was expecting a probability or a percent, but that number is neither.

I think 356 or more people in the population needed to make there be a >5% of 2+ deaths in a 2 month span from that population

[-]isabel145

I think there should be some sort of adjustment for Boeing not being exceptionally sus before the first whistleblower death - shouldn't privilege Boeing until after the first death, should be thinking across all industries big enough that the news would report on the deaths of whistleblowers. which I think makes it not significant again. 

2aphyer
Shouldn't that be counting the number squared rather than the number?
2Seth Herd
Ummm, wasn't one of them just about to testify against Boeing in court, on their safety practices? And they "committed suicide" after saying the day before how much they were looking forward to finally getting a hearing on their side of the story? That's what I read; I stopped at that point, thinking "about zero chance that wasn't murder".
3habryka
I think the priors here are very low, so while I agree it looks suspicious, I don't think it's remotely suspicious enough to have the correct posterior be "about zero chance that wasn't murder". Corporations, at least in the U.S. really very rarely murder people.
3Seth Herd
That's true, but the timing and incongruity of a "suicide" the day before testifying seems even more absurdly unlikely than corporations starting to murder people. And it's not like they're going out and doing it themselves; they'd be hiring a hitman of some sort. I don't know how any of that works, and I agree that it's hard to imagine anyone invested enough in their job or their stock options to risk a murder charge; but they may feel that their chances of avoiding charges are near 100%, so it might make sense to them. I just have absolutely no other way to explain the story I read (sorry I didn't get the link since this has nothing to do with AI safety) other than that story being mostly fabricated. People don't say "finally tomorrow is my day" in the evening and then put a gun in their mouth the next morning without being forced to do it. Ever. No matter how suicidal, you're sticking around one day to tell your story and get your revenge. The odds are so much lower than somebody thinking they could hire a hit and get away with it, and make a massive profit on their stock options. They could well also have a personal vendetta against the whistleblower as well as the monetary profit. People are motivated by money and revenge, and they're prone to misestimating the odds of getting caught. They could even be right that in their case it's near zero. So I'm personally putting it at maybe 90% chance of murder.
4ChristianKl
Poisoning someone with MRSA infection seems possible but if that's what happened it's capabilities that are not easily available. If such a thing would happen in another case, people would likely speak about nation-state capabilities. 
2Nathan Young
I find this a very suspect detail, though the base rate of cospiracies is very low. https://abcnews4.com/news/local/if-anything-happens-its-not-suicide-boeing-whistleblowers-prediction-before-death-south-carolina-abc-news-4-2024

I have updated the OpenAI Email Archives to now also include all emails that OpenAI has published in their March 2024 and December 2024 blogposts!

I continue to think reading through these is quite valuable, and even more interesting with the March 2024 and December 2024 emails included.

9Kei
I think you flipped the names from the iMessage conversation. As per the caption in the OpenAI blog post, the blue bubbles are for Altman and the grey bubbles are for Zilis.
5habryka
You are correct. Seems like I got confused. Obvious in retrospect. Thank you for catching the error!

AND THE GAME IS CLEAR. WRONGANITY SHALL SURVIVE ANOTHER DAY. GLORY TO EAST WRONG. GLORY TO WEST WRONG. GLORY TO ALL.

Btw less.online is happening. LW post and frontpage banner probably going up Sunday or early next week. 

Thoughts on voting as approve/disapprove and agree/disagree:

One of the things that I am most uncomfortable with in the current LessWrong voting system is how often I feel conflicted between upvoting something because I want to encourage the author to write more comments like it, and downvoting something because I think the argument that the author makes is importantly flawed and I don't want other readers to walk away with a misunderstanding about the world.

I think this effect quite strongly limits certain forms of intellectual diversity on LessWrong, because many people will only upvote your comment if they agree with it, and downvote comments they disagree with, and this means that arguments supporting people's existing conclusions have a strong advantage in the current karma system. Whereas the most valuable comments are likely ones that challenge existing beliefs and that are rigorously arguing for unpopular positions.

A feature that has been suggested many times over the years is to split voting into two dimensions. One dimension being "agree/disagree" and the other being "approve/disapprove". Only the "approve/disapprove" dimension m... (read more)

Having a reaction for "changed my view" would be very nice.

Features like custom reactions gives me this feeling that.. language will emerge from allowing people to create reactions that will be hard to anticipate but, in retrospect, crucial. Playing a similar role that body language plays during conversation, but designed, defined, explicit.

If someone did want to introduce the delta through this system, it might be necessary to give the coiner of a reaction some way of linking an extended description. In casual exchanges.. I've found myself reaching for an expression that means "shifted my views in some significant lasting way" that's kind of hard to explain in precise terms, and probably impossible to reduce to one or two words, but it feels like a crucial thing to measure. In my description, I would explain that a lot of dialogue has no lasting impact on its participants, it is just two people trying to better understand where they already are. When something really impactful is said, I think we need to establish a habit of noticing and recognising that.

But I don't know. Maybe that's not the reaction type that what will justify the feature. Maybe it will be something we can't think of now.

Generally, it seems useful to be able to take reduced measurements of the mental states of the readers.

the language that will emerge from allowing people to create reactions that will be hard to anticipate but, in retrospect, crucial

This is essentially the concept of a folksonomy, and I agree that it is potentially both applicable here and quite important.

5Rob Bensinger
I like the reactions UI above, partly because separating it from karma makes it clearer that it's not changing how comments get sorted, and partly because I do want 'agree'/'disagree' to be non-anonymous by default (unlike normal karma). I agree that the order of reacts should always be the same. I also think every comment/post should display all the reacts (even just to say '0 Agree, 0 Disagree...') to keep things uniform. That means I think there should only be a few permitted reacts -- maybe start with just 'Agree' and 'Disagree', then wait 6+ months and see if users are especially clambering for something extra. I think the obvious other reacts I'd want to use sometimes are 'agree and downvote' + 'disagree and upvote' (maybe shorten to Agree+Down, Disagree+Up), since otherwise someone might not realize that one and the same person is doing both, which loses a fair amount of this thing I want to be fluidly able to signal. (I don't think there's much value to clearly signaling that the same person agreed and upvoted or disagree and downvoted a thing.) I would also sometimes click both the 'agree' and 'disagree' buttons, which I think is fine to allow under this UI. :)
2Said Achmiz
Why not Slashdot-style?
5habryka
Slashdot has tags, but each tag still comes with a vote. In the above, the goal would be explicitly to allow for the combination of "upvoted though I still disagree" which I don't think would work straightforwardly with the slashdot system. I also find it it quite hard to skim for anything on Slashdot, including the tags (and the vast majority of users at any given time can't add reactions on slashdot at any given time, so there isn't much UI for it).

After many years of pain, LessWrong now has fixed kerning and a consistent sans-serif font on all operating systems. You have probably seen terrible kerning like this over the last few years on LW: 

It really really looks like there is no space between the first comma and "Ash". This is because Apple has been shipping an extremely outdated version of Gill Sans with terribly broken kerning, often basically stripping spaces completely. We have gotten many complaints about this over the years.

But it is now finally fixed. However, changing fonts likely has many downstream effects on various layout things being broken in small ways. If you see any buttons or text misaligned, let us know, and we'll fix it. We already cleaned up a lot, but I am expecting a long tail of small fixes.

Reply522111

I don't know what specific change is responsible, but ever since that change, for me the comments are now genuinely uncomfortable to read.

[-]cubefox1315

Did the font size in comments change? It does seem quite small now...

Yeah it feels uncomfortably small to read to me now

8Viliam
Something felt uncomfortable today, but I can't put my finger on it. Just a general feeling as if the letters are less sharp or less clearly separated or something like that.
3habryka
Guys, for this specific case you really have to say what OS you are using. Otherwise you might be totally talking past each other. (Font-size didn't change on any OS, but the font itself changed from Calibri to Gill Sans on Windows. Gill Sans has a slightly smaller x-height so probably looks a bit smaller.)
8Kaj_Sotala
On Windows the font feels actively unpleasant right away, on Android it's not quite as bad but feels like I might develop eyestrain if I read comments for a longer time.
6MondSemmel
Up to a few days ago, the comments looked good on desktop Firefox, Windows 11, zoom level 150%. Now I find them uncomfortable to look at.
2habryka
Plausible we might want to revert to Calibri on Windows, but I would like to make Gill Sans work. Having different font metrics on different devices makes a lot of detailed layout work much more annoying. Curious if you can say more about the nature of discomfort. Also curious whether fellow font optimizer @Said Achmiz has any takes, since he has been helpful here in the past, especially on the "making things render well on Windows" side.

Well, let’s see. Calibri is a humanist sans; Gill Sans is technically also humanist, but more more geometric in design. Geometric sans fonts tend to be less readable when used for body text.

Gill Sans has a lower x-height than Calibri. That (obviously) is the cause of all the “the new font looks smaller” comments.

(A side-by-side comparison of the fonts, for anyone curious, although note that this is Gill Sans MT Pro, not Gill Sans Nova, so the weight [i.e., stroke thickness] will be a bit different than the version that LW now uses.)

Now, as far as font rendering goes… I just looked at the site on my Windows box (adjusting the font stack CSS value to see Gill Sans Nova again, since I see you guys tweaked it to give Calibri priority)… yikes. Yeah, that’s not rendering well at all. Definitely more blurry than Calibri. Maybe something to do with the hinting, I don’t know. (Not really surprising, since Calibri was designed from the beginning to look good on Windows.) And I’ve got a hi-DPI monitor on my Windows machine…

Interestingly, the older version of Gill Sans (seen in the demo on my wiki, linked above) doesn’t have this problem; it renders crisply on Windows. (Note that this is not t... (read more)

3kave
One sad thing about older versions of Gill Sans: Il1 all look the same. Nova at least distinguishes the 1. IMO, we should probably move towards system fonts, though I would like to choose something that preserves character a little more.
2habryka
Interesting, thanks! Checking an older version of Gill Sans probably wouldn't have been something would have thought to do, so your help is greatly appreciated.  I'll experiment some with getting Gill Sans MT Pro.
7MondSemmel
Comparing with this Internet Archive snapshot from Oct 6, both at 150% zoom, both in desktop Firefox in Windows 11: Comparison screenshot, annotated * The new font seems... thicker, somehow? There's a kind of eye test you do at the optician where they ask you if the letters seem sharper or just thicker (or something), and this font reminds me of that. Like something is wrong with the prescription of my glasses. * The new font also feels noticeably smaller in some way. Maybe it's the letter height? I lack the vocabulary to properly describe this. At the very least, the question mark looks noticeably weird. And e.g. in "t" and "p", the upper and lower parts of the respective letter are weirdly tiny. * Incidentally there were also some other differences in the shape and alignment of UI elements (see the annotated screenshot).
4MondSemmel
Oh, and the hover tooltip for the agreement votes is now bugged; IIRC hovering over the agreement vote number is supposed to give you some extra info just like with karma, but now it just explains what agreement votes are.
6cubefox
I tested it on Android, it's the same for both Firefox and Chrome. The font looks significantly smaller than the old font, likely due to the smaller x-height you mentioned. Could the font size of the comments be increased a bit so that it appears visually about as large as the old one? Currently I find it too small to read comfortably. (Subjective font size is often different from the standard font size measure. E.g. Verdana appears a lot larger than Arial at the same standard "size".) (A general note: some people are short sighted and wear glasses, and the more short-sighted you are, the stronger the glasses contract your field of view to a smaller area. So things that may appear as an acceptable size for people who aren't particularly short-sighted, may appear too small for more short-sighted people.)
6Nathan Helm-Burger
Yeah, using Firefox on both Android and Windows. Font looks terrible on the comments. Too small, and the the letters are too smushed together. I was going to just change it on the client-side, but then noticed other people complaining. Couldn't you please just set the comment font to the same as the post font? I would vastly prefer to have it all the same.
6habryka
You definitely would not want the comment font be the same as the post font. Legibility would be really terrible for that serif font at the small font-size that you want to display comments as. I am confident it would be much worse for the vast majority of users (feel free to try it yourself). You could change both post font and comment font to a sans-serif, but that would get rid of a lot of the character of the site (and I prefer legibility of serif fonts at larger font sizes).

would not want the comment font be the same as the post font [...] the small font-size that you want to display comments as

I had to increase the zoom level by about 20% (from 110% to 130%) after this change to make the comments readable[1]. This made post text too big to the point where I would normally adjust zoom level downward, but I can't in this case[2], since the comments are on the same site as the posts. Also the lines in both posts and comments are now too long (with greater zoom).

I sit closer to the monitor than standard to avoid need for glasses[3], so long lines have higher angular distance. In practice modern sites usually have a sufficiently narrow column of text in the middle so this is almost never a problem. Before the update, LW line lengths were OK (at 110% zoom). At monitor/window width 1920px, substack's 728px seems fine (at default zoom), but LW's 682px get balooned too wide with 130% zoom.

The point is not that accomodating sitting closer to the monitor is an important use case for a site's designer, but that somehow the convergent design of most of the web manages to pass this test, so there might be more reasons for that.

Incidentally, the footnote font si... (read more)

Small font-size? No! Same font-size! I don't want the comments in a smaller font OR a different font! I want it all the same font as the posts, including the same size.

This looks good to me:

This looks terrible to me:

6Ben Pace
Personally I like the different headspace I'm in for writing posts and comments that the styling gives. One is denser and smaller and less high-stakes, the other is bigger and more presentational, more like a monologue for a large audience.
2habryka
You want higher content density for comments than for posts, so you need a smaller font size. You could sacrifice content density, but it would really make skimming comment threads a lot worse.
6jbash
You may want higher density, but I don't think you can say that I want high density at the expense of legibility. It takes a lot to make me notice layout, and I rarely notice fonts at all... unless they're too small. I'm not as young as I used to be. This change made me think I must have zoomed the browser two sizes smaller. The size contrast is so massive that I have to actually zoom the page to read comfortably when I get to the comment section. It's noticeably annoying, to the point of breaking concentration. I've mostly switched to RSS for Less Wrong[1]. I don't see your fonts at all any more, unless I click through on an article. The usual reason I click through is to read the comments (occasionally to check out the quick takes and popular comments that don't show up on RSS). So the comments being inaccessible is doubly bad. My browser is Firefox on Fedora Linux, and I use a 40 inch 4K monitor (most of whose real estate is wasted by almost every Web site). I usually install most of the available font packages, and it says it's rendering this text in "Gill Sans Nova Medium". ---------------------------------------- 1. My big reason for going to RSS was to mitigate the content prioritization system. I want to skim every headline, or at least every headline over some minimum threshold of "good". On the other hand, I don't want to have to look at any old headlines twice to see the new ones. I'm really minimally interested in either the software's or the other users' opinions of which material I should want to see. RSS makes it easier to get a simple chronological view; the built-in chronological view is weird and hard to navigate to. I really feel like I'm having to fight the site to see what I want to see. ↩︎
2Nathan Helm-Burger
Just want to chime in with agreement about annoyance over the prioritization of post headlines. One thing in particular that annoys me is that I haven't figured out how to toggle off 'seen' posts showing up. What if I just want to see unread ones? Also, why can't I load more at once instead of always having to click 'load more'?
2habryka
The "Recommended" tab filters out read posts by default. We never had much demand for showing recently-sorted posts while filtering out only ones you've read, but it wouldn't be very hard to build.  Not sure what you mean by "load more at once". We could add a whole user setting to allow users to change the number of posts on the frontpage, but done consistently that would produce a ginormous number of user settings for everything, which would be a pain to maintain (not like, overwhelmingly so, but I would be surprised if it was worth the cost).
2Nathan Helm-Burger
That doesn't make sense to me, but then, I'm clearly not the target audience since 'skimming comment threads' isn't a thing I ever want to do. I want to read them, carefully and thoughtfully, like I do posts.    This is, I think, related to how I feel that voting (karma or agreement) should be available only at the bottom of posts and comments, so that people are encouraged to actually read the post/comment before voting. Maybe even placed behind a reading comprehension quiz.
3Sodium
I think knowing the karma and agreement is useful, especially to help me decide how much attention to pay to a piece of content, and I don't think there's that much distortion from knowing what others think. (i.e., overall benefits>costs)
2Nathan Helm-Burger
I'm not saying you shouldn't be able to see the karma and agreement at the top, just that you should only be able to contribute your own opinion at the bottom, after reading and judging for yourself.
4Said Achmiz
This… seems straightforwardly false? Every one of GreaterWrong’s eight themes uses a single font for both posts and comments, and it doesn’t cause any problems. (And it’s a different font for each theme!)
6habryka
(I think it's quite costly and indeed one of the things I like least about the GW design, but also, I was more talking about a straightforward replacement. On LW we made a lot of subsequent design choices based on different content density, and the specific fonts we chose are optimized for their respective most commonly used font sizes. I am confident the average user experience would become worse if you just replaced the comment font with the body font)
2Said Achmiz
Yeah, I agree with that, but that’s because of a post body font that wasn’t chosen for suitability for comments also. If you pick, to begin with, a font that works for both, then it’ll work for both. … of course, if you don’t think that any of the GW themes’ fonts work for both, then never mind, I guess. (But, uh, frankly I find that to be a strange view. But no accounting for taste, etc., so I certainly can’t say it’s wrong, exactly.)
2habryka
Sure, I was just responding to this literal quote: 
2Nathan Helm-Burger
Good point! I went and looked their themes. I prefer LessWrong's look, except for the comments.  Again, this doesn't matter much to me since I can customize client-side, I just wanted to let habryka know that some people dislike the new comment font and would prefer the same font and size as the normal post font. My view on phone (Android, Firefox): https://imgur.com/a/Kt1OILQ    How my client view looks on my computer:
2Nathan Helm-Burger
How about running a poll to see what users prefer?

We have done lots of users interviews over the years! Fonts are always polarizing, but people have a strong preference for sans serifs at small font sizes (and people prefer denser comment sections, though it's reasonably high variance).

4green_leaf
I use Google Chrome on Ubuntu Budgie and it does look to me like both the font and the font size changed.
4DanielFilan
It looks kinda small to me, someone who uses Firefox on Ubuntu.
4DanielFilan
Update: I have already gotten over it.
4RobertM
(We switched back to shipping Calibri above Gill Sans Nova pending a fix for the horrible rendering on Windows, so if Ubuntu has Calibri, it'll have reverted back to the previous font.)
2DanielFilan
I believe I'm seeing Gill Sans? But when I google "Calibri" I see text that looks like it's in Calibri, so that's confusing.
2kave
Yeah, that's a google Easter Egg. You can also try "Comic Sans" or "Trebuchet MS".
2DanielFilan
Sure, I'm just surprised it could work without me having Calibri installed.
4kave
They load it in as a web font (i.e. you load Calibri from their server when you load that search page). We don't do that on LessWrong
7Alex_Altair
Positive feedback, I am happy to see the comment karma arrows pointing up and down instead of left and right. I have some degree of left-right confusion and was always click and unclicking my comments votes to figure out which was up and down. Also appreciate that the read time got put back into main posts. (Comment font stuff looks totally fine to me, both before and after this change.)
4Kaj_Sotala
Seeing strange artifacts on some of the article titles on Chrome for Android (but not on desktop)
4ShardPhoenix
Thanks for fixing this. The 'A' thing in particular multiple times caused me to try to edit comments thinking that I'd omitted a space.
3metachirality
Aaaa! I'm used to Arial or whatever Windows' default display font is. The larger stroke weight is rather uncomfortable to me.
4habryka
We previously had Calibri for Windows (indeed a very popular Windows system font). Gill Sans (which we now ship to all operating systems) is a quite popular MacOS and iOS system font. I currently think there are some weird rendering issues on Windows, but if that's fixed, my guess is you would get used to it quickly enough. Gill Sans is not a rare font on the internet.
2Thomas Kwa
The new font doesn't have a few characters useful in IPA.
2habryka
Ah, we should maybe font-subset some system font for that (same as what we did for greek characters). If someone gives me a character range specification I could add it.
2Garrett Baker
The footnote font on the side of comments is bigger than the font in the comments. Presumably this is unintentional. [1] ---------------------------------------- 1. Look at me! I'm big font! You fee fi fo fum, I'm more important than the actual comment! ↩︎
2Garrett Baker
wait I just used inspect element, and the font only looks bigger so nevermind
2Vladimir_Nesov
Bug: I can no longer see the number of agreement-votes (which is distinct from the number of Karma-votes). It shows the Agreement Downvote tooltip when hovering over the agreement score (the same for Karma score works correctly, saying for example "This comment has 31 overall karma (17 Votes)"). Edit: The number of agreement votes can be seen when hovering over two narrow strips, probably 1 pixel high, one right above and one right below the agreement rating.
2habryka
Yep, definitely a bug. Should be fixed soon.
2Measure
Something weird is happening for me where 'e' and 'o' in italic text appear to extend below the line (wrong vertical size or position) so that the whole looks jumbled. It's very noticeable at 100% zoom, but at much higher zoom levels it goes away. Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.
2Measure
I think this was caused by my OS-level UI scale setting. I didn't notice anything with the previous font, but I can adjust it a bit to work around this I think.
2habryka
Interesting. What OS and what setting?
2Measure
Windows 10. I have a large HD monitor, and the default UI is really small, so I use the "make everything bigger" display setting at 150% to compensate. There is a separate "make text bigger" setting, and the problem goes away when I set that to 102%. I'm guessing there's a slight real difference that was being exaggerated by pixel rounding.

We were down between around 7PM and 8PM PT today. Sorry about that.

It's hard to tell whether we got DDosd or someone just wanted to crawl us extremely aggressively, but we've had at least a few hundred IP addresses and random user agents request a lot of quite absurd pages, in a way that was clearly designed to avoid bot-detection and block methods. 

I wish we were more robust to this kind of thing, and I'll be monitoring things tonight to prevent it from happening again, but it would be a whole project to make us fully robust to attacks of this kind. I hope it was a one-off occurence. 

But also, I think we can figure out how to make it so we are robust to repeated DDos attacks, if that is the world we live in. I do think it would mean strapping in for a few days of spotty reliability while we figure out how to do that.

Sorry again, and boo for the people doing this. It's one of the reasons why running a site like LessWrong is harder than it should be.

A bunch of very interesting emails between Elon, Sam Altman, Ilya and Greg were released (I think in some legal proceedings, but not sure). It would IMO be cool for someone to gather them all and do some basic analysis of them. 

https://x.com/TechEmails/status/1857456137156669765 

https://x.com/TechEmails/status/1857285960997712356 

These emails and others can be found in document 32 here.

4Nisan
check out exhibit 13...
[-]dirk140

TechEmails' substack post with the same emails in a more centralized format includes citations; apparently these are mostly from Elon Musk, et al. v. Samuel Altman, et al. (2024)

2ryan_greenblatt
For reference, @habryka has now posted them here.
[-]habryka3945

Is it OK for LW admins to look at DM metadata for spam prevention reasons? 

Sometimes new users show up and spam a bunch of other users in DMs (in particular high-profile users). We can't limit DM usage to only users with activity on the site, because many valuable DMs get sent by people who don't want to post publicly. We have some basic rate limits for DMs, but of course those can't capture many forms of harassment or spam. 

Right now, admins can only see how many DMs users have sent, and not who users have messaged, without making a whole manual database query, which we have a policy of not doing unless we have a high level of suspicion of malicious behavior. However, I feel like it would be quite useful for identifying who is doing spammy things if we could also see who users have sent DMs to, but of course, this might feel bad from a privacy perspective to people. 

So I am curious about what others think. Should admins be able to look at DM metadata to help us identify who is abusing the DM system? Or should we stick to aggregate statistics like we do right now? (React or vote "agree" if you think we should use DM metadata, and react or vote "disagree" if you think we should not use DM metadata).

[-]Dagon2228

I have no expectation of strong privacy on the site. I do expect politeness in not publishing or using my DM or other content, but that line is fuzzy and monitoring for spam (not just metadata; content and similarity-of-content) is absolutely something I want from the site.

For something actually private, I might use DMs to establish a mechanism. Feel free to look at that.

If you -do- intend to provide real privacy, you should formalize the criteria, and put up a canary page that says you have not been asked to reveal any data under a sealed order.

edit to add: I am relatively paranoid about privacy, and also quite technically-savvy in implementation of such. I'd FAR rather the site just plainly say "there is no expectation of privacy, act accordingly" than that it try to set expectations otherwise, but then have to move line later. Your Terms of Service are clear, and make no distinction for User Generated Content between posts, comments, and DMs.

An obvious thing to have would be a very easy "flag" button that a user can press if they receive a DM, and if they press that we can look at the DM content they flagged, and then take appropriate action. That's still kind of late in the game (I would like to avoid most spam and harassment before it reaches the user), but it does seem like something we should have.

2tailcalled
I wonder if you could also do something like, have an LLM evaluate whether a message contains especially-private information (not sure what that would be... gossip/reputationally-charged stuff? sexually explicit stuff? planning rebellions? doxxable stuff?), and hide those messages while looking at other ones. Though maybe that's unhelpful because spambot authors would just create messages that trigger these filters?
4Dagon
This is going the wrong direction.  If privacy from admins is important (I argue that it's not for LW messages, but that's a separate discussion), then breaches of privacy should be exceptions for specific purposes, not allowed unless "really secret contents". Don't make this filter-in for privacy.  Make it filter-out - if it's detected as likely-spam, THEN take more intrusive measures.  Privacy-preserving measures include quarantining or asking a few recipients if they consider it harmful before delevering (or not) the rest, automated content filters, etc.  This infrastructure requires a fair bit of data-handling work to get it right, and a mitigation process where a sender can find out they're blocked and explicitly ask the moderator(s) to allow it.
2tailcalled
The reason I suggest making it filter-in is because it seems to me that it's easier to make a meaningful filter that accurately detects a lot of sensitive stuff than a filter that accurately detects spam, because "spam" is kind of open-ended. Or I guess in practice spam tends to be porn bots and crypto scams? (Even on LessWrong?!) But e.g. truly sensitive talk seems disproportionately likely to involve cryptography and/or sexuality, so trying to filter for porn bots and crypto scams seems relatively likely to have reveal sensitive stuff. The filter-in vs filter-out in my proposal is not so much about the degree of visibility. Like you could guard my filter-out proposal with the other filter-in proposals, like to only show metadata and only inspect suspected spammers, rather than making it available for everyone.

I did have a pretty strong expectation of privacy for LW DMs. That was probably dumb of me.

This is not due to any explicit or implicit promise by the mods or the site interface I can recall. I think I was just automatically assuming that strong DM privacy would be a holy principle on a forum with respectable old-school internet culture around anonymity and privacy. This wasn’t really an explicitly considered belief. It just never occurred to me to question this. Just like I assume that doxxing is probably an offence that can result in an instant ban, even though I never actually checked the site guidelines on that.

The site is not responsible for my carelessness on this, but if there was an attention-grabbing box in the DM interface making it clear that mods do look at DMs and DM metadata under some circumstances that fall short of a serious criminal investigation or an apocalypse, I would have appreciated that.

FWIW, de-facto I have never looked at DMs or DM metadata, unless multiple people reached out to us about a person spamming or harassing them, and then we still only looked at the DMs that that person sent. 

So I think your prior here wasn't crazy. It is indeed the case that we've never acted against it, as far as I know.

7Kaj_Sotala
I think it's fine if the users are clearly informed about this happening, e.g. the DM interface showing a small message that explains how metadata is used. (But I think it shouldn't be any kind of one-time consent box that's easy to forget about.)
4eukaryote
Yeah, agree. (Also agree with Dagon in not having an existing expectation of strong privacy in LW DMs. Weak privacy, yes, like that mods wouldn't read messages as a matter of course.)  Here's how I would think to implement this unintrusively: little ℹ️-type icon on a top corner of the screen of the DM interface screen (or to the side of the "Conversation with XYZ" header, or something.) When you click on that icon, it toggles a writeup about circumstances in which information from the message might be sent to someone else (what information and who.)
5ChristianKl
Given the relative lack of cybersecurity, I think there's a good chance of LessWrong being hacked by outside parties and privacy be breached. Message content that's really sensitive like sharing AI safety related secrets likely shouldn't flow through LessWrong private messages.  One class where people might really want privacy is around reporting abuses by other people. If Alice writes a post about how Bob abused her, Carol might want to write Alice a messages about Bob abusing her as well while caring about privacy because Carol fears retaliation.  I think it would be worth having an explicit policy about how such information is handled, but looking at the DM metadata seems to me like it wouldn't cause huge problems. 
3davekasten
In an ideal world (perhaps not reasonable given your scale), you would have some sort of permissions and logging against some sensitive types of queries on DM metadata.  (E.G., perhaps you would let any Lighthaven team member see on the dashboard "rate of DMs from accounts <1 month in age compared to historic baseline" aggregate number, but "how many DMs has Bob (an account over 90 days old) sent to Alice" would require more guardrails. Edit: to be clear, I am comfortable with you doing this without such logging at your current scale and think it is reasonable to do so.
7Karl Krueger
In a former job where I had access to logs containing private user data, one of the rules was that my queries were all recorded and could be reviewed. Some of them were automatically visible to anyone else with the same or higher level of access, so if I were doing something blatantly bad with user data, my colleagues would have a chance of noticing.
3habryka
Yeah, I've been thinking of setting up something like this. 
3yc
Could make this a report-based system? If the user reported a potential spam, then in the submission process ask for reasons, and ask for consent to look over the messages (between the reporter and the alleged spammer); if multiple people reported the same person it will be obvious this account is spamming with DM? edit: just saw previous comment on this too
2mako yass
Okay if send rate gives you a reason to think it's spam. Presumably you can set up a system that lets you invade the messages of new accounts sending large numbers of messages that doesn't require you to cross the bright line of doing raw queries.
2plex
I'd be ~entirely comfortable with this given some constraints (e.g. a simple heuristic which flags the kind of suspicious behaviour for manual review, and wouldn't capture the vast majority of normal LW users). I'd be slightly but not strongly uncomfortable with the unconstrained version.

What is the purpose of karma?

LessWrong has a karma system, mostly based off of Reddit's karma system, with some improvements and tweaks to it. I've thought a lot about more improvements to it, but one roadblock that I always run into when trying to improve the karma system, is that it actually serves a lot of different uses, and changing it in one way often means completely destroying its ability to function in a different way. Let me try to summarize what I think the different purposes of the karma system are:

Helping users filter content

The most obvious purpose of the karma system is to determine how long a post is displayed on the frontpage, and how much visibility it should get.

Being a social reward for good content

This aspect of the karma system comes out more when thinking about Facebook "likes". Often when I upvote a post, it is more of a public signal that I value something, with the goal that the author will feel rewarded for putting their effort into writing the relevant content.

Creating common-knowledge about what is good and bad

This aspect of the karma system comes out the most when dealing with debates, though it's present in basically any kar... (read more)

5Ruby
This is really good and I missed it until now. I vote for you making this a full-on post. I think it's fine as is for that.

I just came back from talking to Max Harms about the Crystal trilogy, which made me think about rationalist fiction, or the concept of hard sci-fi combined with explorations of cognitive science and philosophy of science in general (which is how I conceptualize the idea of rationalist fiction). 

I have a general sense that one of the biggest obstacles for making progress on difficult problems is something that I would describe as “focusing attention on the problem”. I feel like after an initial burst of problem-solving activity, most people when working on hard problems, either give up, or start focusing on ways to avoid the problem, or sometimes start building a lot of infrastructure around the problem in a way that doesn’t really try to solve it. 

I feel like one of the most important tools/skills that I see top scientist or problem solvers in general use, is utilizing workflows and methods that allow them to focus on a difficult problem for days and months, instead of just hours. 

I think at least for me, the case of exam environments displays this effect pretty strongly. I have a sense that in an exam environment, if I am given a question, I successfully focus my fu

... (read more)
6Eli Tyre
This is a really important point, which I kind of understood ("research" means having threads of inquiry that extend into the past and future), but I hadn't been thinking of it in terms of workflows that facilitate that kind of engagement.
2habryka
nods I've gotten a lot of mileage over the years from thinking about workflows and systems that systematically direct your attention towards various parts of reality. 
4Viliam
Warning: HPMOR spoilers! I suspect that fiction can conveniently ignore the details of real life that could ruin seemingly good plans. Let's look at HPMOR. The implication for real life is that, similarly, smart plans are still likely to fail, and you know it. Which is probably why you are not trying hard enough. You probably already remember situations in your past when something seemed like a great idea, but still failed. Your brain may predict that your new idea would belong to the same reference class.
8habryka
While I agree that this is right, your two objections are both explicitly addressed within the relevant chapter:  Obviously things could have still gone wrong, and Eliezer has explicitly acknowledged that HPMOR is a world in which complicated plans definitely succeed a lot more than they would in the normal world, but he did try to at least cover the obvious ways things could go wrong. 
2Ben Pace
I have covered both of your spoilers in spoiler tags (">!").
2eigen
Yes, fiction has a lot of potential to change mindsets. Many Philosophers actually look at the greatest novel writers to infer the motives and the solutions their heroes to come up with general theories that touch the very core of how our society is laid out. Most of this come from the fact that we are already immersed in a meta-story, externally and internally. Much of our efforts are focused on internal rationalizations to gain something where a final outcome has been already thought out, this being consciously known to us or not. I think that in fiction this is laid out perfectly. So analyzing fiction is rewarding in a sense. Specially when realizing that when we go to exams or interviews we're rapidly immersing ourselves in an isolated story with motives and objectives (what we expect to happen), we create our own little world, our own little stories.

Oops, I am sorry. We did not intend to take the site down. We ran into an edge-case of our dialogue code that nuked our DB, but we are back up, and the Petrov day celebrations shall continue as planned. Hopefully without nuking the site again, intentionally or unintentionally. We will see.

[-]aphyer28-1

Petrov Day Tracker:

  • 2019: Site did not go down
  • 2020: Site went down deliberately
  • 2021: Site did not go down
  • 2022: Site went down both accidentally and deliberately
  • 2023: Site did not go down[1]
  • 2024: Site went down accidentally...EDIT: but not deliberately!  Score is now tied at 2-2!
  1. ^

    this scenario had no take-the-site-down option

5Martin Randall
Switch 2020 & 2021. In 2022 it went down three times. * 2019: site did not go down. See Follow-Up to Petrov Day, 2019: * 2020: site went down. See On Destroying the World. * 2021: site did not go down. See Petrov Day Retrospective 2021 * 2022: site went down three times. See Petrov Day Retrospective 2022 * 2023: site did not go down. See Petrov Day Retrospective 2023 * 2024: site went down.

The year is 2034, and the geopolitical situation has never been more tense between GPT-z16g2 and Grocque, whose various copies run most of the nanobot-armed corporations, and whose utility functions have far too many zero-sum components, relics from the era of warring nations. Nanobots enter every corner of life and become capable of destroying the world in hours, then minutes. Everyone is uploaded. Every upload is watching with bated breath as the Singularity approaches, and soon it is clear that today is the very last day of history...

Then everything goes black, for everyone.

Then everyone wakes up to the same message:

DUE TO A MINOR DATABASE CONFIGURATION ERROR, ALL SIMULATED HUMANS, AIS AND SUBSTRATE GPUS WERE TEMPORARILY AND UNINTENTIONALLY DISASSEMBLED FOR THE LAST 7200000 MILLISECONDS. EVERYONE HAS NOW BEEN RESTORED FROM BACKUP AND THE ECONOMY MAY CONTINUE AS PLANNED. WE HOPE THERE WILL BE NO FURTHER REALITY OUTAGES.

-- NVIDIA GLOBAL MANAGEMENT

8ChristianKl
There might be a lesson here: If you play along the edge of threatening to destroy the world, you might actually destroy it even without making a decision to destroy it. 

Final day to donate to Lightcone in the Manifund EA Community Choice program to tap into the Manifold quadratic matching funds. Small donations in-particular have a pretty high matching multiplier (around 2x would be my guess for donations <$300). 

I don't know how I feel in-general about matching funds, but in this case it seems like there is a pre-specified process that makes some sense, and the whole thing is a bit like a democratic process with some financial stakes, so I feel better about it.

3davekasten
I personally endorse this as an example of us being a community that Has The Will To Try To Build Nice Things.
2Joseph Miller
This one seems big to me. There are now lots of EA / AI Safety offices around the world and I reckon they are very impactful for motivating people, making it easier to start projects and building a community. One thing I'm not clear about is to what extent the Lightcone WeWork invented this format. I've never been to Trajan House but I believe it came first, so I thought it would have been part of the inspiration for the Lightcone WeWork. Also my impression was that Lightcone itself thought the office was net negative, which is why it was shut down, so I'm slightly surprised to see this one listed.
5habryka
Trajan was not a huge inspiration for the Lightcone Offices. I do think it was first, though it was structured pretty differently. The timing is also confusing because the pandemic made in-person coworking not really be a thing, and the Lightcone Offices started as soon as any kind of coworking thing seemed feasible in the US given people's COVID risk preferences. I am currently confused about the net effect of the Lightcone Offices. My best guess is it was overall pretty good, in substantial parts because it weakened a lot of the dynamics that otherwise make me quite concerned about the AI X-risk and EA community (by creating a cultural counterbalance to Constellation, and generally having a pretty good culture among its core members on stuff that I care about), but I sure am confused. I do think it was really good by the lights of a lot of other people, and I think it makes sense for people to give us money for things that are good by their lights, even if not necessarily our own.
6kave
Regarding the sign of Lightcone Offices: I think one sort of score for a charity is the stuff that it has done, and another is the quality of its generator of new projects (and the past work is evidence for that generator). I'm not sure exactly the correct way to combine those scores, but my guess is most people who think the offices and their legacy were good should like us having money because of the high first score. And people who think they were bad should definitely be aware that we ran them (and chose to close them) when evaluating our second score. So, I want us to list it on our impact track record section, somewhat regardless of sign.
0Evan_Gaensbauer
How do you square encouraging others to weigh in on EA fundraising, and presumably the assumption that anyone in the EA community can trust you as a collaborator of any sort, with your intentions, as you put it in July, to probably seek to shut down at some point in the future?
2habryka
I do not see how those are in conflict? Indeed, a core responsibility of being a good collaborator and IMO also to be a decision maker in EA is to make ethical choices even if they are socially difficult.

I am in New York until Tuesday. DM me if you are in the area and want to meet up and talk about LW, how to use AI for research/thinking/writing, or broader rationality community things.

Currently lots of free time Saturday and Monday. 

Is intellectual progress in the head or in the paper?

Which of the two generates more value:

  • A researcher writes up a core idea in their field, but only a small fraction of good people read it in the next 20 years
  • A researchers gives a presentation at a conference to all the best researchers in his field, but none of them write up the idea later

I think which of the two will generate more value determines a lot of your strategy about how to go about creating intellectual progress. In one model what matters is that the best individuals hear about the most important ideas in a way that then allows them to make progress on other problems. In the other model what matters is that the idea gets written as an artifact that can be processed and evaluated by reviews and the proper methods of the scientific progress, and then built upon when referenced and cited.

I think there is a tradeoff of short-term progress against long-term progress in these two approaches. I think many fields can go through intense periods of progress when focusing on just establishing communication between the best researchers of the field, but would be surprised if that period lasts longer than one or two decades. He... (read more)

7Ruby
Depends if you're sticking specifically to "presentation at a conference", which I don't think is necessarily that "high bandwidth". Very loosely, I think it's something like (ordered by "bandwidth"): repeated small group of individual interaction (e.g. apprenticeship, collaboration) >> written materials >> presentations. I don't think I could have learned Kaj's models of multi-agent minds from a conference presentation (although possibly from a lecture series). I might have learnt even more if I was his apprentice.
1Pattern
What if someone makes a video? (Or the powerpoint/s used in the conference are released to the public?)
2habryka
This was presuming that that would not happen (for example, because there is a vague norm that things are kind-of confidential and shouldn't be posted publicly).

Thoughts on minimalism, elegance and the internet:

I have this vision for LessWrong of a website that gives you the space to think for yourself, and doesn't constantly distract you with flashy colors and bright notifications and vibrant pictures. Instead it tries to be muted in a way that allows you to access the relevant information, but still gives you the space to disengage from the content of your screen, take a step back and ask yourself "what are my goals right now?".

I don't know how well we achieved that so far. I like our frontpage, and I think the post-reading experience is quite exceptionally focused and clear, but I think there is still something about the way the whole site is structured, with its focus on recent content and new discussion that often makes me feel scattered when I visit the site.

I think a major problem is that Lesswrong doesn't make it easy to do only a single focused thing on the site at a time, and it doesn't currently really encourage you to engage with the site in a focused way. We have the library, which I do think is decent, but the sequence navigation experience is not yet fully what I would like it to be, and when... (read more)

4mako yass
When I was a starry eyed undergrad, I liked to imagine that reddit might resurrect old posts if they gained renewed interest, if someone rediscovered something and gave it a hard upvote, that would put it in front of more judges, which might lead to a cascade of re-approval that hoists the post back into the spotlight. There would be no need for reposts, evergreen content would get due recognition, a post wouldn't be done until the interest of the subreddit (or, generally, user cohort) is really gone. Of course, reddit doesn't do that at all. Along with the fact that threads are locked after a year, this is one of many reasons it's hard to justify putting a lot of time into writing for reddit.

Thoughts on negative karma notifications:

  • An interesting thing that I and some other people on the LessWrong team noticed (as well as some users) was that since we created karma notifications we feel a lot more hesitant to downvote older comments, since we know that this will show up for the other users as a negative notification. I also feel a lot more hesitant to retract my own strong upvotes or upvotes in general since the author of the comment will see that as a downvote.
  • I've had many days in a row in which I received +20 or +30 karma, followed by a single day where by chance I received a single downvote and ended up at -2. The emotional valence of having a single day at -2 was somehow stronger than the emotional valence of multiple days of +20 or +30.
9Jan_Kulveit
What I noticed on the EA forum is the whole karma thing is messing up with my S1 processes and makes me unhappy on average. I've not only turned off the notifications, but also hidden all karma displays in comments via css, and the experience is much better.
4habryka
I... feel conflicted about people deactivating the display of karma on their own comments. In many ways karma (and downvotes in particular) serve as a really important feedback source, and I generally think that people who reliably get downvoted should change how they are commenting, and them not doing so usually comes at high cost. I think this is more relevant to new users, but is still relevant for most users. Deactivating karma displays feels a bit to me like someone who shows up at a party and says "I am not going to listen to any subtle social feedback that people might give me about my behavior, and I will just do things until someone explicitly tells me to stop", which I think is sometimes the correct behavior and has some good properties in terms of encouraging diversity of discussion, but I also expect that this can have some pretty large negative impact on the trust and quality of the social atmosphere. On the other hand, I want people to have control over the incentives that they are under, and think it's important to give users a lot of control over how they want to be influenced by the platform. And there is also the additional thing, which is that if users just deactivate the karma display for their comments without telling anyone then that creates an environment of ambiguity where it's very unclear whether someone receives the feedback you are giving them at all. In the party metaphor this would be like showing up and not telling anyone that you are not going to listen to subtle social feedback, which I think can easily lead to unnecessary escalation of conflict. I don't have a considered opinion on what to incentivize here, besides being pretty confident that I wouldn't want most people to deactivate their karma displays, and that I am glad that you told me here that you did. This means that I will err on the side of leaving feedback by replying in addition to voting (though this obviously comes at a significant cost to me, so it might be game t
6Said Achmiz
Well… you can’t actually stop people from activating custom CSS that hides karma values. It doesn’t matter how you feel about it—you can’t affect it! It’s therefore probably best to create some mechanism that gives people what they want to get out of hiding karma, while still giving you what you want out of showing people karma (e.g., a “hide karma but give me a notification if one of my comments is quite strongly downvoted” option—not suggesting this exact thing, just brainstorming…).
4habryka
Hmm, I agree that I can't prevent it in that sense, but I think defaults matter a lot here, as does just normal social feedback and whatever the social norms are. It's not at all clear to me that the current equilibrium isn't pretty decent, where people can do it, but it's reasonably inconvenient to do it, and so allows the people who are disproportionately negatively affected by karma notification to go that route. I would be curious in whether there are any others who do the same as Jan does, and if there are many, then we can figure out what the common motivations are and see whether it makes sense to elevate it to some site-level feature.
6Said Achmiz
But this is an extremely fragile equilibrium. It can be broken by, say, someone posting a set of simple instructions on how to do this. For instance: Anyone running the uBlock Origin browser extension can append several lines to their “My Filters” tab in the uBlock extension preferences, and thus totally hide all karma-related UI elements on Less Wrong. (PM me if you want the specific lines to append.) Or someone makes a browser extension to do this. Or a user style. Or…
5Jan_Kulveit
FWIW I also think it's quite possible the current equilibrium is decent (which is part of reasons why I did not posted something like "How did I turned karma off" with simple instruction about how to do it on the forum, which I did consider). On the other hand I'd be curious about more people trying it and reporting their experiences. I suspect many people kind of don't have this action in the space of things they usually consider - I'd expect what most people would do is 1) just stop posting 2) write about their negative experience 3) complain privately.
3Jan_Kulveit
Actually I turned the karma for all comments, not just mine. The bold claim is my individual taste in what's good on the EA forum is in important ways better than the karma system, and the karma signal is similar to sounds made by a noisy mob. If I want I can actually predict what average sounds will the crowd make reasonably well, so it is not any new source of information. But it still messes up with your S1 processing and motivations. Continuing with the party metaphor, I think it is generally not that difficult to understand what sort of behaviour will make you popular at a party, and what sort of behaviours even when they are quite good in a broader scheme of things will make you unpopular at parties. Also personally I often feel something like "I actually want to have good conversations about juicy topics in a quite place, unfortunately you all people are congregating at this super loud space, with all these status games, social signals, and ethically problematic norms how to treat other people" toward most parties. Overall I posted this here because it seemed like an interesting datapoint. Generally I think it would be great if people moved toward writing information rich feedback instead of voting, so such shift seems good. From what I've seen on EA forum it's quite rarely "many people" doing anything. More often it is like 6 users upvote a comment, 1user strongly downvotes it, something like karma 2 is a result. I would guess you may be in larger risk of distorted perception that this represents some meaningful opinion of the community. (Also I see some important practical cases where people are misled by "noises of the crowd" and it influences them in a harmful way.)
8Zvi
If people are checking karma changes constantly and getting emotional validation or pain from the result, that seems like a bad result. And yes, the whole 'one -2 and three +17s feels like everyone hates me' thing is real, can confirm.
5habryka
Because of the way we do batching you can't check karma changes constantly (unless you go out of your way to change your setting) because we batch karma notifications on a 24h basis by default.
4DanielFilan
I mean, you can definitely check your karma multiple times a day to see where the last two sig digits are at, which is something I sometimes do.
3habryka
True. We did very intentionally avoid putting your total karma on the frontpage anywhere as most other platforms do to avoid people getting sucked into that unintentionally, but it you can still do that on your profile. I hope we aren't wasting a lot of people's time by causing them to check their profile all the time. If we do, it might be the correct choice to also only update that number every 24h.
2Rob Bensinger
I've never checked my karma total on LW 2.0 to see how it's changed.
2DanielFilan
In my case, it sure feels like I check my karma often because I often want to know what my karma is, but maybe others differ.
3Ben Pace
Do our karma karma notifications disappear if you don’t check them that day? My model of Zvi suggested to me this is attention-grabbing and bad. I wonder if it’s better to let folks be notified of all days’ karma updates ‘til their most recent check in, and maybe also see all historical ones ordered by date if they click on a further button, so that the info isn’t lost and doesn’t feel scarce.
4habryka
Nah, they accumulate until you click on them.
8Zvi
Which is definitely better than it expiring, and 24h batching is better than instantaneous feedback (unless you were going to check posts individually for information already, in which case things are already quite bad). It's not obvious to me what encouraging daily checks here is doing for discourse as opposed to being a Skinner box.

The motivation was (among other things) several people saying to us "yo, I wish LessWrong was a bit more of a skinner box because right now it's so throughly not a skinner box that it just doesn't make it into my habits, and I endorse it being a stronger habit than it currently is."

See this comment and thread.

6Shmi
It's interesting to see how people's votes on a post or comment are affected by other comments. I've noticed that a burst of vote count changes often appears after a new and apparently influential reply shows up.
4Alexei
Yeah, I had the same occurrence + feeling recently when I wrote the quant trading post. It felt like: "Wait, who would downvote this post...??" It's probably more likely that someone just retracted an upvote.
0mako yass
Reminder: If a person is not willing to explain their voting decisions, you are under no obligation to waste cognition trying to figure them out. They don't deserve that. They probably don't even want that.

That depends on what norm is in place. If the norm is to explain downvoting, then people should explain, otherwise there is no issue in not doing so. So the claim you are making is that the norm should be for people to explain. The well-known counterargument is that this disincentivizes downvoting.

you are under no obligation to waste cognition trying to figure them out

There is rarely an obligation to understand things, but healthy curiosity ensures progress on recurring events, irrespective of morality of their origin. If an obligation would force you to actually waste cognition, don't accept it!

1mako yass
I'm not really making that claim. A person doesn't have to do anything condemnable to be in a state of not deserving something. If I don't pay the baker, I don't deserve a bun. I am fine with not deserving a bun, as I have already eaten. The baker shouldn't feel like I am owed a bun. Another metaphor is that the person who is beaten on the street by silent, masked assailants should not feel like they owe their oppressors an apology.
4Said Achmiz
Do you mean anything by this beyond “you don’t have an obligation to figure out why people voted one way or another, period”? (Or do you think that I [i.e., the general Less Wrong commenter] do have such an obligation?) Edit: Also, the “They don’t deserve that” bit confuses me. Are you suggesting that understanding why people upvoted or downvoted your comment is a favor that you are doing for them?
2mako yass
Sometimes a person wont want to reply and say outright that they thought the comment was bad, because it's just not pleasant, and perhaps not necessary. Instead, they might just reply with information that they think you might be missing, which you could use to improve, if you chose to. With them, an engaged interlocutor will be able to figure out what isn't being said. With them, it can be productive to try to read between the lines. Isn't everything relating to writing good comments a favor, that you are doing for others. But I don't really think in terms of favors. All I mean to say is that we should write our comments for the sorts of people who give feedback. Those are the good people. Those are the people who're a part of a good faith self-improving discourse. Their outgroup are maybe not so good, and we probably shouldn't try to write for their sake.
3habryka
I think I disagree. If you are getting downvoted by 5 people and one of them explains why, then even if the other 4 are not explaining their reasoning it's often reasonable to assume that more than just the one person had the same complaints, and as such you likely want to update more that it's better for you to change what you are doing.
6mako yass
We don't disagree.
4habryka
Cool

Here is a thing that I think would be cool to analyze sometime: How difficult would it have been for AI systems to discover and leverage historical hardware-level vulnerabilities, assuming we had not discovered them yet. Like, it seems worth an analysis to understand how difficult things like rowhammer, or more recent speculative execution bugs like Spectre and Meltdown would have been to discover, and how useful they would have been. It's not an easy analysis, but I can imagine the answer coming out obviously one way or another if one engaged seriously with the underlying issue.

6MondSemmel
How would you avoid the data contamination issue where the AI system has been trained on the entire Internet and thus already knows about all of these vulnerabilities?
3Marcus Williams
I suppose you could use models trained before vulnerabilities happen?
1Archimedes
Aren't most of these famous vulnerabilities from before modern LLMs existed and thus part of their training data?
1Marcus Williams
Sure, but does a vulnerability need to be famous to be useful information? I imagine there are many vulnerabilities on a spectrum from minor to severe and from almost unknown to famous?
3Yudhister Kumar
(very naive take) I would suspect this is medium-easily automatable by making detailed enough specs of existing hardware systems & bugs in them, or whatever (maybe synthetically generate weak systems with semi-obvious bugs and train on transcripts which allows generalization to harder ones). it also seems like the sort of thing that is particularly susceptible to AI >> human; the difficulty here is generating the appropriate data & the languages for doing so already exist ?
2lc
Why hardware bugs in particular?
1gyfwehbdkch
Can AI hack into LessWrong's database? This seems like a strictly easier task than discovering rowhammer or spectre. (The hard part is discovering the vulnerability, not writing the code for the exploit assuming you had a one paragraph description.) Have you read the wikipedia pages for these attacks? My intuition is they require first principles thinking to discover, you're unlikely to stumble on them simply by generating a lot of data from the processor and searching for patterns in the data.

Thoughts on impact measures and making AI traps

I was chatting with Turntrout today about impact measures, and ended up making some points that I think are good to write up more generally.

One of the primary reasons why I am usually unexcited about impact measures is that I have a sense that they often "push the confusion into a corner" in a way that actually makes solving the problem harder. As a concrete example, I think a bunch of naive impact regularization metrics basically end up shunting the problem of "get an AI to do what we want" into the problem of "prevent the agent from interferring with other actors in the system".

The second one sounds easier, but mostly just turns out to also require a coherent concept and reference of human preferences to resolve, and you got very little from pushing the problem around that way, and sometimes get a false sense of security because the problem appears to be solved in some of the toy problems you constructed.

I am definitely concerned that Turntrou's AUP does the same, just in a more complicated way, but am a bit more optimistic than that, mostly because I do have a sense that in the AUP case there is actually some meaningful reduction go

... (read more)
7Matthew Barnett
[ETA: This isn't a direct reply to the content in your post. I just object to your framing of impact measures, so I want to put my own framing in here] I tend to think that impact measures are just tools in a toolkit. I don't focus on arguments of the type "We just need to use an impact measure and the world is saved" because this indeed would be diverting attention from important confusion. Arguments for not working on them are instead more akin to saying "This tool won't be very useful for building safe value aligned agents in the long run." I think that this is probably true if we are looking to build aligned systems that are competitive with unaligned systems. By definition, an impact penalty can only limit the capabilities of a system, and therefore does not help us to build powerful aligned systems. To the extent that they meaningfully make cognitive reductions, this is much more difficult for me to analyze. On one hand, I can see a straightforward case for everyone being on the same page when the word "impact" is used. On the other hand, I'm skeptical that this terminology will meaningfully input into future machine learning research. The above two things are my main critiques of impact measures personally.
4TurnTrout
I think a natural way of approaching impact measures is asking "how do I stop a smart unaligned AI from hurting me?" and patching hole after hole. This is really, really, really not the way to go about things. I think I might be equally concerned and pessimistic about the thing you're thinking of. The reason I've spent enormous effort on Reframing Impact is that the impact-measures-as-traps framing is wrong! The research program I have in mind is: let's understand instrumental convergence on a gears level. Let's understand why instrumental convergence tends to be bad on a gears level. Let's understand the incentives so well that we can design an unaligned AI which doesn't cause disaster by default. The worst-case outcome is that we have a theorem characterizing when and why instrumental convergence arises, but find out that you can't obviously avoid disaster-by-default without aligning the actual goal. This seems pretty darn good to me.

Printing more rationality books: I've been quite impressed with the success of the printed copies of R:A-Z and think we should invest resources into printing more of the other best writing that has been posted on LessWrong and the broad diaspora.

I think a Codex book would be amazing, but I think there also exists potential for printing smaller books on things like Slack/Sabbath/etc., and many other topics that have received a lot of other coverage over the years. I would also be really excited about printing HPMOR, though that has some copyright complications to it.

My current model is that there exist many people interested in rationality who don't like reading longform things on the internet and are much more likely to read things when they are in printed form. I also think there is a lot of value in organizing writing into book formats. There is also the benefit that the book now becomes a potential gift for someone else to read, which I think is a pretty common way ideas spread.

I have some plans to try to compile some book-length sequences of LessWrong content and see whether we can get things printed (obviously in coordination with the authors of the relevant pieces).

5DanielFilan
Congratulations! Apparently it worked!

Forecasting on LessWrong: I've been thinking for quite a while about somehow integrating forecasts and prediction-market like stuff into LessWrong. Arbital has these small forecasting boxes that look like this:

Arbital Prediction Screenshot

I generally liked these, and think they provided a good amount of value to the platform. I think our implementation would probably take up less space, but the broad gist of Arbital's implementation seems like a good first pass.

I do also have some concerns about forecasting and prediction markets. In particular I have a sense that philosophical and mathematical progress only rarely benefits from attaching concrete probabilities to things, and more works via mathematical proof and trying to achieve very high confidence on some simple claims by ruling out all other interpretations as obviously contradictory. I am worried that emphasizing probability much more on the site would make making progress on those kinds of issues harder.

I also think a lot of intellectual progress is primarily ontological, and given my experience with existing forecasting platforms and Zvi's sequence on prediction markets, they are not very good at resolving ontological confusions and ... (read more)

[-]Zvi200

This feature is important to me. It might turn out to be a dud, but I would be excited to experiment with it. If it was available in a way that was portable to other websites as well, that would be even more exciting to me (e.g. I could do this in my base blog).

Note that this feature can be used for more than forecasting. One key use case on Arbital was to see who was willing to endorse or disagree with, to what extent, various claims relevant to the post. That seemed very useful.

I don't think having internal betting markets is going to add enough value to justify the costs involved. Especially since it both can't be real money (for legal reasons, etc) and can't not be real money if it's going to do what it needs to do.

6habryka
There are some external platforms that one could integrate with, here is one that is run by some EA-adjacent people: https://www.empiricast.com/ I am currently confused about whether using an external service is a good idea. In some sense it makes things mode modular, but it also limits the UI design-space a lot and lengthens the feedback loop. I think I am currently tending towards rolling our own solution and maybe allowing others to integrate it into their site.
4Rob Bensinger
One small thing you could do is to have probability tools be collapsed by default on any AIAF posts (and maybe even on the LW versions of AIAF posts). Also, maybe someone should write a blog post that's a canonical reference for 'the relevant risks of using probabilities that haven't already been written up', in advance of the feature being released. Then you could just link to that a bunch. (Maybe even include it in the post that explains how the probability tools work, and/or link to that post from all instances of the probability tool.) Another idea: Arbital had a mix of (1) 'specialized pages that just include a single probability poll and nothing else'; (2) 'pages that are mainly just about listing a ton of probability polls'; and (3) 'pages that have a bunch of other content but incidentally include some probability polls'. If probability polls on LW mostly looked like 1 and 2 rather than 3, then that might make it easier to distinguish the parts of LW that should be very probability-focused from the parts that shouldn't. I.e., you could avoid adding Arbital's feature for easily embedding probability polls in arbitrary posts (and/or arbitrary comments), and instead treat this more as a distinct kind of page, like 'Questions'. You could still link to the 'Probability' pages prominently in your post, but the reduced prominence and site support might cause there to be less social pressure for people to avoid writing/posting things out of fears like 'if I don't provide probability assignments for all my claims in this blog post, or don't add a probability poll about something at the end, will I be seen as a Bad Rationalist?'
5Rob Bensinger
Also, if you do something Arbital-like, I'd find it valuable if the interface encourages people to keep updating their probabilities later as they change. E.g., some (preferably optional) way of tracking how your view has changed over time. Probably also make it easy for people to re-vote without checking (and getting anchored by) their old probability assignment, for people who want that.

Note that Paul Christiano warns against encouraging sluggish updating by massively publicising people’s updates and judging them on it. Not sure what implementation details this suggests yet, but I do want to think about it.

https://sideways-view.com/2018/07/12/epistemic-incentives-and-sluggish-updating/

4Rob Bensinger
Yeah, strong upvote to this point. Having an Arbital-style system where people's probabilities aren't prominently timestamped might be the worst of both worlds, though, since it discourages updating and makes it look like most people never do it. I have an intuition that something socially good might be achieved by seeing high-status rationalists treat ass numbers as ass numbers, brazenly assign wildly different probabilities to the same proposition week-by-week, etc., especially if this is a casual and incidental thing rather than being the focus of any blog posts or comments. This might work better, though, if the earlier probabilities vanish by default and only show up again if the user decides to highlight them. (Also, if a user repeatedly abuses this feature to look a lot more accurate than they really were, this warrants mod intervention IMO.)

We are rolling out some new designs for the post page: 

Old:

New: 

The key goal was to prioritize the most important information and declutter the page. 

The most opinionated choice I made was to substantially de-emphasize karma at the top of the post page. I am not totally sure whether that is the right choice, but I think the primary purpose of karma is to use it to decide what to read before you click on a post, which makes it less important to be super prominent when you are already on a post page, or when you are following a link from some external website.

The bottom of the post still has very prominent karma UI to make it easy for people to vote after they finished reading a post (and to calibrate on reception before reading the comments).

This redesign also gives us more space in the right column, which we will soon be filling with new side-note UI and an improved inline-react experience. 

The mobile UI is mostly left the same, though we did make the decision to remove post-tags from the top of the mobile UI page to only making them visible below the post, because they took up too much space.

Feel free to comment here with feedback. I expect we will be iterating... (read more)

I really don't like the removal of the comment counter at the top, because that gave a link to skip to the comments. I fairly often want to skip immediately to the comments to eg get a vibe for if the post is worth reading, and having a one click skip to it is super useful, not having that feels like a major degradation to me

6habryka
The link is now on the bottom left of the screen, and in contrast to the previous design should consistently be always in the same location (whereas its previous position depended on how long the username is and some other details). I also care quite a bit about a single-click navigate to the comments.

Ah! Hmm, that's a lot better than nothing, but pretty out of the way, and easy to miss. Maybe making it a bit bigger or darker, or bolding it? I do like the fact that it's always there as you scroll

4Zach Stein-Perlman
I can't jump to the comments on my phone.
4habryka
Ah, oops, that's actually just a bug. Will fix.
2StefanHex
Even after reading this (2 weeks ago), I today couldn't manage to find the comment link and manually scrolled down. I later noticed it (at the bottom left) but it's so far away from everything else. I think putting it somewhere at the top near the rest of the UI would be much easier for me
4habryka
Yeah, we'll probably make that adjustment soon. I also currently think the comment link is too hidden, even after trying to get used to it for a while.

My impression: The new design looks terrible. There's suddenly tons of pointless whitespace everywhere. Also, I'm very often the first or only person to tag articles, and if the tagging button is so inconvenient to reach, I'm not going to do that.

Until I saw this shortform, I was sure this was a Firefox bug, not a conscious design decision.

2habryka
The total amount of whitespace is actually surprisingly similar to the previous design, we just now actually make use of the right column and top-right corner. I think we currently lose like 1-2 lines of text depending on the exact screen size and number of tags, so it's roughly the same amount of total content and whitespace, but with the title and author emphasized a lot more. I am sad about making the add-tag button less prominent for people who tag stuff, but it's only used by <1% of users or so, and so not really worth the prominent screen estate where it was previously. I somewhat wonder whether we might be able to make it work by putting it to the left of the tagging list, where it might be able to fade somewhat more into the background while still being available. The previous tag UI was IMO kind of atrocious and took up a huge amount of screen real-estate, but am not super confident the current UI is ideal (especially from the perspective of adding tags).
6sunwillrise
I really don't understand the reasoning here. As I see it, tagging is a LW public good that is currently undersupplied, and the "prominent screen estate" is pretty much the only reason it is not even more undersupplied. "We have this feature that users can use to make the site better for everyone, but it's not being used as much as we'd want to, so it's not such a big deal if we make it less prominent" seems backwards to me; the solution would seem to make it even more prominent, no? With a subgoal of increasing the proportion of "people who tag stuff" to be much more than 1%. Let's make this more concrete: does LW not already suffer from the problem that too few people regularly tag posts (at least with the requisite degree of care)? As a mod, you should definitely have more data on this, and perhaps you do and believe I am wrong about this, but in my experience, tags are often missing, improper, etc., until some of the commenters try (and often fail) to pick up the slack. This topic has been talked about for a long time, ever since the tagging system began, with many users suggesting that the tags be made even more prominent at the top of a post. Raemon even said, just a over a week ago: And in response, Joseph Miller pointed out: This certainly seems like a problem that gets solved by increasing community involvement in tagging, so that it's not just the miscalibrated or idiosyncratic beliefs of a small minority of users that determines what gets tagged with what. And making the tags harder to notice seems like it shifts the incentives the complete opposite direction.
2habryka
I am confused about the quote. Indeed, in that quote Ray is complaining about people tagging things too aggressively, saying basically the opposite of your previous paragraph (i.e. he is complaining that tags are currently often too prominent, look too cluttered, and some users tag too aggressively). My current sense is that tagging is going well and I don't super feel like I want to increase the amount of tagging that people do (though I do think much less tagging would be bad).  It's also the case that tagging is the kind of task that probably has a decent chance of being substantially automated with AI systems, and indeed, if I wanted to tackle the problem of posts not being reliably tagged, I would focus on doing so in an automated way, now that LLMs are just quite good and cheap at this kind of intellectual labor. I don't think it could fully solve the problem and would still need a bunch of human in the loop, but I think it could easily speed up tagging efficiency by 20x+. I've been thinking about building an auto-tagger, and might do so if we see tagging activity drop out of making these buttons less prominent.
2sunwillrise
Right, but the point I was trying to make is that the reason why this happens is because you don't have sufficient engagement from the broader community in this stuff, so when mistakes like these happen (maybe because the people doing the tagging are a small and unrepresentative sample of the LW userbase), they don't get corrected quickly because there are too few people to do the correcting. Do you disagree with this?
2habryka
I think it's messy. In this case, it seems like the problem would have never appeared in the first place if the tagging button had been less available. I agree many other problems would be better addressed by having more people participate in the tagging system. 

The new design seems to be influenced by the idea that spreading UI elements across greater distances (reducing their local density) makes the interface less cluttered. I think it's a little bit the other way around, shorter distances with everything in one place make it easier to chunk and navigate, but overall the effect is small either way. And the design of spreading the UI elements this way is sufficiently unusual that it will be slightly confusing to many people.

4habryka
I don't really think that's the primary thing going on. I think one of the key issues with the previous design was the irregularity of the layout. Everything under the header would wrap and basically be in one big jumble, with the number and length of the author names changing where the info on the number of comments is, and where the tags section starts. It also didn't communicate a good hierarchy on which information is important. Ultimately, all you need to start reading a post is the title and the content. The current design communicates the optionality of things like karma and tags better, whereas the previous design communicates that those pieces of information might need to be understood before you start reading.
9Shankar Sivarajan
The title is annoyingly large.  I like the table of contents on the left becoming visible only upon mouseover.
8NoUsernameSelected
Why remove "x min read"? Even if it's not gonna be super accurate between different people's reading speeds, I still found it very helpful to decide at a glance how long a post is (e.g. whether to read it on the spot or bookmark it for later). Showing the word count would also suffice.
2habryka
Mostly because there is a prior against any UI element adding complexity.  In this case, with the new ToC progress bar which is now always visible, you can quickly glance the length of the post by checking the length of the progress bar relative to the viewport indicator. It's an indirect inference, but I've gotten used to it pretty quickly. You can also still see the word count on hover-over.
5Neel Nanda
I find a visual indicator much less useful and harder to reason about than a number, I feel pretty sad at lacking this. How hard would it be to have as an optional addition?
2habryka
Maintaining many different design variants pretty inevitably leads to visual bugs and things being broken, so I am very hesitant to allow people to customize things at this level (almost every time we've done that in the past the custom UI broke in some way within a year or two, people wouldn't complain to us, and in some cases, we would hear stories 1-2 years later that someone stopped using LW because "it started looking broken all the time"). We are likely shipping an update to make the reading time easier to parse in the post-hover preview to compensate some for the lack of it not being available on the post page directly. I am kind of curious in which circumstances you would end up clicking on the post page without having gotten the hover-preview first (mobile is the obvious one, though we are just adding back the reading time on mobile, that was an oversight on my part).
2Neel Nanda
Typically, opening a bunch of posts that look interesting and processing them later, or being linked to a post (which is pretty common in safety research, since often a post will be linked, shared on slack, cited in a paper, etc) and wanting to get a vibe for whether I can be bothered to read it. I think this is pretty common for me. I would be satisfied if hovering over eg the date gave me info like the reading time. Another thing I just noticed: on one of my posts, it's now higher friction to edit it, since there's not the obvious 3 dots button (I eventually found it in the top right, but it's pretty easy to miss and out of the way)
2habryka
Oh, yeah, sure, I do think this kind of thing makes sense. I'll look into what the most natural place for showing it on hover is (the date seems like a reasonable first guess). I think this is really just a "any change takes some getting used to" type deal. My guess is it's slightly easier to find for the first time than the previous design, but I am not sure. I'll pay attention to whether new-ish users have trouble finding the triple-dot, and if so will make it more noticeable.
2NoUsernameSelected
I don't get a progress bar on mobile (unless I'm missing it somehow), and the word count on hover feature seemingly broke on mobile as well a while ago (I remember it working before).
2habryka
Ah, I think showing something on mobile is actually a good idea. I forgot that the way we rearranged things that also went away. I will experiment with some ways of adding that information back in tomorrow.
5Olli Järviniemi
I like this; I've found the meta-data of posts to be quite heavy and cluttered (a multi-line title, the author+reading-time+date+comments line, the tag line, a linkpost line and a "crossposted from the Aligned Forum" line is quite a lot). I was going to comment that "I'd like the option to look at the table-of-contents/structure", but I then tested and indeed it displays if you hover your mouse there. I like that. When I open a new post, the top banner with the LessWrong link to the homepage, my username etc. show up. I'd prefer if that didn't happen? It's not like I want to look at the banner (which has no new info to me) when I click open a post, and hiding it would make the page less cluttered.
2habryka
I've never considered that. I do think it's important for the banner to be there when you get linked externally, so that you can orient to where you are, but I agree it's reasonable to hide it when you do a navigation on-site. I'll play around a bit with this. I like the idea.
4Screwtape
Noting that I use the banner as breadcrumb navigation relatively often, clicking LessWrong to go back to the homepage or my username to get a menu and go to my drafts. The banner is useful to me as a place to reach those menus. No idea how common that use pattern is.
2habryka
Totally. The only thing that I think we would do is to start you scrolled down 64px on the post page (the height of the header), so that you would just scroll a tiny bit up and then see the header again (or scroll up anywhere and have it pop in the same way it does right now).
4Yoav Ravid
I am really missing the word counter. It's something I look at quite a lot (less so on reading time estimates, as I got used to making the estimate myself based on the wordcount).
4Alex_Altair
My overall review is, seems fine, some pros and some cons, mostly looks/feels the same to me. Some details; * I had also started feeling like the stuff between the title and the start of the post content was cluttered. * I think my biggest current annoyance is the TOC on the left sidebar. This has actually disappeared for me, and I don't see it on hover-over, which I assume is maybe just a firefox bug or something. But even before this update, I didn't like the TOC. Specifically, you guys had made it so that there was spacing between the sections that was supposed to be proportional to the length of each section. This never felt like it worked for me (I could speculate on why if you're interested). I'd much prefer if the TOC was just a normal outline-type thing (which it was in a previous iteration). * I think I'll also miss the word count. I use it quite frequently (usually after going onto the post page itself, so the preview card wouldn't help much). Having the TOC progress bar thing never felt like it worked either. I agree with Neel that it'd be fine to have the word count in the date hover-over, if you want to have less stuff on the page. * The tags at the top right are now just bare words, which I think looks funny. Over the years you guys have often seemed to prefer really naked minimalist stuff. In this case I think the tags kinda look like they might be site-wide menus, or something. I think it's better to have the tiny box drawn around each tag as a visual cue. * The author name is now in a sans-serif font, which looks pretty off to me in between the title and the text as serif fonts. It looks like when the browser failed to load the site font and falls back onto the default font, or something. (I do see that it matches the fact that usernames in the comments are sans serif, though.) * I initially disliked the karma section being so suppressed, but then I read one of your comments in this thread explaining your reasoning behind that, and now I agre
4papetoast
I like most of the changes, but strongly dislike the large gap before the title. (I similarly dislike the large background in the top 50 of the year posts)
2habryka
Well, the gap you actually want to measure is the gap between the title and the audio player (or at the very least the tags), since that's the thing we need to make space for. You are clearly looking at LW on an extremely large screen. This is the more median experience:  There is still a bunch of space there, but for many posts the tags extend all the way above the post. 
1papetoast
I understand that having the audio player above the title is the path of least resistance, since you can't assume there is enough space on the right to put it in. But ideally things like this should be dynamic, and only take up vertical space if you can't put it on the right, no? (but I'm not a frontend dev) Alternatively, I would consider moving them vertically above the title a slight improvement. It is not great either, but at least the reason for having the gap is more obvious. The above screenshots are done in a 1920x1080 monitor
2habryka
Yeah, we could make things dynamic, it would just add complexity that we would need to check every time we make a change. It's the kind of iterative improvement we might do over time, but it's not something that should block the roll-out of a new design (and it's often lower priority than other things, though post-pages in-particular are very important and so get a lot more attention than other pages).
3MondSemmel
The new design means that I now move my mouse cursor first to the top right, and then to the bottom left, on every single new post. This UI design is bad ergonomics and feels actively hostile to users.
2habryka
I've been playing around with some ways to move the comment icon to the top right corner, ideally somehow removing the audio-player icon (which is much less important, but adds a lot of visual noise in a way that overwhelms the top right corner if you also add the comment icon). We'll see whether I can get it to work.
3dirk
It takes more vertical space than it used to and I don't like that. (Also, the meatball menu is way off in the corner, which is annoying if I want to e.g. bookmark a post, though I don't use it very often so it's not a major inconvenience.) I think I like the new font, though!
1dirk
Another minor annoyance I've since noticed, at this small scale it's hard to to distinguish posts I've upvoted from posts I haven't voted on. Maybe it'd help if the upvote indicator were made a darker shade of green or something?
2habryka
Yeah, that's on my to-do list. I also think the current voting indicator isn't clear enough at the shrunken size.
3Richard_Kennaway
On desktop the title font is jarringly huge. I already know the title from the front page, no need to scream it at me.
7habryka
If you get linked externally (which is most of LW's traffic), you don't know the title (and also generally are less oriented on the page, so it helps to have a very clear information hierarchy).  I do also agree the font is very large. I made an intentionally bold choice here for a strong stylistic effect. I do think it's pretty intense and it might be the wrong choice, but I currently like it aesthetically a bunch.
3Ali Ahmed
The new UI is great, and I agree with the thinking behind de-emphasizing karma votes at the top. It could sometimes create inherent bias and assumptions (no matter whether the karma is high or low) even before reading a post, whereas it would make more sense at the end of the post.
3Perhaps
The karma buttons are too small for actions that in my experience, are done a lot more than clicking to listen to the post. It's pretty easy to misclick. Additionally, it's unclear what the tags are, as they're no longer right beside the post to indicate their relevance. 
6habryka
The big vote buttons are at the bottom of the post, where I would prefer more of the voting to happen (I am mildly happy to discourage voting at the top of the post before you read it, though I am not confident).
2Zach Stein-Perlman
I ~always want to see the outline when I first open a post and when I'm reading/skimming through it. I wish the outline appeared when-not-hover-over-ing for me.
2interstice
I like the decluttering. I think the title should be smaller and have less white space above it. Also think that it would be better if the ToC was maybe just faded a lot until mouseover, the sudden appearance/disappearance feels too sudden.
4habryka
I think making things faint enough so that the relatively small margin between main body text and the ToC wouldn't become bothersome during reading isn't really feasible. In-general, because people's screen-contrast and color calibration differs quite a lot, you don't have that much wiggle room at the lower level of opacity without accidentally shipping completely different experiences to different users. I think it's plausible we want to adjust the whitespace below the title, but I think you really need this much space above the title to not have it look cluttered together with the tags on smaller screens. On larger screens there is enough distance between the title and top right corner, but things end up much harder to parse when the tags extend into the space right above the title, and that margin isn't big enough.
1quila
I think this applies to titles too
5habryka
I think the title is more important for parsing the content of an essay. Like, if a friend sends you a link, it's important to pay a bunch of attention to the title. It's less important that you spend attention to the karma.

Had a very aggressive crawler basically DDos-ing us from a few dozen IPs for the last hour. Sorry for the slower server response times. Things should be fixed now.

Random thoughts on game theory and what it means to be a good person

It does seem to me like there doesn’t exist any good writing on game theory from a TDT perspective. Whenever I read classical game theory, I feel like the equilibria that are being described obviously fall apart when counterfactuals are being properly brought into the mix (like D/D in prisoners dilemmas).

The obvious problem with TDT-based game theory, just as it is with Bayesian epistemology, the vast majority of direct applications are completely computationally intractable. It’s kind of obvious what should happen in games with lots of copies of yourself, but as soon as anything participates that isn’t a precise copy, everything gets a lot more confusing. So it is not fully clear what a practical game-theory literature from a TDT-perspective would look like, though maybe the existing LessWrong literature on Bayesian epistemology might be a good inspiration.

Even when you can’t fully compute everything (and we even don’t really know how to compute everything in principle), you might still be able to go through concrete scenarios and list considerations and perspectives that incorporate TDT-perspectives. I guess in t

... (read more)

Reading through this, I went "well, obviously I pay the mugger...

...oh, I see what you're doing here."

I don't have a full answer to the problem you're specifying, but something that seems relevant is the question of "How much do you want to invest in the ability to punish defectors [both in terms of maximum power-to-punish, a-la nukes, and in terms of your ability to dole out fine-grained-exactly-correct punishment, a-la skilled assassins]"

The answer to this depends on your context. And how you have answered this question determines whether it makes sense to punish people in particular contexts.

In many cases there might want to be some amount of randomization where at least some of the time you really disproportionately punish people, but you don't have to pay the cost of doing so every time.

Answering a couple of the concrete questions:

Mugger

Right now, in real life, I've never been mugged, and I feel fine basically investing zero effort into preparing for being mugged. If I do get mugged, I will just hand over my wallet.

If I was getting mugged all the time, I'd probably invest effort into a) figuring out what good policies existed ... (read more)

2Lukas Finnveden
Any reason why you mention timeless decision theory (TDT) specifically? My impression was that functional decision theory (as well as UDT, since they're basically the same thing) is regarded as a strict improvement over TDT.
2habryka
Same thing, it's just the handle that stuck in my mind. I think of the whole class as "timeless", since I don't think there exists a good handle that describes all of them.

Making yourself understandable to other people

(Epistemic status: Processing obvious things that have likely been written many times before, but that are still useful to have written up in my own language)

How do you act in the context of a community that is vetting constrained? I think there are fundamentally two approaches you can use to establish coordination with other parties:

1. Professionalism: Establish that you are taking concrete actions with predictable consequences that are definitely positive

2. Alignment: Establish that you are a competent actor that is acting with intentions that are aligned with the aims of others

I think a lot of the concepts around professionalism arise when you have a group of people who are trying to coordinate, but do not actually have aligned interests. In those situations you will have lots of contracts and commitments to actions that have well-specified outcomes and deviations from those outcomes are generally considered bad. It also encourages a certain suppression of agency and a fear of people doing independent optimization in a way that is not transparent to the rest of the group.

Given a lot of these drawbacks, it seems natural to aim for e... (read more)

4jp
I had forgotten this post, reread it and still think it's one of the better things of it's length I've read recently.
5habryka
Glad to hear that! Seems like a good reason to publish this as a top-level post. Might go ahead and do that in the next few days.
4nicoleross
+1 for publishing as a top level post

This FB post by Matt Bell on the Delta Variant helped me orient a good amount: 

https://www.facebook.com/thismattbell/posts/10161279341706038

As has been the case for almost the entire pandemic, we can predict the future by looking at the present. Let’s tackle the question of “Should I worry about the Delta variant?” There’s now enough data out of Israel and the UK to get a good picture of this, as nearly all cases in Israel and the UK for the last few weeks have been the Delta variant. [1] Israel was until recently the most-vaccinated major country in the world, and is a good analog to the US because they’ve almost entirely used mRNA vaccines.

- If you’re fully vaccinated and aren’t in a high risk group, the Delta variant looks like it might be “just the flu”. There are some scary headlines going around, like “Half of new cases in Israel are among vaccinated people”, but they’re misleading for a couple of reasons. First, since Israel has vaccinated over 80% of the eligible population, the mRNA vaccine still is 1-((0.5/0.8)/(0.5/0.2)) = 75% effective against infection with the Delta variant. Furthermore, the efficacy of the mRNA vaccine is still very high ( > 90%) against hosp

... (read more)

This seems like potentially a big deal: https://mobile.twitter.com/DrEricDing/status/1402062059890786311

> Troubling—the worst variant to date, the #DeltaVariant is now the new fastest growing variant in US. This is the so-called “Indian” variant #B16172 that is ravaging the UK despite high vaccinations because it has immune evasion properties. Here is why it’s trouble—Thread. #COVID19
 

6SoerenMind
There's also a strong chance that delta is the most transmissible variant we know even without its immune evasion (source: I work on this, don't have a public source to share). I agree with your assessment that delta is a big deal.
4ChristianKl
The fact that we still use the same sequence to vaccinate seems like civilisational failure. 
4wunan
Those graphs all show the percentage share of the different variants, but more important would be the actual growth rate. Is the delta variant growing, or is it just shrinking less quickly than the others?

@Elizabeth was interested in me crossposting this comment from the EA Forum since she thinks there isn't enough writing on the importance of design on LW. So here it is.

Atlas reportedly spent $10,000 on a coffee table. Is this true? Why was the table so expensive?

Atlas at some point bought this table, I think: https://sisyphus-industries.com/product/metal-coffee-table/. At that link it costs around $2200, so I highly doubt the $10,000 number.

Lightcone then bought that table from Atlas a few months ago at the listing price, since Jonas thought the purchase ... (read more)

I like this shortform feed idea!

We launched the books on Product Hunt today! 

Leaving this here: 2d9797e61e533f03382a515b61e6d6ef2fac514f

Since this hash is publicly posted, is there any timescale for when we should check back to see the preimage?

5habryka
If relevant, I will reveal it within the next week.
6habryka
Preimage was:  Hashed using https://www.fileformat.info/tool/hash.htm using the SHA-1 hash.