I am confident, on the basis of private information I can't share, that Anthropic has asked at least some employees to sign similar non-disparagement agreements that are covered by non-disclosure agreements as OpenAI did.
Or to put things into more plain terms:
I am confident that Anthropic has offered at least one employee significant financial incentive to promise to never say anything bad about Anthropic, or anything that might negatively affect its business, and to never tell anyone about their commitment to do so.
I am not aware of Anthropic doing anything like withholding vested equity the way OpenAI did, though I think the effect on discourse is similarly bad.
I of course think this is quite sad and a bad thing for a leading AI capability company to do, especially one that bills itself on being held accountable by its employees and that claims to prioritize safety in its plans.
Hey all, Anthropic cofounder here. I wanted to clarify Anthropic's position on non-disparagement agreements:
In other words— we're not here to play games with AI safety using legal contracts. Anthropic's whole reason for existing is to increase the chance that AI goes well, and spu...
Please keep up the pressure on us
OK:
(Sidenote: it seems Sam was kind of explicitly asking to be pressured, so your comment seems legit :)
But I also think that, had Sam not done so, I would still really appreciate him showing up and responding to Oli's top-level post, and I think it should be fine for folks from companies to show up and engage with the topic at hand (NDAs), without also having to do a general AMA about all kinds of other aspects of their strategy and policies. If Zach's questions do get very upvoted, though, it might suggest there's demand for some kind of Anthropic AMA event.)
Anyone who has signed a non-disparagement agreement with Anthropic is free to state that fact (and we regret that some previous agreements were unclear on this point) [emphasis added]
This seems as far as I can tell a straightforward lie?
I am very confident that the non-disparagement agreements you asked at least one employee to sign were not ambiguous, and very clearly said that the non-disparagement clauses could not be mentioned.
To reiterate what I know to be true: Employees of Anthropic were asked to sign non-disparagement agreements with a commitment to never tell anyone about the presence of those non-disparagement agreements. There was no ambiguity in the agreements that I have seen.
@Sam McCandlish: Please clarify what you meant to communicate by the above, which I interpreted as claiming that there was merely ambiguity in previous agreements about whether the non-disparagement agreements could be disclosed, which seems to me demonstrably false.
I can confirm that my concealed non-disparagement was very explicit that I could not discuss the existence or terms of the agreement, I don't see any way I could be misinterpreting this. (but I have now kindly been released from it!)
EDIT: It wouldn't massively surprise me if Sam just wasn't aware of its existence though
We're not claiming that Anthropic never offered a confidential non-disparagement agreement. What we are saying is: everyone is now free to talk about having signed a non-disparagement agreement with us, regardless of whether there was a non-disclosure previously preventing it. (We will of course continue to honor all of Anthropic's non-disparagement and non-disclosure obligations, e.g. from mutual agreements.)
If you've signed one of these agreements and have concerns about it, please email hr@anthropic.com.
Hmm, I feel like you didn't answer my question. Can you confirm that Anthropic has asked at least some employees to sign confidential non-disparagement agreements?
I think your previous comment pretty strongly implied that you think you did not do so (i.e. saying any previous agreements were merely "unclear" I think pretty clearly implies that none of them did include a non-ambiguous confidential non-disparagement agreement). I want to it to be confirmed and on the record that you did, so I am asking you to say so clearly.
I really think the above was meant to imply that the non disparagement agreements were merely unclear on whether they were covered by a non disclosure clause (and I would be happy to take bets on how a randomly selected reader would interpret it).
My best guess is Sam was genuinely confused on this and that there are non disparagement agreements with Anthropic that clearly are not covered by such clauses.
EDIT: Anthropic have kindly released me personally from my entire concealed non-disparagement, not just made a specific safety exception. Their position on other employees remains unclear, but I take this as a good sign
If someone signed a non-disparagement agreement in the past and wants to raise concerns about safety at Anthropic, we welcome that feedback and will not enforce the non-disparagement agreement.
Thanks for this update! To clarify, are you saying that you WILL enforce existing non disparagements for everything apart from safety, but you are specifically making an exception for safety?
this routine use of non-disparagement agreements, even in these narrow cases, conflicts with our mission
Given this part, I find this surprising. Surely if you think it's bad to ask future employees to sign non disparagements you should also want to free past employees from them too?
This comment appears to respond to habryka, but doesn’t actually address what I took to be his two main points—that Anthropic was using NDAs to cover non-disparagement agreements, and that they were applying significant financial incentive to pressure employees into signing them.
We historically included standard non-disparagement agreements by default in severance agreements
Were these agreements subject to NDA? And were all departing employees asked to sign them, or just some? If the latter, what determined who was asked to sign?
Anyone who has signed a non-disparagement agreement with Anthropic is free to state that fact (and we regret that some previous agreements were unclear on this point).
I'm curious as to why it took you (and therefore Anthropic) so long to make it common knowledge (or even public knowledge) that Anthropic used non-disparagement contracts as a standard and was also planning to change its standard agreements.
The right time to reveal this was when the OpenAI non-disparagement news broke, not after Habryka connects the dots and builds social momentum for scrutiny of Anthropic.
that Anthropic used non-disparagement contracts as a standard and was also planning to change its standard agreements.
I do want to be clear that a major issue is that Anthropic used non-disparagement agreements that were covered by non-disclosure agreements. I think that's an additionally much more insidious thing to do, that contributed substantially to the harm caused by the OpenAI agreements, and I think is important fact to include here (and also makes the two situations even more analogous).
Note, since this is a new and unverified account, that Jack Clark (Anthropic co-founder) confirmed on Twitter that the parent comment is the official Anthropic position https://x.com/jackclarkSF/status/1808975582832832973
Thank you for responding! (I have more comments and questions but figured I would shoot off one quick question which is easy to ask)
We've since recognized that this routine use of non-disparagement agreements, even in these narrow cases, conflicts with our mission
Can you clarify what you mean by "even in these narrow cases"? If I am understanding you correctly, you are saying that you were including a non-disparagement clause by default in all of your severance agreements, which sounds like the opposite of narrow (edit: though as Robert points out it depends on what fraction of employees get offered any kind of severance, which might be most, or might be very few).
I agree that it would have technically been possible for you to also include such an agreement on start of employment, but that would have been very weird, and not even OpenAI did that.
I think using the sentence "even in these narrow cases" seems inappropriate given that (if I am understanding you correctly) all past employees were affected by these agreements. I think it would be good to clarify what fraction of past employees were actually offered these agreements.
Severance agreements typically aren't offered to all departing employees, but usually only those that are fired or laid off. We know that not all past employees were affected by these agreements, because Ivan claims to not have been offered such an agreement, and he left[1] in mid-2023, which was well before June 1st.
Presumably of his own volition, hence no offered severance agreement with non-disparagement clauses.
To expand on my "that's a crux": if the non-disparagement+NDA clauses are very standard, such that they were included in a first draft by an attorney without prompting and no employee ever pushed back, then I would think this was somewhat less bad.
It would still be somewhat bad, because Anthropic should be proactive about not making those kinds of mistakes. I am confused about what level of perfection to demand from Anthropic, considering the stakes.
And if non-disparagement is often used, but Anthropic leadership either specified its presence or its form, that would seem quite bad to me, because mistakes of commision here are more evidence of poor decisionmaking than mistakes of omission. If Anthropic leadership decided to keep the clause when a departing employee wanted to remove the clause, that would similarly seem quite bad to me.
I think that both these clauses are very standard in such agreements. Both severance letter templates I was given for my startup, one from a top-tier SV investor's HR function and another from a top-tier SV law firm, had both clauses. When I asked Claude, it estimated 70-80% of startups would have a similar non-disparagement clause and 80-90% would have a similar confidentiality-of-this-agreement's-terms clause. The three top Google hits for "severance agreement template" all included those clauses.
These generally aren't malicious. Terminations get messy and departing employees often have a warped or incomplete picture of why they were terminated–it's not a good idea to tell them all those details, because that adds liability, and some of those details are themselves confidential about other employees. Companies view the limitation of liability from release of various wrongful termination claims as part of the value they're "purchasing" by offering severance–not because those claims would succeed, but because it's expensive to explain in court why they're justified. But the expenses disgruntled ex-employees can cause is not just legal, it's also reputational. You usually don'...
And internally, we have an anonymous RSP non-compliance reporting line so that any employee can raise concerns about issues like this without any fear of retaliation.
Are you able to elaborate on how this works? Are there any other details about this publicly, couldn't find more detail via a quick search.
Some specific qs I'm curious about: (a) who handles the anonymous complaints, (b) what is the scope of behavior explicitly (and implicitly re: cultural norms) covered here, (c) handling situations where a report would deanonymize the reporter (or limit them to a small number of people)?
OK, let's imagine I had a concern about RSP noncompliance, and felt that I needed to use this mechanism.
(in reality I'd just post in whichever slack channel seemed most appropriate; this happens occasionally for "just wanted to check..." style concerns and I'm very confident we'd welcome graver reports too. Usually that'd be a public channel; for some compartmentalized stuff it might be a private channel and I'd DM the team lead if I didn't have access. I think we have good norms and culture around explicitly raising safety concerns and taking them seriously.)
As I understand it, I'd:
Good that it's clear who it goes to, though if I was an anthropic I'd want an option to escalate to a board member who isn't Dario or Daniella, in case I had concerns related to the CEO
Makes sense - if I felt I had to use an anonymous mechanism, I can see how contacting Daniela about Dario might be uncomfortable. (Although to be clear I actually think that'd be fine, and I'd also have to think that Sam McCandlish as responsible scaling officer wouldn't handle it)
If I was doing this today I guess I'd email another board member; and I'll suggest that we add that as an escalation option.
Are there currently board members who are meaningfully separated in terms of incentive-alignment with Daniella or Dario? (I don't know that it's possible for you to answer in a way that'd really resolve my concerns, given what sort of information is possible to share. But, "is there an actual way to criticize Dario and/or Daniella in a way that will realistically be given a fair hearing by someone who, if appropriate, could take some kind of action" is a crux of mine)
Anthropic has asked employees
[...]
Anthropic has offered at least one employee
As a point of clarification: is it correct that the first quoted statement above should be read as "at least one employee" in line with the second quoted statement? (When I first read it, I parsed it as "all employees" which was very confusing since I carefully read my contract both before signing and a few days ago (before posting this comment) and I'm pretty sure there wasn't anything like this in there.)
(I'm a full-time employee at Anthropic.)
I carefully read my contract both before signing and a few days ago [...] there wasn't anything like this in there.
Current employees of OpenAI also wouldn't yet have signed or even known about the non-disparagement agreement that is part of "general release" paperwork on leaving the company. So this is only evidence about some ways this could work at Anthropic, not others.
I am disappointed. Using nondisparagement agreements seems bad to me, especially if they're covered by non-disclosure agreements, especially if you don't announce that you might use this.
My ask-for-Anthropic now is to explain the contexts in which they have asked or might ask people to incur nondisparagement obligations, and if those are bad, release people and change policy accordingly. And even if nondisparagement obligations can be reasonable, I fail to imagine how non-disclosure obligations covering them could be reasonable, so I think Anthropic should at least do away with the no-disclosure-of-nondisparagement obligations.
Does anyone from Anthropic want to explicitly deny that they are under an agreement like this?
(I know the post talks about some and not necessarily all employees, but am still interested).
I left Anthropic in June 2023 and am not under any such agreement.
EDIT: nor was any such agreement or incentive offered to me.
I left [...] and am not under any such agreement.
Neither is Daniel Kokotajlo. Context and wording strongly suggest that what you mean is that you weren't ever offered paperwork with such an agreement and incentives to sign it, but there remains a slight ambiguity on this crucial detail.
Correct, I was not offered such paperwork nor any incentives to sign it. Edited my post to include this.
I am a current Anthropic employee, and I am not under any such agreement, nor has any such agreement ever been offered to me.
If asked to sign a self-concealing NDA or non-disparagement agreement, I would refuse.
I agree that this kind of legal contract is bad, and Anthropic should do better. I think there are a number of aggrevating factors which made the OpenAI situation extrodinarily bad, and I'm not sure how much these might obtain regarding Anthropic (at least one comment from another departing employee about not being offered this kind of contract suggest the practice is less widespread).
-amount of money at stake
-taking money, equity or other things the employee believed they already owned if the employee doesn't sign the contract, vs. offering them something new (IANAL but in some cases, this could be a felony "grand theft wages" under California law if a threat to withhold wages for not signing a contract is actually carried out, what kinds of equity count as wages would be a complex legal question)
-is this offered to everyone, or only under circumstances where there's a reasonable justification?
-is this only offered when someone is fired or also when someone resigns?
-to what degree are the policies of offering contracts concealed from employees?
-if someone asks to obtain legal advice and/or negotiate before signing, does the company allow this?
-if this becomes public, does the comp...
This is true. I signed a concealed non-disparagement when I left Anthropic in mid 2022. I don't have clear evidence this happened to anyone else (but that's not strong evidence of absence). More details here
EDIT: I should also clarify that I personally don't think Anthropic acted that badly, and recommend reading about what actually happened before forming judgements. I do not think I am the person referred to in Habryka's comment.
In the case of OpenAI most of the debate was about ex-employees. Are we talking about current employees or ex-employees here?
I am including both in this reference class (i.e. when I say employee above, it refers to both present employees and employees who left at some point). I am intentionally being broad here to preserve more anonymity of my sources.
Not sure how to interpret the "agree" votes on this comment. If someone is able to share that they agree with the core claim because of object-level evidence, I am interested. (Rather than agreeing with the claim that this state of affairs is "quite sad".)
(Not answering this question since I think it would leak too many bits on confidential stuff. In general I will be a bit hesitant to answer detailed questions on this, or I might take a long while to think about what to say before I answer, which I recognize is annoying, but I think is the right tradeoff in this situation)
Reputation is lazily evaluated
When evaluating the reputation of your organization, community, or project, many people flock to surveys in which you ask randomly selected people what they think of your thing, or what their attitudes towards your organization, community or project are.
If you do this, you will very reliably get back data that looks like people are indifferent to you and your projects, and your results will probably be dominated by extremely shallow things like "do the words in your name invoke positive or negative associations".
People largely only form opinions of you or your projects when they have some reason to do that, like trying to figure out whether to buy your product, or join your social movement, or vote for you in an election. You basically never care about what people think about you while engaging in activities completely unrelated to you, you care about what people will do when they have to take any action that is related to your goals. But the former is exactly what you are measuring in attitude surveys.
As an example of this (used here for illustrative purposes, and what caused me to form strong opinions on this, but not intended as the central po...
I don't like the fact that this essay is a mix of an insightful generic argument and a contentious specific empirical claim that I don't think you support strongly; it feels like the rhetorical strength of the former lends credence to the latter in a way that isn't very truth-tracking.
I'm not claiming you did anything wrong here, I just don't like something about this dynamic.
I do think the EA example is quite good on an illustrative level. It really strikes me as a rare case where we have an enormous pile of public empirical evidence (which is linked in the post) and it also seems by now really quite clear from a common-sense perspective.
I don't think it makes sense to call this point "contentious". I think it's about as clear as these cases go. At least of the top of my head I can't think of an example that would have been clearer (maybe if you had some social movement that more fully collapsed and where you could do a retrospective root cause analysis, but it's extremely rare to have as clear of a natural experiment as the FTX one). I do think it's political in our local social environment, and so is harder to talk about, so I agree on that dimension a different example would be better.
I do think it would be good/nice to add an additional datapoint, but I also think this would risk being misleading. The point about reputation being lazily evaluated is mostly true from common-sense observations and logical reasoning, and the EA point is mostly trying to provide evidence for "yes, this is a real mistake that real people make". I think even if you...
There's another important effect here: a laggy time course of public opinion. I saw more popular press articles about EA than I ever have, linking SBF to them, but with a large lag after the events. So the early surveys showing a small effect happened before public conversation really bounced around the idea that SBFs crimes were motivated by EA utilitarian logic. The first time many people would remember hearing about EA would be from those later articles and discussions.
The effect probably amplified considerably over time as that hypothesis bounced through public discourse.
The original point stands but this is making the effect look much larger in this case.
Practically all growth metrics are down (and have indeed turned negative on most measures), a substantial fraction of core contributors are distancing themselves from the EA affiliation, surveys among EA community builders report EA-affiliation as a major recurring obstacle[1], and many of the leaders who previously thought it wasn't a big deal now concede that it was/is a huge deal.
Also, informally, recruiting for things like EA Fund managers, or getting funding for EA Funds has become substantially harder. EA leadership positions appear to be filled by less competent people, and in most conversations I have with various people who have been around for a while, people seem to both express much less personal excitement or interest in identifying or championing anything EA-related, and report the same for most other people.
Related to the concepts in my essay, when measured the reputational differential also seem to reliably point towards people updating negatively towards EA as they learn more about EA (which shows up in the quotes you mentioned, and which more recently shows up in the latest Pulse survey, though I mostly consider that survey uninformative for roughly the reasons ou...
Hey! Sorry for the silence, I was feeling a bit stressed by this whole thread, and so I wanted to step away and think about this before responding. I’ve decided to revert the dashboard back to its original state & have republished the stale data. I did some quick/light data checks but prioritised getting this out fast. For transparency: I’ve also added stronger context warnings and I took down the form to access our raw data in sheet form but intend to add it back once we’ve fixed the data. It’s still on our stack to Actually Fix this at some point but we’re still figuring out the timing on that.
On reflection, I think I probably made the wrong call here (although I still feel a bit sad / misunderstood but 🤷🏻♀️). It was a unilateral + lightly held call I made in the middle of my work day — like truly I spent 5 min deciding this & maybe another ~15 updating the thing / leaving a comment. I think if I had a better model for what people wanted from the data, I would have made a different call. I’ve updated on “huh, people really care about not deleting data from the internet!” — although I get that the reaction here might be especially strong because it’s about CEA...
[musing] Actually another mistake here which I wish I just said in the first comment: I didn't have a strong enough TAP for, if someone says a negative thing about your org (or something that could be interpreted negatively), you should have a high bar for not taking away data (meaning more broadly than numbers) that they were using to form that perception, even if you think the data is wrong for reasons they're not tracking. You can like, try and clarify the misconception (ideally, given time & energy constraints etc.), and you can try harder to avoid putting wrong things out there, but don't just take it away -- it's not on to reader to treat you charitably and it kind of doesn't matter what your motives were.
I think I mostly agree with something like that / I do think people should hold orgs to high standards here. I didn't pay enough attention to this and regret it. Sorry! (I'm back to ignoring this thread lol but just felt like sharing a reflection 🤷🏻♀️)
Oh, huh, that seems very sad. Why would you do that? Please leave up the data that we have. I think it's generally bad form to break links that people relied on. The data was accurate as far as I can tell until August 2024, and you linked to it yourself a bunch over the years, don't just break all of those links.
I am pretty up-to-date with other EA metrics and I don't really see how this would be misleading. You had a disclaimer at the top that I think gave all the relevant context. Let people make their own inferences, or add more context, but please don't just take things down.
Unfortunately, archive.org doesn't seem to have worked for that URL, so we can't even rely on that to show the relevant data trends.
Edit: I'll be honest, after thinking about it for longer, the only reason I can think of why you would take down the data is because it makes CEA and EA look less on an upwards trajectory. But this seems so crazy. How can I trust data coming out of CEA if you have a policy of retracting data that doesn't align with the story you want to tell about CEA and EA? The whole point of sharing raw data is to allow other people to come to their own conclusions. This really seems like such a dumb move from a trust perspective.
I also believe that the data making EA+CEA looks bad is the causal reason why it was taken down. However, I want to add some slight nuance.
I want to contrast a model whereby Angelina Li did this while explicitly trying to stop CEA from looking bad, versus a model whereby she senses that something bad might be happening, she might be held responsible (e.g. within her organization / community), and is executing a move that she's learned is 'responsible' from the culture around her.
I think many people have learned to believe the reasoning step "If people believe bad things about my team I think are mistaken with the information I've given them, then I am responsible for not misinforming people, so I should take the information away, because it is irresponsible to cause people to have false beliefs". I think many well-intentioned people will say something like this, and that this is probably because of two reasons (borrowing from The Gervais Principle):
I agree, and I am a bit disturbed that it needs to be said.
At normal, non-EA organizations -- and not only particularly villainous ones, either! -- it is understood that you need to avoid sharing any information that reflects poorly on the organization, unless it's required by law or contract or something. The purpose of public-facing communications is to burnish the org's reputation. This is so obvious that they do not actually spell it out to employees.
Of COURSE any organization that has recently taken down unflattering information is doing it to maintain its reputation.
I'm sorry, but this is how "our people" get taken for a ride. Be more cynical, including about people you like.
I think many people have learned to believe the reasoning step "If people believe bad things about my team I think are mistaken with the information I've given them, then I am responsible for not misinforming people, so I should take the information away, because it is irresponsible to cause people to have false beliefs". I think many well-intentioned people will say something like this, and that this is probably because of two reasons (borrowing from The Gervais Principle):
(Comment not specific to the particulars of this issue but noted as a general policy:) I think that as a general rule, if you are hypothesizing reasons for why somebody might say a thing, you should always also include the hypothesis that "people say a thing because they actually believe in it". This is especially so if you are hypothesizing bad reasons for why people might say it.
It's very annoying when someone hypothesizes various psychological reasons for your behavior and beliefs but never even considers as a possibility the idea that maybe you might have good reasons to believe in it. Compare e.g. "rationalists seem to believe that superintelligence is imminent; I think this is probably because that l...
Thoughts on integrity and accountability
[Epistemic Status: Early draft version of a post I hope to publish eventually. Strongly interested in feedback and critiques, since I feel quite fuzzy about a lot of this]
When I started studying rationality and philosophy, I had the perspective that people who were in positions of power and influence should primarily focus on how to make good decisions in general and that we should generally give power to people who have demonstrated a good track record of general rationality. I also thought of power as this mostly unconstrained resource, similar to having money in your bank account, and that we should make sure to primarily allocate power to the people who are good at thinking and making decisions.
That picture has changed a lot over the years. While I think there is still a lot of value in the idea of "philosopher kings", I've made a variety of updates that significantly changed my relationship to allocating power in this way:
This was a great post that might have changed my worldview some.
Some highlights:
1.
People's rationality is much more defined by their ability to maneuver themselves into environments in which their external incentives align with their goals, than by their ability to have correct opinions while being subject to incentives they don't endorse. This is a tractable intervention and so the best people will be able to have vastly more accurate beliefs than the average person, but it means that "having accurate beliefs in one domain" doesn't straightforwardly generalize to "will have accurate beliefs in other domains".
I've heard people say things like this in the past, but haven't really taken it seriously as an important component of my rationality practice. Somehow what you say here is compelling to me (maybe because I recently noticed a major place where my thinking was majorly constrained by my social ties and social standing) and it prodded me to think about how to build "mech suits" that not only increase my power but incentives my rationality. I now have a todo item to "think about principles for incentivizing true belief...
Welp, I guess my life is comic sans today. The EA Forum snuck some code into our deployment bundle for my account in-particular, lol: https://github.com/ForumMagnum/ForumMagnum/pull/9042/commits/ad99a147824584ea64b5a1d0f01e3f2aa728f83a
A thing that I've been thinking about for a while has been to somehow make LessWrong into something that could give rise to more personal-wikis and wiki-like content. Gwern's writing has a very different structure and quality to it than the posts on LW, with the key components being that they get updated regularly and serve as more stable references for some concept, as opposed to a post which is usually anchored in a specific point in time.
We have a pretty good wiki system for our tags, but never really allowed people to just make their personal wiki pages, mostly because there isn't really any place to find them. We could list the wiki pages you created on your profile, but that doesn't really seem like it would allocate attention to them successfully.
I was thinking about this more recently as Arbital is going through another round of slowly rotting away (its search currently being broken and this being very hard to fix due to annoying Google Apps Engine restrictions) and thinking about importing all the Arbital content into LessWrong. That might be a natural time to do a final push to enable people to write more wiki-like content on the site.
somehow make LessWrong into something that could give rise to more personal-wikis and wiki-like content. Gwern's writing has a very different structure and quality to it than the posts on LW...We have a pretty good wiki system for our tags, but never really allowed people to just make their personal wiki pages, mostly because there isn't really any place to find them. We could list the wiki pages you created on your profile, but that doesn't really seem like it would allocate attention
Multi-version wikis are a hard design problem.
It's something that people kept trying, when they soured on a regular Wikipedia: "the need for consensus makes it impossible for minority views to get a fair hearing! I'll go make my own Wikipedia where everyone can have their own version of an entry, so people can see every side! with blackjack & hookers & booze!" And then it becomes a ghost town, just like every other attempt to replace Wikipedia. (And that's if you're lucky: if you're unlucky you turn into Conservapedia or Rational Wiki.) I'm not aware of any cases of 'non-consensus' wikis that really succeed - it seems that usually, there's so little editor activity to go around that having ...
One thing that feels cool about personal wikis is that people come up with their own factorization and ontology for the things they are thinking about...So I think in addition to the above there needs to be a way for users to easily and without friction add a personal article for some concept they care about, and to have a consistent link to it, in a way that doesn't destroy any of the benefits of the collaborative editing.
My proposal already provides a way to easily add a personal article with a consistent link, while preserving the ability to do collaborative editing on 'public' articles. Strictly speaking, it's fine for people to add wiki entries for their own factorization and ontology.
There is no requirement for those to all be 'official': there doesn't have to be a 'consensus' entry. Nothing about a /wiki/Acausal_cooperation/gwern
user entry requires the /wiki/Acausal_cooperation
consensus entry to exist. (Computers are flexible like that.) That just means there's nothing there at that exact URL, or probably better, it falls back to displaying all sub-pages of user entries like usual. (User entries presumably get some sort of visual styling, in the same way that comments o...
Does anyone have any takes on the two Boeing whistleblowers who died under somewhat suspicious circumstances? I haven't followed this in detail, and my guess is it is basically just random chance, but it sure would be a huge deal if a publicly traded company now was performing assassinations of U.S. citizens.
Curious whether anyone has looked into this, or has thought much about baseline risk of assassinations or other forms of violence from economic actors.
I'm probably missing something simple, but what is 356? I was expecting a probability or a percent, but that number is neither.
I think 356 or more people in the population needed to make there be a >5% of 2+ deaths in a 2 month span from that population
I think there should be some sort of adjustment for Boeing not being exceptionally sus before the first whistleblower death - shouldn't privilege Boeing until after the first death, should be thinking across all industries big enough that the news would report on the deaths of whistleblowers. which I think makes it not significant again.
I have updated the OpenAI Email Archives to now also include all emails that OpenAI has published in their March 2024 and December 2024 blogposts!
I continue to think reading through these is quite valuable, and even more interesting with the March 2024 and December 2024 emails included.
Btw less.online is happening. LW post and frontpage banner probably going up Sunday or early next week.
Thoughts on voting as approve/disapprove and agree/disagree:
One of the things that I am most uncomfortable with in the current LessWrong voting system is how often I feel conflicted between upvoting something because I want to encourage the author to write more comments like it, and downvoting something because I think the argument that the author makes is importantly flawed and I don't want other readers to walk away with a misunderstanding about the world.
I think this effect quite strongly limits certain forms of intellectual diversity on LessWrong, because many people will only upvote your comment if they agree with it, and downvote comments they disagree with, and this means that arguments supporting people's existing conclusions have a strong advantage in the current karma system. Whereas the most valuable comments are likely ones that challenge existing beliefs and that are rigorously arguing for unpopular positions.
A feature that has been suggested many times over the years is to split voting into two dimensions. One dimension being "agree/disagree" and the other being "approve/disapprove". Only the "approve/disapprove" dimension m...
Having a reaction for "changed my view" would be very nice.
Features like custom reactions gives me this feeling that.. language will emerge from allowing people to create reactions that will be hard to anticipate but, in retrospect, crucial. Playing a similar role that body language plays during conversation, but designed, defined, explicit.
If someone did want to introduce the delta through this system, it might be necessary to give the coiner of a reaction some way of linking an extended description. In casual exchanges.. I've found myself reaching for an expression that means "shifted my views in some significant lasting way" that's kind of hard to explain in precise terms, and probably impossible to reduce to one or two words, but it feels like a crucial thing to measure. In my description, I would explain that a lot of dialogue has no lasting impact on its participants, it is just two people trying to better understand where they already are. When something really impactful is said, I think we need to establish a habit of noticing and recognising that.
But I don't know. Maybe that's not the reaction type that what will justify the feature. Maybe it will be something we can't think of now.
Generally, it seems useful to be able to take reduced measurements of the mental states of the readers.
the language that will emerge from allowing people to create reactions that will be hard to anticipate but, in retrospect, crucial
This is essentially the concept of a folksonomy, and I agree that it is potentially both applicable here and quite important.
After many years of pain, LessWrong now has fixed kerning and a consistent sans-serif font on all operating systems. You have probably seen terrible kerning like this over the last few years on LW:
It really really looks like there is no space between the first comma and "Ash". This is because Apple has been shipping an extremely outdated version of Gill Sans with terribly broken kerning, often basically stripping spaces completely. We have gotten many complaints about this over the years.
But it is now finally fixed. However, changing fonts likely has many downstream effects on various layout things being broken in small ways. If you see any buttons or text misaligned, let us know, and we'll fix it. We already cleaned up a lot, but I am expecting a long tail of small fixes.
I don't know what specific change is responsible, but ever since that change, for me the comments are now genuinely uncomfortable to read.
Well, let’s see. Calibri is a humanist sans; Gill Sans is technically also humanist, but more more geometric in design. Geometric sans fonts tend to be less readable when used for body text.
Gill Sans has a lower x-height than Calibri. That (obviously) is the cause of all the “the new font looks smaller” comments.
(A side-by-side comparison of the fonts, for anyone curious, although note that this is Gill Sans MT Pro, not Gill Sans Nova, so the weight [i.e., stroke thickness] will be a bit different than the version that LW now uses.)
Now, as far as font rendering goes… I just looked at the site on my Windows box (adjusting the font stack CSS value to see Gill Sans Nova again, since I see you guys tweaked it to give Calibri priority)… yikes. Yeah, that’s not rendering well at all. Definitely more blurry than Calibri. Maybe something to do with the hinting, I don’t know. (Not really surprising, since Calibri was designed from the beginning to look good on Windows.) And I’ve got a hi-DPI monitor on my Windows machine…
Interestingly, the older version of Gill Sans (seen in the demo on my wiki, linked above) doesn’t have this problem; it renders crisply on Windows. (Note that this is not t...
would not want the comment font be the same as the post font [...] the small font-size that you want to display comments as
I had to increase the zoom level by about 20% (from 110% to 130%) after this change to make the comments readable[1]. This made post text too big to the point where I would normally adjust zoom level downward, but I can't in this case[2], since the comments are on the same site as the posts. Also the lines in both posts and comments are now too long (with greater zoom).
I sit closer to the monitor than standard to avoid need for glasses[3], so long lines have higher angular distance. In practice modern sites usually have a sufficiently narrow column of text in the middle so this is almost never a problem. Before the update, LW line lengths were OK (at 110% zoom). At monitor/window width 1920px, substack's 728px seems fine (at default zoom), but LW's 682px get balooned too wide with 130% zoom.
The point is not that accomodating sitting closer to the monitor is an important use case for a site's designer, but that somehow the convergent design of most of the web manages to pass this test, so there might be more reasons for that.
Incidentally, the footnote font si...
Small font-size? No! Same font-size! I don't want the comments in a smaller font OR a different font! I want it all the same font as the posts, including the same size.
This looks good to me:
This looks terrible to me:
We have done lots of users interviews over the years! Fonts are always polarizing, but people have a strong preference for sans serifs at small font sizes (and people prefer denser comment sections, though it's reasonably high variance).
We were down between around 7PM and 8PM PT today. Sorry about that.
It's hard to tell whether we got DDosd or someone just wanted to crawl us extremely aggressively, but we've had at least a few hundred IP addresses and random user agents request a lot of quite absurd pages, in a way that was clearly designed to avoid bot-detection and block methods.
I wish we were more robust to this kind of thing, and I'll be monitoring things tonight to prevent it from happening again, but it would be a whole project to make us fully robust to attacks of this kind. I hope it was a one-off occurence.
But also, I think we can figure out how to make it so we are robust to repeated DDos attacks, if that is the world we live in. I do think it would mean strapping in for a few days of spotty reliability while we figure out how to do that.
Sorry again, and boo for the people doing this. It's one of the reasons why running a site like LessWrong is harder than it should be.
A bunch of very interesting emails between Elon, Sam Altman, Ilya and Greg were released (I think in some legal proceedings, but not sure). It would IMO be cool for someone to gather them all and do some basic analysis of them.
TechEmails' substack post with the same emails in a more centralized format includes citations; apparently these are mostly from Elon Musk, et al. v. Samuel Altman, et al. (2024)
LessWrong has a karma system, mostly based off of Reddit's karma system, with some improvements and tweaks to it. I've thought a lot about more improvements to it, but one roadblock that I always run into when trying to improve the karma system, is that it actually serves a lot of different uses, and changing it in one way often means completely destroying its ability to function in a different way. Let me try to summarize what I think the different purposes of the karma system are:
Helping users filter content
The most obvious purpose of the karma system is to determine how long a post is displayed on the frontpage, and how much visibility it should get.
Being a social reward for good content
This aspect of the karma system comes out more when thinking about Facebook "likes". Often when I upvote a post, it is more of a public signal that I value something, with the goal that the author will feel rewarded for putting their effort into writing the relevant content.
Creating common-knowledge about what is good and bad
This aspect of the karma system comes out the most when dealing with debates, though it's present in basically any kar...
I just came back from talking to Max Harms about the Crystal trilogy, which made me think about rationalist fiction, or the concept of hard sci-fi combined with explorations of cognitive science and philosophy of science in general (which is how I conceptualize the idea of rationalist fiction).
I have a general sense that one of the biggest obstacles for making progress on difficult problems is something that I would describe as “focusing attention on the problem”. I feel like after an initial burst of problem-solving activity, most people when working on hard problems, either give up, or start focusing on ways to avoid the problem, or sometimes start building a lot of infrastructure around the problem in a way that doesn’t really try to solve it.
I feel like one of the most important tools/skills that I see top scientist or problem solvers in general use, is utilizing workflows and methods that allow them to focus on a difficult problem for days and months, instead of just hours.
I think at least for me, the case of exam environments displays this effect pretty strongly. I have a sense that in an exam environment, if I am given a question, I successfully focus my fu
Oops, I am sorry. We did not intend to take the site down. We ran into an edge-case of our dialogue code that nuked our DB, but we are back up, and the Petrov day celebrations shall continue as planned. Hopefully without nuking the site again, intentionally or unintentionally. We will see.
Petrov Day Tracker:
this scenario had no take-the-site-down option
The year is 2034, and the geopolitical situation has never been more tense between GPT-z16g2 and Grocque, whose various copies run most of the nanobot-armed corporations, and whose utility functions have far too many zero-sum components, relics from the era of warring nations. Nanobots enter every corner of life and become capable of destroying the world in hours, then minutes. Everyone is uploaded. Every upload is watching with bated breath as the Singularity approaches, and soon it is clear that today is the very last day of history...
Then everything goes black, for everyone.
Then everyone wakes up to the same message:
DUE TO A MINOR DATABASE CONFIGURATION ERROR, ALL SIMULATED HUMANS, AIS AND SUBSTRATE GPUS WERE TEMPORARILY AND UNINTENTIONALLY DISASSEMBLED FOR THE LAST 7200000 MILLISECONDS. EVERYONE HAS NOW BEEN RESTORED FROM BACKUP AND THE ECONOMY MAY CONTINUE AS PLANNED. WE HOPE THERE WILL BE NO FURTHER REALITY OUTAGES.
-- NVIDIA GLOBAL MANAGEMENT
Final day to donate to Lightcone in the Manifund EA Community Choice program to tap into the Manifold quadratic matching funds. Small donations in-particular have a pretty high matching multiplier (around 2x would be my guess for donations <$300).
I don't know how I feel in-general about matching funds, but in this case it seems like there is a pre-specified process that makes some sense, and the whole thing is a bit like a democratic process with some financial stakes, so I feel better about it.
I am in New York until Tuesday. DM me if you are in the area and want to meet up and talk about LW, how to use AI for research/thinking/writing, or broader rationality community things.
Currently lots of free time Saturday and Monday.
Is intellectual progress in the head or in the paper?
Which of the two generates more value:
I think which of the two will generate more value determines a lot of your strategy about how to go about creating intellectual progress. In one model what matters is that the best individuals hear about the most important ideas in a way that then allows them to make progress on other problems. In the other model what matters is that the idea gets written as an artifact that can be processed and evaluated by reviews and the proper methods of the scientific progress, and then built upon when referenced and cited.
I think there is a tradeoff of short-term progress against long-term progress in these two approaches. I think many fields can go through intense periods of progress when focusing on just establishing communication between the best researchers of the field, but would be surprised if that period lasts longer than one or two decades. He...
Thoughts on minimalism, elegance and the internet:
I have this vision for LessWrong of a website that gives you the space to think for yourself, and doesn't constantly distract you with flashy colors and bright notifications and vibrant pictures. Instead it tries to be muted in a way that allows you to access the relevant information, but still gives you the space to disengage from the content of your screen, take a step back and ask yourself "what are my goals right now?".
I don't know how well we achieved that so far. I like our frontpage, and I think the post-reading experience is quite exceptionally focused and clear, but I think there is still something about the way the whole site is structured, with its focus on recent content and new discussion that often makes me feel scattered when I visit the site.
I think a major problem is that Lesswrong doesn't make it easy to do only a single focused thing on the site at a time, and it doesn't currently really encourage you to engage with the site in a focused way. We have the library, which I do think is decent, but the sequence navigation experience is not yet fully what I would like it to be, and when...
Thoughts on negative karma notifications:
The motivation was (among other things) several people saying to us "yo, I wish LessWrong was a bit more of a skinner box because right now it's so throughly not a skinner box that it just doesn't make it into my habits, and I endorse it being a stronger habit than it currently is."
That depends on what norm is in place. If the norm is to explain downvoting, then people should explain, otherwise there is no issue in not doing so. So the claim you are making is that the norm should be for people to explain. The well-known counterargument is that this disincentivizes downvoting.
you are under no obligation to waste cognition trying to figure them out
There is rarely an obligation to understand things, but healthy curiosity ensures progress on recurring events, irrespective of morality of their origin. If an obligation would force you to actually waste cognition, don't accept it!
Here is a thing that I think would be cool to analyze sometime: How difficult would it have been for AI systems to discover and leverage historical hardware-level vulnerabilities, assuming we had not discovered them yet. Like, it seems worth an analysis to understand how difficult things like rowhammer, or more recent speculative execution bugs like Spectre and Meltdown would have been to discover, and how useful they would have been. It's not an easy analysis, but I can imagine the answer coming out obviously one way or another if one engaged seriously with the underlying issue.
Thoughts on impact measures and making AI traps
I was chatting with Turntrout today about impact measures, and ended up making some points that I think are good to write up more generally.
One of the primary reasons why I am usually unexcited about impact measures is that I have a sense that they often "push the confusion into a corner" in a way that actually makes solving the problem harder. As a concrete example, I think a bunch of naive impact regularization metrics basically end up shunting the problem of "get an AI to do what we want" into the problem of "prevent the agent from interferring with other actors in the system".
The second one sounds easier, but mostly just turns out to also require a coherent concept and reference of human preferences to resolve, and you got very little from pushing the problem around that way, and sometimes get a false sense of security because the problem appears to be solved in some of the toy problems you constructed.
I am definitely concerned that Turntrou's AUP does the same, just in a more complicated way, but am a bit more optimistic than that, mostly because I do have a sense that in the AUP case there is actually some meaningful reduction go
...Printing more rationality books: I've been quite impressed with the success of the printed copies of R:A-Z and think we should invest resources into printing more of the other best writing that has been posted on LessWrong and the broad diaspora.
I think a Codex book would be amazing, but I think there also exists potential for printing smaller books on things like Slack/Sabbath/etc., and many other topics that have received a lot of other coverage over the years. I would also be really excited about printing HPMOR, though that has some copyright complications to it.
My current model is that there exist many people interested in rationality who don't like reading longform things on the internet and are much more likely to read things when they are in printed form. I also think there is a lot of value in organizing writing into book formats. There is also the benefit that the book now becomes a potential gift for someone else to read, which I think is a pretty common way ideas spread.
I have some plans to try to compile some book-length sequences of LessWrong content and see whether we can get things printed (obviously in coordination with the authors of the relevant pieces).
Forecasting on LessWrong: I've been thinking for quite a while about somehow integrating forecasts and prediction-market like stuff into LessWrong. Arbital has these small forecasting boxes that look like this:
I generally liked these, and think they provided a good amount of value to the platform. I think our implementation would probably take up less space, but the broad gist of Arbital's implementation seems like a good first pass.
I do also have some concerns about forecasting and prediction markets. In particular I have a sense that philosophical and mathematical progress only rarely benefits from attaching concrete probabilities to things, and more works via mathematical proof and trying to achieve very high confidence on some simple claims by ruling out all other interpretations as obviously contradictory. I am worried that emphasizing probability much more on the site would make making progress on those kinds of issues harder.
I also think a lot of intellectual progress is primarily ontological, and given my experience with existing forecasting platforms and Zvi's sequence on prediction markets, they are not very good at resolving ontological confusions and ...
This feature is important to me. It might turn out to be a dud, but I would be excited to experiment with it. If it was available in a way that was portable to other websites as well, that would be even more exciting to me (e.g. I could do this in my base blog).
Note that this feature can be used for more than forecasting. One key use case on Arbital was to see who was willing to endorse or disagree with, to what extent, various claims relevant to the post. That seemed very useful.
I don't think having internal betting markets is going to add enough value to justify the costs involved. Especially since it both can't be real money (for legal reasons, etc) and can't not be real money if it's going to do what it needs to do.
Note that Paul Christiano warns against encouraging sluggish updating by massively publicising people’s updates and judging them on it. Not sure what implementation details this suggests yet, but I do want to think about it.
https://sideways-view.com/2018/07/12/epistemic-incentives-and-sluggish-updating/
We are rolling out some new designs for the post page:
Old:
New:
The key goal was to prioritize the most important information and declutter the page.
The most opinionated choice I made was to substantially de-emphasize karma at the top of the post page. I am not totally sure whether that is the right choice, but I think the primary purpose of karma is to use it to decide what to read before you click on a post, which makes it less important to be super prominent when you are already on a post page, or when you are following a link from some external website.
The bottom of the post still has very prominent karma UI to make it easy for people to vote after they finished reading a post (and to calibrate on reception before reading the comments).
This redesign also gives us more space in the right column, which we will soon be filling with new side-note UI and an improved inline-react experience.
The mobile UI is mostly left the same, though we did make the decision to remove post-tags from the top of the mobile UI page to only making them visible below the post, because they took up too much space.
Feel free to comment here with feedback. I expect we will be iterating...
I really don't like the removal of the comment counter at the top, because that gave a link to skip to the comments. I fairly often want to skip immediately to the comments to eg get a vibe for if the post is worth reading, and having a one click skip to it is super useful, not having that feels like a major degradation to me
Ah! Hmm, that's a lot better than nothing, but pretty out of the way, and easy to miss. Maybe making it a bit bigger or darker, or bolding it? I do like the fact that it's always there as you scroll
My impression: The new design looks terrible. There's suddenly tons of pointless whitespace everywhere. Also, I'm very often the first or only person to tag articles, and if the tagging button is so inconvenient to reach, I'm not going to do that.
Until I saw this shortform, I was sure this was a Firefox bug, not a conscious design decision.
The new design seems to be influenced by the idea that spreading UI elements across greater distances (reducing their local density) makes the interface less cluttered. I think it's a little bit the other way around, shorter distances with everything in one place make it easier to chunk and navigate, but overall the effect is small either way. And the design of spreading the UI elements this way is sufficiently unusual that it will be slightly confusing to many people.
Had a very aggressive crawler basically DDos-ing us from a few dozen IPs for the last hour. Sorry for the slower server response times. Things should be fixed now.
Random thoughts on game theory and what it means to be a good person
It does seem to me like there doesn’t exist any good writing on game theory from a TDT perspective. Whenever I read classical game theory, I feel like the equilibria that are being described obviously fall apart when counterfactuals are being properly brought into the mix (like D/D in prisoners dilemmas).
The obvious problem with TDT-based game theory, just as it is with Bayesian epistemology, the vast majority of direct applications are completely computationally intractable. It’s kind of obvious what should happen in games with lots of copies of yourself, but as soon as anything participates that isn’t a precise copy, everything gets a lot more confusing. So it is not fully clear what a practical game-theory literature from a TDT-perspective would look like, though maybe the existing LessWrong literature on Bayesian epistemology might be a good inspiration.
Even when you can’t fully compute everything (and we even don’t really know how to compute everything in principle), you might still be able to go through concrete scenarios and list considerations and perspectives that incorporate TDT-perspectives. I guess in t
...Reading through this, I went "well, obviously I pay the mugger...
...oh, I see what you're doing here."
I don't have a full answer to the problem you're specifying, but something that seems relevant is the question of "How much do you want to invest in the ability to punish defectors [both in terms of maximum power-to-punish, a-la nukes, and in terms of your ability to dole out fine-grained-exactly-correct punishment, a-la skilled assassins]"
The answer to this depends on your context. And how you have answered this question determines whether it makes sense to punish people in particular contexts.
In many cases there might want to be some amount of randomization where at least some of the time you really disproportionately punish people, but you don't have to pay the cost of doing so every time.
Answering a couple of the concrete questions:
Mugger
Right now, in real life, I've never been mugged, and I feel fine basically investing zero effort into preparing for being mugged. If I do get mugged, I will just hand over my wallet.
If I was getting mugged all the time, I'd probably invest effort into a) figuring out what good policies existed ...
Making yourself understandable to other people
(Epistemic status: Processing obvious things that have likely been written many times before, but that are still useful to have written up in my own language)
How do you act in the context of a community that is vetting constrained? I think there are fundamentally two approaches you can use to establish coordination with other parties:
1. Professionalism: Establish that you are taking concrete actions with predictable consequences that are definitely positive
2. Alignment: Establish that you are a competent actor that is acting with intentions that are aligned with the aims of others
I think a lot of the concepts around professionalism arise when you have a group of people who are trying to coordinate, but do not actually have aligned interests. In those situations you will have lots of contracts and commitments to actions that have well-specified outcomes and deviations from those outcomes are generally considered bad. It also encourages a certain suppression of agency and a fear of people doing independent optimization in a way that is not transparent to the rest of the group.
Given a lot of these drawbacks, it seems natural to aim for e...
This FB post by Matt Bell on the Delta Variant helped me orient a good amount:
https://www.facebook.com/thismattbell/posts/10161279341706038
...As has been the case for almost the entire pandemic, we can predict the future by looking at the present. Let’s tackle the question of “Should I worry about the Delta variant?” There’s now enough data out of Israel and the UK to get a good picture of this, as nearly all cases in Israel and the UK for the last few weeks have been the Delta variant. [1] Israel was until recently the most-vaccinated major country in the world, and is a good analog to the US because they’ve almost entirely used mRNA vaccines.
- If you’re fully vaccinated and aren’t in a high risk group, the Delta variant looks like it might be “just the flu”. There are some scary headlines going around, like “Half of new cases in Israel are among vaccinated people”, but they’re misleading for a couple of reasons. First, since Israel has vaccinated over 80% of the eligible population, the mRNA vaccine still is 1-((0.5/0.8)/(0.5/0.2)) = 75% effective against infection with the Delta variant. Furthermore, the efficacy of the mRNA vaccine is still very high ( > 90%) against hosp
This seems like potentially a big deal: https://mobile.twitter.com/DrEricDing/status/1402062059890786311
> Troubling—the worst variant to date, the #DeltaVariant is now the new fastest growing variant in US. This is the so-called “Indian” variant #B16172 that is ravaging the UK despite high vaccinations because it has immune evasion properties. Here is why it’s trouble—Thread. #COVID19
@Elizabeth was interested in me crossposting this comment from the EA Forum since she thinks there isn't enough writing on the importance of design on LW. So here it is.
Atlas reportedly spent $10,000 on a coffee table. Is this true? Why was the table so expensive?
Atlas at some point bought this table, I think: https://sisyphus-industries.com/product/metal-coffee-table/. At that link it costs around $2200, so I highly doubt the $10,000 number.
Lightcone then bought that table from Atlas a few months ago at the listing price, since Jonas thought the purchase ...
Since this hash is publicly posted, is there any timescale for when we should check back to see the preimage?
In an attempt to get myself to write more here is my own shortform feed. Ideally I would write something daily, but we will see how it goes.