All of Daniel 's Comments + Replies

I don't think we can engage in much "community-wide introspection" without discussing the object-level issues in question, and I can't think of a single instance of an online discussion of that specific issue going particularly well. 

That's why I'm (mostly) okay tabooing these sorts of discussions. It's better to deal with the epistemic uncertainty than to risk converging on a false belief.

I do think it implies something about what is happening behind the scenes when their new flagship model is smaller and less capable than what was released a year ago.

8jacquesthibs
It’s a free model. Much more likely they have paid big boy model coming soon imo.

I am surprised to hear this, especially “I don't think it has lasting value”. In my opinion, this post has aged incredibly well. Reading it now, knowing that the EA criticism contest utterly failed to do one iota of good with regards to stopping the giant catastrophe on the horizon (FTX), and seeing that the top prizes were all given to long, well-formatted essays providing incremental suggestions on heavily trodden topics while the one guy vaguely gesturing at the actual problem (https://forum.effectivealtruism.org/posts/T85NxgeZTTZZpqBq2/the-effective-... (read more)

Just to pull on some loose strings here, why was it okay for Ben Pace to unilaterally reveal the names of Kat Woods and Emerson Spartz, but not for Roko to unilaterally reveal the names of Alice and Chloe? Theoretically Ben could have titled his post, "Sharing Information About [Pseudonymous EA Organization]", and requested the mods enforce anonymity of both parties, right? Is it because Ben's post was first so we adopt his naming conventions as the default? Is it because Kat and Emerson are "public figures" in some sense? Is it because Alice and Chloe agr... (read more)

5Lukas_Gloor
See my comment here.  Kat and Emerson were well-known in the community and they were accused of something that would cause future harm to EA community members as well. By contrast, Chloe isn't particularly likely to make future false allegations even based on Nonlinear's portrayal (I would say). It's different for Alice, since Nonlinear claim she has a pattern. (But with Alice, we'd at least want someone to talk to Nonlinear in private and verify how reliable they seem about negative info they have about Alice, before simply taking their word for it based on an ominous list of redacted names and redacted specifics of accusations.) That would miss the point, rendering the post almost useless. The whole point is to prevent future harm.  Alice and Chloe had Ben, who is a trusted community member, look into their claims. I'd say Ben is at least somewhat "on the hook" for the reliability of the anonymous claims. By contrast, Roko posted a 100 word summary of the Nonlinear incident that got some large number of net downvotes, so he seems to be particularly poorly informed about what even happened.

L'Ésswrong, c'est moi.

I agree that it feels wrong to reveal the identities of Alice and/or Chloe without concrete evidence of major wrongdoing, but I don't think we have a good theoretical framework for why that is.

Ethically (and pragmatically), you want whistleblowers to have the right to anonymity, or else you'll learn of much less wrongdoing that you would otherwise, and because whistleblowers are (usually) in a position of lower social power, so anonymity is meant to compensate for that, I suppose.

2DanielFilan
Here's my attempt at an answer. Note that nothing in this answer is meant to make any claims about the credibility of Ben's or Nonlinear's accounts. Ben Pace wrote a post saying "Hey, you know Kat Woods and Emerson Spartz? The people who run Nonlinear? The people listed on Nonlinear's website as running it? Well here's some info about them qua their role as running Nonlinear". It's an instance of taking a professional identity and relaying claims about their behaviour under that known identity. In his post, people using the identities of "Alice" and "Chloe" take the role of Nonlinear employees/contractors/whatever, and talk about stuff they experienced in terms of those roles. In Nonlinear's post, they make claims about things related to the identities of "Alice" and "Chloe" behaved in their roles as Nonlinear employees/contractors/whatever. In all of these instances, you're taking an identity/reputation someone has established in a domain, and making claims about behaviour associated with that identity in that domain, so that you can keep the reputation of that identity accurate. So you're not e.g. saying "Hey you know Joe Bloggs, the person who is publicly identified as CEO of NonCone? He actually secretly has Y weird habit in his personal life" - that would be an instance of cross-domain identification. So: the way revealing Alice and Chloe's names is different than what Ben did is that it takes an identity established in a domain, and links it to cross-domain information. This is bad because it makes it harder to set up identities in domains the way you want, which is valuable. But it seems like it could be justified if it turned out that Alice was (e.g.) a famous journalist, and that Alice's claims in Ben's post are totally false - then, knowing that the journalist did sketchy journalist stuff under the name of Alice would be very relevant to judging their reputation as a journalist.

Is it because Kat and Emerson are "public figures" in some sense?

Well, yeah. The whole point of Ben's post was presumably to protect the health of the alignment ecosystem. The integrity/ethical conduct/{other positive adjectives} of AI safety orgs is a public good, and arguably a super important one that justifies hurting individual people. I've always viewed the situation as, having to hurt Kat and Emerson is a tragedy, but it is (or at least can be; obviously it's not if the charges have no merit) justified because of what's at stake. If they weren't working in this space, I don't think Ben's post would be okay.

I agree with asking this question. There's a worthy journalistic norm against naming victims of sexual assault, and a norm in the other direction in favor of naming individuals charged with a crime. You could justify this by arguing that a criminal 'forfeits' the right to remain anonymous, that society has a transparency interest to know who has committed misdeeds. Whereas a victim has not done anything to diminish their default right to privacy.

How you apply these principles to NL depends entirely on who you view as the malefactor (or none/both), and there is demonstrable disagreement from the LW community on this question. So how do you adjudicate which names are ok to post?

Wait, that link goes to an archive page from well after Chloe was hired. When I look back to the screen captures from the period of time that Chloe would have seen, there are no specific numbers given for compensation (would link them myself, but I’m on mobile at the moment).

If the ad that Chloe saw said $60,000 - $100,000 in compensation in big bold letters at the top, then that seems like a bait and switch, but the archives from late 2021 list travel as the first benefit, which seems accurate to what the compensation package actually was.

2jefftk
Good catch! That's quite weird -- why would you update a job ad to include compensation information after closing applications? Here are the versions I see: * 2021-10-22, 2021-11-18, 2021-12-03: "Pay: amount dependent on role fit and employee needs", "The application deadline is November 1st, 2021, midnight UK time" * 2022-07-03: "Application Deadline: July 21st", "Target Start Date: September", "Compensation: $60,000 - $100,000 / year". Ben's post has: So it looks to me like what we were looking at was a post-Chloe version, probably trying to hire her replacement, and the version Chloe would have seen didn't have that information.

Maybe I'm projecting more economic literacy than I should, but anytime I read something like "benefits package worth $X", I always decompose it into its component parts mentally. A benefits package nominally worth $X will provide economic value less than $X, because there is option value lost compared to if you were given liquid cash instead. 

The way I would conceptualize the compensation offered (and the way it is presented in the Nonlinear screenshots) is $1000/month + all expenses paid while traveling around fancy destinations with the family. I ki... (read more)

Yeah, I agree that a compensation package costing $X will be worth less than $X, and as an employee it totally makes sense to adjust for that.

But then I think separately it's important that the package did actually cost $X, especially if the $X was supposed to include many of the things that determine your very basic quality of life, like food, toiletries, rent, basic transportation, medical care, etc. I also think it matters how far Chloe got into the hiring process of Nonlinear on the assumption that total compensation would be "equivalent to $X", which to be clear, I don't currently know the details off.

I did notice these. I specifically used the word "loadbearing" because almost all of these either don't matter much or their interpretation is entirely context-dependent. I focused on the salary bullet-point because failing to pay agreed salary is both 

1. A big deal, and 

2. Bad in almost any context. 

The other ones that I think are pretty bad are the Adderall smuggling and the driving without a license, but my prior on "what is the worst thing the median EA org has done" is somewhere between willful licensing noncompliance and illegal amphetamine distribution.

Hmm, at least for me many of the quotes above are substantially more load-bearing, but also not totally crazy that this differs between people. I do think in that case it might make sense to say "load bearing for my overall judgement of Nonlinear", since I (and Ben) do think many of the above are on a similar or higher level of being concerning than the salary point, and Ben intended to communicate that.

I also want to highlight that I do currently believe that Alice was asked to smuggle harder drugs across the border than Adderall (though the Adderall one seems confirmed), and that Nonlinear are disputing this because it will be hard to prove, not because its false (though I am also not like 90%+ confident).

Yeah, I've been going back and checking things as they were stated in the original "Sharing Information About Nonlinear" post. Rereading it, I was surprised at how few specific loadbearing factual claims there were at all. Lots of "vibes-based reasoning" as they say. I think the most damning single paragraph with a concrete claim was:

  • Chloe’s salary was verbally agreed to come out to around $75k/year. However, she was only paid $1k/month, and otherwise had many basic things compensated i.e. rent, groceries, travel. This was supposed to make traveling togeth
... (read more)

In terms of relevant factual claims in the post, here are some more: 

  • "Chloe’s and Alice’s finances (along with Kat's and Drew's) all came directly from Emerson's personal funds (not from the non-profit). This left them having to get permission for their personal purchases"
  • "From talking with both Alice and Nonlinear, it turned out that by the end of Alice’s time working there, since the end of February Kat Woods had thought of Alice as an employee that she managed, but that Emerson had not thought of Alice as an employee, primarily just someone who was
... (read more)

I think this is just false. Nonlinear provided enough screenshot evidence to prove that Chloe agreed to exactly the arrangement that she ultimately got. Yes, it was a shitty job, but it was also a shitty job offer, and Chloe seems to have agreed to that shitty job offer. 

I don't think you can describe that paragraph as "straightforwardly false". 

It is correct that Chloe's compensation was verbally agreed to come out to around ~$70k-$82k a year (the $75k number comes from a conversation with Kat, Kat's job interview transcript seems to suggest the... (read more)

I think what is bugging me about this whole situation is that there doesn't seem to be any mechanism of accountability for the (allegedly) false and/or highly misleading claims made by Alice. You seem to be saying something like, "we didn't make false and/or highly misleading claims, we just repeated the false and/or highly misleading claims that Alice told us, then we said that Alice was maybe unreliable," as if this somehow makes the responsibility (legal, ethical, or otherwise) to tell the truth disappear. 

Here is what Ben said in his post, Closing... (read more)

I think there is totally some shared responsibility for any claims that Ben endorsed, and I also think the post could have done a better job at making many things more explicit quotes, so that they would seem less endorsed, where Ben's ability to independently verify them was limited.

I don't think any retaliation against Alice is unacceptable. I think if Alice did indeed make important accusatory claims that were inaccurate, she should face some consequences. I think Ben and Lightcone should also lose points for anything that seems endorsed in the post, or... (read more)

Spencer sent us a screenshot about the vegan food stuff 2 hours before publication, which Ben didn't get around to editing in before the post went live, but that's all the evidence that I know about that you could somehow argue we had but didn't include. It is not accurate that Nonlinear sent credible signals of having counterevidence before the post went live

Uh, actually I do think that being sent screenshots showing that claims made in the post are false 2 hours before publication is a credible signal that Nonlinear has counterevidence.

I can’t believe... (read more)

8habryka
Sorry, can you please explain what you would have liked us to do at this point? It's 2 hours to publication, which is a major undergoing basically launching something that has been worked on for hundreds of hours.  The screenshots relate to one claim in a post with many dozens of claims, and do not directly falsify what is said in the post, but seem to relate to them in a somewhat complicated manner (see this discussion on the post). We are getting dozens of calls by Nonlinear who are, from our perspective, using a bunch of really quite aggressive tactics to prevent publication of this post at the same time. Please specify concretely what you would have liked us to do instead? Completely halt publication of the post, against the direct promises we made to our sources, who have shown us credible evidence that they are worried about retaliation? I think the right thing to do is to leave a comment with the evidence, which we were indeed going to do if Kat hadn't already done that within an hour of publication of the post. Please be concrete what you would have liked us to do instead? I don't think the screenshots were some kind of major smoking gun or whatever, they were a piece of evidence that was definitely related to one of the claims, but definitely not the kind of thing that would cause me to immediately update and throw out or delay the whole post.

This is a better response than I was expecting. Definitely a few non-sequiturs (Ex: you can’t just add travel expenses onto a $1000/month salary and call that $70,000-$75,000 in compensation. The whole point of money is that it’s fungible and can be spent however you like), but the major accusations appear refuted.

The tone is combative, but if the facts are what Nonlinear alleges then a combative tone seems… appropriate? I’m not sure how I feel about the “Sharing Information About Ben Pace” section, but I do think it was a good idea to mention the “elephant in the room” about Ben possibly white-knighting for Alice, since that’s the only way I can get this whole saga to make sense.

major accusations appear refuted

Note that the accusations Nonlinear lists in the document, with quote marks, are sometimes quite different than what Ben Pace put in his post. So even if you think they've strongly refuted a particular accusation, that doesn't necessarily mean they've refuted something Ben said. 

If the factions were Altman-Brockman-Sutskever vs. Toner-McCauley-D'Angelo, then even assuming Sutskever was an Altman loyalist, any vote to remove Toner would have been tied 3-3.

A 3-3 tie between the CEO founder of the company, the president founder of the company, and the chief scientist of the company vs three people with completely separate day jobs who never interact with rank-and-file employees is not a stable equilibrium. There are ways to leverage this sort of soft power into breaking the formal deadlock, for example: as we saw last week.

It reminds me of the loyalty successful generals like Caesar and Napoleon commanded from their men. The engineers building GPT-X weren't loyal to The Charter, and they certainly weren't loyal to the board. They were loyal to the projects they were building and to Sam, because he was the one providing them resources to build and pumping the value of their equity-based compensation.

9Sune
They were not loyal to the board, but it is not clear if they were loyal to The Charter since they were not given any concrete evidence of a conflict between Sam and the Charter.
4dr_s
Feels like an apt comparison given that the way we find out now is what happens when some kind of Senate tries to cut to size the upstart general and the latter basically goes "you and what army?".
1Tristan Wegner
From your last link: As the company was doing well recently, with ongoing talks about a investment imply a market cap of $90B, this would mean many employees might have hit their 10x already. The highest payout they would ever get. So all incentive to cash out now (or as soon as the 2-year lock will allow), 0 financial incentive to care about long term value. This seems worse in aligning employee interest with the long term interest of the company even compare to regular (unlimited allowed growth) equity, where each employee might hope that the valuation could get even higher. Also: So it seems the growth cap actually encourages short term thinking, which seems against their long term mission. Do you also understand these incentives this way? 

I think it's almost always fine for criticized authors to defend themselves in the comments, even if their defense isn't very good.

I think that's true, but also: When people ask the authors for things (edits to the post, time-consuming engagement), especially if the request is explicit (as in this thread), it's important for third parties to prevent authors from suffering unreasonable costs by pushing back on requests that shouldn't be fulfilled.

In my original answers I address why this is not the case (private communication serves this purpose more naturally).

This stood out to me as strange. Are you referring to this comment?

And regardless of these resources you should of course visit a nutritionist (even if very sporadically, or even just once when you start being vegan) so that they can confirm the important bullet points, whether what you're doing broadly works, and when you should worry about anything. (And again, anecdotically this has been strongly stressed and acknowledged as necessary by

... (read more)
0Martín Soto
No, I was referring to this one, and the ones in that thread, all part of an exchange in which Elizabeth reached out to me for clarification. In the one you quoted I was still not entering that much detail. I'll answer your comment nonetheless. No, what I was saying wasn't as extreme. I was just saying that it's good general practice to visit a nutritionist at least once, learn some of the nutritional basics and perform blood tests periodically (each 1 or 2 years). That's not contradictory with the fact that most vegans won't need to pour a noticeable amount of hours into all this (or better said, they will have to do that the first 1-2 months, but mostly not afterwards). Also, there is no one-page be-all end-all for any kind of nutrition, not only veganism. But there certainly exist a lot of fast and easy basic resources. Yes, of course, we were talking about veganism. But in the actual comment I was referring to, I did talk about epistemic implications, not only implications for animal ethics (as big as they already are). What I meant is "if there is something that worries me even more than the animal ethics consequences of this (which are big), it is breeding a community that shies away from basic ethical responsibility at the earliest possibility and rationalizes the choice (because of the consequences this can have for navigating the precipice)".

The real reason why it's enraging is that it rudely and dramatically implies that Eliezer's time is much more valuable than the OP's

It does imply that, but it's likely true that Eliezer's time is more valuable (or at least in more demand) than OP's. I also don't think Eliezer (or anyone else) should have to spend all that much effort worrying about if what they're about to say might possibly come off as impolite or uncordial.

If he actually wanted to ask OP what the strongest point was he should have just DMed him instead of engineering this public spectacl

... (read more)

It seems you and Paul are correct. I still think this suggests that there is something deeply wrong with RLHF, but less in the “intentionally deceives humans” sense, and more in the “this process consistently writes false data to memory” sense.

Perhaps I am misunderstanding Figure 8? I was assuming that they asked the model for the answer, then asked the model what probability it thinks that that answer is correct. Under this assumption, it looks like the pre-trained model outputs the correct probability, but the RLHF model gives exaggerated probabilities because it thinks that will trick you into giving it higher reward.

 

In some sense this is expected. The RLHF model isn't optimized for helpfulness, it is optimized for perceived helpfulness. It is still disturbing that "alignment" has made the model objectively worse at giving correct information. 

Perhaps I am misunderstanding Figure 8? I was assuming that they asked the model for the answer, then asked the model what probability it thinks that that answer is correct.

Yes, I think you are misunderstanding figure 8. I don't have inside information, but without explanation "calibration" would almost always mean reading it off from the logits. If you instead ask the model to express its uncertainty I think it will do a much worse job, and the RLHF model will probably perform similarly to the pre-trained model. (This depends on details of the human feedb... (read more)

1kyleherndon
I was also thinking the same thing as you, but after reading paulfchristiano's reply, I now think it's that you can use the model to use generate probabilities of next tokens, and that those next tokens are correct as often as those probabilities. This is to say it's not referring to the main way of interfacing with GPT-n (wherein a temperature schedule determines how often it picks something other than the option with the highest probability assigned; i.e. not asking the model "in words" for its predicted probabilities).

My guess is that RLHF is unwittingly training the model to lie.

If I ask a question and the model thinks there is an 80% the answer is "A" and a 20% chance the answer is "B," I probably want the model to always say "A" (or even better: "probably A"). I don't generally want the model to say "A" 80% of the time and "B" 20% of the time.

In some contexts that's worse behavior. For example, if you ask the model to explicitly estimate a probability it will probably do a worse job than if you extract the logits from the pre-trained model (though of course that totally goes out the window if you do chain of thought). But it's n... (read more)

There are reasonable and coherent forms of moral skepticism in which the statement, "It is morally wrong to eat children and mentally disabled people," is false or at least meaningless. The disgust reaction upon hearing the idea of eating children is better explained by the statement, "I don't want to live in a society where children are eaten," which is much more well-grounded in physical reality.

What is disturbing about the example is that this seems to be a person who believes that objective morality exists, but that it wouldn't entail that eating children is wrong. This is indeed a red flag that something in the argument has gone seriously wrong.

3MSRayne
My problem is more the lack of moral realism to begin with. I apparently need to work on a post about this. I am sick and tired of the lack of belief in objective morality around here, leading people to entertain such insane thoughts to begin with; needs some pushback.

While many of these claims are "old news" to those communities, many of these claims are fresh.

Can you clarify which specific claims are new? A claim which hasn’t been previously reported in a mainstream news article might still be known to people who have been following community meta-drama.

The baseline rate reasoning is flawed because a) sexual assault remains the most underreported crime, so there is likely instead an iceberg effect,

I’m not sure how this refutes the base rate argument. The iceberg effect exists for both the rationalist community ... (read more)

It is appropriate to minimize things which are in fact minimal. The majority of these issues have been litigated (metaphorically) before. The fact that they are being brought up over and over again in media articles does not ipso facto mean that the incident has not been adequately dealt with. You can make the argument that these incidents are part of a larger culture problem, but you have to actually make the argument. We're all Bayesians here, so look at the base rates.

 

The one piece of new information which seems potentially important is the part where Sonia Joseph says, "he followed her home and insisted on staying over." I would like to see that incident looked into a bit more.

6pmk
Given the gender ratio in EA and rationality, it would be surprising if women in EA/rationality didn’t experience more harassment than women in other social settings with more even gender ratios. Consider a simplified case: suppose 1% of guys harass women and EA/rationality events are 10% women. Then in a group of 1000 EAs/rationalists there would be 9 harassers targeting 100 women. But if the gender ratio was even, then there would be 5 harassers targeting 500 women. So the probability of each woman being targeted by a harasser is lower in a group with more even gender ratio. For it to be the case that women in EA/rationality experience the same amount of harassment as women in other social settings the men in EA/rationality would need to be less likely to harass women than the average man in other social settings. It is also possible that the average man in EA/rationality is more likely to harass women than the average man in other social settings. I can think of some reasons for this (being socially clumsy, open to breaking social norms etc) and some against (being too shy to make advances, aspiring to high moral standards in EA etc).
-2whistleblower67
While many of these claims are "old news" to those communities, many of these claims are fresh. The baseline rate reasoning is flawed because a) sexual assault remains the most underreported crime, so there is likely instead an iceberg effect, and b) women who were harassed/assaulted have left the movement which changes your distribution, and c) women who would enter your movement otherwise now stay away due to whisper networks and bad vibes.