Wiki Contributions

Comments

Small edges are why there's so much money gambled in poker. 

It's hard to reach a skill level where you make money 50% of the night, but it's not that hard to reach a point where you're "only" losing 60% of the time. (That's still significantly worse than playing roulette, but compared to chess competitions where hobbyists never win any sort of prize, you've at least got chances.) 

You criticize Altman for pushing ahead with dangerous AI tech, but then most of what you'd spend the money on is pushing ahead with tech that isn't directly dangerous. Sure, that's better. But it doesn't solve the issue that we're headed into an out-of-control future. Where's the part where we use money to improve the degree to which thoughtful high-integrity people (or prosocial AI successor agents with those traits) are able to steer where this is all going? 
(Not saying there are easy answers.) 

I mean, personality disorders are all about problems in close interpersonal relationships (or lack of interest in such relationships, in schizoid personality disorder), and trust is always really relevant in such relationships, so I think this could be a helpful lens of looking at things. At the same time, I'd be very surprised if you could derive new helpful treatment approaches from this sort of armchair reasoning (even just at the level of hypothesis generation to be subjected to further testing).

Also, some of these seem a bit strained: 

  • Narcissistic personality disorder seems to be more about superiority and entitlement than expecting others to be trusting. And narcissism is correlated with Machiavellianism, where a feature of that is having a cynical worldview (i.e., thinking people in general aren't trustworthy). If I had to frame narcissism in trust terms, I'd maybe say it's an inability to value or appreciate trust?
  • Histrionic personality disorder has a symptom criterion of "considers relationships to be more intimate than they actually are." I guess maybe you could say "since (by your hypothesis) they expect people to not care, once someone cares, a person with histrionic personality disorder is so surprised that they infer that the relationship must be deeper than it is." A bit strained, but maybe can be made to fit.
  • Borderline: I think there's more of pattern to splitting than randomness (e.g., you rarely have splitting in the early honeymoon stage of a relationship), so maybe something like "fluctuating" would fit better. But also, I'm not sure what fluctuates is always about trust. Sure, sometimes splitting manifests in accusing the partner of cheating out of nowhere, but in other cases, the person may feel really annoyed at the partner in a way that isn't related to trust. (Or it could be related to trust, but going in a different direction: they may resent the partner for trusting them because they have such a low view of themselves that anyone who trusts them must be unworthy.)
  • Dependent: To me the two things you write under it seem to be in tension with each other.

Edit:

Because it takes eight problems currently considered tied up with personal identy and essentially unsolvable [...]

I think treatment success probabilities differ between personality disorders. For some, calling them "currently considered essentially unsolvable" seems wrong.

And not sure how much of OCPD is explained by calling it a persistent form of OCD – they seem very different. You'd expect "persistent" to make something worse, but OCPD tends to be less of an issue for the person who has it (but can be difficult for others around them). Also, some symptoms seem to be non-overlapping, like with OCPD I don't think intrusive thoughts play a role (I might be wrong?), whereas intrusive thoughts are a distinct and telling feature of some presentations of OCD.

Dilemma:

  • If the Thought Assessors converge to 100% accuracy in predicting the reward that will result from a plan, then a plan to wirehead (hack into the Steering Subsystem and set reward to infinity) would seem very appealing, and the agent would do it.
  • If the Thought Assessors don’t converge to 100% accuracy in predicting the reward that will result from a plan, then that’s the very definition of inner misalignment!

    [...]

    The thought “I will secretly hack into my own Steering Subsystem” is almost certainly not aligned with the designer’s intention. So a credit-assignment update that assigns more positive valence to “I will secretly hack into my own Steering Subsystem” is a bad update. We don’t want it. Does it increase “inner alignment”? I think we have to say “yes it does”, because it leads to better reward predictions! But I don’t care. I still don’t want it. It’s bad bad bad. We need to figure out how to prevent that particular credit-assignment Thought Assessor update from happening.

    [...]

    I think there’s a broader lesson here. I think “outer alignment versus inner alignment” is an excellent starting point for thinking about the alignment problem. But that doesn’t mean we should expect one solution to outer alignment, and a different unrelated solution to inner alignment. Some things—particularly interpretability—cut through both outer and inner layers, creating a direct bridge from the designer’s intentions to the AGI’s goals. We should be eagerly searching for things like that.

Yeah, there definitely seems to be something off about that categorization. I've thought a bit about how this stuff works in humans, particularly in this post of my moral anti-realism sequence. To give some quotes from that:

One of many takeaways I got from reading Kaj Sotala’s multi-agent models of mind sequence (as well as comments by him) is that we can model people as pursuers of deep-seated needs. In particular, we have subsystems (or “subagents”) in our minds devoted to various needs-meeting strategies. The subsystems contribute behavioral strategies and responses to help maneuver us toward states where our brain predicts our needs will be satisfied. We can view many of our beliefs, emotional reactions, and even our self-concept/identity as part of this set of strategies. Like life plans, life goals are “merely” components of people’s needs-meeting machinery.[8]

Still, as far as components of needs-meeting machinery go, life goals are pretty unusual. Having life goals means to care about an objective enough to (do one’s best to) disentangle success on it from the reasons we adopted said objective in the first place. The objective takes on a life of its own, and the two aims (meeting one’s needs vs. progressing toward the objective) come apart. Having a life goal means having a particular kind of mental organization so that “we” – particularly the rational, planning parts of our brain – come to identify with the goal more so than with our human needs.[9]

[...]

There’s a normative component to something as mundane as choosing leisure activities. [E.g., going skiing in the cold, or spending the weekend cozily at home.] In the weekend example, I’m not just trying to assess the answer to empirical questions like “Which activity would contain fewer seconds of suffering/happiness” or “Which activity would provide me with lasting happy memories.” I probably already know the answer to those questions. What’s difficult about deciding is that some of my internal motivations conflict. For example, is it more important to be comfortable, or do I want to lead an active life? When I make up my mind in these dilemma situations, I tend to reframe my options until the decision seems straightforward. I know I’ve found the right decision when there’s no lingering fear that the currently-favored option wouldn’t be mine, no fear that I’m caving to social pressures or acting (too much) out of akrasia, impulsivity or some other perceived weakness of character.[21]

We tend to have a lot of freedom in how we frame our decision options. We use this freedom, this reframing capacity, to become comfortable with the choices we are about to make. In case skiing wins out, then “warm and cozy” becomes “lazy and boring,” and “cold and tired” becomes “an opportunity to train resilience / apply Stoicism.” This reframing ability is a double-edged sword: it enables rationalizing, but it also allows us to stick to our beliefs and values when we’re facing temptations and other difficulties.

[...]

Visualizing the future with one life goal vs. another

Whether a given motivational pull – such as the need for adventure, or (e.g.,) the desire to have children – is a bias or a fundamental value is not set in stone; it depends on our other motivational pulls and the overarching self-concept we’ve formed.

Lastly, we also use “planning mode” to choose between life goals. A life goal is a part of our identity – just like one’s career or lifestyle (but it’s even more serious).

We can frame choosing between life goals as choosing between “My future with life goal A” and “My future with life goal B” (or “My future without a life goal”). (Note how this is relevantly similar to “My future on career path A” and “My future on career path B.”)

[...]

It’s important to note that choosing a life goal doesn’t necessarily mean that we predict ourselves to have the highest life satisfaction (let alone the most increased moment-to-moment well-being) with that life goal in the future. Instead, it means that we feel the most satisfied about the particular decision (to adopt the life goal) in the present, when we commit to the given plan, thinking about our future. Life goals inspired by moral considerations (e.g., altruism inspired by Peter Singer’s drowning child argument) are appealing despite their demandingness – they can provide a sense of purpose and responsibility.

So, it seems like we don't want "perfect inner alignment," at least not if inner alignment is about accurately predicting reward and then forming the plan of doing what gives you most reward. Also, there's a concept of "lock in" or "identifying more with the long-term planning part of your brain than with the underlying needs-meeting machinery." Lock in can be dangerous (if you lock in something that isn't automatically corrigible), but it might also be dangerous not to lock in anything (because this means you don't know what other goals form later on).

Idk, the whole thing seems to me like brewing a potion in Harry Potter, except that you don't have a recipe book and there's luck involved, too. "Outer alignment," a minimally sufficient degree thereof (as in: the agent tends to gets rewards when it takes actions towards the intended goal), increases the likelihood that you get broadly pointed you in the right direction, so the intended goal maybe gets considered among things the internal planner considers reinforcing itself around / orienting itself towards. But then, whether the intended gets picked over other alternatives (instrumental requirements for general intelligence, or alien motivations the AI might initially have), who knows. Like with raising a child, sometimes they turn out the way the parents intend, sometimes not at all. There's probably a science to finding out how outcomes become more likely, but even if we could do that with human children developing into adults with fixed identities, there's then still the question of how to find analogous patterns in (brain-like) AI. Tough job.

Conditioned Taste Aversion (CTA) is a phenomenon where, if I get nauseous right now, it causes an aversion to whatever tastes I was exposed to a few hours earlier—not a few seconds earlier, not a few days earlier, just a few hours earlier. (I alluded to CTA above, but not its timing aspect.) The evolutionary reason for this is straightforward: a few hours is presumably how long it typically takes for a toxic food to induce nausea.

That explains why my brother no longer likes mushrooms. When we were little, he liked them and we ate mushrooms at a restaurant, then were driven through curvy mountain roads later that day with the family. He got car sick and vomited, and afterwards he had an intense hatred for mushrooms.

Is that sort of configuration even biologically possible (or realistic)? I have no deep immunology understanding, but I think bad reactions to vaccines have little to nothing to do with whether you're up-to-date on previous vaccines. So far, I'm not sure we're good at predicting who reacts with more severe side effects than average (and if we did, it's not like it's easy to tweak the vaccine, except for tradeoff-y things like lowering the vaccination dose). 

My point is that I have no evidence that he ended up reading most of the relevant posts in their entirety. I don't think people who read all the posts in their entirety should just go ahead and unilaterally dox discussion participants, but I feel like people who have only read parts of it (or only secondhand sources) should do it even less

Also, at the time, I interpreted Roko's "request for a summary" more as a way for him to sneer at people. His "summary" had a lot of loaded terms and subjective judgments in it. Maybe this is a style thing, but I find that people should only (at most) write summaries like that if they're already well-informed. (E.g., Zvi's writing style can be like that, and I find it fine because he's usually really well-informed. But if I see him make a half-assed take on something he doesn't seem to be well-informed on, I'd downvote.) 

See my comment here

Kat and Emerson were well-known in the community and they were accused of something that would cause future harm to EA community members as well. By contrast, Chloe isn't particularly likely to make future false allegations even based on Nonlinear's portrayal (I would say). It's different for Alice, since Nonlinear claim she has a pattern. (But with Alice, we'd at least want someone to talk to Nonlinear in private and verify how reliable they seem about negative info they have about Alice, before simply taking their word for it based on an ominous list of redacted names and redacted specifics of accusations.)

Theoretically Ben could have titled his post, "Sharing Information About [Pseudonymous EA Organization]", and requested the mods enforce anonymity of both parties, right?

That would miss the point, rendering the post almost useless. The whole point is to prevent future harm. 

but not for Roko to unilaterally reveal the names of Alice and Chloe?

Alice and Chloe had Ben, who is a trusted community member, look into their claims. I'd say Ben is at least somewhat "on the hook" for the reliability of the anonymous claims.

By contrast, Roko posted a 100 word summary of the Nonlinear incident that got some large number of net downvotes, so he seems to be particularly poorly informed about what even happened.

Some conditions for when I think it's appropriate for an anonymous source to make a critical post about a named someone on the forum:

  • Is the accused a public person or do they run an organization in the EA or rationality ecosystem?
  • Or: Is the type of harm the person is accused of something that the community benefits from knowing?
  • Did someone who is non-anonymous and trusted in the community talk to the anonymous accuser and verify claims and (to some degree*) stake their reputation for them?

*I think there should be a role of "investigative reporter:" someone verifies that the anonymous person is not obviously unreliable. I don't think the investigative reporter is 100% on the hook for anything that will turn out to be false or misleading, but they are on the hook for things like doing a poor job at verifying claims or making sure there aren't any red flags about a person. 

(It's possible for anonymous voices to make claims without the help of an "investigative reporter;" however, in that case, I think the appropriate community reaction should be to give little-to-no credence to such accusations. After all, they could be made by someone who already has had their reputation justifiably tarnished.)

On de-anonymizing someone (and preventing an unfair first-mover advantage): 

  • In situations where the accused parties are famous and have lots of influence, we can view anonymity protection as evening the playing field rather than conferring an unfair advantage. (After all, famous and influential people already have a lot of advantages on their side – think of Sam Altman in the conflict with the OpenAI board.)
  • If some whistleblower displays a pattern/history of making false accusations, that implies potential for future harm, so it seems potentially appropriate to warn others about them (but you'd still want to be cautious, take your time to evaluate evidence carefully, and not fall prey to a smear campaign by the accused parties – see DARVO).
  • If there's no pattern/history of false accusations, but the claims by a whistleblower turn out to be misleading in more ways than one would normally expect in the heat of things (but not egregiously so), then the situation is going to be unsatisfying, but personally I'd err on the side of protecting anonymity. (I think this case is strongest the more the accused parties are more powerful/influential than the accusers.) I'd definitely protect anonymity if the accusations continue to seem plausible but are impossible to prove/there remains lots of uncertainty.
  • I think de-anonymization, if it makes sense under some circumstances, should only be done after careful investigation, and never "in the heat of the movement." In conflicts that are fought publicly, it's very common for different sides to gain momentum temporarily but then lose it again, depending on who had the last word. 

Very thoughtful post. I liked that you delved into this out of interest even though you aren't particularly involved in this community, but then instead of just treating it as fun but unproductive gossip, you used your interest to make a high-value contribution! 

It changed my mind in some places (I had a favorable reaction to the initial post by Ben; also, I still appreciate what Ben tried to do). 

I will comment on two points that I didn't like, but I'm not sure to what degree this changes your recommended takeaways (more on this below).

They [Kat and Emerson] made a major unforced tactical error in taking so long to respond and another in not writing in the right sort of measured, precise tone that would have allowed them to defuse many criticisms.

I don't like that this sounds like this is only (or mostly) about tone.

I updated that the lawsuit threat was indeed more about tone than I initially thought. I initially thought that any threat of a lawsuit is strong evidence that someone is a bad actor. I now think it's sometimes okay to mention the last resort of lawsuits if you think you're about to be defamed.  

At the same time, I'd say it was hard for Lightcone to come away with that interpretation when Emerson used phrases like 'maximum damages permitted by law' (a phrasing optimized for intimidation). Emerson did so in the context where one of the things he was accused of was unusually hostile negotiation and intimidation tactics! So, given the context and "tone" of the lawsuit threat, I feel like it made a lot of sense for Lightcone to see their worst concerns about Emerson "confirmation-boosted" when he made that lawsuit threat.

In any case, and more to my point about tone vs other things, I want to speak about the newer update by Nonlinear that came three months after the original post by Ben. Criticizing tone there is like saying "they lack expert skills at defusing tensions; not ideal, but also let's not be pedantic." It makes it sound like all they need to become great bosses is a bit of tactfulness training. However, I think there are more fundamental things to improve on, and these things lend a bunch of credibility to why someone might have a bad time working with them. (Also, they had three months to write that post, and it's really quite optimized for presentation in several ways, so it's not like we should apply low standards for this post.) I criticized some aspects of their post here and here. In short, I feel like they reacted by (1) conceding little to no things they could have done differently and (2), going on the attack with outlier-y black-and-white framings against not just Alice, but also Chloe, in a way that I think is probably more unfair/misleading/uncharitable about Chloe than what Chloe said about them. (I say "probably" because I didn't spend a lot of time re-reading Ben's original post and trying to separate which claims were made by Alice vs Chloe, doing the same about Nonlinear's reply, and filtering out whether they're ascribing statements to Chloe with their quotes-that-aren't-quotes that she didn't actually say.) I think that's a big deal because their reaction pattern-matches to how someone would react if they did indeed have a "malefactor" pattern of frequently causing interpersonal harm. Just like it's not okay to make misleading statements about others solely because you struggled with negative emotions in their presence, it's also (equally) not okay to make misleading statements solely because someone is accusing you of being a bad boss or leader. It can be okay to see red in the heat of battle, but it's an unfortunate dynamic because it blurs the line between people who are merely angry and hurt and people who are character-wise incapable of reacting appropriately to appropriate criticism. (This also goes into the topic of "adversarial epistemology" – if you think the existence of bad actors is a sufficient problem, you want to create social pressure for good-but-misguided actors to get their shit together and stop acting in a way/pattern that lends cover to bad actors.)

Eliezer recently re-tweeted this dismissive statement about DARVO. I think this misses the point. Sure, if the person who accuses you is a malicious liar or deluded to a point where it has massively destructive effects and is a pattern, then, yeah, you're forced to fight back. So, point taken: sometimes the person who appears like the victim initially isn't actually the victim. However, other times the truth is at least somewhat towards the middle, i.e., the person accusing you of something may have some points. In that case, you can address what happened without character-assassinating them in return, especially if you feel like you had a lot of responsibility in them having had a bad time. Defending Alice is not the hill I want to die on (although I'm not saying I completely trust Nonlinear's picture of her), but I really don't like the turn things took towards Chloe. I feel like it's messed up that several commenters (at one point my comment here had 9 votes and -5 overall karma, and high disagreement votes) came away with the impression that it might be appropriate to issue a community-wide warning about Chloe as someone with a pattern of being destructive (and de-anonymizing her, which would further send the signal that the community considers her a toxic person). I find that a really scary outcome for whistleblower norms in the community. Note that this isn't because I think it's never appropriate to de-anonymize someone.

Here are the list of values that are important to me about this whole affair and context:

  • I want whistleblower-type stuff to come to light because I think the damage bad leaders can do is often very large
  • I want investigations to be fair. In many cases, this means giving accused parties time to respond
  • I understand that there’s a phenotype of personality where someone has a habit of bad-talking others through false/misleading/distorted claims, and I think investigations (and analysis) should be aware of that

(FWIW, I assume that most people who vehemently disagree with me about some of the things I say in this comment and elsewhere would still endorse these above values.)

So, again, I'm not saying I find this a scary outcome because I have a "always believe the victim" mentality. (Your post fortunately doesn't strawman others like that, but there were comments on Twitter and facebook that pushed this point, which I thought was uncalled for.) 

Instead, consider for a moment the world where I'm right that:

  • Chloe isn't a large outlier in any relevant way of personality, except perhaps she was significantly below average at standing up for her interests/voicing her boundaries (for which it might even be possible that it was selected for in the Nonlinear hiring process)

This is what I find most plausible based on a number of data points. In that world, I think something about the swing of the social pendulum went wrong when the result of Chloe sharing her concerns makes things worse for her. (I'm not saying this is currently the case – I'm saying it would be the case if we fully bought into Nonlinear's framing or the people who make the most negative comments about both Chloe and Alice, without flagging that many people familiar with the issue thought that Alice was a less reliable narrator than Chloe, etc.)

Of course, I focused a lot on a person who is currently anonymized. Fair to say that this is unfair given that Nonlinear have their reputation at stake all out in the open. Like I said elsewhere, it's not like I think they deserved the full force of this.

These are tough tradeoffs to make. Unfortunately, we need some sort of policy to react to people who might be bad leaders. Among all the criticisms about Ben's specific procedure, I don't want this part to be de-emphasized.

The community mishandled this so badly and so comprehensively that inasmuch as Nonlinear made mistakes in their treatment of Chloe or Alice, for the purposes of the EA/LW community, the procedural defects have destroyed the case.

I'm curious what you mean by the clause "for the purposes of the EA/LW community." I don't want to put words into your mouth, but I'd be sympathetic to a claim that goes as follows. From a purely procedural perspective about what a fair process should look like for a community to decide that a particular group should be cut out from the community's talent pipeline (or whatever harsh measure people want to consider),  it would be unfair to draw this sort of conclusion against Nonlinear based on the too many flaws in the process used. If that's what you're saying, I'm sympathetic to that at the very least in the sense of "seems like a defensible view to me." (And maybe also overall – but I find it hard to think about this stuff and I'm a bit tired of the affair.) 

At the same time, I feel like, as a private individual, it's okay to come away with confident beliefs (one way or the other) from this whole thing. It takes a higher bar of evidence (and assured fairness of procedure) to decide "the community should act as though x is established consensus" than it takes to yourself believe x.

Load More