Wiki Contributions

Comments

I really liked this post! I will probably link to it in the future.

Edit: Just came to my mind that these are things I tend to think of under the heading "considerateness" rather than kindness, but it's something I really appreciate in people either way (and the concepts are definitely linked). 

FWIW, one thing I really didn't like about how he came across in the interview is that he seemed to be engaged in framing the narrative one-sidedly in an underhanded way, sneakily rather than out in the open. (Everyone tries to frame the narrative in some way, but it becomes problematic when people don't point out the places where their interpretation differs from others, because then listeners won't easily realize that there are claims that they still need to evaluate and think about rather than just take for granted and something that everyone else already agrees about.) 

He was not highlighting the possibility that the other side's perspective still has validity; instead, he was shrugging that possibility under the carpet. He talked as though (implicitly, not explicitly) it's now officially established or obviously true that the board acted badly (Lex contributed to this by asking easy questions and not pushing back on anything too much). He focused a lot on the support he got during this hard time and people saying good things about him (eulogy while still alive comparison, highlighting that he thinks there's no doubt about his character) and said somewhat condescending things about the former board (about how he thinks they had good intentions, said in that slow voice and thoughtful tone, almost like they had committed a crime) and then emphasized their lack of experience. 

For contrast, here are things he could have said that would have made it easier for listeners to come to the right conclusions (I think anyone who is morally scrupulous about whether they're in the right in situations when many others speak up against them would have highlighted these points a lot more, so the absence of these bits in Altman's interview is telling us something.)

  • Instead of just saying that he believes the former board members came from a place of good intentions, also say if/whether he believes that some of the things they were concerned about weren't totally unreasonable from their perspective. E.g., acknowledge things he did wrong or things that, while not wrong, understandably would lead to misunderstandings.
  • Acknowledge that just because a decision had been made by the review committee, the matter of his character and suitability for OpenAI's charter is not now settled (esp. given that the review maybe had a somewhat limited scope?). He could point out that it's probably rational (or, if he thinks this is not necesarily mandated, at least flag that he'd understand if some people now feel that way) for listeners of the youtube interview to keep an eye on him, while explaining how he intends to prove that the review committee came to the right decision. 
  • He said the board was inexperienced, but he'd say that in any case, whether or not they were onto something. Why is he talking about their lack of experience so much rather than zooming in on their ability to assess someone's character? It could totally be true that the former board was both inexperienced and right about Altman's unsuitability. Pointing out this possibility himself would be a clarifying contribution, but instead, he chose to distract from that entire theme and muddle the waters by making it seem like all that happened was that the board did something stupid out of inexperience, and that's all there was.
  • Acknowledge that it wasn't just an outpouring of support for him; there were also some people who used to occasion to voice critical takes about him (and the Y Combinator thing came to light). 

(Caveat that I didn't actually listen to the full interview and therefore may have missed it if he did more signposting and perspective taking and "acknowledging that for-him-inconvenient hypotheses are now out there and important if true and hard to dismiss entirely for at the very least the people without private info" than I would've thought from skipping through segments of the interview and Zvi's summary.)

In reaction to what I wrote here, maybe it's a defensible stance to go like, "ah, but that's just Altman being good at PR; it's just bad PR for him to give any air of legitimacy to the former board's concerns." 

I concede that, in some cases when someone accuses you of something, they're just playing dirty and your best way to make sure it doesn't stick is by not engaging with low-quality criticism. However, there are also situations where concerns have enough legitimacy that shrugging them under the carpet doesn't help you seem trustworthy. In those cases, I find it extra suspicious when someone shrugs the concerns under the carpet and thereby misses the opportunity to add clarity to the discussion, make themselves more trustworthy, and help people form better views on what's the case.

Maybe that's a high standard, but I'd feel more reassured if the frontier of AI research was steered by someone who could talk about difficult topics and uncertainty around their suitability in a more transparent and illuminating way. 

There are realistic beliefs Altman could have about what's good or bad for AI safety that would not allow Zvi to draw that conclusion. For instance: 

  • Maybe Altman thinks it's really bad for companies' momentum to go through CEO transitions (and we know that he believes OpenAI having a lot of momentum is good for safety, since he sees them as both adequately concerned about safety and more concerned about it than competitors).
  • Maybe Altman thinks OpenAI would be unlikely to find another CEO who understands the research landscape well enough while also being good at managing, who is at least as concerned about safety as Altman is.
  • Maybe Altman was sort of willing to "put that into play," in a way, but his motivation to do so wasn't a desire for power, nor a calculated strategic ploy, but more the understandable human tendency to hold a grudge (esp. in the short term) against the people who just rejected and humiliated him, so he understandably didn't feel a lot of motivational pull to want help them look better about the coup they had just attempted for what seemed to him as unfair/bad reasons. (This still makes Altman look suboptimal, but it's a lot different from "Altman prefers power so much that he'd calculatedly put the world at risk for his short-term enjoyment of power.")
  • Maybe the moments were Altman thought things would go sideways were only very brief, and for the most part, when he was taking actions towards further escalation, he was already very confident that he'd win. 

Overall, the point is that it seems maybe a bit reckless/uncharitable to make strong inferences about someone's rankings of priorities just based on one remark they made being in tension with them pushing in one direction rather than the other in a complicated political struggle.

Small edges are why there's so much money gambled in poker. 

It's hard to reach a skill level where you make money 50% of the night, but it's not that hard to reach a point where you're "only" losing 60% of the time. (That's still significantly worse than playing roulette, but compared to chess competitions where hobbyists never win any sort of prize, you've at least got chances.) 

You criticize Altman for pushing ahead with dangerous AI tech, but then most of what you'd spend the money on is pushing ahead with tech that isn't directly dangerous. Sure, that's better. But it doesn't solve the issue that we're headed into an out-of-control future. Where's the part where we use money to improve the degree to which thoughtful high-integrity people (or prosocial AI successor agents with those traits) are able to steer where this is all going? 
(Not saying there are easy answers.) 

I mean, personality disorders are all about problems in close interpersonal relationships (or lack of interest in such relationships, in schizoid personality disorder), and trust is always really relevant in such relationships, so I think this could be a helpful lens of looking at things. At the same time, I'd be very surprised if you could derive new helpful treatment approaches from this sort of armchair reasoning (even just at the level of hypothesis generation to be subjected to further testing).

Also, some of these seem a bit strained: 

  • Narcissistic personality disorder seems to be more about superiority and entitlement than expecting others to be trusting. And narcissism is correlated with Machiavellianism, where a feature of that is having a cynical worldview (i.e., thinking people in general aren't trustworthy). If I had to frame narcissism in trust terms, I'd maybe say it's an inability to value or appreciate trust?
  • Histrionic personality disorder has a symptom criterion of "considers relationships to be more intimate than they actually are." I guess maybe you could say "since (by your hypothesis) they expect people to not care, once someone cares, a person with histrionic personality disorder is so surprised that they infer that the relationship must be deeper than it is." A bit strained, but maybe can be made to fit.
  • Borderline: I think there's more of pattern to splitting than randomness (e.g., you rarely have splitting in the early honeymoon stage of a relationship), so maybe something like "fluctuating" would fit better. But also, I'm not sure what fluctuates is always about trust. Sure, sometimes splitting manifests in accusing the partner of cheating out of nowhere, but in other cases, the person may feel really annoyed at the partner in a way that isn't related to trust. (Or it could be related to trust, but going in a different direction: they may resent the partner for trusting them because they have such a low view of themselves that anyone who trusts them must be unworthy.)
  • Dependent: To me the two things you write under it seem to be in tension with each other.

Edit:

Because it takes eight problems currently considered tied up with personal identy and essentially unsolvable [...]

I think treatment success probabilities differ between personality disorders. For some, calling them "currently considered essentially unsolvable" seems wrong.

And not sure how much of OCPD is explained by calling it a persistent form of OCD – they seem very different. You'd expect "persistent" to make something worse, but OCPD tends to be less of an issue for the person who has it (but can be difficult for others around them). Also, some symptoms seem to be non-overlapping, like with OCPD I don't think intrusive thoughts play a role (I might be wrong?), whereas intrusive thoughts are a distinct and telling feature of some presentations of OCD.

Dilemma:

  • If the Thought Assessors converge to 100% accuracy in predicting the reward that will result from a plan, then a plan to wirehead (hack into the Steering Subsystem and set reward to infinity) would seem very appealing, and the agent would do it.
  • If the Thought Assessors don’t converge to 100% accuracy in predicting the reward that will result from a plan, then that’s the very definition of inner misalignment!

    [...]

    The thought “I will secretly hack into my own Steering Subsystem” is almost certainly not aligned with the designer’s intention. So a credit-assignment update that assigns more positive valence to “I will secretly hack into my own Steering Subsystem” is a bad update. We don’t want it. Does it increase “inner alignment”? I think we have to say “yes it does”, because it leads to better reward predictions! But I don’t care. I still don’t want it. It’s bad bad bad. We need to figure out how to prevent that particular credit-assignment Thought Assessor update from happening.

    [...]

    I think there’s a broader lesson here. I think “outer alignment versus inner alignment” is an excellent starting point for thinking about the alignment problem. But that doesn’t mean we should expect one solution to outer alignment, and a different unrelated solution to inner alignment. Some things—particularly interpretability—cut through both outer and inner layers, creating a direct bridge from the designer’s intentions to the AGI’s goals. We should be eagerly searching for things like that.

Yeah, there definitely seems to be something off about that categorization. I've thought a bit about how this stuff works in humans, particularly in this post of my moral anti-realism sequence. To give some quotes from that:

One of many takeaways I got from reading Kaj Sotala’s multi-agent models of mind sequence (as well as comments by him) is that we can model people as pursuers of deep-seated needs. In particular, we have subsystems (or “subagents”) in our minds devoted to various needs-meeting strategies. The subsystems contribute behavioral strategies and responses to help maneuver us toward states where our brain predicts our needs will be satisfied. We can view many of our beliefs, emotional reactions, and even our self-concept/identity as part of this set of strategies. Like life plans, life goals are “merely” components of people’s needs-meeting machinery.[8]

Still, as far as components of needs-meeting machinery go, life goals are pretty unusual. Having life goals means to care about an objective enough to (do one’s best to) disentangle success on it from the reasons we adopted said objective in the first place. The objective takes on a life of its own, and the two aims (meeting one’s needs vs. progressing toward the objective) come apart. Having a life goal means having a particular kind of mental organization so that “we” – particularly the rational, planning parts of our brain – come to identify with the goal more so than with our human needs.[9]

[...]

There’s a normative component to something as mundane as choosing leisure activities. [E.g., going skiing in the cold, or spending the weekend cozily at home.] In the weekend example, I’m not just trying to assess the answer to empirical questions like “Which activity would contain fewer seconds of suffering/happiness” or “Which activity would provide me with lasting happy memories.” I probably already know the answer to those questions. What’s difficult about deciding is that some of my internal motivations conflict. For example, is it more important to be comfortable, or do I want to lead an active life? When I make up my mind in these dilemma situations, I tend to reframe my options until the decision seems straightforward. I know I’ve found the right decision when there’s no lingering fear that the currently-favored option wouldn’t be mine, no fear that I’m caving to social pressures or acting (too much) out of akrasia, impulsivity or some other perceived weakness of character.[21]

We tend to have a lot of freedom in how we frame our decision options. We use this freedom, this reframing capacity, to become comfortable with the choices we are about to make. In case skiing wins out, then “warm and cozy” becomes “lazy and boring,” and “cold and tired” becomes “an opportunity to train resilience / apply Stoicism.” This reframing ability is a double-edged sword: it enables rationalizing, but it also allows us to stick to our beliefs and values when we’re facing temptations and other difficulties.

[...]

Visualizing the future with one life goal vs. another

Whether a given motivational pull – such as the need for adventure, or (e.g.,) the desire to have children – is a bias or a fundamental value is not set in stone; it depends on our other motivational pulls and the overarching self-concept we’ve formed.

Lastly, we also use “planning mode” to choose between life goals. A life goal is a part of our identity – just like one’s career or lifestyle (but it’s even more serious).

We can frame choosing between life goals as choosing between “My future with life goal A” and “My future with life goal B” (or “My future without a life goal”). (Note how this is relevantly similar to “My future on career path A” and “My future on career path B.”)

[...]

It’s important to note that choosing a life goal doesn’t necessarily mean that we predict ourselves to have the highest life satisfaction (let alone the most increased moment-to-moment well-being) with that life goal in the future. Instead, it means that we feel the most satisfied about the particular decision (to adopt the life goal) in the present, when we commit to the given plan, thinking about our future. Life goals inspired by moral considerations (e.g., altruism inspired by Peter Singer’s drowning child argument) are appealing despite their demandingness – they can provide a sense of purpose and responsibility.

So, it seems like we don't want "perfect inner alignment," at least not if inner alignment is about accurately predicting reward and then forming the plan of doing what gives you most reward. Also, there's a concept of "lock in" or "identifying more with the long-term planning part of your brain than with the underlying needs-meeting machinery." Lock in can be dangerous (if you lock in something that isn't automatically corrigible), but it might also be dangerous not to lock in anything (because this means you don't know what other goals form later on).

Idk, the whole thing seems to me like brewing a potion in Harry Potter, except that you don't have a recipe book and there's luck involved, too. "Outer alignment," a minimally sufficient degree thereof (as in: the agent tends to gets rewards when it takes actions towards the intended goal), increases the likelihood that you get broadly pointed you in the right direction, so the intended goal maybe gets considered among things the internal planner considers reinforcing itself around / orienting itself towards. But then, whether the intended gets picked over other alternatives (instrumental requirements for general intelligence, or alien motivations the AI might initially have), who knows. Like with raising a child, sometimes they turn out the way the parents intend, sometimes not at all. There's probably a science to finding out how outcomes become more likely, but even if we could do that with human children developing into adults with fixed identities, there's then still the question of how to find analogous patterns in (brain-like) AI. Tough job.

Conditioned Taste Aversion (CTA) is a phenomenon where, if I get nauseous right now, it causes an aversion to whatever tastes I was exposed to a few hours earlier—not a few seconds earlier, not a few days earlier, just a few hours earlier. (I alluded to CTA above, but not its timing aspect.) The evolutionary reason for this is straightforward: a few hours is presumably how long it typically takes for a toxic food to induce nausea.

That explains why my brother no longer likes mushrooms. When we were little, he liked them and we ate mushrooms at a restaurant, then were driven through curvy mountain roads later that day with the family. He got car sick and vomited, and afterwards he had an intense hatred for mushrooms.

Is that sort of configuration even biologically possible (or realistic)? I have no deep immunology understanding, but I think bad reactions to vaccines have little to nothing to do with whether you're up-to-date on previous vaccines. So far, I'm not sure we're good at predicting who reacts with more severe side effects than average (and if we did, it's not like it's easy to tweak the vaccine, except for tradeoff-y things like lowering the vaccination dose). 

My point is that I have no evidence that he ended up reading most of the relevant posts in their entirety. I don't think people who read all the posts in their entirety should just go ahead and unilaterally dox discussion participants, but I feel like people who have only read parts of it (or only secondhand sources) should do it even less

Also, at the time, I interpreted Roko's "request for a summary" more as a way for him to sneer at people. His "summary" had a lot of loaded terms and subjective judgments in it. Maybe this is a style thing, but I find that people should only (at most) write summaries like that if they're already well-informed. (E.g., Zvi's writing style can be like that, and I find it fine because he's usually really well-informed. But if I see him make a half-assed take on something he doesn't seem to be well-informed on, I'd downvote.) 

Load More