All of orthonormal's Comments + Replies

My current candidate definitions, with some significant issues in the footnotes:

A fair environment is a probabilistic function  from an array of actions to an array of payoffs. 

An agent  is a random variable 

 

which takes in a fair environment [1] and a list of agents (including itself), and outputs a mixed strategy over its available actions in [2]

A fair agent is one whose mixed strategy is a function of subjective probabilities[3] that... (read more)

How do you formalize the definition of a decision-theoretically fair problem, even when abstracting away the definition of an agent as well as embedded agency? 

I've failed to find anything in our literature.

It's simple to define a fair environment, given those abstractions: a function E from an array of actions to an array of payoffs, with no reference to any other details of the non-embedded agents that took those actions and received those payoffs.

However, fair problems are more than just fair environments: we want a definition of a fair problem (an... (read more)

2Gurkenglas
It sounds like you're trying to define unfair as evil.
2Vladimir_Nesov
It's an essential aspect of decision making for an agent to figure out where it might be. Thought experiments try to declare the current situation, but they don't necessarily need to be able to convincingly succeed. Algorithmic induction, such as updating from Solomonoff prior, is the basic way an agent figures out which situations it should care about, and declaring that we are working with a particular thought experiment doesn't affect the prior. In line with updatelessness, an agent should be ready for observations in general (according to which of them it cares about more), rather than particular "fair" observations, so distinguishing observations that describe "fair" thought experiments doesn't seem right either.
2orthonormal
My current candidate definitions, with some significant issues in the footnotes: A fair environment is a probabilistic function F(x1,...,xN)=[X1,...,XN] from an array of actions to an array of payoffs.  An agent A is a random variable  A(F,A1,...,Ai−1,Ai=A,Ai+1,...,AN)  which takes in a fair environment F[1] and a list of agents (including itself), and outputs a mixed strategy over its available actions in F. [2] A fair agent is one whose mixed strategy is a function of subjective probabilities[3] that it assigns to [the actions of some finite collection of agents in fair environments, where any agents not appearing in the original problem must themselves be fair].  Formally, if A is a fair agent in with a subjective probability estimator P, A's mixed strategy in a fair environment F, A(F,A1,...,Ai−1,Ai=A,Ai+1,...,AN) should depend only on a finite collection of A's subjective probabilities about outcomes  {P(Fk(A1,...,AN,B1,...BM))=[X1,...,XN+M]}Kk=1  for a set of fair environments F1,...,FK and an additional set of fair[4] agents[5] B1,...,BM if needed (note that not all agents need to appear in all environments).  A fair problem is a fair environment with one designated player, where all other agents are fair agents. 1. ^ I might need to require every F to have a default action dF, so that I don't need to worry about axiom-of-choice issues when defining an agent over the space of all fair environments. 2. ^ I specified a probabilistic environment and mixed strategies because I think there should be a unique fixed point for agents, such that this is well-defined for any fair environment F. (By analogy to reflective oracles.) But I might be wrong, or I might need further restrictions on F. 3. ^ Grossly underspecified. What kinds of properties are required for subjective probabilities here? You can obviously cheat by writing BlueEyedBot into your probability estimator. 4. ^ This is an infinite recursion, of cour

Over the past three years, as my timelines have shortened and my hopes for alignment or coordination have dwindled, I've switched over to consumption. I just make sure to keep a long runway, so that I could pivot if AGI progress is somehow halted or sputters out on its own or something.

The fault does not lie with Jacob, but wow, this post aged like an open bag of bread.

habryka*6864

It… was the fault of Jacob?

The post was misleading when it was written, and I think was called out as such by many people at the time. I think we should have some sympathy with Jacob being naive and being tricked, but surely a substantial amount of blame accrues to him for going to the bat for OpenAI when that turned out to be unjustified in the end (and at least somewhat predictably so).

I suggest a fourth default question for these reading groups:

How did this post age?

Soon the two are lost in a maze of words defined in other words, the problem that Steven Harnad once described as trying to learn Chinese from a Chinese/Chinese dictionary.

Of course, it turned out that LLMs do this just fine, thank you.

5gwern
I don't think LLMs do the equivalent of that. It's more like, learning Chinese from a Chinese/Chinese dictionary stapled to a Chinese encyclopedia. It is not obvious to me that using a Chinese/Chinese dictionary, purged of example sentences, would let you learn, even in theory, even things a simple n-grams or word2vec model trained on a non-dictionary corpus does and encodes into embeddings. For example, would a Chinese/Chinese dictionary let you plot cities by longitude & latitude? (Most dictionaries do not try to list all names, leaving that to things like atlases or gazetteers, because they are about the language, and not a specific place like China, after all.) Note that the various examples from machine translation you might think of, such as learning translation while having zero parallel sentences/translations, are usually using corpuses much richer than just an intra-language dictionary.
2Adele Lopez
I don't doubt that LLMs could do this, but has this exact thing actually been done somewhere?

intensional terms

Should probably link to Extensions and Intensions; not everyone reads these posts in order.

Mati described himself as a TPM since September 2023 (after being PM support since April 2022), and Andrei described himself as a Research Engineer from April 2023 to March 2024. Why do you believe either was not a FTE at the time?

And while failure to sign isn't proof of lack of desire to sign, the two are heavily correlated—otherwise it would be incredibly unlikely for the small Superalignment team to have so many members who signed late or not at all.

orthonormalΩ32812

With the sudden simultaneous exits of Mira Murati, Barret Zoph, and Bob McGrew, I thought I'd update my tally of the departures from OpenAI, collated with how quickly the ex-employee had signed the loyalty letter to Sam Altman last November. 

The letter was leaked at 505 signatures, 667 signatures, and finally 702 signatures; in the end, it was reported that 737 of 770 employees signed. Since then, I've been able to verify 56 departures of people who were full-time employees (as far as I can tell, contractors were not allowed to sign, but all FTEs were... (read more)

There are a few people in this list who I think are being counted incorrectly as FTEs (Mati and Andrei, for example).

I would also be careful about making inferences based on timing of supposed signature: I have heard that the signature Google Doc had crashed and so the process for adding names was slow and cumbersome. That is, the time at which someone’s name was added may have been significantly after they expressed desire to sign.

EDIT: On reflection, I made this a full Shortform post.

With the sudden simultaneous exits of Mira Murati, Barret Zoph, and Bob McGrew, I thought I'd do a more thorough scan of the departures. I still think I'm missing some, so these are lower bounds (modulo any mistakes I've made).

Headline numbers:

  • Attrition for the 505 OpenAI employees who signed before the letter was first leaked: at least 24/505 = 4.8%
  • Attrition for the next 197 to sign (it was leaked again at 667 signatures, and one last time at 702): at least 13/197 = 6.6%
  • Attrition for the (reported) 68
... (read more)

CDT agents respond well to threats

Might want to rephrase this as "CDT agents give in to threats"

If families are worried about the cost of groceries, they should welcome this price discrimination. The AI will realize you are worried about costs. It will offer you prime discounts to win your business. It will know you are willing to switch brands to get discounts, and use this to balance inventory.

Then it will go out and charge other people more, because they can afford to pay. Indeed, this is highly progressive policy. The wealthier you are, the more you will pay for groceries. What’s not to love?

A problem is that this is not only a tax on indifferenc... (read more)

Re: experience machine, Past Me would have refused it and Present Me would take it. The difference is due to a major (and seemingly irreversible) deterioration in my wellbeing several years ago, but not only because that makes the real world less enjoyable. 

Agency is another big reason to refuse the experience machine; if I think I can make a difference in the base-level world, I feel a moral responsibility towards it. But I experience significantly less agency now (and project less agency in the future), so that factor is diminished for me.

The main f... (read more)

  1. Thanks—I didn't recall the content of Yglesias' tweet, and I'd noped out of sorting through his long feed. I suspect Yglesias didn't understand why the numbers were weird, though, and people who read his tweet were even less likely to get it. And most significantly, he tries to draw a conclusion from a spurious fact!
  2. Allowing explicitly conditional markets with a different fee structure (ideally, all fees refunded on the counterfactual markets) could be an interesting public service on Manifold's part.
  3. The only part of my tone that worries me in retrospect i
... (read more)
2Nathan Young
I think the post moved me in your direction, so I think it was fine. 

I'm really impressed with your grace in writing this comment (as well as the one you wrote on the market itself), and it makes me feel better about Manifold's public epistemics.

Yes, and I gained some easy mana from such markets; but the market that got the most attention by far was the intrinsically flawed conditional market.

Real-money markets do have stronger incentives for sharps to scour for arbitrage, so the 1/1/26 market would have been more likely to be noticed before months had gone by.

However (depending on the fee structure for resolving N/A markets), real-money markets have even stronger incentives for sharps to stay away entirely from spurious conditional markets, since they'd be throwing away cash and not just Internet points. Never ever ever cite out-of-the-money conditional markets.

Broke: Prediction markets are an aggregation of opinions, weighted towards informed opinions by smart people, and are therefore a trustworthy forecasting tool on any question.

Woke: Prediction markets are MMOs set in a fantasy world where, if someone is Wrong On The Internet, you can take their lunch money.

Can you share any strong evidence that you're an unusually trustworthy person in regard to confidential conversations? People would in fact be risking a lot by talking to you.

(This is sincere btw; I think this service should absolutely exist, but the best version of it is probably done by someone with a longstanding public reputation of circumspection.)

Good question.

I can't really make this legible, no.

On the whistleblowing part, you should be able to get good advice without trusting me. It's publicly known that Kelsey Piper plus iirc one or two of the ex-OpenAI folks are happy to talk to potential whistleblowers. I should figure out exactly who that is and put their (publicly verifiable) contact info in this post (and, note to self, clarify whether or in-what-domains I endorse their advice vs merely want to make salient that it's available). Thanks.

[Oh, also ideally maybe I'd have a real system for anon... (read more)

Buck2416

I trust Zach a lot and would be shocked if he maliciously or carelessly leaked info. I don’t believe he’s very experienced at handling confidential information, but I expect him to seek out advice as necessary and overall do a good job here. Happy to say more to anyone interested.

Good question! I picked it up from a friend at a LW meetup a decade ago, so it didn't come with all the extra baggage that vipassana meditation seems to usually carry. So this is just going to be the echo of it that works for me.

Step 1 is to stare at your index finger (a very sensitive part of your body) and gently, patiently try to notice that it's still producing a background level of sensory stimulus even when it's not touching anything. That attention to the background signal, focused on a small patch of your body, is what the body scan is based on.

Ste... (read more)

orthonormalΩ462

I have to further compliment my past self: this section aged extremely well, prefiguring the Shoggoth-with-a-smiley-face analogies several years in advance.

GPT-3 is trained simply to predict continuations of text. So what would it actually optimize for, if it had a pretty good model of the world including itself and the ability to make plans in that world?

One might hope that because it's learning to imitate humans in an unsupervised way, that it would end up fairly human, or at least act in that way. I very much doubt this, for the following reason:

  • Two hum
... (read more)

The spun-off agent foundations team seems to have less reason than most AI safety orgs to be in the Bay Area, so moving to NZ might be worth considering for them.

Note on current methodology:

  • I am, for now, not doing further research when the spreadsheet lists a person whose name appears on the final leaked letter; so it's possible that some of the 23 departures among the 702 names on the final leaked letter are spurious. (I will be more thorough when I resolve the market after November.)
  • I am counting only full-time employees and not counting contractors, as I currently believe that the 770 figure refers only to full-time employees. So far, I've seen no contractors among those who signed, but I've only checked a few;
... (read more)

Counterpoint: other labs might become more paranoid that SSI is ahead of them. I think your point is probably more correct than the counterpoint, but it's worth mentioning.

Elon diversifies in the sense of "personally micromanaging more companies", not in the sense of "backing companies he can't micromanage".

By my assessment, the employees who failed to sign the final leaked version of the Altman loyalty letter have now been literally decimated.

I'm trying to track the relative attrition for a Manifold market: of the 265 OpenAI employees who hadn't yet signed the loyalty letter by the time it was first leaked, what percent will still be at OpenAI on the one-year anniversary?

I'm combining that first leaked copy with 505 signatures, the final leaked copy with 702 signatures, the oft-repeated total headcount of 770, and this spreadsheet tracking OpenAI departures ... (read more)

4orthonormal
EDIT: On reflection, I made this a full Shortform post. With the sudden simultaneous exits of Mira Murati, Barret Zoph, and Bob McGrew, I thought I'd do a more thorough scan of the departures. I still think I'm missing some, so these are lower bounds (modulo any mistakes I've made). Headline numbers: * Attrition for the 505 OpenAI employees who signed before the letter was first leaked: at least 24/505 = 4.8% * Attrition for the next 197 to sign (it was leaked again at 667 signatures, and one last time at 702): at least 13/197 = 6.6% * Attrition for the (reported) 68 who had not signed by the last leak: at least 19/68 = 27.9%. Reportedly, 737 out of the 770 signed in the end, and many of the Superalignment team chose not to sign at all. Below are my current tallies of some notable subsets. Please comment with any corrections! People from the Superalignment team who never signed as of the 702 leak (including some policy/governance people who seem to have been closely connected) and are now gone: * Carroll Wainwright * Collin Burns * Cullen O'Keefe * Daniel Kokotajlo * Jan Leike (though he did separately Tweet that the board should resign) * Jeffrey Wu * Jonathan Uesato * Leopold Aschenbrenner * Mati Roy * William Saunders * Yuri Burda People from the Superalignment team (and close collaborators) who did sign before the final leak but are now gone: * Jan Hendrik Kirchner (signed between 668 and 702) * Steven Bills (signed between 668 and 702) * John Schulman (signed between 506 and 667) * Sherry Lachman (signed between 506 and 667) * Ilya Sutskever (signed by 505) * Pavel Izmailov (signed by 505) * Ryan Lowe (signed by 505) * Todor Markov (signed by 505) Others who didn't sign as of the 702 leak (some of whom may have just been AFK for the wrong weekend, though I doubt that was true of Karpathy) and are now gone: * Andrei Alexandru (Research Engineer) * Andrej Karpathy (Co-Founder) * Austin Wiseman (Finance/Accounting) * Girish Sas
2orthonormal
Note on current methodology: * I am, for now, not doing further research when the spreadsheet lists a person whose name appears on the final leaked letter; so it's possible that some of the 23 departures among the 702 names on the final leaked letter are spurious. (I will be more thorough when I resolve the market after November.) * I am counting only full-time employees and not counting contractors, as I currently believe that the 770 figure refers only to full-time employees. So far, I've seen no contractors among those who signed, but I've only checked a few; if the letter includes some categories of contractors, this gets a lot harder to resolve. * I am counting nontechnical employees (e.g. recruiting, marketing) as well as technical staff, because such employees were among those who signed the letter.
Linch1813

"decimate" is one of those relatively rare words where the literal meaning is much less scary than the figurative meaning.

I'm not even angry, just disappointed.

habryka5855

I am angry and disappointed.

The diagnosis is roughly correct (I would say "most suffering is caused by an internal response of fleeing from pain but not escaping it"), but IMO the standard proffered remedy (Buddhist-style detachment from wanting) goes too far and promises too much.

Re: the diagnosis, three illustrative ways the relationship between pain, awareness, and suffering has manifested for me:

  • Migraines: I get them every few weeks, and they're pretty bad. After a friend showed me how to do a vipassana body scan, on a lark I tried moving my attention to the spot inside my skull
... (read more)
5jbkjr
The message of Buddhism isn’t “in order to not suffer, don’t want anything”; not craving/being averse doesn’t mean not having any intentions or preferences. Sure, if you crave the satisfaction of your preferences, or if you’re averse to their frustration, you will suffer, but intentions and preferences remain when craving/aversion/clinging is gone. It’s like a difference between “I’m not ok unless this preference is satisfied” and “I’d still like this preference to be satisfied, but I’ll ultimately be ok either way.”
4Ben
Not directly related, but I also get bad migraines. Would you kindly be able to point me towards somewhere I can read something useful about the vipassana body scan? Maybe I am overthinking it but the top hits when I searched looked low value.
orthonormal1713

Go all-in on lobbying the US and other governments to fully prohibit the training of frontier models beyond a certain level, in a way that OpenAI can't route around (so probably block Altman's foreign chip factory initiative, for instance).

The chess example is meant to make specific points about RL*F concealing a capability that remains (or is even amplified); I'm not trying to claim that the "put up a good fight but lose" criterion is analogous to current RL*F criteria. (Though it does rhyme qualitatively with "be helpful and harmless".)

I agree that "helpful-only" RL*F would result in a model that scores higher on capabilities evals than the base model, possibly much higher. I'm frankly a bit worried about even training that model.

Thank you! I'd forgotten about that.

Certainly RLHF can get the model to stop talking about a capability, but usually this is extremely obvious because the model gives you an explicit refusal?

How certain are you that this is always true (rather than "we've usually noticed this even though we haven't explicitly been checking for it in general"), and that it will continue to be so as models become stronger?

It seems to me like additionally running evals on base models is a highly reasonable precaution.

4Rohin Shah
My probability that (EDIT: for the model we evaluated) the base model outperforms the finetuned model (as I understand that statement) is so small that it is within the realm of probabilities that I am confused about how to reason about (i.e. model error clearly dominates). Intuitively (excluding things like model error), even 1 in a million feels like it could be too high. My probability that the model sometimes stops talking about some capability without giving you an explicit refusal is much higher (depending on how you operationalize it, I might be effectively-certain that this is true, i.e. >99%) but this is not fixed by running evals on base models. (Obviously there's a much much higher probability that I'm somehow misunderstanding what you mean. E.g. maybe you're imagining some effort to elicit capabilities with the base model (and for some reason you're not worried about the same failure mode there), maybe you allow for SFT but not RLHF, maybe you mean just avoid the safety tuning, etc)

I responded to this conversation in this comment on your corresponding post.

Oh wait, I misinterpreted you as using "much worse" to mean "much scarier", when instead you mean "much less capable".

I'd be glad if it were the case that RL*F doesn't hide any meaningful capabilities existing in the base model, but I'm not sure it is the case, and I'd sure like someone to check! It sure seems like RL*F is likely in some cases to get the model to stop explicitly talking about a capability it has (unless it is jailbroken on that subject), rather than to remove the capability.

(Imagine RL*Fing a base model to stop explicitly talking about arithmetic; are we sure it would un-learn the rules?)

2Rohin Shah
Oh yes, sorry for the confusion, I did mean "much less capable". Certainly RLHF can get the model to stop talking about a capability, but usually this is extremely obvious because the model gives you an explicit refusal? Certainly if we encountered that we would figure out some way to make that not happen any more.

That's exactly the point: if a model has bad capabilities and deceptive alignment, then testing the post-tuned model will return a false negative for those capabilities in deployment. Until we have the kind of interpretability tools that we could deeply trust to catch deceptive alignment, we should count any capability found in the base model as if it were present in the tuned model.

I'd like to see evals like DeepMind's run against the strongest pre-RL*F base models, since that actually tells you about capability.

2Rohin Shah
Surely you mean something else, e.g. models without safety tuning? If you run them on base models the scores will be much worse.

The WWII generation is negligible in 2024. The actual effect is partly the inverted demographic pyramid (older population means more women than men even under normal circumstances), and partly that even young Russian men die horrifically often:

At 2005 mortality rates, for example, only 7% of UK men but 37% of Russian men would die before the age of 55 years

And for that, a major culprit is alcohol (leading to accidents and violence, but also literally drinking oneself to death).

Among the men who don't self-destruct, I imagine a large fraction have already b... (read more)

That first statistic, that it swiped right 353 times and got to talk to 160 women, is completely insane. I mean, that’s almost a 50% match rate, whereas estimates in general are 4% to 14%.

Given Russia's fucked-up gender ratio (2.5 single women for every single man), I don't think it's that unreasonable!

Generally, the achievement of "guy finds a woman willing to accept a proposal" impresses me far less in Russia than it would in the USA. Let's see if this replicates in a competitive dating pool.

4Ben Pace
I am extremely surprised to read that Russia has such a harsh gender ratio (86 men for every 100 women), that's way more aggressive than even China (105 men to 100 women). I wanted to know why and so I interrogated ChatGPT for a bit, it explained the following: * The Soviet Union lost around 14% of its population during WWII whereas the UK and France each only lost around 2%. * Also (somehow) life expectancy for Russian men is 68, whereas for UK and French men it's like 79, and yet the women of Russia is like 78 (much closer to the UK and France's 83 and 85 respectively). I am surprised I haven't seen more thinkpieces written about gender dynamics in Russia, which I expect would sway heavily in the men's favor as they're the minority. I also generally update down on Russia's health and competence at war.

In high-leverage situations, you should arguably either be playing tic-tac-toe (simple, legible, predictable responses) or playing 4-D chess to win. If you're making really nonstandard and surprising moves (especially in PR), you have no excuse for winding up with a worse outcome than you would have if you'd acted in bog-standard normal ways.

(This doesn't mean suspending your ethics! Those are part of winning! But if you can't figure out how to win 4-D chess ethically, then you need to play an ethical tic-tac-toe strategy instead.)

Ah, I'm talking about introspection in a therapy context and not about exhorting others.

For example:

Internal coherence: "I forgive myself for doing that stupid thing".

Load-bearing but opaque: "It makes sense to forgive myself, and I want to, but for some reason I just can't".

Load-bearing and clear resistance: "I want other people to forgive themselves for things like that, but when I think about forgiving myself, I get a big NOPE NOPE NOPE".

P.S. Maybe forgiving oneself isn't actually the right thing to do at the moment! But it will also be easier to learn that in the third case than in the second.

"I endorse endorsing X" is a sign of a really promising topic for therapy (or your preferred modality of psychological growth).

If I can simply say "X", then I'm internally coherent enough on that point.

If I can only say "I endorse X", then not-X is psychologically load-bearing for me, but often in a way that is opaque to my conscious reasoning, so working on that conflict can be slippery.

But if I can only say "I endorse endorsing X", then not only is not-X load-bearing for me, but there's a clear feeling of resistance to X that I can consciously hone in on, connect with, and learn about.

2Dagon
I'd understand this better (and perhaps even agree) if there were a few examples and a few counter-examples to find the boundaries of when this is effective.   For myself, without more words like "I endorse endorsing X under Y conditions because X is good for those who are hearing the endorsement and not necessarily for the endorser", I don't see how it works.  The direct, unconditional form just makes me notice my dissonance and worry at it until I either endorse X or not-X (or neither - I'm allowed to be uncertain or ambivalent or just "context-dependent").  

Re: Canadian vs American health care, the reasonable policy would be:

"Sorry, publicly funded health care won't cover this, because the expected DALYs are too expensive. We do allow private clinics to sell you the procedure, though unless you're super wealthy I think the odds of success aren't worth the cost to your family."

(I also approve of euthanasia being offered as long as it's not a hard sell.)

I think MIRI is correct to call it as they see it, both on general principles and because if they turn out to be wrong about genuine alignment progress being very hard, people (at large, but also including us) should update against MIRI's viewpoints on other topics, and in favor of the viewpoints of whichever AI safety orgs called it more correctly.

2Wei Dai
I wrote this top level comment in part as a reply to you.

Yep, before I saw orthonormal's response I had a draft-reply written that says almost literally the same thing:

we just call 'em like we see 'em

[...]

insofar as we make bad predictions, we should get penalized for it. and insofar as we think alignment difficulty is the crux for 'why we need to shut it all down', we'd rather directly argue against illusory alignment progress (and directly acknowledge real major alignment progress as a real reason to be less confident of shutdown as a strategy) rather than redirect to something less cruxy

I'll also add: Nate (u... (read more)

I mean, I don't really care how much e.g. Facebook AI thinks they're racing right now. They're not in the game at this point.

The race dynamics are not just about who's leading. FB is 1-2 years behind (looking at LLM metrics), and it doesn't seem like they're getting further behind OpenAI/Anthropic with each generation, so I expect that the lag at the end will be at most a few years.

That means that if Facebook is unconstrained, the leading labs have only that much time to slow down for safety (or prepare a pivotal act) as they approach AGI before Facebook g... (read more)

I'm surprised that nobody has yet brought up the development that the board offered Dario Amodei the position as a merger with Anthropic (and Dario said no!).

(There's no additional important content in the original article by The Information, so I linked the Reuters paywall-free version.)

Crucially, this doesn't tell us in what order the board made this offer to Dario and the other known figures (GitHub CEO Nat Friedman and Scale AI CEO Alex Wang) before getting Emmett Shear, but it's plausible that merging with Anthropic was Plan A all along. Moreover, I s... (read more)

7Lukas_Gloor
Having a "plan A" requires detailed advance-planning. I think it's much more likely that their decision was reactive rather than plan-based. They felt strongly that Altman had to go based on stuff that happened, and so they followed procedures – appoint an interim CEO and do a standard CEO search. Of course, it's plausible – I'd even say likely – that an "Anthropic merger" was on their mind as something that could happen as a result of this further down the line. But I doubt (and hope not) that this thought made a difference to their decision. Reasoning: * If they had a detailed plan that was motivating their actions (as opposed to reacting to a new development and figuring out what to do as things go on), they would probably have put in a bit more time gathering more potentially incriminating evidence or trying to form social alliances.  For instance, even just, in the months or weeks before, visiting OpenAI and saying hi to employees, introducing themselves as the board, etc., would probably have improved staff's perception of how this went down. Similarly, gathering more evidence by, e.g., talking to people close to Altman but sympathetic to safety concerns, asking whether they feel heard in the company, etc, could have unearthed more ammunition. (It's interesting that even the safety-minded researchers at OpenAI basically sided with Altman here, or, at the very least, none of them came to the board's help speaking up against Altman on similar counts. [Though I guess it's hard to speak up "on similar counts" if people don't even really know their primary concerns apart from the vague "not always candid."]) * If the thought of an Anthropic merge did play a large role in their decision-making (in the sense of "making the difference" to whether they act on something across many otherwise-similar counterfactuals), that would constitute a bad kind of scheming/plotting. People who scheme like that are probably less likely than baseline to underestimate power pol

Has this one been confirmed yet? (Or is there more evidence that this reporting that something like this happened?)

No, I don't think the board's motives were power politics; I'm saying that they failed to account for the kind of political power moves that Sam would make in response.

In addition to this, Microsoft will exert greater pressure to extract mundane commercial utility from models, compared to pushing forward the frontier. Not sure how much that compensates for the second round of evaporative cooling of the safety-minded.

If they thought this would be the outcome of firing Sam, they would not have done so.

The risk they took was calculated, but man, are they bad at politics.

dr_s3115

I keep being confused by them not revealing their reasons. Whatever they are, there's no way that saying them out loud wouldn't give some ammo to those defending them, unless somehow between Friday and now they swung from "omg this is so serious we need to fire Altman NOW" to "oops looks like it was a nothingburger, we'll look stupid if we say it out loud". Do they think it's a literal infohazard or something? Is it such a serious accusation that it would involve the police to state it out loud?

0Chess3D
Interesting! Bad at politics is a good way to put it. So you think this was purely a political power move to remove Sam, and they were so bad at projecting the outcomes, all of them thought Greg would stay on board as President and employees would largely accept the change. 
  1. The quote is from Emmett Shear, not a board member.
  2. The board is also following the "don't say anything literally false" policy by saying practically nothing publicly.
  3. Just as I infer from Shear's qualifier that the firing did have something to do with safety, I infer from the board's public silence that their reason for the firing isn't one that would win back the departing OpenAI members (or would only do so at a cost that's not worth paying). 
  4. This is consistent with it being a safety concern shared by the superalignment team (who by and large didn't
... (read more)
5ryan_b
Ah, oops! My expectations are reversed for Shear; him I strongly expect to be as exact as humanly possible. With that update, I'm inclined to agree with your hypothesis.
5dr_s
That's the part that confuses me most. An NDA wouldn't be strong enough reason at this point. As you say, safety concerns might, but that seems pretty wild unless they literally already have AGI and are fighting over what to do with it. The other thing is anything that if said out loud might involve the police, so revealing the info would be itself an escalation (and possibly mutually assured destruction, if there's criminal liability on both sides). I got nothing.
Load More