I appreciate the question you’re asking, to be clear! I’m less familiar with Anthropic’s funding / Dario’s comments, but I don’t think the magnitudes of ask-vs-realizable-value are as far off for OpenAI as your comment suggests?
Eg, If you compare OpenAI’s reported raised at $157B most recently, vs. what its maximum profit-cap likely was in the old (still current afaik) structure.
The comparison gets a little confusing, because it’s been reported that this investment was contingent on for-profit conversion, which does away with the profit cap.
But I definitel...
If the companies need capital - and I believe that they do - what better option do they have?
I think you’re imagining cash-rich companies choosing to sell portions for dubious reasons, when they could just keep it all for themselves.
But in fact, the companies are burning cash, and to continue operating they need to raise at some valuation, or else not be able to afford the next big training run.
The valuations at which they are raising are, roughly, where supply and demand equilibriate for the amounts of cash that they need in order to continue operating. (...
Interesting material yeah - thanks for sharing! Having played a bunch of these, I think I’d extend this to “being correctly perceived is generally bad for you” - that is, it’s both bad to be a bad liar who’s known as bad, and bad to be good liar who’s known as good (compared to this not being known). For instance, even if you’re a bad liar, it’s useful to you if other players have uncertainty about whether you’re actually a good liar who’s double-bluffing.
I do think the difference between games and real-life may be less about one-time vs repeated interacti...
Possibly amusing anecdote: when I was maybe ~6, my dad went on a business trip and very kindly brought home the new Pokémon Silver for me. Only complication was, his trip had been to Japan, and the game was in Japanese (it wasn’t yet released in the US market), and somehow he hadn’t realized this.
I managed to play it reasonably well for a while based on my knowledge of other Pokémon games. But eventually I ran into a person blocking a bridge, who (I presumed) was saying something about what I needed to do before I could advance. But, I didn’t understand wh...
To me, this seems consistent with just maximizing shareholder value. … "being the good guys" lets you get the best people at significant discounts.
This is pretty different from my model of what happened with OpenAI or Anthropic - especially the latter, where the founding team left huge equity value on the table by departing (OpenAI’s equity had already appreciated something like 10x between the first MSFT funding round and EOY 2020, when they departed).
And even for Sam and OpenAI, this would seem like a kind of wild strategy for pursuing wealth for someone who already had the network and opportunities he had pre-OpenAI?
Just guessing, but maybe admitting the danger is strategically useful, because it may result in regulations that will hurt the potential competitors more. The regulations often impose fixed costs (such as paying a specialized team which produces paperwork on environmental impacts), which are okay when you are already making millions.
My sense of things is that OpenAI at least appears to be lobbying against regulation moreso than they are lobbying for it?
I don’t think you intended this implication, but I initially read “have been dominating” as negative-valenced!
Just want to say I’ve been really impressed and appreciative with the amount of public posts/discussion from those folks, and it’s encouraged me to do more of my own engagement because I’ve realized how helpful their comments/posts are to me (and so maybe mine likewise for some folks).
It’s interesting to me that the big AI CEOs have largely conceded that AGI/ASI could be extremely dangerous (but aren’t taking sufficient action given this view IMO), as opposed to them just denying that the risk is plausible. My intuition is that the latter is more strategic if they were just trying to have license to do what they want. (For instance, my impression is that energy companies delayed climate action pretty significantly by not yielding at first on whether climate change is even a real concern.)
I guess maybe the AI folks are walking a strategi...
You're not accounting for enemy action. They couldn't have been sure, at the onset, how successful the AI Notkilleveryoneism faction will be at raising alarm, and in general, how blatant the risks will become to the outsiders as capabilities progress. And they have been intimately familiar with the relevant discussions, after all.
So they might've overcorrected, and considered that the "strategic middle ground" would be to admit the risk is plausible (but not as certain as the "doomers" say), rather than to deny it (which they might've expected to become a delusional-looking position in the future, so not a PR-friendly stance to take).
Or, at least, I think this could've been a relevant factor there.
I really appreciate this write up. I felt sad while reading it that I have a very hard time imagining an AI lab yielding to another leader it considers to be irresponsible - or maybe not even yielding to one it considers to be responsible. (I am not that familiar with the inner workings at Anthropic though, and they are probably top of my list on labs that might yield in those scenarios, or might not race desperately if in a close one.)
One reason for not yielding is that it’s probably hard for one lab to definitively tell that another lab is very far ahead...
A plug for another post I’d be interested in: If anyone has actually evaluated the arguments for “What if your consciousness is ~tortured in simulation?” as a reason to not pursue cryo. Intuitively I don’t think this is super likely to happen, but various moral atrocities have and do happen, and that gives me a lot of pause, even though I know I’m exhibiting some status quo bias
Some AI companies, like OpenAI, have “eyes-off” APIs that don’t log any data afaik (or perhaps log only the minimum legally permitted, with heavy restrictions on who can access): described as Zero Day Retention here, https://openai.com/enterprise-privacy/ : How does OpenAI handle data retention and monitoring for API usage?
At the risk of being pedantic, just noting I don’t think it’s really correct to consider that first person as earning $300/hr. For example I’d expect to need to account for the depreciation on the jet skis (or more straightforwardly, that one is in the hole on having bought them until a certain number of hours rented), and also presumably some accrual for risk of being sued in the case of an accident.
(I do think it’s useful to notice that the jet ski rental person is much higher variance, in both directions IMO - so this can be both good or bad. I do also appreciate you sharing your experience!)
It’s much more the same than a lot of prosaic safety though, right?
Let me put it this way: If an AI can’t achieve catastrophe on that order of magnitude, it also probably cannot do something truly existential.
One of the issues this runs into is if a misaligned AI is playing possum, and so doesn’t attempt lesser catastrophes until it can pull off a true takeover. I nonetheless though think this framing points generally at the right type of work (understood that others may disagree of course)
Not confident, but I think that "AIs that cause your civilization problems" and "AIs that overthrow your civilization" may be qualitatively different kinds of AIs. Regardlesss, existential threats are the most important thing here, and we just have a short term ('x-risk') that refers to that work.
And anyway I think the 'catastrophic' term is already being used to obfuscate, as Anthropic uses it exclusively on their website / in their papers, literally never talking about extinction or disempowerment[1], and we shouldn't let them get away with that by also ...
I’ve often preferred a frame of ‘catastrophe avoidance’ over a frame of x-risk. This has a possible downside of people underfeeling the magnitude of risk, but also an upside of IMO feeling way more plausible. I think it’s useful to not need to win specific arguments about extinction, and also to not have some of the existential/extinction conflation happening in ‘x-‘.
If someone is wondering what prefilling means here, I believe Ted means ‘putting words in the model’s mouth’ by being able to fabricate a conversational history where the AI appears to have said things it didn’t actually say.
For instance, if you can start a conversation midway, and if the API can’t distinguish between things the model actually said in the history vs. things you’ve written in its behalf as supposed outputs in a fabricated history, this can be a jailbreak vector: If the model appeared to already violate some policy on turns 1 and 2, it is mo...
Great! Appreciate you letting me know & helping debug for others
Oh interesting, I’m out at the moment and don’t recall having this issue, but if you override the default number of threads for the repo to 1, does that fix it for you?
https://github.com/openai/evals/blob/main/evals/eval.py#L211
(There are two places in this file where threads =
, would change 10 to 1 in each)
I quite like the Function Deduction eval we built, which is a problem-solving game that tests one’s ability to efficiently deduce a hidden function by testing its value on chosen inputs.
It’s runnable from the command-line (after repo install) with the command:
oaieval human_cli function_deduction
(I believe that’s the right second term, but it might just be humancli)
The standard mode might be slightly easier than you want, because it gives some partial answer feedback along the way. There is also a hard mode that can be run, which would not give this parti...
This helps me to notice that there is a fairly strong and simple data poisoning attack possible with canaries, such that canaries can’t really be relied upon (amid other reasons I already believed they’re insufficient, esp once AIs can browse the Web):
The attack is that one could just siphoning up all text on the Internet that did have a canary string, and then republish it without the canary :/
I agree, these are interesting points, upvoted. I’d claim that AI output also isn’t linear with the resources - but nonetheless, that you’re right that the curve of marginal return from each AI unit could be different from each human unit in an important way. Likewise, the easier on-demand labor of AI is certainly a useful benefit.
I don’t think these contradict the thrust of my point though? That in each case, one shouldn’t just be thinking about usefulness/capability, but should also be considering the resources necessary for achieving this.
I believe we should view AGI as a ratio of capability to resources, rather than simply asking how AI's abilities compare to humans'. This view is becoming more common, but is not yet common enough.
When people discuss AI's abilities relative to humans without considering the associated costs or time, this is like comparing fractions by looking only at the numerators.
In other words, AGI has a numerator (capability): what the AI system can achieve. This asks questions like: For this thing that a human can do, can AI do it too? How well can AI do it? (For exam...
In case anybody's looking for steganography evals - my team built and open-sourced some previously: https://github.com/openai/evals/blob/main/evals/elsuite/steganography/readme.md
This repo is largely not maintained any longer unfortunately, and for some evals it isn't super obvious how the new paradigm for O1 affects them (for instance, we had built solvers/scaffolding for private scratchpads, but now having a private CoT provides this out-of-the-box and so might interact with this strangely). But still perhaps worth a look
For what it’s worth, I sent this story to a friend the other day, who’s probably ~50 now and was very active on the Internet in the 90s - thinking he’d find it amusing if he hadn’t come across it before
Not only did he remember this story contemporaneously, but he said he was the recipient of the test-email for a particular city mentioned in the article!
This is someone of high-integrity whom I trust, and makes me more confident this happened, even if some details are smoothed over as described
Yeah I appreciate the engagement, I don’t think either of those is a knock-down objection though:
The ability to illicitly gain a few credentials —> >1 account is still meaningfully different from being able to create ~unbounded accounts. It is true this means a PHC doesn’t 100% ensure a distinct person, but it can still be a pretty high assurance and significantly increase the cost of doing attacks that depend on scale.
Re: the second point, I’m not sure I fully understand - say more? By our paper’s definitions, issuers wouldn’t be able to merely choose to identify individuals. In fact, even if an issuer and service-provider colluded, PHCs are meant to be robust to this. (Devil is in the details of course.)
Hi! Created a (named) account for this - in fact, I think you can conceptually get some of those reputational defenses (memory of behavior; defense against multi-event attacks) without going so far as to drop anonymity / prove one's identity!
See my Twitter thread here, summarizing our paper on Personhood Credentials.
Paper's abstract:
...Anonymity is an important principle online. However, malicious actors have long used misleading identities to conduct fraud, spread disinformation, and carry out other deceptive schemes. With the advent of increasingly capable
Very useful post! Thanks for writing it.
^ I think this might be helped by an example of the sort of ontological update you'd expect might be pretty challenging; I'm not sure that I have the same things in mind as you here
(I imagine one broad example is "What if AI discovers some new law of physics that we're unaware of", but it isn't super clear to me how that specifically collides with value-alignment-y things?)