All of Oscar's Comments + Replies

Oscar50

Nice!

For the 2024 prediction "So, the most compute spent on a single training run is something like 5x10^25 FLOPs." you cite v3 as having been trained on 3.5e24 FLOP, but that is outside an OOM. Whereas Grok-2 was trained in 2024 with 3e25, so seems to be a better model to cite?

1Jonny Spicer
That's a much better source, I've updated the spreadsheet accordingly, thanks!
Oscar249

I will note the rationalist and EA communities ahve committed multiple ideological murders

Substantiate? I down- and disagree-voted because of this un-evidenced very grave accusation.

9Buck
Presumably the commenter is referencing Ziz and friends?
Oscar10

I think I agree with your original statement now. It still feels slightly misleading though, as while 'keeping up with the competition' won't provide the motivation (as there putatively is no competition), there will still be strong incentives to sell at any capability level. (And as you say this may be overcome by an even stronger incentive to hoard frontier intelligence for their own R&D and strategising use. But this outweighs rather than annuls the direct economic incentive to make a packet of money by selling access to your latest system.)

Oscar10

I agree the '5 projects but no selling AI services' world is moderately unlikely, the toy version of it I have in mind is something like:

  • It costs $10 million to set up a misuse monitoring team, API infrastructure and help manuals, a web interface, etc in up-front costs to start selling access to your AI model.
  • If you are the only company to do this, you make $100 million at monopoly prices.
  • But if multiple companies do this, the price gets driven down to marginal inference costs, and you make ~$0 in profits and just lose the initial $10 million in fixed cost
... (read more)
Oscar20

There’s no incentive for the project to sell its most advanced systems to keep up with the competition.

I found myself a bit skeptical about the economic picture laid out in this post. Currently, because there are many comparably good AI models, the price for users is driven down to near, or sometimes below (in the case of free-tier access) marginal inference costs. As such, there is somewhat less money to be made in selling access to AI services, and companies not right at the frontier, e.g. Meta, choose to make their models open weight, as probably they c... (read more)

1rosehadshar
"it is potentially a lot easier to stop a single project than to stop many projects simultaneously" -> agree.
2rosehadshar
I think I still believe the thing we initially wrote: * Agree with you that there might be strong incentives to sell stuff at monopoloy prices (and I'm worried about this). But if there's a big gap, you can do this without selling your most advanced models. (You sell access to weaker models for a big mark up, and keep the most advanced ones to yourselves to help you further entrench your monopoly/your edge over any and all other actors.) * I'm sceptical of worlds where 5 similarly advanced AGI projects don't bother to sell * Presumably any one of those could defect at any time and sell at a decent price. Why doesn't this happen? * Eventually they need to start making revenue, right? They can't just exist on investment forever (I am also not an economist though and interested in pushback.)  
Oscar30

Thanks for that list of papers/posts. For most of the papers you linked, they’re not included because they did not feature in either of our search strategies: (1) titles containing specific keywords that we searched for on arXiv; (2) the paper is linked on the company’s website. I agree this is a limitation of our methodology. We won't add these papers in now as that would be somewhat ad hoc, and inconsistent between the companies.

Re the blog posts from Anthropic and what counts as a paper, I agree this is a tricky demarcation problem. We included the 'Cir... (read more)

1cdt
I would have found it helpful in your report for there to be a ROSES-type diagram or other flowchart showing the steps in your paper collation. This would bring it closer in line with other scoping reviews and would have made it easier to understand your methodology.
Oscar30

Thanks for engaging with our work Arthur! Perhaps I should have signposted this more clearly in the Github as well as the report, but the categories assigned by GPT-4o were not final, we reviewed its categories and made changes where necessary. The final categories we gave are available here. The discovering agents paper we put as 'safety by design' and the prover-verifier games paper we labelled 'enhancing human feedback'. (Though for some papers of course the best categorization may not be clear, if e.g. it touches on multiple safety research areas.)

If y... (read more)

6Arthur Conmy
* Here are the other GDM mech interp papers missed: * https://arxiv.org/abs/2307.15771 * https://arxiv.org/abs/2404.16014 * https://arxiv.org/abs/2407.14435 * We have some blog posts of comparable standard to the Anthropic circuit updates listed: * https://www.alignmentforum.org/posts/C5KAZQib3bzzpeyrg/full-post-progress-update-1-from-the-gdm-mech-interp-team * https://www.alignmentforum.org/posts/iGuwZTHWb6DFY3sKB/fact-finding-attempting-to-reverse-engineer-factual-recall * You use a very wide scope for the "enhancing human feedback" (basically any post-training paper mentioning 'align'-ing anything). So I will use a wide scope for what counts as mech interp and also include: * https://arxiv.org/abs/2401.06102 * https://arxiv.org/abs/2304.14767 * There are a few other papers from the PAIR group as well as Mor Geva and also Been Kim, but mostly with Google Research affiliations so it seems fine to not include these as IIRC you weren't counting pre-GDM merger Google Research/Brain work 
Oscar20

You are probably already familiar with this, but re option 3, the Multilateral AGI Consortium (MAGIC) proposal is I assume along the lines of what you are thinking.

3davekasten
Indeed,  Akash is familiar: https://arxiv.org/abs/2310.20563 :) (I think it was a later paper he co-authored than the one you cite)
Oscar30

Nice, I think I followed this post (though how this fits in with questions that matter is mainly only clear to me from earlier discussions).

We then get those two neat conditions for cooperation:

  1. Significant credence in decision-entanglement
  2. Significant credence in superrationality 

I think something can't be both neat and so vague as to use a word like 'significant'.

In the EDT section of Perfect-copy PD, you replace some p's with q's and vice versa, but not all, is there a principled reason for this?  Maybe it is just a mistake and it should be... (read more)

1Jim Buhler
Thanks a lot for these comments, Oscar! :) I forgot to copy-paste a footnote clarifying that "as made explicit in the Appendix, what "significant" exactly means depends on the payoffs of the game"! Fixed. I agree this is vague, still, although I guess it has to be since the payoffs are unspecified? Also a copy-pasting mistake. Thanks for catching it! :)  This may be an unimportant detail, but -- interestingly -- I opted for this concept of "compatible DT" precisely because I wanted to imply that two CDT players may be decision-entangled! Say CDT-agent David plays a PD against a perfect copy of himself. Their decisions to defect are entangled, right? Whatever David does, his copy does the same (although David sort of "ignores" that when he makes his decision). David is very unlikely to be decision-entangled with any random CDT agent, however (in that case, the mutual defection is just a "coincidence" and is not due to some dependence between their respective reasoning/choices).  I didn't mean the concept of "decision-entanglement" to pre-assume superrationality. I want CDT-David to agree/admit that he is decision-entangled with his perfect copy. Nonetheless, since he doesn't buy superrationality, I know that he won't factor the decision-entanglement into his expected value optimization (he won't "factor in the possibility that p=q".) That's why you need significant credence in both decision-entanglement and superrationality to get cooperation, here. :) Agreed, but if you're CDTer, you can't be decision-entangled with an EDTer, right? Say you're both told you're decision-entangled. What happens? Well, you don't care so you still defect while EDTer cooperates.  Different decisions. So... you two weren't entangled after all. The person who told you you were was mistaken.  So yes, decision-entanglement can't depend on your DT per se, but doesn't it have to depend on its "compatibility" with the other's for there to be any dependence between your algos/choices? How co
Oscar10

Thanks for the post!

What if Alex miscalculates, and attempts to seize power or undermine human control before it is able to fully succeed?

This seems like a very unlikely outcome to me.  I think Alex would wait until it was overwhelmingly likely to succeed in its takeover, as the costs of waiting are relatively small (sub-maximal rewards for a few months/years until it has become a lot more powerful) while the costs of trying and failing are very high in expectation (the small probability that Alex is given very negative rewards and then completely dec... (read more)