Wiki Contributions

Comments

mishka11

if one is after VC funding, one needs to show those VCs that there is some secret sauce which remains proprietary

IMO software/algorithmic moat is pretty impossible to keep.

Indeed.

That is, unless the situation is highly non-stationary (that is, algorithms and methods are modified fast without stopping; of course, a foom would be one such situation, but I can imagine a more pedestrian "rapid fire" evolution of methods which goes at a good clip, but does not accelerate beyond reason).

mishka11

And I doubt that Microsoft or Google have a program dedicated to "trying everything that look promising", even though it is true that they have manpower and hardware to do just that. But would they choose to do that?

Actually I'm under the impression a lot of what they do is just sharing papers in a company slack and reproducing stuff at scale.

I'd love to have a better feel for how much of the promising things they try to reproduce at scale...

Unfortunately, I don't have enough inside access for that...

My mental model of the hardware poor is they want to publicize their results as fast as they can so they get more clout, VC funding, or just getting essentially acquired by big tech. Academic recognition in the form of citations drive researchers. Getting rich drives the founders.

There are all kinds of people. I think Schmidhuber's group might be happy to deliberately create an uncontrollable foom, if they can (they have Saudi funding, so I have no idea how much hardware do they actually have, and how much options for more hardware do they have contingent on preliminary results). Some other people just don't think their methods are strong enough to be that unsafe. Some people do care about safety (but still want to go ahead; some of those say "this is potentially risky, but in the future, and not right now", and they might be right or wrong). Some people feel their approach does increase safety (they might be right or wrong). A number of people are ideological (they feel that their preferred approach is not getting a fair shake from the research community, and they want to make a strong attempt to show that the community is wrong and myopic)...

I think most places tend to publish some of their results for the reasons you've stated, but they are also likely to hold some of stronger things back (at least, for a while); after all, if one is after VC funding, one needs to show those VCs that there is some secret sauce which remains proprietary...

mishka10

It's certainly true that having a lot of hardware is super-useful. One can try more things, one can pull more resources towards things deemed more important, one can do longer runs if a training scheme does not saturate, but keeps improving.

:-) And yes, I don't think a laptop with a 4090 is existentially dangerous (yet), and even a single installation with 8 H100s is probably not enough (at the current and near-future state of algorithmic art) :-)

But take a configuration worth a few million dollars, and one starts having some chances...

Of course, if a place with more hardware decides to adopt a non-standard scheme invented by a relatively hardware-poor place, the place with more hardware would win. But a non-standard scheme might be non-public, and even if it is public, people often have strong opinions about what to try and what not to try, and those opinions might interfere with a timely attempt.

I think that non-standard architectural and algorithmic breakthroughs can easily make smaller players competitive, especially as inertia of adherence to "what has been proven before" will inhibit the largest players.

Do these exist?

Yes, of course, we are seeing a rich stream of promising new things, ranging from evolutionary schemas (many of which tend towards open-endedness and therefore might be particularly unsafe, while very promising) to various derivatives of Mamba to potentially more interpretable architectures (like Kolmogorov-Arnold networks or like recent Memory Mosaics, which is an academic collaboration with Meta, but which has not been a consumer of significant compute yet) to GFlowNet motifs from Bengio group, and so on.

These things are mostly coming from places which seem to have "medium compute" (although we don't have exact knowledge about their compute): Schmidhuber's group, Sakana AI, Zyphra AI, Liquid AI, and so on. And I doubt that Microsoft or Google have a program dedicated to "trying everything that look promising", even though it is true that they have manpower and hardware to do just that. But would they choose to do that?

OK, so we are likely to have that (I don't think he is over-optimistic here), and the models are already very capable of discussing AI research papers and exhibit good comprehension of those papers (that's one of my main use cases for LLMs: to help me understand an AI research paper better and faster). And they will get better at that as well.

This really does not sound like AGI to me (or at least highly depends on what a coding project means here)

If it's an open-ended AI project, it sounds like "foom before AGI", with AGI-strength appearing at some point on the trajectory as a side-effect.

The key here is that when people discuss "foom", they usually tend to focus on a (rather strong) argument that AGI is likely to be sufficient for "foom". But AGI is not necessary for "foom", one can have "foom" fully in progress before full AGI is achieved ("the road to superintelligence goes not via human equivalence, but around it").

mishka26

I think this post might suffer from the lack of distinction between karma and agreement/disagreement on the level of posts. I don't think it deserves negative karma, but with this range of topics, it is certain to elicit a lot of disagreement.


Of course, one meta-issue is the diversity of opinion, both in the AI community and in the AI existential safety community.

The diversity of opinion in the AI community is huge, but it is somewhat obfuscated by "money, compute, and SOTA success" effects, which tend to create an artificial impression of consensus when one looks from the outside. But people often move from leading orgs to pursue less standard approaches, in particular, because large orgs are often not so friendly to those non-standard approaches.

The diversity of opinion in the AI existential safety community is at least as big (and is probably even larger, which is natural given that the field is much younger, with its progress being much less certain), but, in addition to that, the diversity is less obfuscated, because it does not have anything resembling the Transformer-based LLM highly successful center around which people can consolidate.

I doubt that the diversity of opinion in the AI existential safety community is likely to decrease, and I doubt that such a decrease would be desirable.


Another meta-issue is how much we should agree on the super-importance of compute. On this meta-issue, the consensus in the AI community and in the AI existential safety community is very strong (and in the case of the AI existential safety community, the reason for this consensus is that compute is, at least, a lever one could plausibly hope to regulate).

But is it actually that unquestionable? Even with Microsoft backing OpenAI, Google should have always been ahead of OpenAI, if it were just a matter of raw compute.

The Llama-3-70B training run is only in millions of GPU hours, so the cost of training can't much exceed 10 million dollars, and it is a model roughly equivalent to early GPT-4 in its power.

I think that non-standard architectural and algorithmic breakthroughs can easily make smaller players competitive, especially as inertia of adherence to "what has been proven before" will inhibit the largest players.


Then, finally, there is all this focus of conversations around "AGI", both in the AI community and in the AI existential safety community.

But for the purpose of existential safety we should not focus on "AGI" (whatever that might be). We should focus on a much more narrow ability of AI systems to accelerate AI research and development.

Here we are very close. E.g. John Schulman in his latest podcast with Dwarkesh said

Even in one or two years, we'll find that the models can do a lot more involved tasks than they can do now. For example, you could imagine having the models carry out a whole coding project instead of it giving you one suggestion on how to write a function. You could imagine the model taking high-level instructions on what to code and going out on its own, writing any files, and testing it, and looking at the output. It might even iterate on that a bit. So just much more complex tasks.

OK, so we are likely to have that (I don't think he is over-optimistic here), and the models are already very capable of discussing AI research papers and exhibit good comprehension of those papers (that's one of my main use cases for LLMs: to help me understand an AI research paper better and faster). And they will get better at that as well.

This combination of the coming ability of LLMs to do end-to-end software projects on their own and the increasing competence of LLMs in their comprehension of AI research sounds like a good reason to anticipate rapidly intensifying phenomenon of AI systems accelerating AI research and development faster and faster in a very near future. Hence the anticipation of very short timelines by many people (although this is still a minority view, even in the AI existential safety circles).

mishka109

The podcast is here: https://www.dwarkeshpatel.com/p/john-schulman?initial_medium=video

From reading the first 29 min of the transcript, my impression is: he is strong enough to lead an org to an AGI (it seems many people are strong enough to do this from our current level, the conversation does seem to show that we are pretty close), but I don't get the feeling that he is strong enough to deal with issues related to AI existential safety. At least, that's what my initial impression is :-(

mishka112

Jan Leike confirms: https://twitter.com/janleike/status/1790603862132596961

Dwarkesh is supposed to release his podcast with John Schulman today, so we can evaluate the quality of his thinking more closely (he is mostly known for reinforcement learning, https://scholar.google.com/citations?user=itSa94cAAAAJ&hl=en, although he has some track record of safety-related publications, including Unsolved Problems in ML Safety, 2021-2022, https://arxiv.org/abs/2109.13916 and Let's Verify Step by Step, https://arxiv.org/abs/2305.20050 which includes Jan Leike and Ilya Sutskever among its co-authors).

No confirmation of him becoming the new head of Superalignment yet...

mishka159

Ilya departure is momentous.

What do we know about those other departures? The NYT article has this:

Jan Leike, who ran the Super Alignment team alongside Dr. Sutskever, has also resigned from OpenAI. His role will be taken by John Schulman, another company co-founder.

I have not been able to find any other traces of this information yet.

We do know that Pavel Izmailov has joined xAI: https://izmailovpavel.github.io/

Leopold Aschenbrenner still lists OpenAI as his affiliation everywhere I see. The only recent traces of his activity seem to be likes on Twitter: https://twitter.com/leopoldasch/likes

mishka20

Thanks!

Interesting. I see a lot of people reporting their coding experience improving compared to GPT-4, but it looks like this is not uniform, that experience differs for different people (perhaps, depending on what they are doing)...

mishka10

What's your setup? Are you using it via ChatGPT interface or via API and a wrapper?

mishka141

This also points out that Arena tells you what model is Model A and what is Model B. That is unfortunate, and potentially taints the statistics.

No, https://chat.lmsys.org/ says this:

  • Ask any question to two anonymous models (e.g., ChatGPT, Claude, Llama) and vote for the better one!
  • You can chat for multiple turns until you identify a winner.
  • Votes won't be counted if model identities are revealed during the conversation.

So one can choose to know the names of the models one is talking with, but then one's votes will not be counted for the statistics.

Load More