I'm currently researching forecasting and epistemics as part of the Quantified Uncertainty Research Institute.
In general, I'm happy to see people work to be more explicit about their viewpoints.
It seems like the team behind this disagrees with much of the rest of the AI safety efforts, in that you think other safety approaches and strategies are unlikely to succeed. Most of this part seems to be in the AI Safety section. Arguably this section provides a basic summary for those not very familiar with the area, but for those who disagree with you on these pieces, I suspect that this section isn't close to long or detailed enough to convince them.
I find that this section is very hand-wavy and metaphorical. I currently believe that AI oversight mechanisms, control, and careful scaling have a decent chance of maintaining reasonable alignment, if handled decently intelligently.
For example, this piece says,
> The appropriate analogy is not one researcher reviewing another, but rather a group of preschoolers reviewing the work of a million Einsteins. It might be easier and faster than doing the research itself, but it will still take years and years of effort and verification to check any single breakthrough.
I think it's likely we won't get a discrete step like that. It would be more like some smart scientists reviewing the work of some smarter scientists, but in a situation where the latter team is forced to reveal all of their in-progress thinking, and is evaluated in a very large range of extreme situations, and there are other clever strategies for inventive setting and oversight.
There also seems to be an implicit assumption that scaling will happen incredibly quickly after rough benchmarks of AGI are achieved, like a major discontinuous jump. I think this is possible, but I'm very unsure.
Yep, that makes sense to me.
One tiny point - I think the phrase "synthetic data" arguably breaks down at some point. "Synthetic data" sounds to me like, "we're generating fake data made to come from a similar distribution to 'real data'." But I assume that a lot of the data we'll get with inference will be things more like straightforward reasoning.
For example, we get O1 to solve a bunch of not-yet-recorded mathematical lemmas, then train the next model on those. Technically this is "synthetic data", but I don't see why this data is fundamentally different than similar mathematics that humans do. This data is typically the synthesis or distillation of much longer search and reasoning processes.
As such, it seems very sensible to me to expect "synthetic data" to be a major deal.
Another interesting take-away to me - I didn't realize that Microsoft was doing much training of it's own. It makes a lot of sense that they'd want their own teams making their own models, in part to hedge around OpenAI.
I'm curious what their strategy will be in the next few years.
This is neat, thanks for highlighting.
>The implication: If you don't have access to a 2024-frontier AI, you're going to have a hard time training the next frontier model. That gap will likely widen with each subsequent iteration.
This doesn't seem super clear to me. Without synthetic data, you need to scrape large parts of the web and manage a lot of storage infrastructure. This can either be done illegally, or with complex negotiations (especially as companies are catching on to this).
In comparison, it would be very useful if you could train a near-SOTA model with just synthetic data, say from an open-source model. This might not bring you all the way to SOTA, but close might be good enough for many things.
Caroline Jeanmaire
This seems like an error. From their page, it seems like the CEO is Sanjana Kashyap.
https://www.intelligencerising.org/about
Minor point, but I'd be happy if LessWrong/Lightcone had various (popular) subscriptions for perks, like Patreon.
Some potential perks:
I realize these can be a pain to set up though.
(I'd want this if it helped with total profit, to Lightcone)
Donated $300 now, intend to donate more (after more thinking).
My impression is that if you read LessWrong regularly, it could easily be worth $10-$30/month for you. If you've attended Lighthaven, there's an extra benefit there, which could be much more. So I think it's very reasonable for people in our community to donate $100 (a ~$10/month Substack subscription) to $10k (a fancy club membership) per person or so, depending on the person, just from the standpoint of thinking of it as communal/local infrastructure.
One potential point of contention is with the fact that I believe some of the team is working on future, more experimental projects, than just the LessWrong/Lighthaven basics. But I feel pretty good about this in-general, it's just more high-risk and more difficult to recommend.
I also think it's just good practice for community-focused projections to get donations from the community. This helps keep incentives more aligned. I think that the Lighthaven team is about as relevant as things get, on this topic, now.
I am not particularly sympathetic to your argument, which amounts to 'the public might pressure them to train away the inconvenient thoughts, so they shouldn't let the public see the inconvenient thoughts in the first place.'
I was attempting to make a descriptive claim about the challenges they would face, not a normative claim that it would be better if they wouldn't expose this information.
From a stance of global morality, it seems quite scary for one company to oversee and then hide all the epistemic reasoning of their tools.
I'd also guess that the main issue I raised, should rarely be the main problem with o1. I think that there is some limit of epistemic quality you can reach without offending users. But this is mainly for questions like, "How likely are different religions", not, 'what is the best way of coding this algorithm", which is what o1 seems more targeted towards now.
So I'd imagine that most cases in which the reasoning steps of o1 would look objectionable, would be ones that are straightforward technical problems, like the system lying in some steps or reasoning in weird ways.
Also, knowledge of these steps might just make it easier to crack/hack o1.
If I were a serious API users of an o1-type system, I'd seriously want to see the reasoning steps, at very least. I imagine that over time, API users will be able to get a lot of this from these sorts of systems.
If it is the case that a frontier is hit when the vast majority of objectionable-looking steps are due to true epistemic disagreements, then I think there's a different discussion to be had. It seems very safe to me to at least ensure that the middle steps are exposed to academic and government researchers. I'm less sure then of the implications of revealing this data to the public. It does seem like generally a really hard question to me. While I'm generally pro-transparency, if I were convinced then that full transparent reasoning would force these models to hold incorrect beliefs at a deeper level, I'd be worried.
I'm in support of this sort of work. Generally, I like the idea of dividing up LLM architectures into many separate components that could be individually overseen / aligned.
Separating "Shaggoth" / "Face" seems like a pretty reasonable division to me.
At the same time, there are definitely a lot of significant social+political challenges here.
I suspect that one reason why OpenAI doesn't expose all the thinking of O1 is that this thinking would upset some users, especially journalists and such. It's hard enough making sure that the final outputs are sufficiently unobjectionable to go public at a large scale. It seems harder to make sure the full set of steps is also unobjectionable.
One important thing that smart intellectuals do is to have objectionable/unpopular beliefs, but still present unobjectionable/popular outputs. For example, I'm sure many of us have beliefs that could get us cancelled by some group or other.
If the entire reasoning process is exposed, this might pressure it to be unobjectionable, even if that trades off against accuracy.
In general, I'm personally very much in favor of transparent thinking and argumentation. It's just that I've noticed this as one fundamental challenge with intellectual activity, and I also expect to see it here.
One other challenge to flag - I imagine that the Shaggoth and Face layers would require (or at least greatly benefit from) some communication back and forth. An intellectual analysis could vary heavily depending on the audience. It's not enough to do all the intellectual work in one pass, then match it to the audience after that.
For example, if an AI were tasked with designing changes to New York City, it might matter a lot that if the audience is a religious zealot.
One last tiny point - in future work, I'd lean against using the phrase "Shaggoth" as the reasoning step. It sort of makes sense to this audience, but I think it's a mediocre fit here. For one, because I assume the "Face" could have some "Shaggoth" like properties.
Separately, I'd flag that I'm not a huge fan of these bold name like "The Compendium." I get that the team might think the document justified the grandiosity, but from my perspective, I'm more skeptical. (I feel similarly with other names like "Situational Awareness."