Leon Lang

I'm a last-year PhD student at the University of Amsterdam working on AI Safety and Alignment, and specifically safety risks of Reinforcement Learning from Human Feedback (RLHF). Previously, I also worked on abstract multivariate information theory and equivariant deep learning. https://langleon.github.io/

Wikitag Contributions

Comments

Sorted by

Only the output! I thought Mikhail was referring to the output here, as this is what we see for the IMO problems.

But as I see it now, the consensus seems to be something like "The chain of thought of new models does look like the IMO problem solutions, and if you don't train the model to produce final answers that look nice to humans, then they will look like the chain of thought. Probably the experimental model's answers were not yet trained to look nice".

Is this your position? I think that's pretty plausible. 

Here is a screenshot of a chain of thought from the blog post you link:

This looks different from the IMO solutions to me and doesn't have the patterns I mentioned. E.g., the sentences are grammatically complete.

Fwiw, I've recently used o3 a lot for requesting proofs, and it writes very differently.

Could you give an example of an RLed LLM that writes like these examples?

Though I agree with Rauno's comment that it does look like the chain of thought examples from the Baker et al. paper. 

As I understand it, we don’t actually see the chain of thought here but only the final submitted solution. And I don’t think that a pressure to save tokens would apply to that.

The proofs look very different from how LLMs typically write, and I wonder how that emerged. Much more concise. Most sentences are not fully grammatically complete. A bit like how a human would write if they don't care about form and only care about content and being logically persuasive. 

Thanks for the comment Stepan!

I think it's right that the distinction "lots of data" and "less data" doesn't really carve reality at its natural joints. I feel like your distinction between "discrete" and "continuous"  also doesn't fully do this since you could imagine a case of discrete  where we have only one  for each  in the dataset, and thus need regression, too (at least, in principle).

I think the real distinction is probably whether we have "several 's for each " in the dataset, or not. The twin dataset case has that, and so even though it's not a lot of data (only 32 pairs, or 64 total samples), we can essentially apply what I called the "lots of data" case.

Now, I have to admit that by this point I'm somewhat attached to the imperfect state of this post and won't edit it anymore. But I've strongly upvoted your comment and weakly agreed with it, and I hope some confused readers will find it. 

Thanks, I've replaced the word "likelihood" by "probability" in the comment above and in the post itself!

Thanks, I think this is an excellent comment that gives lots of useful context.

To summarize briefly what foorforthought has already expressed, what I meant with platoninc variance explained is the explained variance independent of a specific sample or statistical model, but as you rightly point out, this still depends on lots of context that depends on crucial details of study design or the population one studies. 

what is a measurable space?

I'm not sure if clarifying this is most useful for the purpose of understanding this post specifically, but for what it's worth: A measurable space is a set together with a set of subsets that are called "measurable". Those measurable sets are the sets to which we can then assign probabilities once we have a probability measure  (which in the post we assume to be derived from a density , see my other comment under your original comment).

"the function  is constant," you mean its just one outcome like a die that always lands on one side?

I think that's what the commenter you replied to means, yes. (They don't seem to be active anymore)

what makes a function measurable? 

This is another technicality that might not be too useful to think about for the purpose of this post. A function is measurable if the preimages of all measurable sets are measurable. I.e.: , for two measurable spaces  and , is measurable, if  is measurable for all measurable . For practical purposes, you can think of continuous functions or, in the discrete case, just any functions. 

I'm sorry that the terminology of random variables caused confusion!
If it helps, you can basically ignore the formalism of random variables and instead simply talk about the probability of certain events. For a random variable  with values in  and density ,  an event is (up to technicalities that you shouldn't care about) any subset . Its probability is given by the integral

In the case that  is discrete and not continuous (e.g., in the case that it is the set of all possible human DNA sequences), one would take a sum instead of an integral:

The connection to reality is that if we sample  from the random variable , then its probability of being in the event  is modeled as being precisely . I think with these definitions, it should be possible to read the post again without getting into the technicalities of what a random variable is. 

I think this post would be much easier to learn from if it was a jupyter notebook with python code intermixed or R markdown.

In the end of the article I link to this piece of code of how to do the twin study analysis. I hope that's somewhat helpful.

Load More