All of arielroth's Comments + Replies

some of the Chinese food samples looked nauseating to me

Filtering for difficulty like that is tricky. In particular the most difficult samples are random noise or Chinese or something that the model can't begin to comprehend.

Some approaches I would consider:

Curriculum learning -- Have a bunch of checkpoints from a smaller GPT. Say the big GPT currently has a LM loss of 3. Then show it the examples where the smaller GPT's loss improved most rapidly when its average loss was 3.

Quality -- Put more effort into filtering out garbage and upsampling high quality corpuses like Wikipedia.

Retrieval -- Let the model look things up when its confused, like MARGE from Pretraining via Paraphrasing does.

Answer by arielroth20

No, the number of iterations is irrelevant. You can derive Kelly by trying to maximize your expected log wealth for a single bet. If you care about wealth instead of log wealth, then just bet the house every opportunity you get.

A bigger issue with Kelly is that it doesn't account for future income and debt streams. There should be an easy fix for that, but I need to think a bit.

5abramdemski
It's important that we can derive Kelly that way, but if that were the only derivation, it would not be so interesting. It begs the question: why log wealth? The derivation that does something interesting to pin down Kelly in particular is the one where we take the limit in iterations.

I think people generally use zero sum to refer to zero sum (or constant sum) rewards e.g. one seat in congress or one minute of a viewer's attention. Even rock, paper, scissors would be negative sum if someone tried to disturb his opponent's sleep or spent a million dollars bribing the ref or fanatically practiced for a million games.

2abramdemski
Even if we use a framework of rewards, it doesn't make sense to differentiate between zero sum, negative sum, positive sum, constant sum, etc. without (a) assuming that we can compare rewards across people (so you find the congress seat as rewarding as I would, etc) and (b) having a baseline to compare to (the two of us arm-wrestling for a candy bar is zero sum compared to a baseline of us somehow having to split the candy bar between us no matter what, positive sum if compared to a baseline where we wouldn't get any candy bar, and negative sum if we would have both gotten a candy bar otherwise).