User Comment Replies

I wonder if you could take the R1-Zero training regime, penalize/restrict using existing words from all languages (maybe only in the scratchpad, not the final response), and obtain a model which can solve math problems by reasoning in a non-existent language.

3Milan W5mo

It may just use l33tc0d3 or palabres nonexistentes interpolatas d'idioms prossimos.

Jesse Hoogland's Shortform

Jakub Halmeš5mo40

During the training process, we observe that CoT often exhibits language mixing, particularly when RL prompts involve multiple languages. To mitigate the issue of language mixing, we introduce a language consistency reward during RL training, which is calculated as the proportion of target language words in the CoT. Although ablation experiments show that such alignment results in a slight degradation in the model’s performance, this reward aligns with human preferences, making it more readable.

I also found this trade-off between human readability and performance noteworthy.

3Milan W5mo

Side note: Claude 3.5 Sonnet does CoT language-mixing after a bit of prompting and convincing. I'm not sure about effects on performance. Also the closeness narratively implied by having it imitate the idiosyncratic mixture I was using to talk to it probably exacerbated sycophancy.

Jakub Halmeš's Shortform

Jakub Halmeš6mo10

Yes, fair here means that their subjective EVs are equal. The post referenced in the sibling comment calls it "Even Odds", which is probably better.

Jakub Halmeš's Shortform

Jakub Halmeš6mo10

I did not realize that. Thank you for the reference!

Jakub Halmeš's Shortform

Jakub Halmeš6mo*70

If Alice thinks X happens with a probability of 20% while Bob thinks it's 40%, what would be a fair bet between them?

I created a Claude Artifact, which calculates a bet such that the expected value is the same for both.

In this case, Bob wins if X happens (he thinks it's more likely). If Alice bets $100, he should bet $42.86, and the EV of such bet for both players (according to their beliefs) is $14.29.

EDIT: I updated the calculator to handle the case when A's probability is higher than B's correctly.

1Dagon6mo

The assumption that “equal monetary EV” is the definition of “fair” is questionable. In fact, any wager between 21% and 39% (narrower if transaction costs and risk-of-ruin are included) is fair from the standpoint of “ask participants prefer to make the bet vs declining”. If you do want to make it “fair” in terms of equal benefit to both, you probably need their utility-of-marginal-money calculations. If Alice really needs the money, it’s not “fair” for Bob to demand half of the monetary expectation. There’s also the fairness question of whether they are equally rational and well calibrated and have the same relevant information (hint: Aumann proved they don’t).

Unnamed6mo128

This is a bet at 30% probability, as 42.86/142.86 = .30001.

That is the average of Alice's probability and Bob's probability. The fair bet according to equal subjective EV is at the average of the two probabilities; previous discussion here.

The Inner Alignment Problem

Jakub Halmeš1y10

I wrote this mostly for personal purposes. I wanted to organize my thoughts about the problem while reading the paper, and publishing the notes, even if no one reads them, forces me to write more clearly and precisely.

I would like to get some feedback if there may be value in posts such as this one for other people. Please let me know! Thank you.

LESSWRONG
LW

All of Jakub Halmeš's Comments + Replies