It seems to me that, if the above description of how RLHF systems work is accurate, then the people who are doing this are not doing what they think they're doing at all. They are doing exactly what Sam Ringer says, they're taking 100 dogs, killing all the ones that don't do what they want and breeding from the ones that do.
In order for reinforcement learning to work at all, the model has to have a memory that persists between trials. I'd encourage readers to look at the work of the cognitive scientist, John Vervaeke. One of the many wise things he s...
I'm a big fan of the work of John Vervaeke, particularly the role of Relevance Realisation in helping (and hindering) us make good decisions. In this case, the Prescient alien is just a distraction from the salient facts, which are, in 100 trials, 100% of the time, the best choice is to take the opaque box.
In fact. Let's simplify the thought experiment:
I show you a coin. I tell you that it's a normal coin. I toss it 100 times. Every time it lands heads. The next time I toss it, what is the chance of it landing tails?
For those of you who said 50%, let me phrase the question another way: Given I have tossed a coin 100 times and it's landed heads every time, what is the probability that the coin is unbiased?
Apologies if this argument has been made before - I’ve had a quick scan through the comments and can’t see it so here goes: The rational choice is to one-box. The two-boxers are throwing away a critical piece of evidence: in 100 cases out of 100 so far, one-boxing is the right strategy. Therefore, based upon the observable evidence, there’s a less than 1% chance of two-boxing being the correct strategy. It’s irrational to argue that you should two-box. This argument maps on to the real world. In the real world you are never certain about the mechanism behi...
The above is, I'm afraid, an example of Survivor bias. Famous people have biographies written about them. These novels concentrate on aspects of their lives that are salient (though not necessarily relevant). There are probably thousands (if not millions) of people who had similar upbringings but who never got famous enough for someone to write a book about them. Other examples are: