Sorry to be blunt, but any distraction filter that can be disabled through the chrome extension menu is essentially worthless. Speaking from experience, for most people this will work for exactly 3 days until they find a website they really want to visit and just "temporarily" disable the extension in order to see it.
For #5, I think the answer would be to make the AI produce the AI safety ideas which not only solve alignment, but also yield some aspect of capabilities growth along an axis that the big players care about, and in a way where the capabilities are not easily separable from the alignment. I can imagine this being the case if the AI safety idea somehow makes the AI much better at instruction-following using the spirit of the instruction (which is after all what we care about). The big players do care about having instruction-following AIs, and if the way to do that is to use the AI safety book, they will use it.
Do you expect Lecun to have been assuming that the entire field of RL stops existing in order to focus on his specific vision?
Very many things wrong with all of that:
This is very dumb, Lecun should know better, and I'm sure he *would* know better if he spent 5 minutes thinking about any of this.
The word "privilege" has been so tainted by its association with guilt that it's almost an infohazard to think you've got privilege at this point, it makes you lower your head in shame at having more than others, and brings about a self-flagellation sort of attitude. It elicits an instinct to lower yourself rather than bring others up. The proper reactions to all these things you've listed is gratitude to your circumstances and compassion towards those who don't have them. And certainly everyone should be very careful towards any instinct they have at publicly "acknowledging their privilege"... it's probably your status-raising instincts having found a good opportunity to boast about your intelligence, appearance and good looks while appearing like you're being modest.
Weird side effect to beware for retinoids: they make dry eyes worse, and in my experience this can significantly decrease your quality of life, especially if it prevents you from sleeping well.
Basically, this shows that every term in a standard Bayesian inference, including the prior ratio, can be re-cast as a likelihood term in a setting where you start off unsure about what words mean, and have a flat prior over which set of words is true.
If the possible meanings of your words are a continuous one-dimensional variable x, a flat prior over x will not be a flat prior if you change variables to y = f(y) for an arbitrary bijection f, and the construction would be sneaking in a specific choice of function f.
Say the words are utterances about the probability of a coin falling heads, why should the flat prior be over the probability p, instead of over the log-odds log(p/(1-p)) ?
Most of the weird stuff involving priors comes into being when you want posteriors over a continuous hypothesis space, where you get in trouble because reparametrizing your space changes the form of your prior, so a uniform "natural" prior is really a particular choice of parametrization. Using a discrete hypothesis space avoids big parts of the problem.
Wait, why doesn't the entropy of your posterior distribution capture this effect? In the basic example where we get to see samples from a bernoulli process, the posterior is a beta distribution that gets ever sharper around the truth. If you compute the entropy of the posterior, you might say something like "I'm unlikely to change my mind about this, my posterior only has 0.2 bits to go until zero entropy". That's already a quantity which estimates how much future evidence will influence your beliefs.
I suspect the expert judges would need to resort to known jailbreaking techniques to distinguish LLMs. A fair interesting test might be against expert-but-not-in-ML judges.