Does p=1 mean that all features are always on?
If yes, how did it fail to get perfect loss in this case?
johnswentworth's post Fixing The Good Regulator Theorem has the same definition of the Good Regulator Theorem.
That is enough for me to confirm that this is indeed what it says. Not because I trust John more than Alfred, but because there are now tow independent enough claims on LW for the same definition of the theorem, which would be very surprising if the definition was wrong.
If a regulator is 'good' (in the sense described by the two criteria in the previous section), then the variable R can be described as a deterministic function of S .
Really! This is the theorem?
Is there anyone else who understands the Good Regulator Theorem that can confirm this?
The reasons I'm surprised/confused are:
I think what is going on is something like what Scott describe in Can Things Be Both Popular And Silenced?
6. Celebrity helps launder taboo ideology. If you believe Muslim immigration is threatening, you might not be willing to say that aloud – especially if you’re an ordinary person who often trips on their tongue, and the precise words you use are the difference between “mainstream conservative belief” and “evil bigot who must be fired immediately”. Saying “I am really into Sam Harris” both leaves a lot of ambiguity, and lets you outsource the not-saying-the-wrong-word-and-getting-fired work to a professional who’s good at it. In contrast, if your belief is orthodox and you expect it to win you social approval, you want to be as direct as possible.
I don't think that admitting to believe in AI X-risk would get you labelled as evil, but it could possibly get you labelled as crazy (which is maybe worse than evil for intellectuals?). To be able to publicly admit to that view, you either have to be able to argue for it really well, or be able to outsource that arguing to someone else.
I don't know why this is happening with this book in particular though, since there are currently lots of books, blogposts, YouTube videos, etc, that explains AI X-risk. Maybe non of the previous work where good enough? Or respectable enough? Or advertised enough?
I think that currently the highest leverage action anyone (with the right skills for this) can do, is to increase awareness of AI risk. Given that you're the sort of person who succeeded in getting a few hundred thousand followers, I'm guessing you would be good at this.
Who do you have in mind, and what work? The line between safety and capabilities is blurry, and everyone disagrees about where it is.
Other reasons could be:
My theory is that safety ai folk are taught that a rules framework is how to provide oversight over the ai...like the idea that you can define constraints, logic gates, or formal objectives, and keep the system within bounds, like a classic control theory...
I don't know anyone in AI safety who have missed that fact that NNs are not GOFAI.
LW discussion norms is that you're supposed to say what you mean, and not leave people to guess, because this leads to more precise communication. E.g. I guessed that you did not mean what you literary wrote, because that would be dumb, but I don't know exactly what statement you're arguing for.
I know this is not standard communication practice in most places, but it is actually very valuable, you should try it.
I did some quick calculations for what the mse per feature should be for compressed storage. I.e. storing T features in D dimension where T > D.
I assume every feature is on with probably p. On feature equals 1, off feature equals 0. Mse is mean square error for linear readout of features.
For random embeddings (super possition):
mse_r = Tp/(T+D)
If using the D neurons to embed T features exactly, and output feature value constant p, for the rest.
mse_d = p(1-p)(T-D)/T
This suggest we should see a transition between these types of embeddings when
mse_r = mse_d when T^2/D^2=(p-1)/p
For T=100 and T=50, this means p=0.2
The model in this post is doing a bit more than just embedding features. But I don't think, it can't do better than the most effective embedding of the T output features in the D neurons?
mse_r only depends on E[(u dot v)^2]=1/D where v and u are diffrent embedding vectors. Lots of ebeddings have this property, e.g. embedding features along random basis vectors, i.e. assigning each feature to a random single neuron. This will result in some embedding vectors being exactly identical. But the mse (L2) loss is equally happy with this as with random (almost orthogonal) feature directions.