Ah, I missed that it was a generative model. If you don't mind I'd like to extend this discussion a bit. I think it's valuable (and fun).
I do still think it can go wrong. The joint distribution can shift after training by confounding factors and effect modification. And the latter is more dangerous, because for the purposes of reporting the confounder matters less (I think), but effect modification can move you outside any distribution you've seen in training. And it can be something really stupid you forgot in your training set, like the action to turn of...
Hmm, why would it require additional computation? The counterexample does not need to be an exact human imitator, only a not-translator that performs well in training. In the worst case there exist multiple parts of the activations of the predictor that correlate to "diamond", so multiple 'good results' by just shifting parameters in the model.
Is this precise enough?
As I read this, your proposal seems to hinge on a speed prior (or simplicity prior) over F such that F has good generalization from the simple training set to the complex world. I think you could be more precise if you'd explain how the speed prior (enforced by C) chooses direct translation over simulation. Reading your discussion, your disagreement seems to stem from a disagreement about the effect of the speed prior i.e. are translators faster or less complex than imitators?
Thanks for your reply!
You are right of course. The argument being more about building up evidence.
But, after thinking about it some more , I see now that any evidence gathered with a single known feature of priorless models (like the one I mentioned) would be so minuscule (approaching 0[1]) that you'd need to combine unlikely amounts of features[2]. You'd end up in (a worse version of) an arms race akin to the 'science it' example/counterexample scenario mentioned in the report and thus it's a dead end. By extension, all priorless models with o...
I agree with you (both). It's a framing difference iff you can translate back and forth. My thinking was that the problem might be setup so that it's "easy" to recognize but difficult to implement. If you can define a strategy which sets it up to be easy to recognize, that is.
Another way I thought about it, is that you can use your 'meta' knowledge about human imitators versus direct translators to give you a probability over all reporters. Approaching the problem not with certainty of a solution but with recognized uncertainty (I refrain from using ...
Congratulations to all the winners!
And, as said before: bravo to the swift processing of all the proposals! 197 proposals, wow!
Question to those who had to read all those proposals:
How about the category of proposals* that tried to identify the good from the bad reporter by hand (with some strategy) after training multiple reporters (post hoc instead of a priori)? Were there any good ones? Or can "identification by looking at it" (either in a simple or a complex way) be definitively ruled out (and as result, require all future efforts in resear...
I really like you're putting up your proposal for all to read. It lets stumps like me learn a lot. So, thanks for showing it!
You asked for some ideas about things you want clarity on. I can't help you yet, but I'd like to ask some questions first, if I can.
Some about the technical implementation and some more to do with the logic behind your proposal.
First some technical questions:
1) What if the original AI is not some recurrent 'thinking' AI using compute steps, but something that learns a single function mapping from input to output? Like a neural net us...
As before I am behind the curve. Above I concluded saying that I can form no prior belief about G as a function of I. I cannot, but we can learn a function to create our prior. Paul Christiano already wrote an article about learning the prior (https://www.lesswrong.com/posts/SL9mKhgdmDKXmxwE4/learning-the-prior).
So in conclusion, in the worst case no single function mapping I to G exists, as there are multiple reducing down to either camp translator or camp human-imitator. Without context we can form no strong prior due to the complexity of A and I, ...
RNNs break the Markov property in the sense that they depend on more than just the previous element in the sequence they are modelling. But I don't see why that would be relevant to ELK.
You're right in that RNNs don't have anything to do with ELK, but I came back to it because the Markov property was part of the lead up to saying that all parts of I are correlated.
So with your help, I have to change my reasoning to:
...In the worst case our reporter needs to learn the function between highly correlated I and our target G.
Thank you very much for your reply!
I'll concede that the markov property does not make all nodes indistinguishable. I'll go further and say that not all algorithm's have to have the markov property. A google-search learned me that an RNN breaks the markov property. But then again, we are dealing with the worst-case-game, so with our luck, it'll probably be some highly correlated thing.
You suggest using some strong prior belief. I assume you mean a prior belief about I or about I -> G? I thought, but correct me if I'm wrong, that the opaqueness of the in...
tl;dr as of 18/2/2022
The goal is to educate me and maybe others. I make some statements, you tell me how wrong I am (please).
After input from P. (many thanks) and an article by Paul Christiano this statement stands yet uncorrected:
In the worst case, the internal state of the predictor is highly correlated within itself and multiple mappings with zero loss from the internal state to the desired extraction of information exist. The only solution is to work with some prior belief about how the internal state maps to the desired information. But as by design o...
Ah, a conditional VAE! Small question: Am I the only one that reserves 'z' for the latent variables of the autoencoder? You seem to be using it as the 'predictor state' input. Or am I reading it wrong?
Now I understand your P(z|Q,A) better, as it's just the conditional generator. But, how do you get P(A|Q)? That distribution need not be the same for the human known set and the total set.
I was wondering what happens in deployment when you meet a z that's not in your P(z,Q,A) (ie very small p). Would you be sampling P(z|Q,A) forever?