All of Thomas's Comments + Replies

Ah, a conditional VAE! Small question: Am I the only one that reserves 'z' for the latent variables of the autoencoder? You seem to be using it as the 'predictor state' input. Or am I reading it wrong?

Now I understand your P(z|Q,A) better, as it's just the conditional generator. But, how do you get P(A|Q)? That distribution need not be the same for the human known set and the total set.

I was wondering what happens in deployment when you meet a z that's not in your P(z,Q,A) (ie very small p). Would you be sampling P(z|Q,A) forever?

1P.
You aren’t the only one, z is usually used for the latent, I just followed Paul’s notation to avoid confusion. P(A|Q) comes just from training on the QA pairs. But I did say “set any reasonable prior over answers”, because I expect P(z|Q,A) to be orders of magnitude higher for the right answer. Like I said in another comment, an image generator (that isn’t terrible) is incredibly unlikely to generate a cat from the input “dog”, so even big changes to the prior probably won’t matter much. That being said, machine learning rests on the IID assumption, regular reporters are no exception, they also incorporate P(A|Q), it’s just that here it is explicit. The whole point of VAEs is that the estimation of the probability of a sample is efficient (see section 2.1 here: https://arxiv.org/abs/1606.05908v1), so I don’t expect it to be a problem.

Could be! Though, in my head I see it as a self centering monte carlo sampling of a distribution mimicking some other training distribution, GANs not being the only one in that group. The drawback is that you can never leave that distribution; if your training is narrow, your model is narrow.

Ah, I missed that it was a generative model. If you don't mind I'd like to extend this discussion a bit. I think it's valuable (and fun).

I do still think it can go wrong. The joint distribution can shift after training by confounding factors and effect modification. And the latter is more dangerous, because for the purposes of reporting the confounder matters less (I think), but effect modification can move you outside any distribution you've seen in training. And it can be something really stupid you forgot in your training set, like the action to turn of... (read more)

Hmm, why would it require additional computation? The counterexample does not need to be an exact human imitator, only a not-translator that performs well in training. In the worst case there exist multiple parts of the activations of the predictor that correlate to "diamond", so multiple 'good results' by just shifting parameters in the model. 

2P.
A generative model isn’t like a regression model. If we have two variables that are strongly correlated and want to predict another variable, then we can shift the parameters to either of them and get very close to what we would get by using both. In a generative model on the other hand, we need to predict both, no value is privileged, we can’t just shift the parameters. See my reply to interstice on what I think would happen in the worst case scenario where we have failed with the implementation details and that happens. The result wouldn’t be that bad. If you can think of another algorithm as simple as a direct generator that performs well in training, say so. I think that almost by definition the direct generator is the simplest one. And if we make a good enough but still human level dataset (although this isn’t a requirement for my approach to work) the only robust and simple correlation that remains is the one we are interested in.

Is this precise enough? 

As I read this, your proposal seems to hinge on a speed prior (or simplicity prior) over F such that F has good generalization from the simple training set to the complex world. I think you could be more precise if you'd explain how the speed prior (enforced by C) chooses direct translation over simulation. Reading your discussion, your disagreement seems to stem from a disagreement about the effect of the speed prior i.e. are translators faster or less complex than imitators?

3P.
The disagreement stems from them not understanding my proposal, I hope that now it is clear what I meant. I explained in my submission why the speed prior works, but in a nutshell it is because both the “direct generator” and the “generator of worlds that a human thinks are consistent with a QA pair” need to generate the activations that represent the world, but the “direct generator” does it directly (e.g. “diamond” -> representation of a diamond), while the other one performs additional computation to determine what worlds a human thinks are consistent with a QA pair (e.g. “diamond” -> it might be a diamond, but it might also be just a screen in front of the camera and what is really in the vault is…), in the training set where the labels are always correct the direct generator is preferred.

Thanks for your reply!
You are right of course. The argument being more about building up evidence. 

But, after thinking about it some more , I see now that any evidence gathered with a single known  feature of priorless models (like the one I mentioned) would be so minuscule (approaching 0[1]) that you'd need to combine unlikely amounts of features[2]. You'd end up in (a worse version of) an arms race akin to the 'science it' example/counterexample scenario mentioned in the report and thus it's a dead end. By extension, all priorless models with o... (read more)

I agree with you (both). It's a framing difference iff you can translate back and forth. My thinking was that the problem might be setup so that it's "easy" to recognize but difficult to implement. If you can define a strategy which sets it up to be easy to recognize, that is. 
Another way I thought about it, is that you can use your 'meta' knowledge about human imitators versus direct translators to give you a probability over all reporters. Approaching the problem not with certainty of a solution but with recognized uncertainty (I refrain from using ... (read more)

3paulfchristiano
It seems like there are a ton of clusters though---there are exponentially many ways of modifying the human simulator to give different behavior off distribution, each of which is more likely than the direct translator (enough more likely that there are a whole cluster of equivalent variations each of which is still individually more likely than all of the direct translators put together).

Congratulations to all the winners! 
And, as said before: bravo to the swift processing of all the proposals! 197 proposals, wow!

Question to those who had to read all those proposals: 
How about the category of proposals* that tried to identify the good from the bad reporter by hand (with some strategy) after training multiple reporters (post hoc instead of a priori)? Were there any good ones? Or can "identification by looking at it" (either in a simple or a complex way) be definitively ruled out (and as result, require all future efforts in resear... (read more)

3paulfchristiano
I think we'd basically declare victory if a human could look at 2 reporters (or a bunch of reporters), one of which is a direct translator, and tell which it was The action is just in "how does the human tell?" And the counterexample is "what if the human doesn't think of anything clever, so they just have to use one of the current proposals (each of which has counterexamples)."
2Erik Jenner
I didn't see the proposals, but I think that almost all of the difficulty will be in how you can tell good from bad reporters by looking at them. If you have a precise enough description of how to do that, you can also use it as a regularizer. So the post hoc vs a priori thing you mention sounds more like a framing difference to me than fundamentally different categories. I'd guess that whether a proposal is promising depends mostly on how it tries to distinguish between the good and bad reporter, not whether it does so via regularization or via selection after training (since you can translate back and forth between those anyway). (Though as a side note, I'd usually expect regularization to be much more efficient in practice, since if your training process has a bias towards the bad reporter, it might be hard to get any good reporters at all.) If I'm mistaken, I'd be very interested to hear an example of a strategy that fundamentally only works once you have multiple trained models, rather than as a regularizer!

I really like you're putting up your proposal for all to read. It lets stumps like me learn a lot. So, thanks for showing it!


You asked for some ideas about things you want clarity on. I can't help you yet, but I'd like to ask some questions first, if I can.
Some about the technical implementation and some more to do with the logic behind your proposal.

First some technical questions:
1) What if the original AI is not some recurrent 'thinking' AI using compute steps, but something that learns a single function mapping from input to output? Like a neural net us... (read more)

2Charlie Steiner
I think the proposal is not to force the AI to funnel all its state through the human, but instead to make the human-simulation available as extra computing resources that the most-effective AIs should take advantage of.

As before I am behind the curve. Above I concluded saying that I can form no prior belief about G as a function of I. I cannot, but we can learn a function to create our prior. Paul Christiano already wrote  an article about learning the prior (https://www.lesswrong.com/posts/SL9mKhgdmDKXmxwE4/learning-the-prior).

So in conclusion, in the worst case no single function mapping I to G exists, as there are multiple reducing down to either camp translator or camp human-imitator. Without context we can form no strong prior due to the complexity of A and I, ... (read more)

Updating my first line of thought

RNNs break the Markov property in the sense that they depend on more than just the previous element in the sequence they are modelling. But I don't see why that would be relevant to ELK.

You're right in that RNNs don't have anything to do with ELK, but I came back to it because the Markov property was part of the lead up to saying that all parts of I are correlated. 

So with your help, I have to change my reasoning to:

In the worst case our reporter needs to learn the function between highly correlated I and our target G.

... (read more)
1Thomas
As before I am behind the curve. Above I concluded saying that I can form no prior belief about G as a function of I. I cannot, but we can learn a function to create our prior. Paul Christiano already wrote  an article about learning the prior (https://www.lesswrong.com/posts/SL9mKhgdmDKXmxwE4/learning-the-prior). So in conclusion, in the worst case no single function mapping I to G exists, as there are multiple reducing down to either camp translator or camp human-imitator. Without context we can form no strong prior due to the complexity of A and I, but as Paul described in his article we can learn a prior from for example in our case the dataset containing G as a function of A. I'll add a tl;dr in my first post to shorten the read about how I slowly caught up to everyone else. Corrections are of course still welcome!

Thank you very much for your reply!

I'll concede that the markov property does not make all nodes indistinguishable. I'll go further and say that not all algorithm's have to have the markov property. A google-search learned me that an RNN breaks the markov property. But then again, we are dealing with the worst-case-game, so with our luck, it'll probably be some highly correlated thing.

You suggest using some strong prior belief. I assume you mean a prior belief about I or about I -> G? I thought, but correct me if I'm wrong, that the opaqueness of the in... (read more)

2P.
RNNs break the Markov property in the sense that they depend on more than just the previous element in the sequence they are modelling. But I don't see why that would be relevant to ELK. When I say that a strong prior is needed I mean the same thing that Paul means when he writes: "We suspect you can’t solve ELK just by getting better data—you probably need to 'open up the black box' and include some term in the loss that depends on the structure of your model and not merely its behaviour.". Which is a very broad class of strategies. I also don't understand what you mean by having a strong idea about A->G, we of course have pairs of [A, G] in our training data but what we need to know is how to compute G from A given these pairs.
Thomas*Ω0120

tl;dr as of 18/2/2022
The goal is to educate me and maybe others. I make some statements, you tell me how wrong I am (please).

After input from P. (many thanks) and an article by Paul Christiano this statement stands yet uncorrected:

In the worst case, the internal state of the predictor is highly correlated within itself and multiple mappings with zero loss from the internal state to the desired extraction of information exist. The only solution is to work with some prior belief about how the internal state maps to the desired information. But as by design o... (read more)

2P.
The Markov property doesn't imply that we can't determine what variable we care about using some kind of "correlation". Some part of the information in some node in the chain might disappear when computing the next node, so we might be able to distinguish it from its successors. And it might also have been gained when randomly computing its value from the previous node, so it might be possible to distinguish it from its predecessors. In the worst case scenario where all variables are in fact correlated to G what we need to do is to use a strong prior so that it prefers the correct computational graph over the wrong ones. This might be hard but it isn't impossible. But you can also try to create a dataset that makes the problem easier to solve, or train a wrong reporter and only reply when the predictions made when using each node are the same so we don't care what node it actually uses (as long as it can use the nodes properly, instead of computing other node and using it to get the answer, or something like that).