Thomas — LessWrong

LESSWRONG
LW

Thomas — LessWrong

Ah, a conditional VAE! Small question: Am I the only one that reserves 'z' for the latent variables of the autoencoder? You seem to be using it as the 'predictor state' input. Or am I reading it wrong?

Now I understand your P(z|Q,A) better, as it's just the conditional generator. But, how do you get P(A|Q)? That distribution need not be the same for the human known set and the total set.

I was wondering what happens in deployment when you meet a z that's not in your P(z,Q,A) (ie very small p). Would you be sampling P(z|Q,A) forever?

Replying toELK prize results

Thomas4y

ELK prize results

Could be! Though, in my head I see it as a self centering monte carlo sampling of a distribution mimicking some other training distribution, GANs not being the only one in that group. The drawback is that you can never leave that distribution; if your training is narrow, your model is narrow.

Replying toELK prize results

Thomas4y

ELK prize results

Ah, I missed that it was a generative model. If you don't mind I'd like to extend this discussion a bit. I think it's valuable (and fun).

I do still think it can go wrong. The joint distribution can shift after training by confounding factors and effect modification. And the latter is more dangerous, because for the purposes of reporting the confounder matters less (I think), but effect modification can move you outside any distribution you've seen in training. And it can be something really stupid you forgot in your training set, like the action to turn off the lights causing some sensors to work while others do not.

You might say, "ah, but... (read more)

Replying toELK prize results

Thomas4y

ELK prize results

Hmm, why would it require additional computation? The counterexample does not need to be an exact human imitator, only a not-translator that performs well in training. In the worst case there exist multiple parts of the activations of the predictor that correlate to "diamond", so multiple 'good results' by just shifting parameters in the model.

Replying toELK prize results

Thomas4y

ELK prize results

Is this precise enough?

As I read this, your proposal seems to hinge on a speed prior (or simplicity prior) over F such that F has good generalization from the simple training set to the complex world. I think you could be more precise if you'd explain how the speed prior (enforced by C) chooses direct translation over simulation. Reading your discussion, your disagreement seems to stem from a disagreement about the effect of the speed prior i.e. are translators faster or less complex than imitators?

Replying toELK prize results

Thomas4y

ELK prize results

Thanks for your reply!
You are right of course. The argument being more about building up evidence.

But, after thinking about it some more , I see now that any evidence gathered with a single known feature of priorless models (like the one I mentioned) would be so minuscule (approaching 0^[1]) that you'd need to combine unlikely amounts of features^[2]. You'd end up in (a worse version of) an arms race akin to the 'science it' example/counterexample scenario mentioned in the report and thus it's a dead end. By extension, all priorless models with or without a good training regime^[3], with or without a good loss-function^[4] , with or without some form of regularization^[4], all... (read more)

Replying toELK prize results

Thomas4y

ELK prize results

I agree with you (both). It's a framing difference iff you can translate back and forth. My thinking was that the problem might be setup so that it's "easy" to recognize but difficult to implement. If you can define a strategy which sets it up to be easy to recognize, that is.
Another way I thought about it, is that you can use your 'meta' knowledge about human imitators versus direct translators to give you a probability over all reporters. Approaching the problem not with certainty of a solution but with recognized uncertainty (I refrain from using 'known uncertainty' here, because knowing how much you don't know something is hard).

I obviously don't have... (read more)

Replying toELK prize results

Thomas4y*

ELK prize results

Congratulations to all the winners!
And, as said before: bravo to the swift processing of all the proposals! 197 proposals, wow!

Question to those who had to read all those proposals:
How about the category of proposals* that tried to identify the good from the bad reporter by hand (with some strategy) after training multiple reporters (post hoc instead of a priori)? Were there any good ones? Or can "identification by looking at it" (either in a simple or a complex way) be definitively ruled out (and as result, require all future efforts in research to be about creating a strategy that converges to a single point on the ridge of good solutions in parameter/function space)?

Edit: * to which "Use the reporter to define causal interventions on the predictor" sort of belongs

Replying toELK Proposal: Thinking Via A Human Imitator

Thomas4y

ELK Proposal: Thinking Via A Human Imitator

I really like you're putting up your proposal for all to read. It lets stumps like me learn a lot. So, thanks for showing it!

You asked for some ideas about things you want clarity on. I can't help you yet, but I'd like to ask some questions first, if I can.
Some about the technical implementation and some more to do with the logic behind your proposal.

First some technical questions:
1) What if the original AI is not some recurrent 'thinking' AI using compute steps, but something that learns a single function mapping from input to output? Like a neural net using 1 hidden layer. Would this still work?
2) When do we exactly train... (read 510 more words →)

Replying toPrizes for ELK proposals

Thomas4y

Prizes for ELK proposals

As before I am behind the curve. Above I concluded saying that I can form no prior belief about G as a function of I. I cannot, but we can learn a function to create our prior. Paul Christiano already wrote an article about learning the prior (https://www.lesswrong.com/posts/SL9mKhgdmDKXmxwE4/learning-the-prior).

So in conclusion, in the worst case no single function mapping I to G exists, as there are multiple reducing down to either camp translator or camp human-imitator. Without context we can form no strong prior due to the complexity of A and I, but as Paul described in his article we can learn a prior from for example in our case the dataset containing G as a function of A.

I'll add a tl;dr in my first post to shorten the read about how I slowly caught up to everyone else. Corrections are of course still welcome!