Hi, thanks for the response! I apologize, the "Left as an exercise" line was mine, and written kind of tongue-in-cheek. The rough sketch of the proposition we had in the initial draft did not spell out sufficiently clearly what it was I want to demonstrate here and was also (as you point out correctly) wrong in the way it was stated. That wasted people's time and I feel pretty bad about it. Mea culpa.
I think/hope the current version of the statement is more complete and less wrong. (Although I also wouldn't be shocked if there are mistakes in there). Regar...
Hmm there was a bunch of back and forth on this point even before the first version of the post, with @Michael Oesterle and @metasemi arguing what you are arguing. My motivation for calling the token the state is that A) the math gets easier/cleaner that way and B) it matches my geometric intuitions. In particular, if I have a first-order dynamical system then is the state, not the trajectory of states . In this situation, the dynamics of the system only depend on the current state (that's because it's ...
Thanks for pointing this out! This argument made it into the revised version. I think because of finite precision it's reasonable to assume that such an always exists in practice (if we also assume that the probability gets rounded to something < 1).
Technically correct, thanks for pointing that out! This comment (and the ones like it) was the motivation for introducing the "non-degenerate" requirement into the text. In practice, the proposition holds pretty well - although I agree it would nice to have a deeper understanding of when to expect the transition rule to be "non-degenerate"
Thanks for sharing your thoughts Shos! :)
Hmmm good point. I originally made that decision because loading the image from the server was actually kind of slow. But then I figured out asynchronicity, so could totally change it... I'll see if I find some time later today to push an update! (to make an 'all vs all' mode in addition to the 'King of the hill')
Hi Jennifer!
Awesome, thank you for the thoughtful comment! The links are super interesting, reminds me of some of the research in empirical aesthetics I read forever ago.
On the topic of circular preferences: It turns out that the type of reward model I am training here handles non-transitive preferences in a "sensible" fashion. In particular, if you're "non-circular on average" (i.e. you only make accidental "mistakes" in your rating) then the model averages that out. And if you consitently have a loopy utility function, then the reward model will map all ...
Hi Erik! Thank you for the careful read, this is awesome!
Regarding proposition 1 - I think you're right, that counter-example disproves the proposition. The proposition we were actually going for was , i.e. the probability without the end of the bridge! I'll fix this in the post.
Regarding proposition II - janus had the same intuition and I tried to explain it with the following argument: When the distance between tokens becomes large enough, then eventually all bridges between the first token and an arbitrary second...
Uhhh exciting! Thanks for sharing!
Huh, thanks for spotting that! Yes, should totally be ELK 😀 Fixed it.
This work by Michael Aird and Justin Shovelain might also be relevant: "Using vector fields to visualise preferences and make them consistent"
And I have a post where I demonstrate that reward modeling can extract utility functions from non-transitive preference orderings: "Inferring utility functions from locally non-transitive preferences"
(Extremely cool project ideas btw)
Hey Ben! :) Thanks for the comment and the careful reading!
Yes, we only added the missing arx.iv papers after clustering, but then we repeat the dimensionality reduction and show that the original clustering still holds up even with the new papers (Figure 4 bottom right). I think that's pretty neat (especially since the dimensionality reduction doesn't "know" about the clustering) but of course the clusters might look slightly different if we also re-run k-means on the extended dataset.
There's an important caveat here:
The visual stimuli are presented 8 degrees over the visual field for 100ms followed by a 100ms grey mask as in a standard rapid serial visual presentation (RSVP) task.
I'd be willing to bet that if you give the macaque more than 100ms they'll get it right - That's at least how it is for humans!
(Not trying to shift the goalpost, it's a cool result! Just pointing at the next step.)
Great points, thanks for the comment! :) I agree that there are potentially some very low-hanging fruits. I could even imagine that some of these methods work better in artificial networks than in biological networks (less noise, more controlled environment).
But I believe one of the major bottlenecks might be that the weights and activations of an artificial neural network are just so difficult to access? Putting the weights and activations of a large model like GPT-3 under the microscope requires impressive hardware (running forward passes, storing the ac...
Great point! And thanks for the references :)
I'll change your background to Computational Cognitive Science in the table! (unless you object or think a different field is even more appropriate)
Thank you for the comment and the questions! :)
This is not clear from how we wrote the paper but we actually do the clustering in the full 768-dimensional space! If you look closely as the clustering plot you can see that the clusters are slightly overlapping - that would be impossible with k-means in 2D, since in that setting membership is determined by distance from the 2D centroid.
Oh true, I completely overlooked that! (if I keep collecting mistakes like this I'll soon have enough for a "My mistakes" page)
Yes, good point! I had that in an earlier draft and then removed it for simplicity and for the other argument you're making!
This sounds right to me! In particular, I just (re-)discovered this old post by Yudkowsky and this newer post by Alex Flint that both go a lot deeper on the topic. I think the optimal control perspective is a nice complement to those posts and if I find the time to look more into this then that work is probably the right direction.
As part of the AI Safety Camp our team is preparing a research report on the state of AI safety! Should be online within a week or two :)
Interesting, I added a note to the text highlighting this! I was not aware of that part of the story at all. That makes it more of a Moloch-example than a "mistaking adversarial for random"-example.
Yes, that's a pretty fair interpretation! The macroscopic/folk psychology notion of "surprise" of course doesn't map super cleanly onto the information-theoretic notion. But I tend to think of it as: there is a certain "expected surprise" about what future possible states might look like if everything evolves "as usual", . And then there is the (usually larger) "additional surprise" about the states that the AI might steer us into, . The delta between those two is the "excess surprise" that the AI needs to be able to bri...
Thank you for your comment! You are right, these things are not clear from this post at all and I did not do a good job at clarifying that. I'm a bit low on time atm, but hopefully, I'll be able to make some edits to the post to set the expectations for the reader more carefully.
The short answer to your question is: Yep, X is the space of events. In Vanessa's post it has to be compact and metric, I'm simplifying this to an interval in R. And can be derived from by plugging in g=0 and replacing the measure by the...
Cool paper, great to see the project worked out! (:
One question: How do you know the contractors weren't just answering randomly (or were confused about the task) in your "quality after filtering" experiments (Table 4)? Is there agreement across contractors about the quality of completions (in case they saw the same completions)?
Fascinating! Thanks for sharing!
Cool experiment! I could imagine that the tokenizer handicaps GPT's performance here (reversing the characters leads to completely different tokens). With a character-level tokenizer GPT should/might be able to handle that task better!
I was slightly surprised to find that even fine-tuning GPT-Neo-125M for a long time on many sequences of letters followed by spaces, followed by a colon, followed by the same sequence in reverse, was not enough to get it to pick up the pattern - probably because the positional encoding vectors make the difference between e.g. "18 tokens away" and "19 tokens away" a rather subtle difference. However, I then tried fine-tuning on a similar dataset with numbers in between (e.g. "1 W 2 O 3 R 4 D 5 S : 5 S 4 D 3 R 2 O 1 W") (or similar representation -- can't remember exactly, but something roughly like that) and it picked up the pattern right away. Data representation matters a lot!
Interesting, thank you! I guess I was thinking of deception as characterized by Evan Hubinger, with mesa-optimizers, bells, whistles, and all. But I can see how a sufficiently large competence-vs-performance gap could also count as deception.
Thanks for the comment! I'm curious about the Anthropic Codex code-vulnerability prompting, is this written up somewhere? The closest I could find is this, but. I don't think that's what you're referencing?
I was not aware of this, thanks for pointing this out! I made a note in the text. I guess this is not an example of "advanced AI with an unfortunately misspecified goal" but rather just an example of the much larger class of "system with an unfortunately misspecified goal".
Thanks for the comment, I did not know this! I'll put a note in the essay to highlight this comment.
Iiinteresting! Thanks for sharing! Yes, the choice of how to measure this affects the outcome a lot..
Hmm, fair, I think you might get along fine with my coworker from footnote 6 :) I'm not even sure there is a better way to write these titles - but they can still be very intimidating for an outsider.
Yes, I agree, a model can really push intuition to the next level! There is a failure mode where people just throw everything into a model and hope that the result will make sense. In my experience that just produces a mess, and you need some intuition for how to properly set up the model.
Hi! :) Thanks for the comment! Yes, that's on purpose, the idea is that a lot of the shorthand in molecular neuroscience are very hard to digest. So since the exact letters don't matter I intentionally garbled them with a Glitch Text Generator. But perhaps that isn't very clear without explanation, I'll add something.
This word Ǫ̵͎͊G̶̦̉̇l̶͉͇̝̽͆̚i̷͔̓̏͌c̷̱̙̍̂͜k̷̠͍͌l̷̢̍͗̃n̷̖͇̏̆å̴̤c̵̲̼̫͑̎̆ f.e. a garbled version of O-GLicklnac, which in term is the phonetic version of "O-GlcNAc"
Theory #4 appears very natural to me, especially in the light of papers like Chen et al 2006 or Cuntz et al 2012. And another supporting intuition from developmental neuroscience is that development is a huge mess and that figuring out where to put a long-range connection is really involved. And there can be a bunch of circuit remodeling on a local scale, once you established a long-range connection, there is little hope of substantially rewiring it.
In case you want to dive deeper into this (and you don't want to read all those papers), I'd be happy ...
I've been meaning to dive into this for-e-ver and only now find the time for it! This is really neat stuff, haven't enjoyed a framework this much since logical induction. Thank you for writing this!
Yep, I agree, SLIDE is probably a dud. Thanks for the references! And my inside view is also that current trends will probably continue and most interesting stuff will happen on AI-specialized hardware.
Thank you for the comment! You are right, that should be a ReLu in the illustration, I'll fix it :)
Great explanation, I feel substantially less confused now. And thank you for adding two new shoulder advisors to my repertorie :D
Thank you for the thoughtful reply!
3. I agree with your point, especially that should be true.
But I think I can salvage my point by making a further distinction. When I write I actually mean where is a semantic embedding that takes sentences to vectors. Already at the level of the embedding we probably have ...
Awesome, thanks for the feedback Eric! And glad to hear you enjoyed the post!
I'm confused why you're using a neural network
Good point, for the example post it was total overkill. The reason I went with a NN was to demonstrate the link with the usual setting in which preference learning is applied. And in general, NNs generalize better than the table-based approach ( see also my response to Charlie Steiner ).
happy to chat about that
I definitely plan to write a follow-up to this post, will come back to your offer when that follow-up reaches the front of my q...
Thanks for the comment! (:
Hey Charlie!
Good comment, gave me a feeling "oh, ups, why didn't I?" for a while. I think having the Elo-like algorithm as a baseline to compare to would have been a good thing to have in any case. But there is something that the NN can do that the Elo-like algorithm can't; generalization. Every "new" element (or even an interpolation of older elements) will get the "initial score" (like 1500 in chess) in Elo, while the NN can exploit similarities between the new element and older elements.
Fantastic, thank you for the pointer, learned something new today! A unique and explicit representation would be very neat indeed.
I'm pretty confused here.
Yeah, the feeling's mutual 😅 But the discussion is also very rewarding for me, thank you for engaging!
I am in favor of learning-from-scratch, and I am also in favor of specific designed inductive biases, and I don't think those two things are in opposition to each other.
A couple of thoughts:
Yes please, would be excited to see that!
Here's an operationalization. Suppose someday we write computer code that can do the exact same useful computational things that the neocortex (etc.) does, for the exact same reason. My question is: Might that code look like a learning-from-scratch algorithm?
Hmm, I see. If this is the crux, then I'll put all the remaining nitpicking at the end of my comment and just say: I think I'm on board with your argument. Yes, it seems conceivable to me that a learning-from-scratch program ends up in a (functionally) very similar state to the brain. The trajectory of...
Hey Steve! Thanks for writing this, it was an interesting and useful read! After our discussion in the LW comments, I wanted to get a better understanding of your thinking and this sequence is doing the job. Now I feel I can better engage in a technical discussion.
I can sympathize well with your struggle in section 2.6. A lot of the "big picture" neuroscience is in the stage where it's not even wrong. That being said, I don't think you'll find a lot of neuroscientists who nod along with your line of argument without raising objections here and there (neuro...
Thanks!
I don't think anyone except for Jeff Hawkin believes in literal cortical uniformity.
Not even him! Jeff Hawkins: "Mountcastle’s proposal that there is a common cortical algorithm doesn’t mean there are no variations. He knew that. The issue is how much is common in all cortical regions, and how much is different. The evidence suggests that there is a huge amount of commonality."
I mentioned "non-uniform neural architecture and hyperparameters". I'm inclined to put different layer thicknesses (including agranularity) in the category of "non-uniform hyp...
I enjoyed the footnote a lot :) And the entire story, of course. Thanks for writing!
Neuroscience and Natural Abstractions
Similarities in structure and function abound in biology; individual neurons that activate exclusively to particular oriented stimuli exist in animals from drosophila (Strother et al. 2017) via pigeons (Li et al. 2007) and turtles (Ammermueller et al. 1995) to macaques (De Valois et al. 1982). The universality of major functional response classes in biology suggests that the neural systems underlying information processing in biology might be highly stereotyped (Van Hooser, 2007, Scholl et al. 2013). In line with this h... (read more)