All of Jan's Comments + Replies

Jan*120

Neuroscience and Natural Abstractions

Similarities in structure and function abound in biology; individual neurons that activate exclusively to particular oriented stimuli exist in animals from drosophila (Strother et al. 2017) via pigeons (Li et al. 2007) and turtles (Ammermueller et al. 1995) to macaques (De Valois et al. 1982). The universality of major functional response classes in biology suggests that the neural systems underlying information processing in biology might be highly stereotyped (Van Hooser, 2007, Scholl et al. 2013). In line with this h... (read more)

JanΩ330

Hi, thanks for the response! I apologize, the "Left as an exercise" line was mine, and written kind of tongue-in-cheek. The rough sketch of the proposition we had in the initial draft did not spell out sufficiently clearly what it was I want to demonstrate here and was also (as you point out correctly) wrong in the way it was stated. That wasted people's time and I feel pretty bad about it. Mea culpa.

I think/hope the current version of the statement is more complete and less wrong. (Although I also wouldn't be shocked if there are mistakes in there). Regar... (read more)

JanΩ110

Hmm there was a bunch of back and forth on this point even before the first version of the post, with @Michael Oesterle  and @metasemi arguing what you are arguing. My motivation for calling the token the state is that A) the math gets easier/cleaner that way and B) it matches my geometric intuitions. In particular, if I have a first-order dynamical system  then  is the state, not the trajectory of states . In this situation, the dynamics of the system only depend on the current state (that's because it's ... (read more)

JanΩ110

Thanks for pointing this out! This argument made it into the revised version. I think because of finite precision it's reasonable to assume that such an  always exists in practice (if we also assume that the probability gets rounded to something < 1).

JanΩ110

Technically correct, thanks for pointing that out! This comment (and the ones like it) was the motivation for introducing the "non-degenerate" requirement into the text. In practice, the proposition holds pretty well - although I agree it would nice to have a deeper understanding of when to expect the transition rule to be "non-degenerate"

Jan10

Hmmm good point. I originally made that decision because loading the image from the server was actually kind of slow. But then I figured out asynchronicity, so could totally change it... I'll see if I find some time later today to push an update! (to make an 'all vs all' mode in addition to the 'King of the hill')

Jan30

Hi Jennifer!

Awesome, thank you for the thoughtful comment! The links are super interesting, reminds me of some of the research in empirical aesthetics I read forever ago.

On the topic of circular preferences: It turns out that the type of reward model I am training here handles non-transitive preferences in a "sensible" fashion. In particular, if you're "non-circular on average" (i.e. you only make accidental "mistakes" in your rating) then the model averages that out. And if you consitently have a loopy utility function, then the reward model will map all ... (read more)

2JenniferRM
Interesting! I'm fascinated by the idea of a way to figure out the transitive relations via a "non-circular on average" assumption and might go hunt down the code to see how it works. I think humans (and likely dogs and maybe pigeons) have preference learning stuff that helps them remember and abstract early choices and early outcomes somehow, to bootstrap into skilled choosers pretty fast, but I've never really thought about the algorithms that might do this. It feels like stumbling across a whole potential microfield of cognitive science that I've never heard of before that is potentially important to friendliness research! (I have sent the DM. Thanks <3)  
Jan10

Hi Erik! Thank you for the careful read, this is awesome!

Regarding proposition 1 - I think you're right, that counter-example disproves the proposition. The proposition we were actually going for was  ,  i.e. the probability without the end of the bridge! I'll fix this in the post.

Regarding proposition II - janus had the same intuition and I tried to explain it with the following argument: When the distance between tokens becomes large enough, then eventually all bridges between the first token and an arbitrary second... (read more)

2Erik Jenner
In that case, I agree the monotonically decreasing version of the statement is correct. I think the limit still isn't necessarily zero, for the reasons I mention in my original comment. (Though I do agree it will be zero under somewhat reasonable assumptions, and in particular for LMs) One crux here is the "appropriately normalized": why should the normalization be linear, i.e. just B + 1? I buy that there are some important systems where this holds, and maybe it even holds for LMs, but it certainly won't be true in general (e.g. sometimes you need exponential normalization). Even modulo that issue, the claim still isn't obvious to me, but that may be a good point to start (i.e. an explanation of where the normalization factor comes from would plausibly also clear up my remaining skepticism).
Jan20

Huh, thanks for spotting that! Yes, should totally be ELK 😀 Fixed it.

JanΩ230

This work by Michael Aird and Justin Shovelain might also be relevant: "Using vector fields to visualise preferences and make them consistent"

And I have a post where I demonstrate that reward modeling can extract utility functions from non-transitive preference orderings: "Inferring utility functions from locally non-transitive preferences"

(Extremely cool project ideas btw)

Jan20

Hey Ben! :) Thanks for the comment and the careful reading!

Yes, we only added the missing arx.iv papers after clustering, but then we repeat the dimensionality reduction and show that the original clustering still holds up even with the new papers (Figure 4 bottom right). I think that's pretty neat (especially since the dimensionality reduction doesn't "know" about the clustering) but of course the clusters might look slightly different if we also re-run k-means on the extended dataset.

Jan30

There's an important caveat here:

The visual stimuli are presented 8 degrees over the visual field for 100ms followed by a 100ms grey mask as in a standard rapid serial visual presentation (RSVP) task.

I'd be willing to bet that if you give the macaque more than 100ms they'll get it right - That's at least how it is for humans!

(Not trying to shift the goalpost, it's a cool result! Just pointing at the next step.)

Jan20

Great points, thanks for the comment! :) I agree that there are potentially some very low-hanging fruits. I could even imagine that some of these methods work better in artificial networks than in biological networks (less noise, more controlled environment).

But I believe one of the major bottlenecks might be that the weights and activations of an artificial neural network are just so difficult to access? Putting the weights and activations of a large model like GPT-3 under the microscope requires impressive hardware (running forward passes, storing the ac... (read more)

Jan30

Great point! And thanks for the references :) 

I'll change your background to Computational Cognitive Science in the table! (unless you object or think a different field is even more appropriate)

Jan30

Thank you for the comment and the questions! :)

This is not clear from how we wrote the paper but we actually do the clustering in the full 768-dimensional space! If you look closely as the clustering plot you can see that the clusters are slightly overlapping - that would be impossible with k-means in 2D, since in that setting membership is determined by distance from the 2D centroid.

1jacopo
Ahh sorry! Going back to read it was pretty clear from the text. I was tricked by the figure where the embedding is presented first. Again, good job! :)
Jan30

Oh true, I completely overlooked that! (if I keep collecting mistakes like this I'll soon have enough for a "My mistakes" page)

Jan20

Yes, good point! I had that in an earlier draft and then removed it for simplicity and for the other argument you're making!

Jan20

This sounds right to me! In particular, I just (re-)discovered this old post by Yudkowsky and this newer post by Alex Flint that both go a lot deeper on the topic. I think the optimal control perspective is a nice complement to those posts and if I find the time to look more into this then that work is probably the right direction.

Answer by Jan40

As part of the AI Safety Camp our team is preparing a research report on the state of AI safety! Should be online within a week or two :)

7Cedar
Yooo! That sounds amazing. Please do let me know once that report is up!
Jan30

Interesting, I added a note to the text highlighting this! I was not aware of that part of the story at all. That makes it more of a Moloch-example than a "mistaking adversarial for random"-example.

4gwern
Yes, it is a cautionary lesson, just not the one people are taking it for; however, given the minimal documentation or transparency, there are a lot of better examples of short-term risk-heavy investments or blowing up (which are why Wall Street strictly segregates the risk-control departments from the trader and uses clawbacks etc) to tell, so the Zillow anecdote isn't really worth telling in any context (except perhaps, like the other stories, as an example of low epistemic standards and how leprechauns are born).
Jan20

Yes, that's a pretty fair interpretation! The macroscopic/folk psychology notion of "surprise" of course doesn't map super cleanly onto the information-theoretic notion. But I tend to think of it as: there is a certain "expected surprise" about what future possible states might look like if everything evolves "as usual", . And then there is the (usually larger) "additional surprise" about the states that the AI might steer us into, . The delta between those two is the "excess surprise" that the AI needs to be able to bri... (read more)

5Adam Jermyn
Thanks for clarifying! Maybe the 'actions -> nats' mapping can be sharpened if it's not an AI but a very naive search process? Say the controller can sample k outcomes at random before choosing one to actually achieve. I think that let's it get ~ln(k) extra nats of surprise, right? Then you can talk about the AI's ability to control things in terms of 'the number of random samples you'd need to draw to achieve this much improvement'. 
Jan10

Thank you for your comment! You are right, these things are not clear from this post at all and I did not do a good job at clarifying that. I'm a bit low on time atm, but hopefully, I'll be able to make some edits to the post to set the expectations for the reader more carefully.

The short answer to your question is: Yep, X is the space of events. In Vanessa's post it has to be compact and metric, I'm simplifying this to an interval in R. And  can be derived from  by plugging in g=0 and replacing the measure  by the... (read more)

Jan20

Cool paper, great to see the project worked out! (:

One question: How do you know the contractors weren't just answering randomly (or were confused about the task) in your "quality after filtering" experiments (Table 4)? Is there agreement across contractors about the quality of completions (in case they saw the same completions)?

6dmz
Thanks! :) Good question. Surge ran some internal auditing processes for all our quality data collection. We also checked 100 random comparisons ourselves for an earlier round of data and they seemed reasonable: we only disagreed with 5 of them, and 4 of those were a disagreement between "equally good" / "equally bad" and an answer one way or the other. (There were another 10 that seemed a bit borderline.) We don't have interrater reliability numbers here, though - that would be useful.
Jan10

Fascinating! Thanks for sharing!

Jan130

Cool experiment! I could imagine that the tokenizer handicaps GPT's performance here (reversing the characters leads to completely different tokens). With a character-level tokenizer GPT should/might be able to handle that task better!

9gwern
For the similar anagram task, I found space-separating (to avoid the BPE inconsistency/nondeterminism by forcing it to encode individual letters) seemed like it helped: https://gwern.net/GPT-3-nonfiction#anagrams For this task, I think a worthwhile followup would be to experiment with the new edit mode.

I was slightly surprised to find that even fine-tuning GPT-Neo-125M  for a long time on many sequences of letters followed by spaces, followed by a colon, followed by the same sequence in reverse, was not enough to get it to pick up the pattern - probably because the positional encoding vectors make the difference between e.g. "18 tokens away" and "19 tokens away" a rather subtle difference. However, I then tried fine-tuning on a similar dataset with numbers in between (e.g. "1 W 2 O 3 R 4 D 5 S : 5 S 4 D 3 R 2 O 1 W") (or similar representation -- can't remember exactly, but something roughly like that) and it picked up the pattern right away. Data representation matters a lot!

3Stuart_Armstrong
Possibly! Though it did seem to recognise that the words were spelt backwards. It must have some backwards spelt words in its training data, just not that many.
Jan20

Interesting, thank you! I guess I was thinking of deception as characterized by Evan Hubinger, with mesa-optimizers, bells, whistles, and all. But I can see how a sufficiently large competence-vs-performance gap could also count as deception.

Jan20

Thanks for the comment! I'm curious about the Anthropic Codex code-vulnerability prompting, is this written up somewhere? The closest I could find is this, but. I don't think that's what you're referencing?

8gwern
https://arxiv.org/pdf/2107.03374.pdf#page=27
Jan10

I was not aware of this, thanks for pointing this out! I made a note in the text. I guess this is not an example of "advanced AI with an unfortunately misspecified goal" but rather just an example of the much larger class of "system with an unfortunately misspecified goal".

Jan20

Thanks for the comment, I did not know this! I'll put a note in the essay to highlight this comment.

Jan20

Iiinteresting! Thanks for sharing! Yes, the choice of how to measure this affects the outcome a lot..

Jan10

Hmm, fair, I think you might get along fine with my coworker from footnote 6 :) I'm not even sure there is a better way to write these titles - but they can still be very intimidating for an outsider.

Jan20

Yes, I agree, a model can really push intuition to the next level! There is a failure mode where people just throw everything into a model and hope that the result will make sense. In my experience that just produces a mess, and you need some intuition for how to properly set up the model.

2Nathan Helm-Burger
Absolutely. In fact, I think the critical impediment to machine learning being able to learn more useful things from the current amassed neuroscience knowledge is: "but which of these many complicated bits are even worth including in the model?" There's just too much, and so much is noise, or incompletely understood such that our models of it are incomplete enough to be worse-than-useless. 
Jan50

Hi! :) Thanks for the comment! Yes, that's on purpose, the idea is that a lot of the shorthand in molecular neuroscience are very hard to digest. So since the exact letters don't matter I intentionally garbled them with a Glitch Text Generator. But perhaps that isn't very clear without explanation, I'll add something.

This word Ǫ̵͎͊G̶̦̉̇l̶͉͇̝̽͆̚i̷͔̓̏͌c̷̱̙̍̂͜k̷̠͍͌l̷̢̍͗̃n̷̖͇̏̆å̴̤c̵̲̼̫͑̎̆ f.e. a garbled version of O-GLicklnac, which in term is the phonetic version of "O-GlcNAc"

Jan60

Theory #4 appears very natural to me, especially in the light of papers like Chen et al 2006 or Cuntz et al 2012. And another supporting intuition from developmental neuroscience is that development is a huge mess and that figuring out where to put a long-range connection is really involved.  And there can be a bunch of circuit remodeling on a local scale, once you established a long-range connection, there is little hope of substantially rewiring it.

In case you want to dive deeper into this (and you don't want to read all those papers), I'd be happy ... (read more)

4Lucius Bushnaq
Yes, a chat could definitely be valuable. I'll pm you. I agree that connection costs definitely look like a real, modularity promoting effect. Leaving aside all the empirical evidence, I have some trouble imagining how it could plausibly not be. If you put a ceiling on how many connections there can be, the network has got to stick to the most necessary ones. And since some features of the world/input data are just more "interlinked" than others, it's hard to see how the network wouldn't be forced to reflect that in some capacity. I just don't think it's the only modularity promoting effect.
Jan20

I've been meaning to dive into this for-e-ver and only now find the time for it! This is really neat stuff, haven't enjoyed a framework this much since logical induction. Thank you for writing this!

Jan10

Yep, I agree, SLIDE is probably a dud. Thanks for the references! And my inside view is also that current trends will probably continue and most interesting stuff will happen on AI-specialized hardware.

Jan20

Thank you for the comment! You are right, that should be a ReLu in the illustration, I'll fix it :)

Jan20

Great explanation, I feel substantially less confused now. And thank you for adding two new shoulder advisors to my repertorie :D

Jan20

Thank you for the thoughtful reply!

3. I agree with your point, especially that  should be true.

But I think I can salvage my point by making a further distinction. When I write  I actually mean  where  is a semantic embedding that takes sentences to vectors. Already at the level of the embedding we probably have ... (read more)

1Kaarel
I still disagree / am confused. If it's indeed the case that emb(chocolate ice cream and vanilla ice cream)≠emb(chocolate ice cream)+emb(vanilla ice cream), then why would we expect u(emb(chocolate ice cream and vanilla ice cream))=u(emb(chocolate ice cream))+u(emb(vanilla ice cream))? (Also, in the second-to-last sentence of your comment, it looks like you say the former is an equality.) Furthermore, if the latter equality is true, wouldn't it imply that the utility we get from [chocolate ice cream and vanilla ice cream] is the sum of the utility from chocolate ice cream and the utility from vanilla ice cream? Isn't u(emb(X)) supposed to be equal to the utility of X? My current best attempt to understand/steelman this is to accept emb(chocolate ice cream and vanilla ice cream)≠emb(chocolate ice cream)+emb(vanilla ice cream), to reject u(emb(chocolate ice cream and vanilla ice cream))=u(emb(chocolate ice cream))+u(emb(vanilla ice cream)), and to try to think of the embedding as something slightly strange. I don't see a reason to think utility would be linear in current semantic embeddings of natural language or of a programming language, nor do I see an appealing other approach to construct such an embedding. Maybe we could figure out a correct embedding if we had access to lots of data about the agent's preferences (possibly in addition to some semantic/physical data), but it feels like that might defeat the idea of this embedding in the context of this post as constituting a step that does not yet depend on preference data. Or alternatively, if we are fine with using preference data on this step, maybe we could find a cool embedding, but in that case, it seems very likely that it would also just give us a one-step solution to the entire problem of computing a set of rational preferences for the agent. A separate attempt to steelman this would be to assume that we have access to a semantic embedding pretrained on preference data from a bunch of other agents, and
Jan10

Awesome, thanks for the feedback Eric! And glad to hear you enjoyed the post!

I'm confused why you're using a neural network

Good point, for the example post it was total overkill. The reason I went with a NN was to demonstrate the link with the usual setting in which preference learning is applied. And in general, NNs generalize better than the table-based approach ( see also my response to Charlie Steiner ).

happy to chat about that

I definitely plan to write a follow-up to this post, will come back to your offer when that follow-up reaches the front of my q... (read more)

Jan20

Thanks for the comment! (:

  1. True, fixed it! I was confused there for a bit.
  2. This is also true. I wrote it like this because the proof sketch on Wikipedia included that step. And I guess if step 3 can't be executed (complicated), then it's nice to have the sorted list as a next-best-thing.
  3. Those are interesting points and I'm not sure I have a good answer (because the underlying problems are quite deep, I think). My statement about linearity in semantic embeddings is motivated by something like the famous "King – Man + Woman = Queen" from word2vec. Regarding li
... (read more)
1Kaarel
3. Ahh okay thanks, I have a better picture of what you mean by a basis of possibility space now. I still doubt that utility interacts nicely with this linear structure though. The utility function is linear in lotteries, but this is distinct from being linear in possibilities. Like, if I understand your idea on that step correctly, you want to find a basis of possibility-space, not lottery space. (A basis on lottery space is easy to find -- just take all the trivial lotteries, i.e. those where some outcome has probability 1.) To give an example of the contrast: if the utility I get from a life with vanilla ice cream is u_1 and the utility I get from a life with chocolate ice cream is u_2, then the utility of a lottery with 50% chance of each is indeed 0.5 u_1 + 0.5 u_2. But what I think you need on that step is something different. You want to say something like "the utility of the life where I get both vanilla ice cream and chocolate ice cream is u_1+u_2". But this still seems just morally false to me. I think the mistake you are making in the derivation you give in your comment is interpreting the numerical coefficients in front of events as both probabilities of events or lotteries and as multiplication in the linear space you propose. The former is fine and correct, but I think the latter is not fine. So in particular, when you write u(2A), in the notation of the source you link, this can only mean "the utility you get from a lottery where the probability of A is 2", which does not make sense assuming you don't allow your probabilities to be >1. Or even if you do allow probabilities >1, it still won't give you what you want. In particular, if A is a life with vanilla ice cream, then in their notation, 2A does not refer to a life with twice the quantity of vanilla ice cream, or whatever.  4. I think the gradient part of the Hodge decomposition is not (in general) the same as the ranking with the minimal number of incorrect pairs. Fun stuff
Jan30

Hey Charlie! 

Good comment, gave me a feeling "oh, ups, why didn't I?" for a while. I think having the Elo-like algorithm as a baseline to compare to would have been a good thing to have in any case. But there is something that the NN can do that the Elo-like algorithm can't; generalization. Every "new" element (or even an interpolation of older elements) will get the "initial score" (like 1500 in chess) in Elo, while the NN can exploit similarities between the new element and older elements.

JanΩ010

Fantastic, thank you for the pointer, learned something new today! A unique and explicit representation would be very neat indeed.

JanΩ370

I'm pretty confused here.

Yeah, the feeling's mutual 😅 But the discussion is also very rewarding for me, thank you for engaging!

I am in favor of learning-from-scratch, and I am also in favor of specific designed inductive biases, and I don't think those two things are in opposition to each other.

A couple of thoughts:

  • Yes, I agree that the inductive bias (/genetically hardcoded information) can live in different components: the learning rule, the network architecture, or the initialization of the weights. So learning-from-scratch is logically compatible with
... (read more)
JanΩ590

Here's an operationalization. Suppose someday we write computer code that can do the exact same useful computational things that the neocortex (etc.) does, for the exact same reason. My question is: Might that code look like a learning-from-scratch algorithm?

Hmm, I see. If this is the crux, then I'll put all the remaining nitpicking at the end of my comment and just say: I think I'm on board with your argument. Yes, it seems conceivable to me that a learning-from-scratch program ends up in a (functionally) very similar state to the brain. The trajectory of... (read more)

7Steven Byrnes
Thanks for your interesting comments! I'm pretty confused here. To me, that doesn't seem to support your point, which suggests that one of us is confused, or else I don't understand your point. Specifically: If I switch from a fully-connected DNN to a ConvNet, I'm switching from one learning-from-scratch algorithm to a different learning-from-scratch algorithm. I feel like your perspective is that {inductive biases, non-learning-from-scratch} are a pair that go inexorably together, and you are strongly in favor of both, and I am strongly opposed to both. But that's not right: they don't inexorably go together. The ConvNet example proves it. I am in favor of learning-from-scratch, and I am also in favor of specific designed inductive biases, and I don't think those two things are in opposition to each other. I think you're misunderstanding me. Random chunks of matter do not learn language, but the neocortex does. There's a reason for that—aspects of the neocortex are designed by evolution to do certain computations that result in the useful functionality of learning language (as an example). There is a reason that these particular computations, unlike the computations performed by random chunks of matter, are able to learn language. And this reason can be described in purely computational terms—"such-and-such process performs a kind of search over this particular space, and meanwhile this other process breaks down the syntactic tree using such-and-such algorithm…", I dunno, whatever. The point is, this kind of explanation does not talk about subplates and synapses, it talks about principles of algorithms and computations. Whatever that explanation is, it's a thing that we can turn into a design spec for our own algorithms, which, powered by the same engineering principles, will do the same computations, with the same results. In particular, our code will be just as data-efficient as the neocortex is, and it will make the same types of mistakes in the same type
JanΩ8180

Hey Steve! Thanks for writing this, it was an interesting and useful read! After our discussion in the LW comments, I wanted to get a better understanding of your thinking and this sequence is doing the job. Now I feel I can better engage in a technical discussion.

I can sympathize well with your struggle in section 2.6. A lot of the "big picture" neuroscience is in the stage where it's not even wrong. That being said, I don't think you'll find a lot of neuroscientists who nod along with your line of argument without raising objections here and there (neuro... (read more)

Thanks!

I don't think anyone except for Jeff Hawkin believes in literal cortical uniformity.

Not even him! Jeff Hawkins: "Mountcastle’s proposal that there is a common cortical algorithm doesn’t mean there are no variations. He knew that. The issue is how much is common in all cortical regions, and how much is different. The evidence suggests that there is a huge amount of commonality."

I mentioned "non-uniform neural architecture and hyperparameters". I'm inclined to put different layer thicknesses (including agranularity) in the category of "non-uniform hyp... (read more)

Jan30

I enjoyed the footnote a lot :) And the entire story, of course. Thanks for writing!

3lsusr
Thanks. Wordplay based around double-meanings is hard to write in a language which distinguishes between the different meanings.
Load More