User Comment Replies

>This type of paper reading, where I gather tools to engineer with, initially seems less relevant for fundamental concepts research like alignment. However, your general relativity example suggests that Einstein also had a tool gathering phase leading up to relativity, so shrugs.

As an advisor used to remark that working on applications can lead to directions related to more fundamental research. How it can happen is something like this: 1. Try to apply method to domain; 2. Realize shortcomings of method; 3. Find & attempt solutions to address shortc... (read more)

Who models the models that model models? An exploration of GPT-3's in-context model fitting ability

davidl1233yΩ120

Great write-up. Inspired me to try how much further ICL could go beyond "simpler" mappings (OP shows pretty nice results for two linear and two quadratic functions). As such, I tried a damped sinusoid:

with the prompt:

x=3.984, y=6.68
x=2.197, y-2.497
x=0.26, y=-7.561
x=6.025, y=-1.98
x=7.126, y=-4.879
x=8.584, y=-0.894
x=9.97, y=3.403
x=11.1, y=2.45
x=12.09, y=-0.452
x=13.72, y=-2.48
x=14.81, y=-0.606
x=10, y=

but didn't get any luck. Maybe I need more points, especially around the troughs and valleys.

We Are Conjecture, A New Alignment Research Startup

davidl1233y10

Is Conjecture open to the idea of funding PhD fellowships for research in alignment and related topics? I think society will look back and see work in alignment as being very crucial in getting machines (which are growing impressively more intelligent quite quickly) to cooperate with humans.

Excited to hear that some at EleutherAI are working on alignment next (GPT-J & -Neo work were quite awesome).

Can you get AGI from a Transformer?

davidl1234y50

I'm going slightly off-topic but couldn't help but notice that your website says that you're doing this in your spare time. I'm surprised that you've covered so much ground. If you don't mind me the question -- how do you keep abreast of the AI field with so many papers published every year? Like do you attend periodic meet-ups in your circle of friends/colleagues to discuss such matters? do you opt to read summaries of papers instead of the long paper?

3Steven Byrnes4y

Oh, there are infinity papers on AI per month. I couldn't dream of reading them all. Does anyone? I think I'm at least vaguely aware of what the ML people are all talking about and excited about, through twitter. Mainstream ML papers tend to be pretty low on my priority list compared to computational neuroscience, and neuroscience in general, and of course AGI safety and strategy. Learning is easier when you have specific questions you're desperately trying to answer :-) Beyond that, I dunno. I don't watch TV, and I quit my other hobbies to clear more time for this one. It is a bit exhausting sometimes. Maybe I should take a break. Oh but I do really want to write up that next blog post! And the one after that... :-)

Can you get AGI from a Transformer?

davidl1234y*30

It's all about mashing together compositional generative models. Like: "I need to put this book into my bag. Will it fit?" Well, you have a generative model of all the ways that the book can be oriented, and you have a generative model of all the ways that the bag can be reshaped and that its current contents can be shuffled around, and you try to mix and match all those models until you fit them together into a plausible composite model wherein the book slides easily into the bag. Then you reshape the bag, shuffle the contents, and orient the book, and it

... (read more)

3Steven Byrnes4y

Well, I think it's premature to say what is or isn't important for an AGI until we build one. Yeah I agree that capsule networks capture part of that, even if it's not the whole story. Sure, I was talking about digging into the gory details of the neocortical algorithm. What do the layer 2 neurons calculate and how? That kind of thing. Plenty of people are doing that all the time, of course, and making rapid progress in my opinion. I find that fact a bit nerve-wracking, but oh well, what can you do? Hope for the best, I suppose, and meanwhile work as fast as possible on the "what if we succeed" question. I mean, I do actually have idiosyncratic opinions about what layer 2 neurons are calculating and how, or whatever, but wouldn't think to blog about them, on the off chance that I'm actually right. :-P Bigger-picture thinking, like you're talking about, is more likely to be a good thing, I figure, although the details matter. Like, creating common knowledge that a certain path will imminently lead to AGI could lead to a frantic race between teams around the world where safety gets thrown out the window. But some big-picture knowledge is necessary for "what if we succeed". Of course I blog on big-picture stuff myself. I think I'm pushing things in the right direction, but who knows :-/

Can you get AGI from a Transformer?

davidl1234y*30

Yes, I also think that memory and generative models could be “different forms” of the same thing. A generative model seems like compressed memory. Perhaps memory to a biological organism could be like short-term memory (representations being focused on (attention) and recent history. Contents readily retrievable). And generative models to a biological organism could be like long-term memory (effort needed in retrieving compressed contents). However, a machine with large memory capacities might have less need of generative models solely for the sake of memo... (read more)

3Steven Byrnes4y

Oh yeah, definitely, and also planning, reasoning, and so on. It's all about mashing together compositional generative models. Like: "I need to put this book into my bag. Will it fit?" Well, you have a generative model of all the ways that the book can be oriented, and you have a generative model of all the ways that the bag can be reshaped and that its current contents can be shuffled around, and you try to mix and match all those models until you fit them together into a plausible composite model wherein the book slides easily into the bag. Then you reshape the bag, shuffle the contents, and orient the book, and it slides in, just like you imagined! No, I haven't had time. Also, I think that safely using AGI systems remains an unsolved problem. If we had a complete neocortex simulator right now, I think we would be able to quickly (years not decades) turn it into an extremely powerful system with superhuman cross-domain reasoning and common sense and a drive to accomplish goals. But we would have only sketchy and unreliable methods to "steer it" towards trying to do what we want it to try to do, or to even know what it's trying to do at any given time. So, such a system would be a very dangerous thing, and it would get rapidly and unpredictably more dangerous as we optimized the hyperparameters and scaled the algorithms etc. And I don't think that having these systems right in front of us would make it much easier to figure out how to reliably control them. (It would make it easier to find unreliable control methods, which work for a while then suddenly fail.) So that's why I'm not inclined to be part of the project to reverse-engineer the neocortex—not until we have a better plan for "what if we succeed". I feel like I understand the neocortex about as well as I need to in order to think about how and whether a future neocortex-like AGI can be controlled, or more generally how to make sure it's really a step towards the awesome post-AGI utopia we're all hoping

Can you get AGI from a Transformer?

davidl1234y30

Thanks for writing back.

I asked about the memory and generative models because I feel uncertain about the differences, if any, between storing information in memory versus storing in generative models. Example of storing in memory would be something like a knowledge graph. Example of retrieving info from a generative model would be something like inputing a vector into deconvolutional NN so that it outputs an image (models have capacity making them function like memory). One question on my mind is, are there things that are better suited (or “more naturally”) stored in a generative model versus in memory.

3Steven Byrnes4y

Thanks for explaining where you're coming from. Hmm. For the brain, I disagree that there's a distinction—the generative models, semantic memory, and a knowledge graph are three different descriptions of the same thing. Like, say you know that "if you push the button on the toy, it goes Beep". You could call that part of a knowledge graph—some relationship between a certain toy, its affordance of pressing the button, and the beep sound—but you could also call it a generative model—a little kinda movie in your head, in which you picture the button being pressed, and then you hear the sound Beep. Right? For images, it gets a bit trickier to visualize what's going on, but I think the Dileep George vision model is probably a better starting point than a deconvolutional NN if you're thinking about the brain. You don't normally think of your visual knowledge as organized into a "knowledge graph", but I do think that there is in fact a giant repository of known, um, snippets of aspects of images, with known relations between them—like how known contours fit together into known shapes, and how known subcomponents fit together into known assemblies, etc.—and in this sense your visual memory can in fact be treated as a knowledge graph, and formalized as some elaborate variant of a probabilistic graphical model. By contrast, I don't think I would describe a deconvolutional NN as implicitly encoding a "knowledge graph". I mean, it's a different type of generative model, it's not structured that way, it doesn't seem very knowledge-graph-ish to me...

Can you get AGI from a Transformer?

davidl1234y30

Good day Steve,

This post says, “Since generative models are simpler (less information content) than reverse / discriminative models, they can be learned more quickly.” Is this true? I’ve always had the impression that it’s the opposite. It’s easier to tell the apart, say, cats and dogs (discriminative model) than it is to draw cats and dogs (generative model). Most children first learn to discriminate between different objects before learning how to draw/create/generate them.

Would you have an opinion of how memory and generative models interact? To jog the... (read more)

3Steven Byrnes4y

Did you see the part where I linked that post? Here's the quote I have a 1-sentence take on working memory here. I haven't thought about it beyond that... Probably ... that seems like the kind of thing that people would be trying to make benchmarks for. But I don't know. Well, my claim (and the claim of Jeff Hawkins, Yann LeCun, the “predictive coding” people like Friston, etc.) is that the neocortex is constantly predicting what input signals it will receive next, and updating its models when the predictions are wrong. So when humans tell apart cats and dogs, I would call that generative. ResNet image classifiers are discriminative, but they need a lot of samples to learn to do so, so I'm not sure whether that counts as "easy" / "low-information-content". Drawing on paper is kinda hard for humans (at least for me), but that's not really a fair comparison, the generative part is just imagining a cat you've never seen before, and then the hard part (for me) is copying that imagined image onto a piece of paper. Of course I don't put too much stock in that comparison either. Maybe generative models are hard but feel easy because that's how our brains are designed. Anyway, I feel good about the math or engineering example: it's low-information-content to be able to answer the question "what would happen if I do steps X, Y, Z" and higher information-content to be able to answer the question "what steps do I take in what order to prove the theorem / invent the gadget?". The case of imagery is less obvious, now that you mention it. But it at least seems plausible to me that a probabilistic program to draw any possible eye (combining shape, textures, lighting, certain types of noise, reflections, etc.) has less complexity (fewer bits) than an equally good eye-detecting discriminative model. Note that our discriminative models (like ResNet image classifiers) are not usually "equally good", because they are more prone to failing out-of-distribution—e.g. they will incorrect

LESSWRONG
LW

All of davidl123's Comments + Replies