All of Bill Benzon's Comments + Replies

Beating benchmarks, even very difficult ones, is all find and dandy, but we must remember that those tests, no matter how difficult, are at best only a limited measure of human ability. Why? Because they present the test-take with a well-defined situation to which they must respond. Life isn't like that. It's messy and murky. Perhaps the most difficult step is to wade into the mess and the murk and impose a structure on it – perhaps by simply asking a question – so that one can then set about dealing with that situation in terms of the imposed structure. T... (read more)

Yes, the matching of "mental content" between one mind and another is perhaps the central issue in semantics. You might want to take a look at Warglien and Gärdenfors, Semantics, conceptual spaces, and the meeting of minds:

Abstract: We present an account of semantics that is not construed as a mapping of language to the world but rather as a mapping between individual meaning spaces. The meanings of linguistic entities are established via a “meeting of minds.” The concepts in the minds of communicating individuals are modeled as convex regions in conceptua

... (read more)

YES. 

At the moment the A.I. world is dominated by an almost magical believe in large language models. Yes, they are marvelous, a very powerful technology. By all means, let's understand and develop them. But they aren't the way, the truth and the light. They're just a very powerful and important technology. Heavy investment in them has an opportunity cost, less money to invest in other architectures and ideas. 

And I'm not just talking about software, chips, and infrastructure. I'm talking about education and training. It's not good to have a whol... (read more)

Whatever one means by "memorize" is by no means self-evident. If you prompt ChatGPT with "To be, or not to be," it will return the whole soliloquy. Sometimes. Other times it will give you an opening chunk and then an explanation that that's the well known soliloquy, etc. By poking around I discovered that I could elicit the soliloquy by giving it prompts that consisting of syntactically coherent phrases, but if I gave it prompts that were not syntactically coherent, it didn't recognize the source, that is, until a bit more prompting. I've never found the i... (read more)

I was assuming lots of places widely spread. What I was curious about was a specific connection in the available data between the terms I used in my prompts and the levels of language. gwern's comment satisfies that concern.

By labeled data I simply mean that children's stories are likely to be identified as such in the data. Children's books are identified as children's books. Otherwise, how is the model to "know" what language is appropriate for children? Without some link between the language and a certain class of people it's just more text. My prompt specifies 5-year olds. How does the model connect that prompt with a specific kind of language?

Of course, but it does need to know what a definition is. There are certainly lots of dictionaries on the web. I'm willing to assume that some of them made it into the training data. And it needs to know that people of different ages use language at different levels of detail and abstraction. I think that requires labeled data, like children's stories labeled as such.

1metachirality
It doesn't and the developers don't label the data. The LLM learns that these categories exist during training because they can and it helps minimize the loss function.

"Everyone" has known about holography since "forever." That's not the point of the article. Yevick's point is that there are two very different kinds of objects in the world and two very different kinds of computing regimes. One regime is well-suited for one kind of object while the other is well-suited for the other kind of object. Early AI tried to solve all problems with one kind of computing. Current AI is trying to solve all problems with a different kind of computing. If Yevick was right, then both approaches are inadequate. She may have been on to something and she may not have been. But as far as I know, no one has followed up on her insight. 

First I should say that I have little interest in the Frankenstein approach to AI, that is, AI as autonomous agents. I'm much more attracted to AI as intelligence augmentation (as advocated by Stanford's Michael Jordan). For the most part I've been treating ChatGPT as an object of research and so my interactions have been motivated by having it do things that give me clues about how it works, perhaps distant clues, but clues nonetheless. But I do other things with it, and on a few occasions I've gotten into a zone where some very interesting interactive st... (read more)

Thanks, I'll check it out.

I listened off and on to much of the interview, while also playing solitaire (why I do that I do not know, but I do), but I paid close attention at two points during the talk about GPT-4, once following about 46:00 where Altman was talking about using it as a brainstorming partner and later at about 55:00 where Fridman mentioned collaboration and said: "I'm not sure where the magic is if it's in here [gestures to his head] or if it's in there [points toward the table] or if it's somewhere in between." I've been in a kind of magical collaborative zone with ... (read more)

2trevor
Have you read Janus's Cyborgism post? It looks like you'd be pretty interested.

Interesting. #4 looks like a hallucination.

Thanks.

2Radford Neal
These ideas weren't unfamiliar to Hinton.  For example, see the following paper on "Holographic Reduced Representations" by a PhD student of his from 1991: https://www.ijcai.org/Proceedings/91-1/Papers/006.pdf

I thought some more about your comment and decided to try again, this time retaining the medieval setting. Here's what happened. My prompts are in bold-face.

_________

I’m going to tell you a short story from the Middle Ages. After I tell you the story, I’m going to ask you a question. Here’s the story:

It is New Year’s Eve at King Arthur’s court. The knights are gathered at the round table, prepared for a holiday meal. But before the meal begins, tradition dictates that one knight must stand up and tell a tale of daring and adventure. Arthur asks for a volun... (read more)

Well, OK. I know about the chivalric code, etc. For that matter, I've published an article about the poem, though not about the beheading game. I was interested in the exchanges that take place in the 4th part of the poem. But that fact that Gawain was bound by a code of honor which simply didn't exist in the West isn't what interests me. If it interests you, read the O'Neill article I link to in the OP. That's what he discusses and his discussion is a very interesting one.

What interests me is that any reasonable adult who hears that challenge, no matter w... (read more)

Thanks. That is, your prompt directed it to think first, and answer. Mine didn't do that. It seems that it needs to be told. Very interesting.

Though it's a bit beyond me, those folks are doing some interesting work. Here's an informal introduction from Jan. 27, 2023: Bob Coecke, Vincent Wang-Mascianica, Jonathon Liu, Our quest for finding the universality of language.

Memory needs to be developed. The ability to develop memory didn't disappear with the advent of writing, though some of the motivation may have. Still, the ancient Greeks and Romans developed a technique for memorizing long strings of pretty much anything. It's generally known as the method of loci and it continues in use to this day.  Here's the opening of the Wikipedia entry:

The method of loci is a strategy for memory enhancement, which uses visualizations of familiar spatial environments in order to enhance the recall of information. The method of

... (read more)

Thanks for catching the broken link. It's now fixed.

Beyond that, good lord! I know that it's not a good definition of tragedy; I pointed that out in my introductory remarks. This is not about what tragedy is. It's about whether or not ChatGPT can apply a simple definition to simple examples. It did that. 

On the other hand, I suppose I could dock it some points for getting overly chatty, as in its response in Trial Two, but I think that would be asking too much of it. I don't know what OpenAI had in mind during the fine-tuning and RLHFing, but the result is a somewhat pointlessly helpful busybody of a Chatbot. 

Since it got all six correct, it's doing pretty good already.

Interesting, yes. Sure. But keep in mind that what I was up to in that paper is much simpler. I wasn't really interested in organizing my tag list. That's just a long list that I had available to me. I just wanted to see how ChatGPT would deal with the task of coming up with organizing categories. Could it do it at all? If so, would its suggestions be reasonable ones? Further, since I didn't know what it would do, I decided to start first with a shorter list. It was only when I'd determined that it could do the task in a reasonable way with the shorter lis... (read more)

I don't know what these mean: "sort a list of 655 topics into a linear order," "sorting along a single axis." The lists I'm talking about are already in alphabetical order.  The idea is to come up with a set of categories which you can use to organize the list in thematically coherent sub lists. It's like you have a library of 1000 books. How are you going to put them on shelves? You could group them alphabetically by title or author's (last) name. Or you could group them by subject matter. In doing this you know what the subjects are have a sense of ... (read more)

3gwern
You could do it by embedding the text of each post, and then averaging all the embeddings of each tag's posts into a single 'tag embedding', which summarizes the gestalt of all the posts with a given tag. Then you could do my sort trick, or just use a standard clustering algorithm to cluster the tags into 12 clusters, and ask GPT to label each cluster using the list of titles, say. This would address your points about GPT being unable to 'plan' or being misled by idiosyncratic uses of words like 'jasmine'. It would also produce a much more even distribution over the 12 clusters, unless there truly was an extremely skewed distribution (as well as the other advantages I mentioned like not forgetting or skipping any entries or confusing item counts or whatever).

I don't know quite how to respond to that. Without having read the piece that took me, I don't know, say 30-40 hours to write spread over two or three weeks (including the hour or so I spent with ChatGPT), you're telling me that it couldn't possibly be worth more than a tweet. How do you know that? Have you thought about what the task involves? If you had a list of 50 topics to organize, how would you do it manually? What about 655 topics? How would you do that manually? 

How would you do it using a computer? Sure, given well defined items and clear so... (read more)

4gwern
If I was going to sort a list of 655 topics into a linear order and I didn't have a well-defined hierarchy or pre-existing list to work from, I might use one of two approaches: 1. for manual sorting along a single axis, I can probably not give any sort of cardinal value but I can do comparisons of the form 'A is more/less X than is B'. Then I can use my resorter utility to lighten the burden of an obvious approach like trying to herd them all in a spreadsheet or text buffer. 2. if I prefer to automate it, I can embed them (presumably they have text descriptions or titles, or even abstracts if they are things like papers or URLs) with a neural net and then I can 'sort them' by simply picking one to start with, and then finding the 'nearest' by embedding, and repeating until they are all in a giant list. I call this 'sort by magic' or 'sorting by semantic similarity'. (Note that this embedding approach, while a lot more work up front than simply tossing a list into a ChatGPT text book, has many advantages beyond just producing the list: it avoids any issues with GPT-4 hallucinating, forgetting, being very expensive to call on long lists, lists not fitting in context, the API being down, etc.) This produces some interesting effects: because such lists have contents that naturally cluster, reading through the sorted list will often reveal 'obvious' clusters as the list transitions from cluster to cluster. There is no a priori way to decide how many clusters there 'actually' are, but I found that roughly, k = sqrt(n) worked well to pick out a reasonably evenly populated & meaningful set of clusters. Once you have defined k and they are picked out like that, it's easy to grab a cluster and make it a sublist, for example, and to give it a name. (In fact, I even have a feature where I feed a cluster into GPT-4 as a list, and ask it for a descriptive name.) Or you can start editing it to fix up problems, or you can specify where to start to get

Well, when Walter Freeman was working on the olfactory cortex of rodents he was using a surface mounted 8x8 matrix of electrodes. I assume that measured in millimeters. In his 1999 paper Consciousness, Intentionality, and Causality (paragraphs 36 - 43) a hemisphere-wide global operator (42): 

I propose that the globally coherent activity, which is an order parameter, may be an objective correlate of awareness through preafference, comprising expectation and attention, which are based in prior proprioceptive and exteroceptive feedback of the sensory con

... (read more)

ryan_greenblatt – By mech interp I mean "A subfield of interpretability that uses bottom-up or reverse engineering approaches, generally by corresponding low-level components such as circuits or neurons to components of human-understandable algorithms and then working upward to build an overall understanding."

That makes sense to me, and I think it is essential that we identify those low-level components. But I’ve got problems with the “working upward” part. 

The low-level components of a gothic cathedral, for example, consist of things like stone block... (read more)

I've lost the thread entirely. Where have I ever said or implied that odors are not location specific or that anything else is not location specific. And how specific are you about location? Are we talking about centimeters (or more), millimeters, individual cortical columns?

What's so obscure about the idea that consciousness is a process that can take place pretty much anywhere, though maybe its confined to interaction within the cortex and between subcortical areas, I've not given that one much thought. BTW, I take my conception of consciousness from William Powers, who didn't speculation about its location in the brain.

1Ilio
Nothing at all. I’m big fan of these kind of ideas and I’d love to present yours to some friends, but I’m afraid they’ll get dismissive if I can’t translate your thoughts into their usual frame of reference. But I get you didn’t work this aspect specifically, there’s many fields in cognitive sciences. About how much specificity, it’s up to interpretation. A (1k by 1k by frame by cell type by density) tensor representing the cortical columns within the granular cortices is indeed a promising interpretation, although it’d probably be short of an extrapyramidal tensor (and maybe an agranular one).

"You said: what matters is temporal dynamics"

You mean this: "We're not talking about some specific location or space in the brain; we're talking about a process."

If so, all I meant was a process that can take place pretty much anywhere. Consciousness can pretty much 'float' to wherever its needed.

Since you asked for more, why not this: Direct Brain-to-Brain Thought Transfer: A High Tech Fantasy that Won't Work.

1Ilio
You mean there’s some key difference in meaning between your original formulation and my reformulation? Care to elaborate and formulate some specific prediction? As an example, I once gave a try at interpreting data from olfactory system for a friend who were wondering if we could find sign of an chaotic attractor. If you ever toy with Lorenz model, one key feature is: you either see the attractor by plotting x vs vs z, or you can see it by plotting one of these variable only vs itself at t+delta vs itself at t+2*delta (for many deltas). In other words, that gives a precise feature you can look for (I didn’t find any, and nowadays it seems accepted that odors are location specific, like every other sense). Do you have a better idea or it’s more or less what you’d have tried?

Is accessing the visual cartesian theater physically different from accessing the visual cortex? Granted, there's a lot of visual cortex, and different regions seem to have different functions. Is the visual cartesian theater some specific region of visual cortex?

I'm not sure what your question about ordering in sensory areas is about.

As for backprop, that gets the distribution done, but that's only part of the problem. In LLMs, for example, it seems that syntactic information is handled in the first few layers of the model. Given the way texts are structu... (read more)

1Ilio
In my view: yes, no. To put some flesh on the bone, my working hypothesis is: what’s conscious is gamma activity within an isocortex connected to the claustrum (because that’s the information which will get selected for the next conscious frame/can be considered as in working memory) You said: what matters is temporal dynamics. I said: why so many maps if what matters is timing? The closer to the input, the more sensory. The closer to the output, the more motor. The closer to the restrictions, the easier to interpret activity as latent space. Is there any regularity that you feel hard to interpret this way? Thanks, I’ll go read. Don’t hesitate to add other links that can help understand your vision.

In a paper I wrote awhile back I cite the late Walter Freeman as arguing that "consciousness arises as discontinuous whole-hemisphere states succeeding one another at a "frame rate" of 6 Hz to 10 Hz" (p. 2). I'm willing to speculate that that's your 'one-shot' refresh rate. BTW, Freeman didn't believe in a Cartesian theater and neither do it; the imagery of the stage 'up there' and the seating area 'back here' is not at all helpful. We're not talking about some specific location or space in the brain; we're talking about a process.

Well, of course, "the dis... (read more)

2Ilio
It’s possible. I don’t think there was relevant human data in Walter Freeman time, so I’m willing to speculate that’s indeed the frame rate in mouse. But I didn’t check the literature he had access to, so just a wild guess. I agree there’s no seating area. I still find the concept of a cartesian theater useful. For exemple, it allows knowing where to plant electrodes if you want to access the visual cartesian theater for rehabilitation purposes. I guess you’d agree that can be helpful. 😉 I have friends who believe that, but they can’t explain why the brain needs that much ordering in the sensory areas. What’s your own take? You know backprop algorithm? That’s a mathematical model for the distributed way. It was recently shown that it produces networks that explains (statistically speaking) most the properties of the BOLD cortical response in our visial systems. So, whatever the biological cortices actually do, it turns equivalent for the « distributed memory » aspect. I wonder if that’s too flattering for connectionism, which mostly stalled until the early breakthrough in computer vision suddenly attract every labs. BTW

Oh, I didn't mean to say imply that using GPUs was sequential, not at all. What I meant was that the connectionist alternative didn't really take off until GPUs were used, making massive parallelism possible. 

Going back to Yevick, in her 1975 paper she often refers to holographic logic as 'one-shot' logic, meaning that the whole identification process takes place in one operation, the illumination of the hologram (i.e. the holographic memory store) by the reference beam. The whole memory 'surface' is searched in one unitary operation.

In an LLM, I'm th... (read more)

2Ilio
A few comments before later. 😉 Thanks for the clarification! I guess you already noticed how research centers in cognitive science seem to have a failure mode over a specific value question: Do we seek excellence at the risk of overfitting funding agency criterion, or do we seek fidelity to our interdisciplinary mission at the risk of compromising growth? I certainly agree that, before the GPUs, the connectionist approach had a very small share of the excellence tokens. But it was already instrumental in providing a common conceptual framework beyond cognitivism. As an example, even the first PCs were enough to run toy examples of double dissociation using networks structured by sensory type rather than by cognitive operation. From a neuropsychological point of view, that was already a key result. And for the neuroscientist in me, toy models like Kohonen maps were already key to make sense of why we need so many short inhibitory neurons in grid-like cortical structures. Like a refresh rate? That would fit the evidence for a 3-7 Hz refresh rate of our cartesian theater, or the way LLMs go through prompt/answer cycles. Do you see other potential uses for this concept? What’s wrong with « the distributed way »?
1Ilio
When I hear « conventional, sequential, computational regime », my understanding is « the way everyone was trying before parallel computation revolutionized computer vision ». What’s your definition so that using GPU feels sequential?

Miriam Lipshutz Yevick was born in 1924 and died in 2018, so we can't ask her these questions. She fled Europe with her family inn 1940 for the same reason many Jews fled Europe and ended up in Hoboken, NJ. Seven years later she got a PhD in math from MIT; she was only the 5th woman to get that degree from MIT. But, as both a woman and a Jew, she had almost no chance of an academic post in 1947. She eventually got an academic gig, but it was at a college oriented toward adult education. Still, she managed to do some remarkable mathematical work.

The two pap... (read more)

1Ilio
Thanks, I didn’t know this perspective on the history of our science. The stories I most heard were indeed more about HH model, Hebb rule, Kohonen map, RL, and then connexionism became deep learning.. …but neural networks did refute that idea! I feel like I’m missing something here, especially since you then mention GPU. Was sequential a typo?

I'll get back to you tomorrow. I don't think it's a matter of going back to the old ways. ANNs are marvelous; they're here to stay. The issue is one of integrating some symbolic ideas. It's not at all clear how that's to be done. If you wish, take a look at this blog post: Miriam Yevick on why both symbols and networks are necessary for artificial minds.

2Ilio
Fascinating paper! I wonder how much they would agree that holography means sparse tensors and convolution, or that the intuitive versus reflexive thinking basically amount to visuo-spatial versus phonological loop. Can’t wait to hear which other idea you’d like to import from this line of thought.

LOL! Plus he's clearly lost in a vast system he can't comprehend. How do you comprehend a complex network of billions upon billions of weights? Is there any way you can get on top of the system to observe its operations, to map them out?

I did a little checking. It's complicated. In 2017 Hassibis published an article entitled "Neuroscience-Inspired Artificial Intelligence" in which he attributes the concept of episodic memory to a review article that Endel Tulving published in 2002, "EPISODIC MEMORY: From Mind to Brain." That article has quite a bit to say about the brain. In the 2002 article Tulving dates the concept to an article he published in 1972. That article is entitled "Episodic and Semantic Memory." As far as I know, while there are precedents – everything can be fobbed off on Pl... (read more)

2Ilio
Well that’s a problem, don’t you think? Yes, as a cognitive neuroscientist myself, you’re right that many within my generation tend to dismiss symbolic approaches. We were students during a winter that many of us thought caused by the over promising and under delivering of the symbolic approach, with Minsky as the main reason for the slow start of neural networks. I bet you have a different perspective. What’s your three best points for changing the view of my generation?

Scott Alexander has started a discussion of the monosemanticity paper over at Astral Codex Ten. In a response to a comment by Hollis Robbins I offered these remarks:

Though it is true, Hollis, that the more sophisticated neuroscientists have long ago given up any idea of a one-to-one relationship between neurons and percepts and concepts (the so-called "grandmother cell") I think that Scott is right that "polysemanticity at the level of words and polysemanticity at the level of neurons are two totally different concepts/ideas."  I think the idea of dis... (read more)

Yeah, he's talking about neuroscience. I get that. But "episodic memory" is a term of art and the idea behind it didn't come from neuroscience. It's quite possible that he just doesn't know the intellectual history and is taking "episodic memory" as a term that's in general use, which it is. But he's also making claims about intellectual history. 

Because he's using that term in that context, I don't know just what claim he's making. Is he also (implicitly) claiming that neuroscience is the source of the idea? If he thinks that, then he's wrong. If he'... (read more)

1Ilio
Your point is « Good AIs should have a working memory, a concept that comes from psychology ». DH point is « Good AIs should have a working memory, and the way to implement it was based on concepts taken from neuroscience ». That’s indeed orthogonal notions, if you will.

My confidence in this project has just gone up. It seems that I now have a collaborator. That is, he's familiar with my work in general and my investigations of ChatGPT in particular, we've had some email correspondence, and a couple of Zoom conversations. During today's conversation we decided to collaborate on a paper on the theme of 'demystifying LLMs.' 

A word of caution. We haven't written the paper yet, so who knows? But all the signs are good. He's an expert on computer vision systems on the faculty of Goethe University in Frankfurt: Visvanathan... (read more)

Yes. It's more about the structure of language and cognition than about the mechanics of the models. The number of parameters and layers and functions assigned to layers shouldn't change things, nor going multi-modal, either. Whatever the mechanics of the mechanics of the models, they have to deal with language as it is, and that's not changing in any appreciable way.

At the beginning of the year I thought a decent model of how LLMs work was 10 years or so out. I’m now thinking it may be five years or less. What do I mean? 

In the days of classical symbolic AI, researchers would use a programming language, often some variety of LISP, but not always, to implement a model of some set of linguistic structures and processes, such as those involved in story understanding and generation, or question answering. I see a similar division of conceptual labor in figuring out what’s going on inside LLMs. In this analogy I see m... (read more)

1Bill Benzon
My confidence in this project has just gone up. It seems that I now have a collaborator. That is, he's familiar with my work in general and my investigations of ChatGPT in particular, we've had some email correspondence, and a couple of Zoom conversations. During today's conversation we decided to collaborate on a paper on the theme of 'demystifying LLMs.'  A word of caution. We haven't written the paper yet, so who knows? But all the signs are good. He's an expert on computer vision systems on the faculty of Goethe University in Frankfurt: Visvanathan Ramesh.  These are my most important papers on ChatGPT: * ChatGPT tells stories, and a note about reverse engineering: A Working Paper * Discursive Competence in ChatGPT, Part 2: Memory for Texts * ChatGPT tells 20 versions of its prototypical story, with a short note on method * ChatGPT's Ontological Landscape: A Working Paper
-2rotatingpaguro
To clarify: do you think in about 5 years we will be able to do such thing to then state of the art big models?

#14: If there have indeed been secret capability gains, so that Altman was not joking about reaching AGI internally (it seems likely that he was joking, though given the stakes, it's probably not the sort of thing to joke about), then the way I read their documents, the board should make that determination:

Fifth, the board determines when we've attained AGI. Again, by AGI we mean a highly autonomous system that outperforms humans at most economically valuable work. Such a system is excluded from IP licenses and other commercial terms with Microsoft, which

... (read more)

Honestly this does seem... possible. A disagreement on whether GPT-5 counts as AGI would have this effect. The most safety minded would go "ok, this is AGI, we can't give it to Microsoft". The more business oriented and less conservative would go "no, this isn't AGI yet, it'll make us a fuckton of money though". There would be conflict. But for example seeing how now everyone might switch to Microsoft and simply rebuild the thing from scratch there, Ilya despairs and decides to do a 180 because at least this way he gets to supervise the work somehow.

But in assertions such as "beagles are dogs" and "eagles are birds" etc. we're moving UP from specific to general, not down.

2Anna Krusenstern
Surely, sorry, I've meant that moving from specific to general, which is corresponds to moving from state, characterized by less entropy to state, with higher entropy. 

And asserting that you saw something is different from asserting what something is. You can do the latter without ever having seen that something yourself, but you know about it because you read it in a book or someone told you about. So it's not semantically equivalent. As you say, it works only as a clause, not as a free-standing sentence.

Load More