All of Jonathan Marcus's Comments + Replies

Which to me seems like a huge improvement to its capabilities

This was actually my position when I started writing this post. My instincts told me that "thinking out loud" was a big enhancement to its capabilities. But then I started thinking about what I saw. I watched it spend tens of trillions of FLOPs to write out, in English, how to do a 3x3 matrix multiplication. It was so colossally inefficient, like building a humanoid robot and teaching it to use an abacus.

Then again, your analogy to humans is valid. We do a huge amount of processing internally, an... (read more)

2Kaj_Sotala
There's also the case where it's allowed to call other services that are more optimized for the specific use case in question, such as querying Wolfram Alpha:

Okay, you raise a very good point. To analogize to my own brain: it's like noticing that I can multiply integers 1-20 in my head in one step, but for larger numbers I need to write it out. Does that mean that my neural net can do multiplication? Well, as you say, it depends on n.

it's easy to imagine a huge LLM capable doing 500 iterations of SH1 of small strings in one shot

Nitpick: for SHA1 (and any other cryptographic hash functions) I can't fathom how an LLM could learn it through SGD, as opposed to being hand coded. To do SHA1 correctly you need to impl... (read more)

I think your definition of LLM is the common one. For example, https://www.lesswrong.com/posts/KJRBb43nDxk6mwLcR/ai-doom-from-an-llm-plateau-ist-perspective is on the front page right now, and it uses LLM to refer to a big neural net, in a transformer topology, trained with a lot of data. This is how I was intending to use it as well. Note the difference between "language model" as Christopher King used it, and "large language model" as I am. I plan to keep using LLM for now, especially as GPT refers to OpenAI's product and not the general class of things.

Thank you for trying so hard to replicate this little experiment of mine.

Perhaps you sent the prompt in the middle of a conversation rather than at the beginning? If the same list was also sent earlier in the conversation, I can imagine it managed to get the answer right because it had more time to 'take in' the numbers, or otherwise establish a context that guided it to the right answer.

Yes, this is exactly what I did. I made sure to use a new list of numbers for each question – I'd noticed that it would remember previous answers if I didn't – but I didn'... (read more)

This makes sense. Another thought I had was that I picked 50 numbers 1-100. It could've just gotten lucky. If I were to do this again, I'd do 1-1000 to decrease the odds of this.

Oh this is funny. It told me that it ran the code and got the answer [64, 91, 39, 47]. I checked that these satisfied the problem. But I didn't check (until reviewing other comments) whether that's actually what the code outputted. It's not. Technically the code actually doesn't output anything, it saves the result to a variable instead. And if I print that variable, it found [64, 6, 96, 75].

Lesson 1: I was not careful enough in checking its output, even when I thought I was being careful.

Lesson 2: It is indeed not running code, even if it tells me it is.

5Kshitij Sachan
I am confused how it got the answer correct without running code?

Thanks for the deep and thoughtful comment. I hadn't considered that "In general, you can always get a bigger and bigger constant factor to solve problems with higher n."

I'm going to think carefully and then see how this changes my thinking. I'll try to reply again soon.

0Rudi C
This doesn’t matter much, as the constant factor needed still grows as fast as the asymptotic bound. GPT does not have a big enough constant factor. (This objection has always been true of asymptotic bounds.)

Thanks, this is exactly the kind of feedback I was hoping for. 

Nomenclature-wise: I was using LLM to mean "deep neural nets in the style of GPT-3" but I should be more precise. Do you know of a good term for what I meant?

More generally, I should learn about other styles of LLM. I've gotten some good leads from these comments and some DMs.

3Christopher King
Hmm, how about GPT (generative pre-trained transformer)?

Interesting! Yes, I am using ChatGPT with GPT-4. It printed out the code, then *told me that it ran it*,  then printed out a correct answer. I didn't think to fact-check it; instead I assumed the OpenAI has been adding some impressive/scary new features.

8Jonathan Marcus
Oh this is funny. It told me that it ran the code and got the answer [64, 91, 39, 47]. I checked that these satisfied the problem. But I didn't check (until reviewing other comments) whether that's actually what the code outputted. It's not. Technically the code actually doesn't output anything, it saves the result to a variable instead. And if I print that variable, it found [64, 6, 96, 75]. Lesson 1: I was not careful enough in checking its output, even when I thought I was being careful. Lesson 2: It is indeed not running code, even if it tells me it is.