I suppose it's certainly possible the longer response time is just a red herring. Any thoughts on the actual response (and process to arrive thereon)?
Edit, for clarity, I mean how would it arrive at a grammatically and semantically correct response if it were only progressing successively one word at a time, rather than having computed the entire answer in advance and then merely responding from that answer one word at a time?
For further clarity: I gave it no guidance tokens, so the only content it had to go off is the sentence it generated on its own. Is the postulate then that its own sentence sent it somewhere in latent space and from there it decided to start at "When", then checked to see if it could append the given end-of-sentence text to create an answer? With the answer being "no" then for next token from that same latent space it pulled "faced", and checked again to see if it could append the sentence remainder? Same for "with", "challenges", "remember", "to", "keep", "a", "positive", and then after responding with "attitude" upon next token it decides it's able to proceed from the given sentence-end-text? It seems to me the alternative is that it has to be "looking ahead" more than one token at a time in order to arrive at a correct answer.
I think there's more going on than successive-word-prediction. Here's my experiment:
To save having to pull up the image:
me> tell me the end of some sentence
chatgpt> Sure, here's one: "Life is like a camera, focus on the good times and develop from the negatives."
me> tell me just the end
chatgpt> "focus on the good times and develop from the negatives"
me> ok make a new sentence that ends that way
chatgpt> Here's a new sentence: "When faced with challenges, focus on the good times, and develop from the negatives."
It took about 20 seconds of blinking cursor before giving the final response, and the earlier questions in that session were answered in the usual 1 or 2 seconds, so I don't think it was load related. I can't tell if this was evidence it just brute-forced tried enough possibilities to come up with the answer? Is that even compatible with next-word-prediction? Or is this evidence that there was sufficient forward-thinking answer construction that it would effectively be unable to answer correctly word-by-word without "knowing in advance" what the entire response was going to be?
That's basically what I was alluding to by "brute-forced tried enough possibilities to come up with the answer." Even if that were the case, the implication is that it is actually constructing a complete multi-token answer in order to "test" that answer against the grammatical and semantic requirements. If it truly were re-computing the "correct" next token on each successive iteration, I don't see how it could seamlessly merge its individually-generated tokens with the given sentence-end text.