I suppose it's certainly possible the longer response time is just a red herring. Any thoughts on the actual response (and process to arrive thereon)?
Edit, for clarity, I mean how would it arrive at a grammatically and semantically correct response if it were only progressing successively one word at a time, rather than having computed the entire answer in advance and then merely responding from that answer one word at a time?
For further clarity: I gave it no guidance tokens, so the only content it had to go off is the sentence it generated on its ow...
I think there's more going on than successive-word-prediction. Here's my experiment:
To save having to pull up the image:
me> tell me the end of some sentence
chatgpt> Sure, here's one: "Life is like a camera, focus on the good times and develop from the negatives."
me> tell me just the end
chatgpt> "focus on the good times and develop from the negatives"
me> ok make a new sentence that ends that way
chatgpt> Here's a new sentence: "When faced with challenges, focus on the good times, and develop from the negatives."
It t...
That's basically what I was alluding to by "brute-forced tried enough possibilities to come up with the answer." Even if that were the case, the implication is that it is actually constructing a complete multi-token answer in order to "test" that answer against the grammatical and semantic requirements. If it truly were re-computing the "correct" next token on each successive iteration, I don't see how it could seamlessly merge its individually-generated tokens with the given sentence-end text.