Apparently with reflection technique (answer-critique-improve) GTP4 capable of giving much better answers. But that implies it should be capable of doing essentially Alpha Go Zero type of learning! It can't do complete self play from zero as there is no ground truth for it to learn from, but that basically burns all hopes of having further progress bottlenecked by data. Also, while still severely lacking, it constitutes limited self-improvement capability. Not only we have unclear distinction of GTP4 being an AGI or not, but it also have some slight self improvement capabilities! We really are boiling frog, aren't we?
I am not AI researcher. (I have a feeling you may have mistaken me for one?)
I don't expect much gain after first iteration of reflection (don't know if it was attempted). When calling it recursive I was referring to Alpha Go Zero style of distillation and amplification: We have model producing Q->A, reflect on A to get A' and update model in direction Q->A'. We got to the state where A' is better than A, before if tried result probably would be distilled stupidity instead of distilled intelligence.
Such process, in my opinion, is significant step in direction of "capable of building new, better AI" worth explicitly noticing before we take such capability for granted.
Only if you apply "distillation and amplification" part, and I hope if you go too hard in absence of some kind of reality anchoring it may go off the rails and result in distilled weirdness. And hopefully you need bigger model anyway.