GPT-4 is known to pretty good at chess (see I played chess against ChatGPT-4 and lost! for one example). However, GPT-4 does not seem to be very good at strategic reasoning in general (it only really can do it if there is a greedy search algorithm).
I tried Hex and Connect4, it failed at both despite being able to explain the rules and even display the board with ASCII art. I was wondering if maybe it just has bad spatial reasoning, so I tried puzzles in natural language based on logical constraints. It failed these as well unless they were quite simple.
I even made a variant of chess up on the spot where the goal is to get any piece to the bank rank instead of capturing the King. It didn't stop me from "sacking" my queen by moving it to the bank rank as soon as their was a gap. So if it has an internal model of chess, it didn't figure out how to apply it to new objectives.
So I think GPT-4 must've learned a rudimentary chess engine; it is not applying general strategic reasoning to chess.
This doesn't necessarily mean GPT-4 can't be agentic, but it does suggest it is either a narrow one or a dumb one (or it's hiding its abilities).
It's hard to apply general strategic reasoning to anything in a single forward pass, isn't it? If your LLM has to come up with an answer that begins with the next token, you'd better hope the next token is right. IIRC this is the popular explanation for why LLM output seems to be so much better when you just add something like "Let's think step by step" to the prompt.
Is anyone trying to incorporate this effect into LLM training yet? Add an "I'm thinking" and an "I'm done thinking" to the output token set, and only have the main "predict the next token in a way that matches the training data" loss function grade on tokens that aren't in between those brackets. Then when you hit "What is 45235 + 259719? 304954" in the training set, optimization doesn't have to discourage multi-step reasoning to reproduce that, because "<thinking>5+9=14, so we carry the 1" ... "</thinking>304954" is still worth just as much as an ex nihilo "304954". Chess algorithms could do a brief tree search before outputting their final decision.
Add whatever regularization is needed to keep the "train of thought" in English rather than in whatever-cypher-the-optimizer-hits-on, and this would be an increase in safety, not just in capabilities. The more "internal reasoning" is human-readable text rather than maybe-a-posteriori-interpretable activation patterns, the better. You could even expose it to the end user: ask ChatGPT a question and you get a succinct answer, click on the "expose thoughts" button and you get the chain of support for the answer.
This is also basically an idea I had - I actually made a system design and started coding it, but haven't made much progress due to lack of motivation... Seems like it should work, though