You can call it a "gut claim" if that makes you feel better. But the actual reason is I did some very simple math (about the window size required and given quadratic scaling for transformers) and concluded that practically speaking it was impossible.
If you're talking about this:
...Now imagine trying to implement a serious backtracking algorithm. Stockfish checks millions of positions per turn of play. The attention window for your "backtracking transformer" is going to have to be at lease {size of chess board state}*{number of positions eval
Have you never figured out something by yourself? The way I learned to do Sudoku was: I was given a book of Sudoku puzzles and told "have fun".
So few shot + scratchpad ?
I didn't say it was impossible to train an LLM to play Chess. I said it was impossible for an LLM to teach itself to play a game of similar difficulty to chess if that game is not in it's training data.
More gut claims.
What they do not do is teach themselves things that aren't in their training data via trial-and-error. Which is the primary way humans learn things
Setting up...
sure. 4000 words (~8000 tokens) to do a 9-state 9-turn game with the entire strategy written out by a human.
Ok? That's how you teach anybody anything.
Now extrapolate that to chess, go, or any serious game.
LLMs can play chess, poker just fine. gpt 3.5-turbo-instruct plays at about 1800 Elo, consistently making legal moves. - https://github.com/adamkarvonen/chess_gpt_eval
Then there is this grandmaster level chess transformer - https://arxiv.org/abs/2402.04494
Poker - https://arxiv.org/abs/2308.12466
...And this doesn't address at all my actual point,
GPT-4 can play tic-tac-toe
https://chat.openai.com/share/75758e5e-d228-420f-9138-7bff47f2e12d
Not sure what you mean by 100 percent accuracy and of course, you probably already know this but 3.5 Instruct Turbo plays chess at about 1800 ELO fulfilling your constraints (and has about 5 illegal moves (potentially less) in 8205) https://github.com/adamkarvonen/chess_gpt_eval
They can compute a state prior to each generated token and they can choose a token that signal a preservation of this state.
They had access to and tested the base un-RLHF'd model. Doesn't change much. RLHF has slightly higher misalignment and deception rates(which is a bit notable) but otherwise similar behavior.
Optimal tic tac toe takes explaining the game in excruciating detail. https://chat.openai.com/share/75758e5e-d228-420f-9138-7bff47f2e12d
Optimal play requires explaining the game in detail. See here
https://chat.openai.com/share/75758e5e-d228-420f-9138-7bff47f2e12d
I don't understand your position. Are you saying that if we generated protein sequences by uniformly randomly independently picking letters from "ILVFMCAGPTSYWQNHEDKR" to sample strings, and then trained an LLM to predict those uniform random strings, it would end up with internal structure representing how biology works? Because that's obviously wrong to me and I don't see why you'd believe it.
Ah no. I misunderstood you here. You're right.
**What I was trying to get at the notion that something in particular (Human, evolution etc) has to have "figured s...
>They find functions that fit the results. Most such functions are simple and therefore generalize well. But that doesn't mean they generalize arbitrarily well.
You have no idea how simple the functions they are learning are.
>Not really any different from the human language LLM, it's just trained on stuff evolution has figured out rather than stuff humans have figured out. This wouldn't work if you used random protein sequences instead of evolved ones.
It would work just fine. The model would predict random arbitrary sequences and the structure w...
Large language models gain their capabilities from self-supervised learning on humans performing activities, or from reinforcement learning from human feedback about how to achieve things, or from internalizing its human-approved knowledge into its motivation. In all of these cases, you rely on humans figuring out how to do stuff, in order to make the AI able to do stuff, so it is of course logical that this would tightly integrated capabilities and alignment in the way Simplicia says.
No. Language Models aren't relying on humans figuring anything out. How ...
Not really. The majority of your experiences and interactions are forgotten and discarded, the few that aren't are recalled and triggered by the right input when necessary and not just sitting there in your awareness at all times. Those memories are also modified at every recall.
And that's really just beside the point. However you want to spin it, evaluating that many positions is not necessary for backtracking or playing chess. If that's the base of your "impossible" rhetoric then it's a poor one.