Wikitag Contributions

Comments

Sorted by

My guess is that he’s referring to the fact that Blackwell offers much larger world sizes than Hopper and this makes LLM training/inference more efficient. Semianalysis has argued something similar here: https://semianalysis.com/2024/12/25/nvidias-christmas-present-gb300-b300-reasoning-inference-amazon-memory-supply-chain

anaguma10

No, at some point you "jump all the way" to AGI, i.e. AI systems that can do any length of task as well as professional humans -- 10 years, 100 years, 1000 years, etc.


Isn’t the quadratic cost of context length a constraint here? Naively you’d expect that acting coherently over 100 years would require 10x the context, and therefore 100x the compute/memory, than 10 years. 

anaguma53

I would guess that the reason it hasn’t devolved into full neuralese is because there is a KL divergence penalty, similar to how RHLF works.

anaguma10

I gave the model both the PGN and the FEN on every move with this in mind. Why do you think conditioning on high level games would help? I can see why for the base models, but I expect that the RLHFed models would try to play the moves which maximize their chances of winning, with or without such prompting.

anaguma30

Do you know if there are scaling laws for DLGNs?

anaguma10

“Let's play a game of chess. I'll be white, you will be black. On each move, I'll provide you my move, and the board state in FEN and PGN notation. Respond with only your move.”

anaguma20

GPT 4.5 is a very tricky model to play chess against. It tricked me in the opening and was much better, then I managed to recover and reach a winning endgame. And then it tried to trick me again by suggesting illegal moves which would lead to it being winning again!

anaguma30

How large of an advantage do you think OA gets relative to its competitors from Stargate?

anaguma10

This is interesting. Can you say more about these experiments?

anaguma51

How does Anthropic and XAi’s compute compare over this period?

Load More