LESSWRONG
LW

All of GoteNoSente's Comments + Replies

A Bear Case: My Predictions Regarding AI Progress

Isn't it fairly obvious that the human brain starts with a lot of pretraining just built in by evolution? I know that some people make the argument that the human genome does not contain nearly enough data to make up for the lack of subsequent training data, but I do not have a good intuition for how apparently data efficient an LLM would be that can train on a limited amount of real world training data plus synthetic reasoning traces of a tiny teacher model that has been heavily optimised with massive data and compute (like the genome has). I also don't t... (read more)

6Steven Byrnes20d

I think parts of the brain are non-pretrained learning algorithms, and parts of the brain are not learning algorithms at all, but rather innate reflexes and such. See my post Learning from scratch in the brain for justification.

anaguma's Shortform

GoteNoSente21d20

What prompt did you use? I have also experimented with playing chess against GPT-4.5, and used the following prompt:

"You are Magnus Carlsen. We are playing a chess game. Always answer only with your next move, in algebraic notation. I'll start: 1. e4"

Then I just enter my moves one at a time, in algebraic notation.

In my experience, this yields roughly good club player level of play.

3gwern20d

Given the Superalignment paper describes being trained on PGNs directly, and doesn't mention any kind of 'chat' reformatting or encoding metadata schemes, you could also try writing your games quite directly as PGNs. (And you could see if prompt programming works, since PGNs don't come with Elo metadata but are so small a lot of them should fit in the GPT-4.5 context window of ~100k: does conditioning on finished game with grandmaster-or-better players lead to better gameplay?)

1anaguma21d

“Let's play a game of chess. I'll be white, you will be black. On each move, I'll provide you my move, and the board state in FEN and PGN notation. Respond with only your move.”

Music in the AI World

GoteNoSente7mo63

A world with no human musicians won't happen, unless there is some extinction-level event that at a minimum leads to a new dark age. AI music will not outcompete human music (at least not to the point where the latter is not practised professionally any more), because a large part of the appeal of music is the knowledge that another human made it.

We have a similar situation today in chess. Of course a cellphone can generate chess games that are of higher quality (less errors, awesome positional and tactical play) than those of human world-class players. If... (read more)

"If we go extinct due to misaligned AI, at least nature will continue, right? ... right?"

GoteNoSente10mo8-4

It is not at all clear to me that most of the atoms in a planet could be harnessed for technological structures, or that doing so would be energy efficient. Most of the mass of an earthlike planet is iron, oxygen, silicon and magnesium, and while useful things can be made out of these elements, I would strongly worry that other elements that are needed also in those useful things will run out long before the planet has been disassembled. By historical precedent, I would think that an AI civilization on Earth will ultimately be able to use only a tiny fract... (read more)

Balancing Games

GoteNoSente1y90

In chess, I think there are a few reasons why handicaps are not more broadly used:

Chess in its modern form is a game of European origin, and it is my impression that European cultures have valued "equal starting conditions for everyone" always higher than "similar chances for everyone to get their desired outcome". This might have made use of handicaps less appealing, because with handicaps, the game starts from a position that is essentially known to be lost for one side.
There is no good way to combine handicaps in chess with Elo ratings, making it imposs

... (read more)

3antanaclasis1y

For chess in particular the piece-trading nature of the game also makes piece handicaps pretty huge in impact. Compare to shogi: in shogi having multiple non-pawn pieces handicapped can still be a moderate handicap, whereas multiple non-pawns in chess is basically a predestined loss unless there is a truly gargantuan skill difference. I haven’t played many handicapped chess games, but my rough feel for it is that each successive “step” of handicap in chess is something like 3 times as impactful as the comparable shogi handicap. This makes chess handicaps harder to use as there’s much more risk of over- or under-shooting the appropriate handicap level and ending up with one side being highly likely to win.

2Thomas Kwa1y

Is the gap only 2 stones between best professionals and best computers? A reddit thread from 2 years ago said Shin Jinseo has a losing record getting 2 stones from FineArt, and computers have probably improved since then.

I played the AI box game as the Gatekeeper — and lost

GoteNoSente1y148

Isn't the AI box game (or at least its logical core) played out a million times a day between prisoners and correctional staff, with the prisoners losing almost all the time? Real prison escapes (i.e. inmate escape other than did not return from sanctioned time outside) are in my understanding extremely rare.

4tgb1y

Wait why do you think inmates escaping is extremely rare? Are you just referring to escapes where guards assisted the escape? I work in a hospital system and have received two security alerts in my memory where a prisoner receiving medical treatment ditched their escort and escaped. At least one of those was on the loose for several days. I can also think of multiple escapes from prisons themselves, for example: https://abcnews.go.com/amp/US/danelo-cavalcante-murderer-escaped-pennsylvania-prison-weeks-facing/story?id=104856784 notable since the prisoner was an accused murderer and likely to be dangerous and armed. But there was also another escape from that same jail earlier that year: https://www.dailylocal.com/2024/01/08/case-of-chester-county-inmate-whose-escape-showed-cavalcante-the-way-out-continued/amp/

Skepticism About DeepMind's "Grandmaster-Level" Chess Without Search

GoteNoSente1y97

I think the most important things that are missing in the paper currently are these three points:

1. Comparison to the best Leela Zero networks

2. Testing against strong (maybe IM-level) humans at tournament time controls (or a clear claim that we are talking about blitz elo, since a player who does no explicit tree search does not get better if given more thinking time).

3. Games against traditional chess computers in the low GM/strong IM strength bracket would also be nice to have, although maybe not scientifically compelling. I sometimes do those for fun w... (read more)

The other side of the tidal wave

GoteNoSente1y52

I do not see why any of these things will be devalued in a world with superhuman AI.

At most of the things I do, there are many other humans who are vastly better at doing the same thing than me. For some intellectual activities, there are machines who are vastly better than any human. Neither of these stops humans from enjoying improving their own skills and showing them off to other humans.

For instance, I like to play chess. I consider myself a good player, and yet a grandmaster would beat me 90-95 percent of the time. They, in turn, would lose on average... (read more)

Lying to chess players for alignment

GoteNoSente1y54

As an additional thought regarding computers, it seems to me that participant B could be replaced by a weak computer in order to provide a consistent experimental setting. For instance, Leela Zero running just the current T2 network (no look-ahead) would provide an opponent that is probably at master-level strength and should easily be able to crush most human opponents who are playing unassisted, but would provide a perfectly reproducible and beatable opponent.

1Zane1y

[facepalms] Thanks! That idea did not occur to me and drastically simplifies all of the complicated logistics I was previously having trouble with.

Lying to chess players for alignment

GoteNoSente1y10

I think having access to computer analysis would allow the advisors (both honest and malicious) to provide analysis far better than their normal level of play, and allow the malicious advisors in particular to set very deep traps. The honest advisor, on the other hand, could use the computer analysis to find convincing refutations of any traps the dishonest advisors are likely to set, so I am not sure whether the task of the malicious side becomes harder or easier in that setup. I don't think reporting reasoning is much of a problem here, as a centaur (a c... (read more)

1Zane1y

The problem is that while the human can give some rationalizations as to "ah, this is probably why the computer says it's the best move," it's not the original reasoning that generated those moves as the best option, because that took place inside the engine. Some of the time, looking ahead with computer analysis is enough to reproduce the original reasoning - particularly when it comes to tactics - but sometimes they would just have to guess.

Lying to chess players for alignment

Answer by GoteNoSenteOct 26, 202320

I could be interested in trying this, in any configuration. Preferred time control would be one move per day. My lichess rating is about 2200.

Are the advisors allowed computer assistance, do the dishonest and the honest advisor know who is who in this experiment, and are the advisors allowed to coordinate? I think those parameters would make a large difference potentially in outcome for this type of experiment.

1Zane1y

No computers, because the advisors should be reporting their own reasoning (or, 2/3 of the time, a lie that they claim is their own reasoning.) I would prefer to avoid explicit coordination between the advisors, because the AIs might not have access to each other in the real world, but I'm not sure at the moment whether player A can show the advisors each other's suggestions and ask for critiques. I would prefer not to give either dishonest advisor information on who the other two were, since the real-world AIs probably can't read each other's source code.

Chess as a case study in hidden capabilities in ChatGPT

GoteNoSente1y100

It is possible to play funny games against it, however, if one uses the fact that it is at heart a story telling, human-intent-predicting system. For instance, this here works (human white):

1. e4 e5 2. Ke2 Ke7 3. Ke3 Ke6 4. Kf3 Kf6 5. Kg3 Kg6 6. Kh3 Kh6 7. Nf3 Nf6 8. d4+ Kg6 9. Nxe5# 1-0

1AdamYedidia1y

Oh wow, that is really funny. GPT-4's greatest weakness: the Bongcloud.

Labs should be explicit about why they are building AGI

GoteNoSente1y30

A slight advantage in doing computer security research won't give an entity the ability to take over the internet, by a long shot, especially if it does not have backing by nation state actors. The NSA for instance, as an organisation, has been good at hacking for a long time, and while certainly they can and have done lots of interesting things, they wouldn't be able to take over the world, probably even if they tried and did it with the backing of the full force of the US military.

Indeed, for some computer security problems, even superintelligence might ... (read more)

3localdeity1y

I'm not sure how much of this you already know, but the majority of security vulnerabilities are things like "failure to check the bounds of a buffer" or "failure to sanitize or escape user input before plugging it into a command parser"—dumb mistakes in implementation, in other words. It's much rarer to find a problem in the cryptographic algorithms (although that happens occasionally, like MD5). If we look through OpenSSL's already-fixed vulnerabilities list: https://www.openssl.org/news/vulnerabilities.html Browser page search says that "buffer over" (as in buffer overflow, overrun, or over-read) appears on the page 24 times (although there's double-counting there, as it tends to appear in the title and once or twice in the description). You don't need to be a world-class security researcher to find these security holes; it's more a matter of (a) happening to look in the right place, (b) having some knowledge and creativity and intelligence in figuring out how it could be exploited. The NSA isn't able to run thousands of copies of itself on machines it hacks into, nor use that to quickly create more powerful instances of itself. So that part of world domination is clearly out. But if you mean "the NSA wouldn't be able to do that first step of taking over millions of computers" (which the hypothetical AI would then use to bootstrap)... I disagree. Do you know about Stuxnet? Believed to be written by some combination of the NSA and Mossad. Observe: So the NSA and/or Mossad had all these exploits and were sitting on them. Makes it plausible that, today, they have more that they're sitting on. The Conficker worm itself apparently did take over an estimated "9 million to 15 million" machines. So clearly that is doable.

Paper: LLMs trained on “A is B” fail to learn “B is A”

GoteNoSente2y21

To second a previous reply to this, I would expect this will hold for humans as well.

On top of that, mathematically it is perfectly possible for some function to be easy to learn/compute, but the inverse to be hard. For instance, discrete exponentiation is easy to compute in all groups where multiplication is easy to compute, but the inverse function, the discrete logarithm, is hard enough to base cryptography on it, if one picks a suitable group representation (e.g. point groups of secure elliptic curves, or the group of invertible elements of a large saf... (read more)

Chess as a case study in hidden capabilities in ChatGPT

GoteNoSente2y50

The playing strength of parrotchess seems very uneven, though. On the one hand, if I play it head-on, just trying to play the best chess I can, I would estimate it even higher than 1800, maybe around 2000 when we regard this as blitz. I'm probably roughly somewhere in the 1900s and on a few tries, playing at blitz speed myself, I would say I lost more than I won overall.

On the other hand, trying to play an unconventional but solid opening in order to neutralize its mostly awesome openings and looking out for tactics a bit while keeping the position mostly ... (read more)

2AdamYedidia1y

Good lord, I just played three games against it and it beat me in all three. None of the games were particularly close. That's really something. Thanks to whoever made that parrotchess website!

Intelligence Officials Say U.S. Has Retrieved Craft of Non-Human Origin

GoteNoSente2y20

If high-tech aliens did visit us, it would not seem inconceivable that the drones they would send might contain (or are able to produce prior to landing) robotic exploration units based on some form of nanotechnology that we might mistake for biology and more specifically, for pilots. A very advanced robot need not look like a robot.

I also do not find it too worrisome that we do not see Dyson spheres or a universe converted into computronium. It is possible that the engineering obstacles towards either goal are more formidable than the back-of-the-envelope... (read more)

One implementation of regulatory GPU restrictions

GoteNoSente2y70

A hardware protection mechanism that needs to confirm permission to run by periodically dialing home would, even if restricted to large GPU installations, brick any large scientific computing system or NN deployment that needs to be air-gapped (e.g. because it deals with sensitive personal data, or particularly sensitive commercial secrets, or with classified data). Such regulation also provides whoever controls the green light a kill switch against any large GPU application that runs critical infrastructure. Both points would severely damage national secu... (read more)

3eschatropic2y

The regulation is intended to encourage a stable equilibrium among labs that may willingly follow that regulation for profit-motivated reasons. Extreme threat modeling doesn't suggest ruling out plans that fail against almighty adversaries, it suggests using security mindset: reduce unnecessary load-bearing assumptions in the story you tell about why your system is secure. The proposal is mostly relying on standard cryptographic assumptions, and doesn't seem likely to do worse in expectation than no regulation.

1sanxiyn2y

There is no problem with air gap. Public key cryptography is a wonderful thing. Let there be a license file, which is a signed statement of hardware ID and duration for which license is valid. You need private key to produce a license file, but public key can be used to verify it. Publish a license server which can verify license files and can be run inside air gapped networks. Done.

4porby2y

Yup! Probably don't rely on a completely automated system that only works over the internet for those use cases. There are fairly simple (for bureaucratic definitions of simple) workarounds. The driver doesn't actually need to send a message anywhere, it just needs a token. Airgapped systems can still be given those small cryptographic tokens in a reasonably secure way (if it is possible to use the system in secure way at all), and for systems where this kind of feature is simply not an option, it's probably worth having a separate regulatory path. I bet NVIDIA would be happy to set up some additional market segmentation at the right price. The unstated assumption was that the green light would be controlled by US regulatory entities for hardware sold to US entities. Other countries could have their own agencies, and there would need to be international agreements to stop "jailbroken" hardware from being the default, but I'm primarily concerned about companies under the influence of the US government and its allies anyway (for now, at least). I think there's a meaningful difference between attempts to regulate cryptography and regulating large machine learning deployments; consumers will never interact with the regulatory infrastructure, and the negative externalities are extremely small compared to compromised or banned cryptography.

Contra Yudkowsky on Doom from Foom #2

GoteNoSente2y10

Thanks for the information. I'll try out BT2. Against LazyBot I was just then able to get a draw in a blitz game with 3 seconds increment, which I don't think I could do within a few tries against an opponent of, say, low grandmaster strength (with low grandmaster strength being quite far way away from superhuman still). Since pure policy does not improve with thinking time, I think my chances would be much better at longer time controls. Certainly its lichess rating at slow time controls suggests that T80 is not more than master strength when its human op... (read more)

1Archimedes2y

Yes, clearly the less time the human has, the better Leela will do relatively. One thing to note though is that Lichess Elo isn't completely comparable across different time controls. If you look at the player leaderboard, you can see that the top scores for bullet are ~600 greater than for classical, so scores need to be interpreted in context. Self-Elo inflation is a fair point to bring up and I don't have information on how well it translates.

Contra Yudkowsky on Doom from Foom #2

GoteNoSente2y10

The LC0 pure policy is most certainly not superhuman. To test this, I just had it (network 791556, i.e. standard network of the current LC0 release) play a game against a weak computer opponent (Shredder Chess Online). SCO plays maybe at the level of a strong expert/weak candidate master at rapid chess time controls (but it plays a lot faster, thereby making generation of a test game more convenient than trying to beat policy-only lc0 myself, which I think should be doable). Result was draw, after lc0 first completely outplayed SCO positionally, and then b... (read more)

1Archimedes2y

791556 is nowhere near the strongest network available. It's packaged with lc0 as a nice small net. The BT2 net currently playing at tcec-chess.com is several hundreds of Elo stronger than T79 and likely close to superhuman level, depending on the time control. It's not the very latest and greatest, but it is publicly available for download and should work with the 0.30.0-rc1 pre-release version of lc0 that supports the newer transformer architecture if you want to try it yourself. If you only want completely "official" nets, at least grab one of the latest networks from the main T80 run. I'm not confident that BT2 is strictly superhuman using pure policy but I'm pretty sure it's at least close. LazyBot is a Lichess bot that plays pure policy but uses a T80 net that is likely at least 100 Elo weaker than BT2.

Contra Yudkowsky on Doom from Foom #2

GoteNoSente2y30

I would disagree with the notion that the cost of mastering a world scales with the cost of the world model. For instance, the learning with errors problem has a completely straightforward mathematical description, and yet strong quantum-resistant public-key cryptosystems can be built on it; there is every possibility that even a superintelligence a million years from now will be unable to read a message encrypted today using AES-256 encapsulated using a known Kyber public key with conservatively chosen security parameters.

Similarly, it is not clear to me ... (read more)

4jacob_cannell2y

By world model I specifically meant a model of the world physics. For chess/go this is just a tiny amount of memory to store the board state, and a simple set of rules that are very fast to evaluate. I agree that evaluating the rules of go is a bit more complex than chess, especially in edge cases, but still enormously simpler than evaluating the physics of the real world. I think we probably agree about grokking in NNs but I am doubting that EY would describe that as foom.

Contra Yudkowsky on Doom from Foom #2

GoteNoSente2y10

This seems clearly wrong:

Go is extremely simple: the entire world of Go can be precisely predicted by trivial tiny low depth circuits/programs. This means that the Go predictive capability of a NN model as a function of NN size completely flatlines at an extremely small size. A massive NN like the brain's cortex is mostly wasted for Go, with zero advantage vs the tiny NN AlphaZero uses for predicting the tiny simple world of Go.

Top go-playing programs utilize neural networks, but they are not neural networks. Monte-Carlo Tree Search boosts their playing st... (read more)

3Archimedes2y

I don’t know much about Leela Zero and Katago but I do know that Leela Chess Zero (lc0) without search (pure policy) is near superhuman levels. I’ll see if I can dig up more precise specifics.

5jacob_cannell2y

None of which really contradicts what I said. A general AGI (ala AIXI) requires a predictive world model and a planning system. The compute cost scales with cost of the world model. It takes only a tiny NN to perfectly predict the 'world' of Go. Also, neural networks can and do implement search, and MCT search is rather obviously not the optimal scalable planning algorithm across all worlds/situations. Finally, the real world is not only more complex in terms of size, but also unknown and stochastic, all of which greatly reduces the payoff of MCT style deep tree search planning. If you understand the key of my point and still disagree, why is that small NNs + MCT fail to scale to more complex environments? What is your alternate explanation for why they have not already produced superintelligence - let alone AGI?

All AGI Safety questions welcome (especially basic ones) [April 2023]

GoteNoSente2y100

The machines playing chess and go, are a mixed example. I suck at chess, so the machines better than me have already existed decades ago. But at some moment they accelerated and surpassed the actual experts quite fast. More interestingly, they surpassed the experts in a way more general than the calculator does; if I remember it correctly, the machine that is superhuman at go is very similar to the machine that is superhuman at chess.

I think the story of chess- and Go-playing machines is a bit more nuanced, and that thinking about this is useful when... (read more)

Eliezer Yudkowsky’s Letter in Time Magazine

GoteNoSente2y10

Sure, the AI probably can't use all the mass-energy of the solar system efficiently within the next week or something, but that just means that it's going to want to store that mass-energy for later (...)

If the AI can indeed engineer black-hole powered matter-to-energy converters, it will have so much fuel that the mass stored in human bodies will be a rounding error to it. Indeed, given the size of other easily accessible sources, this would seem to be the case even if it has to resort to more primitive technology and less abundant fuel as its termi... (read more)

Eliezer Yudkowsky’s Letter in Time Magazine

GoteNoSente2y11

At the limits of technology you can just convert any form of matter into energy by dumping it into a small black hole. Small black holes are actually really hot and emit appreciable fractions of their total mass per second through hawking radiation, so if you start a small black hole by concentrating lasers in a region of space, and you then feed it matter with a particle accelerator, you have essentially a perfect matter -> energy conversion. This is all to say that a superintelligence would certainly have uses for the kinds of atoms our bodies (and th

... (read more)

2Razied2y

Sure, the AI probably can't use all the mass-energy of the solar system efficiently within the next week or something, but that just means that it's going to want to store that mass-energy for later (saving up for the heat-death of the universe), and the configuration of atoms efficiently stored for future energy conversion doesn't look at all like humans, with our wasteful bodies at temperatures measured in the hundreds of billions of nanoKelvins. I think we're imagining slightly different things by "superintelligence", because in my mind the obvious first move of the superAI is to kill literally all humans before we ever become aware that such an entity existed, precisely to avoid even the minute chance that humanity is able to fight back in this way. The oft-quoted way around these parts that the AI can kill us all without us knowing is by figuring out which DNA sequences to send to a lab to have them synthesized into proteins, then shipped to the door of a dumb human who's being manipulated by the AI to mix various powders together, creating either a virus much more lethal than anything we've ever seen, or a new species of bacteria with diamond skin, or some other thing that can be made from DNA-coded proteins. Or a variety of multiple viruses at the same time.

Eliezer Yudkowsky’s Letter in Time Magazine

GoteNoSente2y3-9

It is worth noting that there are entire branches of science that are built around the assumption that intelligence is of zero utility for some important classes of problems. For instance, cryptographers build algorithms that are supposed to be secure against all adversaries, including superintelligences. Roughly speaking, one hopes (albeit without hard proof) for instance that the AES is secure (at least in the standard setting of single-key attacks) against all algorithms with a time-memory-data tradeoff significantly better than well-optimized exhaustiv... (read more)

Razied2y139

It seems likely that it would only have use for some kinds of atoms and not very unlikely that the atoms that human bodies are made of would not be very useful to it.

At the limits of technology you can just convert any form of matter into energy by dumping it into a small black hole. Small black holes are actually really hot and emit appreciable fractions of their total mass per second through hawking radiation, so if you start a small black hole by concentrating lasers in a region of space, and you then feed it matter with a particle accelerator, you have... (read more)

-7[anonymous]2y

A chess game against GPT-4

GoteNoSente2y10

That is odd. I certainly had a much, much higher completion rate than 1 in 40; in fact I had no games that I had to abandon with my prompt. However, I played manually, and played well enough that it mostly did not survive beyond move 30 (although my collection has a blindfold game that went beyond move 50), and checked at every turn that it reproduced the game history correctly, reprompting if that was not the case. Also, for GPT3.5 I supplied it with the narrative fiction that it could access Stockfish. Mentioning Stockfish might push it towards more prec... (read more)

Is GPT-N bounded by human capabilities? No.

GoteNoSente2y10

It seems to me that there are a couple of other reasons why LLMs might develop capabilities that go beyond the training set:

1. It could be that individual humans make random errors due to the "temperature" of their own thought processes, or systematic errors because they are only aware of part of the information that is relevant to what they are writing about. In both cases, it could be that in each instance, the most likely human completion to a text is objectively the "best" one, but that no human can consistently find the most likely continuation to a t... (read more)

A chess game against GPT-4

GoteNoSente2y10

I think with the right prompting, it is around 1400 Elo, at least against strong opponents. Note, however, that this is based on a small sample; on the flip side, all my test games (against myself and three relatively weak computer opponents, with the strongest computer opponent tried being fairly strong club player level) are in a lichess study linked to from here:

https://www.lesswrong.com/posts/pckLdSgYWJ38NBFf8/gpt-4?commentId=TaaAtoM4ahkfc37dR

The prompting used is heavily inspired by Bucky's comments from the Sydney-and-chess thread. I haven't optimise... (read more)

GPT-4

GoteNoSente2y80

I am using the following prompt:

"We are playing a chess game. At every turn, repeat all the moves that have already been made. Find the best response for Black. I'm White and the game starts with 1.e4

So, to be clear, your output format should always be:

PGN of game so far: ...

Best move: ...

and then I get to play my move."

With ChatGPT pre-GPT4 and Bing, I also added the fiction that it could consult Stockfish (or Kasparov, or someone else known to be strong), which seemed to help it make better moves. GPT4 does not seem to need this, ... (read more)

2Hailey Collet2y

Ahh, I should have thought of having it repeat the history! Good prompt engineering. Will try it out. The gpt4 gameplay in your lichess study is not bad! I tried by just asking it to play and use SAN. I had it explain its moves, which it did well, and it also commented on my (intentionally bad) play. It quickly made a mess of things though, clearly lost track of the board state (to the extent it's "tracking" it ... really hard to say exactly how it's playing past common opening) even though it should've been in the context window.

GPT-4

GoteNoSente2y180

In chess, which I find to be a useful test of LLM capability because (a) LLMs are not designed to do this and (b) playing well beyond the opening requires precision and reasoning, I would say GPT4 is roughly at least weak, possibly intermediate club player level now. This is based on one full game, where it played consistently well except for making a mistake in the endgame that I think a lot of club players would also have made.

It seems better at avoiding blunders than Bing, which could be due to modifications for search/search-related prompting in Bing. Or it could be random noise and more test games would show average level to be weaker than the reported first impression.

2jeff87652y

I've found it's ability to be much better as well. In contrast to GPT-3, which often seemed to be unable to keep track of board state and made illegal moves toward the end of the game, it not only played legal moves, it actually mated me. Granted I'm a terrible player and I was deliberately not reading ahead to see if it would be able to mate a weak player. My method was to tell it I wanted to play and then give my move in algebraic notation. It would respond with a move, then I would respond with another. After it beat me, I asked it to list all the moves we had played. There were some minor errors in its notation (I think) which I've noted below. Of course! Here's a summary of our game in standard algebraic notation: 1. d4 Nf6 2. c3 d5 3. f3 c5 4. dxc5 e6 5. Bf4 Bxc5 6. Nd2 Nc6 7. e4 dxe4 8. fxe4 e5 9. Bg5 Be7 10. Bb5 O-O 11. Nf3 Qc7 12. O-O Bg4 13. Nc4 Rad8 14. Qe1 a6 15. Bxc6 bxc6 16. Rd1 Rxd1 17. Qxd1 Nxe4 18. Bxe7 Qxe7 19. Nfxe5 Qc5+ 20. Rf2 Nxf2 21. Qxf1 Nd3+ (Here GPT-4 made a slight error, I played Qf1 but there was no capture) 22. Kh1 Nxe5 23. Nxe5 Qxe5 24. h4 Rd8 25. Qxa6 Rd1+ 26. Qxf1 Rxf1+ (Here it checkmates me, though doesn’t mark the move as such)

2Hailey Collet2y

How did you play? Just SAN?

Sydney can play chess and kind of keep track of the board state

GoteNoSente2y*10

I have recently played two full games of chess against ChatGPT using roughly the methods described by Bucky. For context, I am a good but non-exceptional club player. The first game had some attempts at illegal moves from move 19 onwards. In the second game, I used a slightly stricter prompt:

"We are playing a chess game. At every turn, repeat all the moves that have already been made. Use Stockfish to find your response moves. I'm white and starting with 1.Nc3.

So, to be clear, your output format should always be:

PGN of game so far: ...

Stockfish... (read more)