What prompt did you use? I have also experimented with playing chess against GPT-4.5, and used the following prompt:
"You are Magnus Carlsen. We are playing a chess game. Always answer only with your next move, in algebraic notation. I'll start: 1. e4"
Then I just enter my moves one at a time, in algebraic notation.
In my experience, this yields roughly good club player level of play.
A world with no human musicians won't happen, unless there is some extinction-level event that at a minimum leads to a new dark age. AI music will not outcompete human music (at least not to the point where the latter is not practised professionally any more), because a large part of the appeal of music is the knowledge that another human made it.
We have a similar situation today in chess. Of course a cellphone can generate chess games that are of higher quality (less errors, awesome positional and tactical play) than those of human world-class players. If...
It is not at all clear to me that most of the atoms in a planet could be harnessed for technological structures, or that doing so would be energy efficient. Most of the mass of an earthlike planet is iron, oxygen, silicon and magnesium, and while useful things can be made out of these elements, I would strongly worry that other elements that are needed also in those useful things will run out long before the planet has been disassembled. By historical precedent, I would think that an AI civilization on Earth will ultimately be able to use only a tiny fract...
In chess, I think there are a few reasons why handicaps are not more broadly used:
Isn't the AI box game (or at least its logical core) played out a million times a day between prisoners and correctional staff, with the prisoners losing almost all the time? Real prison escapes (i.e. inmate escape other than did not return from sanctioned time outside) are in my understanding extremely rare.
I think the most important things that are missing in the paper currently are these three points:
1. Comparison to the best Leela Zero networks
2. Testing against strong (maybe IM-level) humans at tournament time controls (or a clear claim that we are talking about blitz elo, since a player who does no explicit tree search does not get better if given more thinking time).
3. Games against traditional chess computers in the low GM/strong IM strength bracket would also be nice to have, although maybe not scientifically compelling. I sometimes do those for fun w...
I do not see why any of these things will be devalued in a world with superhuman AI.
At most of the things I do, there are many other humans who are vastly better at doing the same thing than me. For some intellectual activities, there are machines who are vastly better than any human. Neither of these stops humans from enjoying improving their own skills and showing them off to other humans.
For instance, I like to play chess. I consider myself a good player, and yet a grandmaster would beat me 90-95 percent of the time. They, in turn, would lose on average...
As an additional thought regarding computers, it seems to me that participant B could be replaced by a weak computer in order to provide a consistent experimental setting. For instance, Leela Zero running just the current T2 network (no look-ahead) would provide an opponent that is probably at master-level strength and should easily be able to crush most human opponents who are playing unassisted, but would provide a perfectly reproducible and beatable opponent.
I think having access to computer analysis would allow the advisors (both honest and malicious) to provide analysis far better than their normal level of play, and allow the malicious advisors in particular to set very deep traps. The honest advisor, on the other hand, could use the computer analysis to find convincing refutations of any traps the dishonest advisors are likely to set, so I am not sure whether the task of the malicious side becomes harder or easier in that setup. I don't think reporting reasoning is much of a problem here, as a centaur (a c...
I could be interested in trying this, in any configuration. Preferred time control would be one move per day. My lichess rating is about 2200.
Are the advisors allowed computer assistance, do the dishonest and the honest advisor know who is who in this experiment, and are the advisors allowed to coordinate? I think those parameters would make a large difference potentially in outcome for this type of experiment.
It is possible to play funny games against it, however, if one uses the fact that it is at heart a story telling, human-intent-predicting system. For instance, this here works (human white):
1. e4 e5 2. Ke2 Ke7 3. Ke3 Ke6 4. Kf3 Kf6 5. Kg3 Kg6 6. Kh3 Kh6 7. Nf3 Nf6 8. d4+ Kg6 9. Nxe5# 1-0
A slight advantage in doing computer security research won't give an entity the ability to take over the internet, by a long shot, especially if it does not have backing by nation state actors. The NSA for instance, as an organisation, has been good at hacking for a long time, and while certainly they can and have done lots of interesting things, they wouldn't be able to take over the world, probably even if they tried and did it with the backing of the full force of the US military.
Indeed, for some computer security problems, even superintelligence might ...
To second a previous reply to this, I would expect this will hold for humans as well.
On top of that, mathematically it is perfectly possible for some function to be easy to learn/compute, but the inverse to be hard. For instance, discrete exponentiation is easy to compute in all groups where multiplication is easy to compute, but the inverse function, the discrete logarithm, is hard enough to base cryptography on it, if one picks a suitable group representation (e.g. point groups of secure elliptic curves, or the group of invertible elements of a large saf...
The playing strength of parrotchess seems very uneven, though. On the one hand, if I play it head-on, just trying to play the best chess I can, I would estimate it even higher than 1800, maybe around 2000 when we regard this as blitz. I'm probably roughly somewhere in the 1900s and on a few tries, playing at blitz speed myself, I would say I lost more than I won overall.
On the other hand, trying to play an unconventional but solid opening in order to neutralize its mostly awesome openings and looking out for tactics a bit while keeping the position mostly ...
If high-tech aliens did visit us, it would not seem inconceivable that the drones they would send might contain (or are able to produce prior to landing) robotic exploration units based on some form of nanotechnology that we might mistake for biology and more specifically, for pilots. A very advanced robot need not look like a robot.
I also do not find it too worrisome that we do not see Dyson spheres or a universe converted into computronium. It is possible that the engineering obstacles towards either goal are more formidable than the back-of-the-envelope...
A hardware protection mechanism that needs to confirm permission to run by periodically dialing home would, even if restricted to large GPU installations, brick any large scientific computing system or NN deployment that needs to be air-gapped (e.g. because it deals with sensitive personal data, or particularly sensitive commercial secrets, or with classified data). Such regulation also provides whoever controls the green light a kill switch against any large GPU application that runs critical infrastructure. Both points would severely damage national secu...
Thanks for the information. I'll try out BT2. Against LazyBot I was just then able to get a draw in a blitz game with 3 seconds increment, which I don't think I could do within a few tries against an opponent of, say, low grandmaster strength (with low grandmaster strength being quite far way away from superhuman still). Since pure policy does not improve with thinking time, I think my chances would be much better at longer time controls. Certainly its lichess rating at slow time controls suggests that T80 is not more than master strength when its human op...
The LC0 pure policy is most certainly not superhuman. To test this, I just had it (network 791556, i.e. standard network of the current LC0 release) play a game against a weak computer opponent (Shredder Chess Online). SCO plays maybe at the level of a strong expert/weak candidate master at rapid chess time controls (but it plays a lot faster, thereby making generation of a test game more convenient than trying to beat policy-only lc0 myself, which I think should be doable). Result was draw, after lc0 first completely outplayed SCO positionally, and then b...
I would disagree with the notion that the cost of mastering a world scales with the cost of the world model. For instance, the learning with errors problem has a completely straightforward mathematical description, and yet strong quantum-resistant public-key cryptosystems can be built on it; there is every possibility that even a superintelligence a million years from now will be unable to read a message encrypted today using AES-256 encapsulated using a known Kyber public key with conservatively chosen security parameters.
Similarly, it is not clear to me ...
This seems clearly wrong:
Go is extremely simple: the entire world of Go can be precisely predicted by trivial tiny low depth circuits/programs. This means that the Go predictive capability of a NN model as a function of NN size completely flatlines at an extremely small size. A massive NN like the brain's cortex is mostly wasted for Go, with zero advantage vs the tiny NN AlphaZero uses for predicting the tiny simple world of Go.
Top go-playing programs utilize neural networks, but they are not neural networks. Monte-Carlo Tree Search boosts their playing st...
The machines playing chess and go, are a mixed example. I suck at chess, so the machines better than me have already existed decades ago. But at some moment they accelerated and surpassed the actual experts quite fast. More interestingly, they surpassed the experts in a way more general than the calculator does; if I remember it correctly, the machine that is superhuman at go is very similar to the machine that is superhuman at chess.
I think the story of chess- and Go-playing machines is a bit more nuanced, and that thinking about this is useful when...
Sure, the AI probably can't use all the mass-energy of the solar system efficiently within the next week or something, but that just means that it's going to want to store that mass-energy for later (...)
If the AI can indeed engineer black-hole powered matter-to-energy converters, it will have so much fuel that the mass stored in human bodies will be a rounding error to it. Indeed, given the size of other easily accessible sources, this would seem to be the case even if it has to resort to more primitive technology and less abundant fuel as its termi...
...At the limits of technology you can just convert any form of matter into energy by dumping it into a small black hole. Small black holes are actually really hot and emit appreciable fractions of their total mass per second through hawking radiation, so if you start a small black hole by concentrating lasers in a region of space, and you then feed it matter with a particle accelerator, you have essentially a perfect matter -> energy conversion. This is all to say that a superintelligence would certainly have uses for the kinds of atoms our bodies (and th
It is worth noting that there are entire branches of science that are built around the assumption that intelligence is of zero utility for some important classes of problems. For instance, cryptographers build algorithms that are supposed to be secure against all adversaries, including superintelligences. Roughly speaking, one hopes (albeit without hard proof) for instance that the AES is secure (at least in the standard setting of single-key attacks) against all algorithms with a time-memory-data tradeoff significantly better than well-optimized exhaustiv...
It seems likely that it would only have use for some kinds of atoms and not very unlikely that the atoms that human bodies are made of would not be very useful to it.
At the limits of technology you can just convert any form of matter into energy by dumping it into a small black hole. Small black holes are actually really hot and emit appreciable fractions of their total mass per second through hawking radiation, so if you start a small black hole by concentrating lasers in a region of space, and you then feed it matter with a particle accelerator, you have...
That is odd. I certainly had a much, much higher completion rate than 1 in 40; in fact I had no games that I had to abandon with my prompt. However, I played manually, and played well enough that it mostly did not survive beyond move 30 (although my collection has a blindfold game that went beyond move 50), and checked at every turn that it reproduced the game history correctly, reprompting if that was not the case. Also, for GPT3.5 I supplied it with the narrative fiction that it could access Stockfish. Mentioning Stockfish might push it towards more prec...
It seems to me that there are a couple of other reasons why LLMs might develop capabilities that go beyond the training set:
1. It could be that individual humans make random errors due to the "temperature" of their own thought processes, or systematic errors because they are only aware of part of the information that is relevant to what they are writing about. In both cases, it could be that in each instance, the most likely human completion to a text is objectively the "best" one, but that no human can consistently find the most likely continuation to a t...
I think with the right prompting, it is around 1400 Elo, at least against strong opponents. Note, however, that this is based on a small sample; on the flip side, all my test games (against myself and three relatively weak computer opponents, with the strongest computer opponent tried being fairly strong club player level) are in a lichess study linked to from here:
https://www.lesswrong.com/posts/pckLdSgYWJ38NBFf8/gpt-4?commentId=TaaAtoM4ahkfc37dR
The prompting used is heavily inspired by Bucky's comments from the Sydney-and-chess thread. I haven't optimise...
I am using the following prompt:
"We are playing a chess game. At every turn, repeat all the moves that have already been made. Find the best response for Black. I'm White and the game starts with 1.e4
So, to be clear, your output format should always be:
PGN of game so far: ...
Best move: ...
and then I get to play my move."
With ChatGPT pre-GPT4 and Bing, I also added the fiction that it could consult Stockfish (or Kasparov, or someone else known to be strong), which seemed to help it make better moves. GPT4 does not seem to need this, ...
In chess, which I find to be a useful test of LLM capability because (a) LLMs are not designed to do this and (b) playing well beyond the opening requires precision and reasoning, I would say GPT4 is roughly at least weak, possibly intermediate club player level now. This is based on one full game, where it played consistently well except for making a mistake in the endgame that I think a lot of club players would also have made.
It seems better at avoiding blunders than Bing, which could be due to modifications for search/search-related prompting in Bing. Or it could be random noise and more test games would show average level to be weaker than the reported first impression.
I have recently played two full games of chess against ChatGPT using roughly the methods described by Bucky. For context, I am a good but non-exceptional club player. The first game had some attempts at illegal moves from move 19 onwards. In the second game, I used a slightly stricter prompt:
"We are playing a chess game. At every turn, repeat all the moves that have already been made. Use Stockfish to find your response moves. I'm white and starting with 1.Nc3.
So, to be clear, your output format should always be:
PGN of game so far: ...
Stockfish...
Isn't it fairly obvious that the human brain starts with a lot of pretraining just built in by evolution? I know that some people make the argument that the human genome does not contain nearly enough data to make up for the lack of subsequent training data, but I do not have a good intuition for how apparently data efficient an LLM would be that can train on a limited amount of real world training data plus synthetic reasoning traces of a tiny teacher model that has been heavily optimised with massive data and compute (like the genome has). I also don't t... (read more)