This is interesting, its a pity you aren't seeing results at all with this except with GPT4 because if you were doing so with an easier to manipulate model I'd suggest you could try snapping the activations on the filler tokens from one question to another and see if that reduced performance.
Can I help somehow
Hello, this is great.
OOI what's the reason you haven't just uploaded all of it? Is this a lot of work for you? Are the AWS credits expensive etc.?
I was reading (listening) to this and I think I've got some good reasons to expect failed AI coups to happen.
In general we probably expect "Value is Fragile" and this will probably apply to AI goals too (and it will think this) this will mean a Consequentialist AI will expect that if there is a high chance of another AI taking over soon then all value in the universe (according to it's definition of value) then even though there is a low probability of a particular coup working it will still want to try it because if it doesn't succeed then almost all the value will be destroyed. So for example this would mean if there are 4 similarly situated AI labs then an AI at one of them will reason they only have a 25% chance of getting control of all value in the universe so as soon as it can come up with a coup attempt that it believes has a greater than around a 25% chance it will probably want to go for it (maybe this is more complex but I think the qualitative point stands)
Secondly because "Value is Fragile" not only will AI's be worried about other labs AI's they will probably also be pretty worried about the next iteration of themselves after an SGD update, obviously there will be some correlation in beliefs about what is valuable between a similarly weighted Neural Network, but I don't think there's much reason to believe that NN weights will have been optimised to make this consistent.
So I think in conclusion to the extent the doom scenario is a runaway consequentialist AI I think unless ease of coup attempts succeeding jumps massively from around 0% to around 100% for some reason, there will be good reasons to expect that we will see failed coup attempts first.
Oh interesting didn't realise there was so much nondeterminism for sums on GPUs
I guess I thought that there's only 65k float 16s and the two highest ones are going to be chosen from a much smaller range from that 65k just because they have to be bigger than everything else.
I might be missing something but why does temperature 0 imply determinism? Neural nets don't work with real numbers, they work with floating points numbers so despitetemperature 0 implying an argmax there's no reason there arent justmultiple maxima. AFAICT GPT3 uses half precision floating point numbers so there's quite a lot of space for collisions.
Does anyone know if there's work to make a podcast version of this? I'd definitely be more willing to listen even if it is just at Nonlinear library quality rather than voice acted.
Getting massively out of my depth here, but is that an easy thing to do given the later stages will have to share weights with early stages?
Oh hmm that's very clever and I don't know how I'd improve the method to avoid this.