All of snimu's Comments + Replies

snimu10

Yeah, I was kind of rambling, sorry. 

My main point is twofold (I'll just write GPU when I mean GPU / AI accelerator):

1. Destroying all GPUs is a stalling tactic, not a winning strategy. While CPUs are clearly much worse for AI than GPUs, they, and AI algorithms, should keep improving over time. State-of-the-art models from less than ten years ago can be run on CPUs today, with little loss in accuracy. If this trend continues, GPUs vs CPUs only seems to be of short-term importance. Regarding your point about having to train a dense net on GPUs before s... (read more)

3gwern
I didn't read Eliezer as suggesting a single GPU burn and then the nanobots all, I dunno, fry themselves and never exist again. More as a persistent thing. And burning all GPUs persistently does seem quite pivotal: maybe if the AGI confined itself to solely that and never did anything again, eventually someone would accumulate enough CPUs and spend so much money as to create a new AGI using only hardware which doesn't violate the first AGI's definition of 'GPU' (presumably they know about the loophole otherwise who would ever even try?), but that will take a long time and is approaching angels-on-pinheads sorts of specificity. (If a 'pivotal act' needs to guarantee safety until the sun goes red giant in a billion years, this may be too stringent a definition to be of any use. We don't demand that sort of solution for anything else.) CPUs are improving slowly, and are fundamentally unsuited to DL right now, so I'm doubtful that waiting a decade is going to give us amazing CPUs which can do DL at the level of, say, a Nvidia H100 (itself potentially still very underpowered compared to the GPUs you'd need for AGI). By AI algorithm progress, I assume you mean something like the Hernandez progress law? It's worth pointing out that the Hernandez experience curve is still pretty slow compared to the GPU vs GPU gap. A GPU is like 20x better, and Hernandez is a halving of cost every 16 months due to hardware+software improvement; even at face-value, you'd need at least 5 halvings to catch up, taking at least half a decade. Worse, 'hardware' here means 'GPU', of course, so Hernandez is an overestimate of a hypothetical 'CPU' curve, so you're talking more like decades. Actually, it's worse than that, because 'software' here means 'all of the accelerated R&D enabled by GPUs being so awesome and letting us try out lots of things by trial-and-error'; experience curves are actually caused by the number of cumulative 'units', and not by mere passage of time (progress doesn't just
snimu10

I realize that destroying all GPUs (or all AI-Accelerators in general) as a solution to AGI Doom is not realisticly alignable, but I wonder whether it would be enough even if it were. It seems like the Lottery-Ticket Hypothesis would likely foil this plan: 

dense, randomly-initialized, feed-forward networks contain subnetworks ("winning tickets") that - when trained in isolation - reach test accuracy comparable to the original network in a similar number of iterations.

Seeing how Neuralmagic successfully sparsifies models to run on CPUs with minimal los... (read more)

7gwern
I don't follow. While it's plausible that sparsification may scale better (maybe check Rosenfeld to see if his scaling laws cover that, I don't recall offhand EDIT: hm no, while it varies dataset size by subsampling it doesn't seem to do compute-optimal scaling or report things easily enough for me to tell anything - although the larger models do prune differently, so they should be either better or worse with scale), you still have to train the largest model in the first place before you can sparsify it, and regardless of size, it remains the case that CPUs are much worse for training large NNs than GPUs.