All of Jose Miguel Cruz y Celis's Comments + Replies

Ok, let's examine a more conservative scenario using solely visual input. If we take 10 megabits/s as the base and deduct 30% to account for sleep time, we'll end up with roughly 0.78 petabytes accumulated over 30 years. This translates to approximately 157 trillion tokens in 30 years, or around 5.24 trillion tokens annually. Interestingly, even under these conservative conditions, the estimate significantly surpasses the training data of LLMs (~1 trillion tokens) by two orders of magnitude.

Where's Nick Bostrom? I've been wondering about this. I haven't seen anything published recently by him or hear him talk, besides that small New York Times piece. It would be great to hear his take in depth about this recent AI progress.

You mention "I would point out that your calculations are based on the incident data our senses pick up, whereas what we learn is based on the information received by our brain. Almost all of the incident data is thrown away much closer to the source."

Wouldn't this be similar to how a Neural Network "disregards" training data that it has already seen? i.e. If it has already learned that pattern, there's no gradient so the loss wouldn't go down. Maybe there's another mechanism that we're missing in current neural nets online training, that would increase tr... (read more)

3AnthonyC
I don't know how that's done, sorry. Does it literally throw away the the data without using it for anything whatsoever (And does it do this with on the order of 99.9% of the training data set?)? Or does it process the data but then because it is redundant it has no or almost no effect on the model weights? I'm talking about the former, since the vast majority of our visual data never makes it from the retina to the optic nerve. The latter would be something more like how looking at my bedroom wall yet again has little to no effect on my understanding of any aspect of the world. And to your second point, yeah I was pretty unclear, sorry. I meant, your original calculation was that a human at age 30 has ~31,728 T tokens worth of data, compared to 1T for GPT4. The human has 31728 times as much, and log (31728) is about 4.5, meaning the human has 4.5 OOMs more training data. But if I'm right that you should cut down your human training data amounts by ~1000x because of throwing it away before it gets processed in the brain at all, then we're left with a human at age 30 having only 31.728x as much. log(31.728)~1.5, aka the human has 1.5 OOMs more training data. The rest of that comment was me indicating that that's just how much data gets to the brain in any form, not how much is actually being processed for training purposes.

I did some calculations with a bunch of assumptions and simplifications but here's a high estimate, back of the envelope calculation for the data and "tokens" a 30 year old human would have "trained" on:

  •  Visual data: 130 million photoreceptor cells, firing at 10 Hz = 1.3Gbits/s = 162.5 MB/s over 30 years (aprox. 946,080,000 seconds) = 153 Petabytes
  • Auditory data:  Humans can hear frequencies up to 20,000 Hz, high quality audio is sampled at 44.1 kHz satisfying Nyquist-Shannon sampling theorem, if we assume a 16bit (cd quality)*2(channels for stere
... (read more)

I'm curious about where you get that "models trained mostly on English text are still pretty good at Spanish" do you have a reference?

I'm very much aligned with the version of utilitarianism that Bostrom and Ord generally put forth, but a question came up in a conversation regarding this philosophy and view of sustainability.  As a thought experiment what would be consistent with this philosophy if we discover that a very clear way to minimize existential risk due to X requires a genocide of half or a significant subset of the population?

Here we are now, what would you comment on the progress of C. Elegans emulation in general and of your particular approach?