All of Darcey's Comments + Replies

Darcey41

I'd be interested to see links to those papers!

3Lech Mazur
I've messaged you the links. Basically MLPs.
Darcey9-2

I'm a little bit skeptical of the argument in "Transformers are not special" -- it seems like, if there were other architectures which had slightly greater capabilities than the Transformer, and which were relatively low-hanging fruit, we would have found them already.

I'm in academia, so I can't say for sure what is going on at big companies like Google. But I assume that, following the 2017 release of the Transformer, they allocated different research teams to pursuing different directions: some research teams for scaling, and others for the development o... (read more)

8Lech Mazur
There have been a few papers with architectures showing performance that matches transformers on smaller datasets with scaling that looks promising. I can tell you that I've switched from attention to an architecture loosely based on one of these papers because it performed better on a smallish dataset in my project but I haven't tested it on any standard vision or language datasets, so I don't have any concrete evidence yet. Nevertheless, my guess is that indeed there is nothing special about transformers.
porby144

I think what's going on is something like:

  1. Being slightly better isn't enough to unseat an entrenched option that is well understood. It would probably have to very noticeably better, particularly in scaling.
  2. I expect the way the internal structures are used will usually dominate the details of the internal structure (once you're already at the pretty good frontier).
  3. If you're already extremely familiar with transformers, and you can simply change how you use transformers for possible gains, you're more likely to do that than to explore a from-scratch techniq
... (read more)
Darcey100

Thanks for this post! More than anything I've read before, it captures the visceral horror I feel when contemplating AGI, including some of the supposed FAIs I've seen described (though I'm not well-read on the subject).

One thought though: the distinction between wrapper-minds and non-wrapper-minds does not feel completely clear-cut to me. For instance, consider a wrapper-mind whose goal is to maximize the number of paperclips, but rather than being given a hard-coded definition of "paperclip", it is instructed to go out into the world, interact with human... (read more)

Darcey10

Aha, thanks for clarifying this; was going to ask this too. :)