LESSWRONG
LW

All of Sophie Y's Comments + Replies

How does GPT-3 spend its 175B parameters?

The architecture shown for "Not in GPT" seems to be wrong? GPT is decoder only. The part labeled as "Not in GPT" is decoder part.

1Robert_AIZI2y

I think both of these statements are true. Despite this, I think the architecture shown in "Not in GPT" is correct, because (as I understand it) "encoder" and "decoder" are interchangeable unless both are present. That's what I was trying to get at here: See this comment for more discussion of the terminology.