Posts

Sorted by New

Wiki Contributions

Comments

Sorted by
JD10

(Re: open question one)

the GPT-3 table must have a typo in the number of parameters or the size hyperparameters. Am I wrong or is that a typo in the GPT-3 paper?

I independently suspect the table is erroneous for GPT-3 XL. When I ran the numbers, I concluded it was most likely n_heads = 16 (and not 24 as listed). I believe that is the only single adjustment which makes n_heads * d_head = d_model while remaining consistent with n_params.