Juraj Vitko — LessWrong

Here's a list of resources that may be of use to you. The GPT-3 paper isn't too specific on implementation details because the changes that led to it were rather incremental (especially from GPT-2, and more so the farther back we look at the Transformer lineage). So the scope to understand GPT-3 is broader than one might expect.

https://github.com/jalammar/jalammar.github.io/blob/master/notebooks/nlp/01_Exploring_Word_Embeddings.ipynb
http://www.peterbloem.nl/blog/transformers
http://jalammar.github.io/illustrated-transformer/
https://amaarora.github.io/2020/02/18/annotatedGPT2.html
http://jalammar.github.io/illustrated-gpt2/
http://jalammar.github.io/how-gpt3-works-visualizations-animations/
https://arxiv.org/pdf/1409.0473.pdf Attention (initial)
https://arxiv.org/pdf/1706.03762.pdf Attention Is All You Need
http://nlp.seas.harvard.edu/2018/04/03/attention.html (annotated)
https://www.arxiv-vanity.com/papers/1904.02679/ Visualizing Attention
https://stats.stackexchange.com/questions/421935/what-exactly-are-keys-queries-and-values-in-attention-mechanisms
https://arxiv.org/pdf/1807.03819.pdf Universal Transformers
https://arxiv.org/pdf/2007.14062.pdf Big Bird (see appendices)
https://www.reddit.com/r/MachineLearning/comments/hxvts0/d_breaking_the_quadratic_attention_bottleneck_in/
https://www.tensorflow.org/tutorials/text/transformer
https://www.tensorflow.org/tutorials/text/nmt_with_attention
https://cdn.openai.com/blocksparse/blocksparsepaper.pdf
https://openai.com/blog/block-sparse-gpu-kernels/
https://github.com/pbloem/former/blob/master/former/transformers.py
https://github.com/openai/blocksparse/blob/master/examples/transformer/enwik8.py
https://github.com/google/trax/blob/master/trax/models/transformer.py
https://github.com/huggingface/transformers/blob/master/src/transformers/modeling_gpt2.py

LESSWRONG
is fundraising!
LW

LESSWRONG
is fundraising!
LW

Posts

Wikitag Contributions

Comments

Posts

Wikitag Contributions

Comments