Juraj Vitko

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by

Here's a list of resources that may be of use to you. The GPT-3 paper isn't too specific on implementation details because the changes that led to it were rather incremental (especially from GPT-2, and more so the farther back we look at the Transformer lineage). So the scope to understand GPT-3 is broader than one might expect.

  • https://github.com/jalammar/jalammar.github.io/blob/master/notebooks/nlp/01_Exploring_Word_Embeddings.ipynb
  • http://www.peterbloem.nl/blog/transformers
  • http://jalammar.github.io/illustrated-transformer/
  • https://amaarora.github.io/2020/02/18/annotatedGPT2.html
  • http://jalammar.github.io/illustrated-gpt2/
  • http://jalammar.github.io/how-gpt3-works-visualizations-animations/
  • https://arxiv.org/pdf/1409.0473.pdf Attention (initial)
  • https://arxiv.org/pdf/1706.03762.pdf Attention Is All You Need
  • http://nlp.seas.harvard.edu/2018/04/03/attention.html (annotated)
  • https://www.arxiv-vanity.com/papers/1904.02679/ Visualizing Attention
  • https://stats.stackexchange.com/questions/421935/what-exactly-are-keys-queries-and-values-in-attention-mechanisms
  • https://arxiv.org/pdf/1807.03819.pdf Universal Transformers
  • https://arxiv.org/pdf/2007.14062.pdf Big Bird (see appendices)
  • https://www.reddit.com/r/MachineLearning/comments/hxvts0/d_breaking_the_quadratic_attention_bottleneck_in/
  • https://www.tensorflow.org/tutorials/text/transformer
  • https://www.tensorflow.org/tutorials/text/nmt_with_attention
  • https://cdn.openai.com/blocksparse/blocksparsepaper.pdf
  • https://openai.com/blog/block-sparse-gpu-kernels/
  • https://github.com/pbloem/former/blob/master/former/transformers.py
  • https://github.com/openai/blocksparse/blob/master/examples/transformer/enwik8.py
  • https://github.com/google/trax/blob/master/trax/models/transformer.py
  • https://github.com/huggingface/transformers/blob/master/src/transformers/modeling_gpt2.py