All of toph's Comments + Replies

toph10

Late to the party, but thanks for writing this up! I'm confused about two points in this calculation of the Theory section:

  • The FLOP needed to compute the term "δ3@A2R" (and similar)
    • I understand this to be the outer product of two vectors, δ3 with length #output, and A2R with length #hidden2  
    • If that's the case, should this require only #output*#hidden2*#batch FLOP (without the factor two in the table), since it's just the multiplication of each pair of numbers?
  • Do the parameter updates need to be accumulated for each example in the batch?
    • If this is the
... (read more)