New Comment
4 comments, sorted by Click to highlight new comments since:

I think this post could use a summary of what your takeaways were here or why this is relevant to LW. (It does indeed seem relevant, but it's generally good practice to include that in linkposts so people can get a rough sense of an article via the hoverover)

"I have heard that they get the details wrong though, and the fact that they [Groq] are still adversing their ResNet-50 performance (a 2015 era network) speaks to that."

I'm not sure I fully get this criticism: ResNet-50 is the most standard image recognition benchmark and unsurprisingly it's the only (?) architecture that NVIDIA lists in their benchmarking stats for image recognition as well: https://developer.nvidia.com/deep-learning-performance-training-inference.

You are of course aware that Xilinx has its own flavour of ML stuff that can be pushed onto its FPGA's. I believe it is mostly geared towards inference, but have you considered checking the plausibility of your 'as good as a 3090' estimate against the published performance numbers of the first party solutions?

I did not write this post. Just thought it was interesting/relevant for LessWrong.