Many can write faster asm than the compiler, yet don't. Why?
There's a take I've seen going around, which goes approximately like this: > It used to be the case that you had to write assembly to make computers do things, but then compilers came along. Now we have optimizing compilers, and those optimizing compilers can write assembly better than pretty much any human. Because of that, basically nobody writes assembly anymore. The same is about to be true of regular programming. I 85% agree with this take. However, I think there's one important inaccuracy: even today, finding places where your optimizing compiler failed to produce optimal code is often pretty straightforward, and once you've identified those places 10x+ speedups for that specific program on that specific hardware is often possible[1]. The reason nobody writes assembly anymore is the difficulty of mixing hand-written assembly with machine-generated assembly. The issue is that it's easy to have the compiler write all of the assembly in your project, and it's easy from a build perspective to have the compiler write none of the assembly in your project, but having the compiler write most but not all of the assembly in your project is hard. As with many things in proramming, having two sources of truth leads to sadness. You have many choices for what to do if you spot an optimization the compiler missed, and all of them are bad: 1. Hope there's a pragma or compiler flag. If one exists, great! Add it and pray that your codebase doesn't change such that your pragma now hurts perf. 2. Inline assembly. Now you're maintaining two mental models: the C semantics the rest of your code assumes, and the register/memory state your asm block manipulates. The compiler can't optimize across inline asm boundaries. Lots of other pitfalls as well - using inline asm feels to me like a knife except the handle has been replaced by a second blade so you can have twice as much knife per knife. 3. Factor the hot path into a separate .s file, write an ABI-compliant assembly functi