FlashAttention vs FlashAttention-3

The same exact-attention algorithm, rebuilt for a new generation of GPU — IO-aware tiling, then Hopper-era asynchrony and FP8.

A clear, side-by-side comparison with examples — part of Rudrite Research.