Scaling Laws vs Chinchilla

Two readings of the same power laws — one prescribed bigger models, one showed compute-optimal training needs far more data per parameter.

A clear, side-by-side comparison with examples — part of Rudrite Research.