Scaling Laws vs Chinchilla
Two readings of the same power laws — one prescribed bigger models, one showed compute-optimal training needs far more data per parameter.
A clear, side-by-side comparison with examples — part of Rudrite Research.
Two readings of the same power laws — one prescribed bigger models, one showed compute-optimal training needs far more data per parameter.
A clear, side-by-side comparison with examples — part of Rudrite Research.