BERT vs GPT vs T5

Three ways to pretrain the same transformer — read both directions, predict the next token, or cast every task as text-to-text.

A clear, side-by-side comparison with examples — part of Rudrite Research.