BERT vs GPT vs T5
Three ways to pretrain the same transformer — read both directions, predict the next token, or cast every task as text-to-text.
A clear, side-by-side comparison with examples — part of Rudrite Research.
Three ways to pretrain the same transformer — read both directions, predict the next token, or cast every task as text-to-text.
A clear, side-by-side comparison with examples — part of Rudrite Research.