T5 (Encoder-Decoder)

T5 — short for Text-to-Text Transfer Transformer — is a model from Google that uses the full encoder–decoder Transformer, just like the original. Its big idea is wonderfully unifying: frame every task as a text-to-text problem. Translation, summarisation, classification, question answering — all become "text in, text out," handled by a single model in a single format.

💡 In one line: T5 uses both encoder and decoder, and turns every task into the same shape — text in, text out.

Encoder–Decoder Architecture

T5 keeps both Transformer stacks:

  • The encoder reads and understands the input bidirectionally (like BERT).
  • The decoder generates the output token by token, using cross-attention to look back at the encoder (like the original Transformer).

This gives T5 the best of both worlds: deep understanding and flexible generation — ideal for sequence-to-sequence tasks where the output is a transformed version of the input.

The Text-to-Text Framework

T5's core innovation is treating every task identically: the input is text, the output is text. You tell the model what to do with a task prefix in the input:

Input (with prefix)Output
translate English to German: That is goodDas ist gut
summarize: <long article><short summary>
cola sentence: He are happyunacceptable
stsb sentence1: … sentence2: …3.8

Notice that even classification and scoring become text outputs ("unacceptable", "3.8"). One model, one format, many tasks.

How T5 is Pre-trained: Span Corruption

T5's pre-training is a twist on masked language modeling called span corruption: instead of masking single tokens, it masks whole spans of text and trains the model to reconstruct them. This objective suits the encoder–decoder setup, where the decoder regenerates the missing pieces. After pre-training, T5 is fine-tuned on many tasks — all in the same text-to-text format.

Why Encoder–Decoder for T5?

The encoder–decoder design shines when the output is a rewritten or transformed version of the input:

  • Translation — input in one language, output in another.
  • Summarisation — long input, short output.
  • Question answering — passage + question in, answer out.

The encoder fully understands the source, and the decoder produces output of any length.

T5 vs. BERT vs. GPT

ModelStackReadsBest for
BERTEncoder-onlyBidirectionalUnderstanding
GPTDecoder-onlyLeft-to-rightGeneration
T5Encoder–DecoderBothText-to-text (seq2seq)

This trio captures the three ways to use the Transformer: understand (BERT), generate (GPT), or transform (T5).

Code Example


T5 also handles other tasks via the text2text-generation pipeline using prefixes like "summarize: ...". (Runs with pip install transformers.)

Applications

  • Translation between languages
  • Summarisation of long documents
  • Question answering
  • Classification (expressed as text labels)

…all through the same text-to-text interface.

Summary

  • T5 uses the full encoder–decoder Transformer.
  • Its key idea is the text-to-text framework: every task is text in → text out, selected by a task prefix.
  • It's pre-trained with span corruption and fine-tuned across many tasks in one format.
  • The encoder–decoder design is ideal for transforming input into output (translation, summarisation).
  • The three variants together: BERT understands, GPT generates, T5 transforms.