February 23, 2026

EP002: ULMFiT Was the ImageNet Moment for Text

24 minutes

The paper "Universal Language Model Fine-tuning for Text Classification" by Jeremy Howard and Sebastian Ruder introduces ULMFiT, an effective transfer learning method for Natural Language Processing (NLP). While transfer learning has long revolutionized computer vision, NLP models previously required significant task-specific modifications or training from scratch. ULMFiT enables ImageNet-like inductive transfer for any NLP task using a single, 3-layer LSTM architecture.

The ULMFiT process consists of three main stages:

• General-domain LM pretraining: The language model is first trained on a large, general corpus (Wikitext-103) to capture broad features of the language.

• Target task LM fine-tuning: The pretrained model is adapted to the specific target task data, even if the dataset is small.

• Target task classifier fine-tuning: The model is augmented with linear blocks to perform the final classification task.

To ensure robust learning and prevent catastrophic forgetting (where the model loses its pretrained knowledge during fine-tuning), the authors propose three key techniques:

1. Discriminative fine-tuning: Using different learning rates for different layers of the model.

2. Slanted triangular learning rates (STLR): A learning rate schedule that first linearly increases and then decays.

3. Gradual unfreezing: A process of slowly unfreezing the model layers starting from the last one to preserve low-level representations.

Key Results and Impact:

• ULMFiT significantly outperformed the state-of-the-art on six representative text classification tasks, reducing the error rate by 18-24% on most datasets.

• The method is extremely sample-efficient; with only 100 labeled examples, it can match the performance of training a model from scratch on 100 times more data.

• It provides a universal approach that works across different document sizes and label types without requiring custom feature engineering.

...more

View all episodes

By Yun Wu

February 23, 2026

EP002: ULMFiT Was the ImageNet Moment for Text

24 minutes

The ULMFiT process consists of three main stages:

• General-domain LM pretraining: The language model is first trained on a large, general corpus (Wikitext-103) to capture broad features of the language.

• Target task LM fine-tuning: The pretrained model is adapted to the specific target task data, even if the dataset is small.

• Target task classifier fine-tuning: The model is augmented with linear blocks to perform the final classification task.

To ensure robust learning and prevent catastrophic forgetting (where the model loses its pretrained knowledge during fine-tuning), the authors propose three key techniques:

1. Discriminative fine-tuning: Using different learning rates for different layers of the model.

2. Slanted triangular learning rates (STLR): A learning rate schedule that first linearly increases and then decays.

3. Gradual unfreezing: A process of slowly unfreezing the model layers starting from the last one to preserve low-level representations.

Key Results and Impact:

• ULMFiT significantly outperformed the state-of-the-art on six representative text classification tasks, reducing the error rate by 18-24% on most datasets.

• The method is extremely sample-efficient; with only 100 labeled examples, it can match the performance of training a model from scratch on 100 times more data.

• It provides a universal approach that works across different document sizes and label types without requiring custom feature engineering.

...more

Share EP002: ULMFiT Was the ImageNet Moment for Text

Sign up to save your podcasts

EP002: ULMFiT Was the ImageNet Moment for Text

EP002: ULMFiT Was the ImageNet Moment for Text