
Sign up to save your podcasts
Or


This document investigates why bidirectional language models perform better than unidirectional models on natural language understanding tasks. The authors propose a new framework called Flow Neural Information Bottleneck (FlowNIB), which uses the Information Bottleneck principle to analyze the flow of information during training. FlowNIB dynamically balances maximizing information about the input and information relevant to the output. The study shows that bidirectional models preserve more mutual information from the input and exhibit higher effective dimensionality in their internal representations compared to unidirectional models. Experiments across various models and tasks validate these findings, suggesting that this enhanced information processing capacity contributes to their superior performance.
By Enoch H. KangThis document investigates why bidirectional language models perform better than unidirectional models on natural language understanding tasks. The authors propose a new framework called Flow Neural Information Bottleneck (FlowNIB), which uses the Information Bottleneck principle to analyze the flow of information during training. FlowNIB dynamically balances maximizing information about the input and information relevant to the output. The study shows that bidirectional models preserve more mutual information from the input and exhibit higher effective dimensionality in their internal representations compared to unidirectional models. Experiments across various models and tasks validate these findings, suggesting that this enhanced information processing capacity contributes to their superior performance.