This research explores whether transformers, a type of neural network architecture, can learn to reason implicitly over knowledge. The authors find that transformers can learn to reason implicitly, but only through a phenomenon called
grokking, where training extends far beyond overfitting. The study investigates two reasoning types:
composition and
comparison. They find that while the transformers generalize well on in-distribution examples for both types, they struggle with out-of-distribution generalization for composition but succeed for comparison. Through mechanistic analysis of the model’s internals, they discover that
different circuits are formed during grokking for each reasoning type, which explains the varying levels of systematicity. The authors also demonstrate the potential of
parametric memory for complex reasoning tasks with large search spaces, showing that a fully grokked transformer can achieve near-perfect accuracy, while state-of-the-art LLMs with non-parametric memory fail.