
Sign up to save your podcasts
Or
Revisiting Superficial Alignment Hypothesis

- The paper revisits the Superficial Alignment Hypothesis.
- It studies post-training scaling behavior with finetuning examples.
- Performance scales as a power law with more finetuning examples.
- Model performance correlates with reasoning ability, not just style.
- Language models can integrate new knowledge post-pre-training.
- Results suggest the hypothesis is an oversimplification.
...more