New #languagemodeling #nlp #ai #paper, led by Angelica Chen! We break the steepest MLM training loss drop into *2* phase changes: first in internal grammatical structure, then external capabilities. Big implications for emergence, simplicity bias, and interpretability! https://arxiv.org/abs/2309.07311
Genuinely, if you do anything related tangentially to any area of science of deep learning, you should check it out. It's about grammar, epistemology, causal interpretability, latent structure, phase transitions, early training dynamics, the information bottleneck hypothesis, and simplicity bias.