The newFormer is introduced,
but what do we really know about it?
@ari and others
imagine a new large-scale architecture &
ask how would you interptret its abilities and behaviours
https://arxiv.org/abs/2308.00189
#deepRead #NLProc #MachineLearning
@ari We NLPers became a Complex Systems Science (like studying the brain or the weather)
We won't look at all the parts, or understand the math of every module, instead, we try to make sense of the system at different levels of granularity
@ari Once, they say, we might have created many variants of such networks, but this is long past, too costly
(I am not convinced this is true yet, we might still find network dynamics, babyLM models etc. can tell us some things without full pretraining, but it remains to be seen)
For more on BabyLM
http://babylm.github.io
Or network dynamics
https://arxiv.org/abs/2109.06096
@ari This leaves us to discover (not design!) what those models are capable of.
To do that, we must start by testing what newFormer does and build from there up the how and why
Specifically, we need to test the function by trying out data(s) and checking the outputs
@ari Fortunately, we have some uncommon advantages Complex Systems hardly have
First, we know every detail in it, already an advantage over other large-scale efforts (e.g. the brain)
Second, we can run an experiment
@ari How do you think we can tackle this challenge? We all know those models grow, what are the right questions to ask?
And what would generalize (e.g., I am quite sure our recent work below would not stay true forever, how would we know it is wrong in the new model or make longlasting deductions?)