Two works from
@dieuwke
on stability or consistency of outputs\metrics or if you want (and really, I want) reliability
Datasets for compositional generalizations do not agree with each other. It means that different models are good at different things. But that the metrics don't measure what we thought...
@Adinawilliams
@_dieuwke_
@dieuwke @Adinawilliams How consistent is in context learning?
Across many ICL inputs, they find results vary a lot, ICL training improves model consistency, and bigger models are only more consistent if trained for ICL
No paper yet? couldn't find...
#EMNLP2023
#ML #LLMs #evaluation #NLP #NLProc