I don’t train from scratch, I use RoBERTa
Wait…
Why not cross-encoder/stsb-roberta?facebook/muppet-roberta?
We automatically identify the best models on (periodically)
Just pick the best one
and finetune on your task
Finetuned models are known to sometimes be better than the pretrained model.
We have found that while 5\6 finetuned models are not good
strong models outperform the pretrained model consistently and often
(Fig. Gains of the model over the pretrained, each row is a model)
Yet in practice, this knowledge is rarely applied. We tested T5, RoBERTa and BERT finetuned models (ours and @huggingface) and the best are... THE BEST
Hence, we decided to share it with the world, so people would test it for themselves:
Reposting it from the birdsite, but we are already working on improvements and will share updates soon. Please keep in touch if you have any ideas questions and comments.
Paper: https://arxiv.org/abs/2211.00107
Related work:
https://sigmoid.social/@LChoshen/109291730087194880
We test models from @huggingface hub
rank them efficiently (linear probing on one task)
The best ones, we finetune on 36 different datasets and share with you here:
https://ibm.github.io/model-recycling/
Next time you finetune, just pick the best one
Why use a worse model?
#NLProc #nlp #MachineLearning #ml
Fixed link...
Want to learn more about intertraining?