sigmoid.social is one of the many independent Mastodon servers you can use to participate in the fediverse.
A social space for people researching, working with, or just interested in AI!

Server stats:

596
active users

I don’t train from scratch, I use RoBERTa🧐
Wait…
Why not cross-encoder/stsb-roberta?facebook/muppet-roberta?

We automatically identify the best models on 🤗(periodically)

Just pick the best one
and finetune on your task

ibm.github.io/model-recycling/

Model RecyclingHomeModel-recycling - the best model per architecture. Comparing finetuned models from HF, as base models for future finetuning.

Finetuned models are known to sometimes be better than the pretrained model.

We have found that
🟪 while 5\6 finetuned models are not good
🟩 strong models outperform the pretrained model consistently and often

(Fig. Gains of the model over the pretrained, each row is a model)

Yet in practice, this knowledge is rarely applied. We tested T5, RoBERTa and BERT finetuned models (ours and @huggingface) and the best are... THE BEST

Hence, we decided to share it with the world, so people would test it for themselves:

Reposting it from the birdsite, but we are already working on improvements and will share updates soon. Please keep in touch if you have any ideas questions and comments.

Paper: arxiv.org/abs/2211.00107

Related work:
sigmoid.social/@LChoshen/10929

arXiv.orgWhere to start? Analyzing the potential value of intermediate modelsPrevious studies observed that finetuned models may be better base models than the vanilla pretrained model. Such a model, finetuned on some source dataset, may provide a better starting point for a new finetuning process on a desired target dataset. Here, we perform a systematic analysis of this \emph{intertraining} scheme, over a wide range of English classification tasks. Surprisingly, our analysis suggests that the potential intertraining gain can be analyzed \emph{independently} for the target dataset under consideration, and for a base model being considered as a starting point. This is in contrast to current perception that the alignment between the target dataset and the source dataset used to generate the base model is a major factor in determining intertraining success. We analyze different aspects that contribute to each. Furthermore, we leverage our analysis to propose a practical and efficient approach to determine if and how to select a base model in real-world settings. Last, we release an updating ranking of best models in the HuggingFace hub per architecture https://ibm.github.io/model-recycling/.
Leshem Choshen

We test models from @huggingface hub
rank them efficiently (linear probing on one task)

The best ones, we finetune on 36 different datasets and share with you here:
ibm.github.io/model-recycling/

Next time you finetune, just pick the best one

Why use a worse model?


Fixed link...

Model RecyclingHomeModel-recycling - the best model per architecture. Comparing finetuned models from HF, as base models for future finetuning.