sigmoid.social is one of the many independent Mastodon servers you can use to participate in the fediverse.
A social space for people researching, working with, or just interested in AI!

Server stats:

656
active users

Max Vladymyrov 🇺🇦

ML algorithms need lots of data and are prone to catastrophic forgetting. We present a new method for continual few-shot learning, bringing us closer to the way humans learn: sample efficient, while maintaining long-term retention.
📜arxiv.org/abs/2301.04584

🧵 below:

arXiv.orgContinual HyperTransformer: A Meta-Learner for Continual Few-Shot LearningWe focus on the problem of learning without forgetting from multiple tasks arriving sequentially, where each task is defined using a few-shot episode of novel or already seen classes. We approach this problem using the recently published HyperTransformer (HT), a Transformer-based hypernetwork that generates specialized task-specific CNN weights directly from the support set. In order to learn from a continual sequence of tasks, we propose to recursively re-use the generated weights as input to the HT for the next task. This way, the generated CNN weights themselves act as a representation of previously learned tasks, and the HT is trained to update these weights so that the new task can be learned without forgetting past tasks. This approach is different from most continual learning algorithms that typically rely on using replay buffers, weight regularization or task-dependent architectural changes. We demonstrate that our proposed Continual HyperTransformer method equipped with a prototypical loss is capable of learning and retaining knowledge about past tasks for a variety of scenarios, including learning from mini-batches, and task-incremental and class-incremental learning scenarios.
#AI#CV#NewPaper

Consider a problem of learning from a sequence of T tasks, each described using only a few labeled samples, such that you don’t forget about previous tasks while learning new ones.

To solve this problem, we used the recently proposed HyperTransformer (HT, arxiv.org/abs/2201.04182), a Transformer-based hypernetwork that generates CNN weights directly from the few-shot task description.

The idea that we propose is simple: we want to recursively reuse the generated weights for the previously learned task as input to the HT for the next task. By doing this, the CNN weights themselves act as a representation of previously learned tasks.

Continual HyperTransformer is trained to update these weights in a way that enables the new task to be learned w/o forgetting past tasks. Unlike other continual learning methods, we do not rely on replay buffers, special regularization, or task-dependent architectural changes.

We have also replaced the fixed-dimensional cross-entropy loss with a more flexible Prototypical loss. This allows us to project an increasing number of classes from different tasks to the same embedding space without changing the architecture of the generated CNN.

We tested the model on three different use cases. First, when classes do not change between the tasks (i.e. tasks = mini-batch), we showed that running a 5-shot problem continually (i.e. running 5 tasks one example at a time) is comparable to 5-shot problem run as a single task!

Second, when classes for all the tasks come from the same distribution, we observed positive backwards transfer, where the accuracy on past tasks improves after learning a subsequent task.

Finally, our model can also learn when each task has its own semantic meaning and comes from a separate distribution. It performs much better than the baseline that trains a single embedding for all the tasks.

This work is done with my wonderful colleagues Andrey Zhmoginov and Mark Sandler from Google Research. Please check out the paper at arxiv.org/abs/2301.04584 and don’t hesitate to reach out if you have any questions.

arXiv.orgContinual HyperTransformer: A Meta-Learner for Continual Few-Shot LearningWe focus on the problem of learning without forgetting from multiple tasks arriving sequentially, where each task is defined using a few-shot episode of novel or already seen classes. We approach this problem using the recently published HyperTransformer (HT), a Transformer-based hypernetwork that generates specialized task-specific CNN weights directly from the support set. In order to learn from a continual sequence of tasks, we propose to recursively re-use the generated weights as input to the HT for the next task. This way, the generated CNN weights themselves act as a representation of previously learned tasks, and the HT is trained to update these weights so that the new task can be learned without forgetting past tasks. This approach is different from most continual learning algorithms that typically rely on using replay buffers, weight regularization or task-dependent architectural changes. We demonstrate that our proposed Continual HyperTransformer method equipped with a prototypical loss is capable of learning and retaining knowledge about past tasks for a variety of scenarios, including learning from mini-batches, and task-incremental and class-incremental learning scenarios.