VoLTA: Vision-Language Transformer with Weakly-Supervised Local-Feature Alignment
https://openreview.net/forum?id=Kt2VJrCKo4
Mastodon is the best way to keep up with what's happening.
Follow anyone across the fediverse and see it all in chronological order. No algorithms, ads, or clickbait in sight.