michabbb<p>🔍 Major breakthrough in multimodal AI research:</p><p><a href="https://social.vivaldi.net/tags/InfinityMM" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>InfinityMM</span></a> dataset launches with 43.4M entries across 4 categories: 10M image descriptions, 24.4M visual instructions, 6M high-quality instructions & 3M <a href="https://social.vivaldi.net/tags/AI" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>AI</span></a> generated data</p><p>🧠 Technical highlights:</p><p>New <a href="https://social.vivaldi.net/tags/AquilaVL2B" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>AquilaVL2B</span></a> model uses <a href="https://social.vivaldi.net/tags/LLaVA" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>LLaVA</span></a> architecture with <a href="https://social.vivaldi.net/tags/Qwen25" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Qwen25</span></a> language model & <a href="https://social.vivaldi.net/tags/SigLIP" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>SigLIP</span></a> for image processing<br>Despite only 2B parameters, achieves state-of-the-art results in multiple benchmarks<br>Exceptional performance: <a href="https://social.vivaldi.net/tags/MMStar" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>MMStar</span></a> (54.9%), <a href="https://social.vivaldi.net/tags/MathVista" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>MathVista</span></a> (59%), <a href="https://social.vivaldi.net/tags/MMBench" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>MMBench</span></a> (75.2%)</p><p>🚀 Training innovation:</p><p>4-stage training process with increasing complexity<br>Combines image recognition, instruction classification & response generation<br>Uses <a href="https://social.vivaldi.net/tags/opensource" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>opensource</span></a> models like RAM++ for data generation</p><p>💡 Industry impact:</p><p>Model trained on both <a href="https://social.vivaldi.net/tags/Nvidia" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Nvidia</span></a> A100 GPUs & Chinese chips<br>Complete dataset & model available to research community<br>Shows promising results compared to commercial systems like <a href="https://social.vivaldi.net/tags/GPT4V" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>GPT4V</span></a></p><p><a href="https://arxiv.org/abs/2410.18558" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://</span><span class="">arxiv.org/abs/2410.18558</span><span class="invisible"></span></a></p>