sigmoid.social is one of the many independent Mastodon servers you can use to participate in the fediverse.
A social space for people researching, working with, or just interested in AI!

Server stats:

605
active users

#mathvista

0 posts0 participants0 posts today
michabbb<p>🔍 Major breakthrough in multimodal AI research:</p><p><a href="https://social.vivaldi.net/tags/InfinityMM" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>InfinityMM</span></a> dataset launches with 43.4M entries across 4 categories: 10M image descriptions, 24.4M visual instructions, 6M high-quality instructions &amp; 3M <a href="https://social.vivaldi.net/tags/AI" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>AI</span></a> generated data</p><p>🧠 Technical highlights:</p><p>New <a href="https://social.vivaldi.net/tags/AquilaVL2B" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>AquilaVL2B</span></a> model uses <a href="https://social.vivaldi.net/tags/LLaVA" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>LLaVA</span></a> architecture with <a href="https://social.vivaldi.net/tags/Qwen25" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Qwen25</span></a> language model &amp; <a href="https://social.vivaldi.net/tags/SigLIP" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>SigLIP</span></a> for image processing<br>Despite only 2B parameters, achieves state-of-the-art results in multiple benchmarks<br>Exceptional performance: <a href="https://social.vivaldi.net/tags/MMStar" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>MMStar</span></a> (54.9%), <a href="https://social.vivaldi.net/tags/MathVista" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>MathVista</span></a> (59%), <a href="https://social.vivaldi.net/tags/MMBench" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>MMBench</span></a> (75.2%)</p><p>🚀 Training innovation:</p><p>4-stage training process with increasing complexity<br>Combines image recognition, instruction classification &amp; response generation<br>Uses <a href="https://social.vivaldi.net/tags/opensource" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>opensource</span></a> models like RAM++ for data generation</p><p>💡 Industry impact:</p><p>Model trained on both <a href="https://social.vivaldi.net/tags/Nvidia" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Nvidia</span></a> A100 GPUs &amp; Chinese chips<br>Complete dataset &amp; model available to research community<br>Shows promising results compared to commercial systems like <a href="https://social.vivaldi.net/tags/GPT4V" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>GPT4V</span></a></p><p><a href="https://arxiv.org/abs/2410.18558" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://</span><span class="">arxiv.org/abs/2410.18558</span><span class="invisible"></span></a></p>
michabbb<p><a href="https://social.vivaldi.net/tags/TechNews" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>TechNews</span></a>: <a href="https://social.vivaldi.net/tags/Qwen" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Qwen</span></a> Releases New <a href="https://social.vivaldi.net/tags/VisionLanguage" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>VisionLanguage</span></a> <a href="https://social.vivaldi.net/tags/LLM" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>LLM</span></a> Qwen2-VL 🖥️👁️</p><p>After a year of development, <a href="https://social.vivaldi.net/tags/Qwen" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Qwen</span></a> has released Qwen2-VL, its latest <a href="https://social.vivaldi.net/tags/AI" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>AI</span></a> system for interpreting visual and textual information. 🚀</p><p>Key Features of Qwen2-VL:</p><p>1. 🖼️ Image Understanding:</p><p> Qwen2-VL shows performance on <a href="https://social.vivaldi.net/tags/VisualUnderstanding" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>VisualUnderstanding</span></a> benchmarks including <a href="https://social.vivaldi.net/tags/MathVista" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>MathVista</span></a>, <a href="https://social.vivaldi.net/tags/DocVQA" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>DocVQA</span></a>, <a href="https://social.vivaldi.net/tags/RealWorldQA" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>RealWorldQA</span></a>, and <a href="https://social.vivaldi.net/tags/MTVQA" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>MTVQA</span></a>. </p><p>2. 🎬 Video Analysis:</p><p> Qwen2-VL can analyze videos over 20 minutes in length. This is achieved through online streaming capabilities, allowing for video-based <a href="https://social.vivaldi.net/tags/QuestionAnswering" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>QuestionAnswering</span></a>, <a href="https://social.vivaldi.net/tags/Dialog" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Dialog</span></a>, and <a href="https://social.vivaldi.net/tags/ContentCreation" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>ContentCreation</span></a>. <a href="https://social.vivaldi.net/tags/VideoAnalysis" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>VideoAnalysis</span></a></p><p>3. 🤖 Device Integration:</p><p> The <a href="https://social.vivaldi.net/tags/AI" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>AI</span></a> can be integrated with <a href="https://social.vivaldi.net/tags/mobile" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>mobile</span></a> phones, <a href="https://social.vivaldi.net/tags/robots" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>robots</span></a>, and other devices. It uses reasoning and decision-making abilities to interpret visual environments and text instructions for device control. <a href="https://social.vivaldi.net/tags/AIAssistants" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>AIAssistants</span></a> 📱</p><p>4. 🌍 Multilingual Capabilities:</p><p> Qwen2-VL understands text in images across multiple languages. It supports most European languages, Japanese, Korean, Arabic, Vietnamese, among others, in addition to English and Chinese. <a href="https://social.vivaldi.net/tags/MultilingualAI" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>MultilingualAI</span></a></p><p>This release represents an advancement in <a href="https://social.vivaldi.net/tags/ArtificialIntelligence" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>ArtificialIntelligence</span></a>, combining visual perception and language understanding. 🧠 Potential applications include <a href="https://social.vivaldi.net/tags/education" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>education</span></a>, <a href="https://social.vivaldi.net/tags/healthcare" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>healthcare</span></a>, <a href="https://social.vivaldi.net/tags/robotics" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>robotics</span></a>, and <a href="https://social.vivaldi.net/tags/contentmoderation" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>contentmoderation</span></a>.</p><p><a href="https://github.com/QwenLM/Qwen2-VL" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://</span><span class="">github.com/QwenLM/Qwen2-VL</span><span class="invisible"></span></a></p>