sigmoid.social is one of the many independent Mastodon servers you can use to participate in the fediverse.
A social space for people researching, working with, or just interested in AI!

Server stats:

597
active users

#ngram

0 posts0 participants0 posts today
Karsten Schmidt<p>Recently I've combined various functions which I've been using in other projects (e.g. my personal PKM toolchain) and published them as new library <a href="https://thi.ng/text-analysis" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://</span><span class="">thi.ng/text-analysis</span><span class="invisible"></span></a> for better re-use:</p><p>- customizable, composable &amp; extensible tokenization (transducer based)<br>- ngram generation<br>- Porter-stemming &amp; stopword removal<br>- vocabulary (bi-directional index) creation<br>- dense &amp; sparse multi-hot vector encoding/decoding<br>- histograms (incl. sorted versions)<br>- tf-idf (term frequency &amp; inverse document frequency), multiple strategies<br>- k-means clustering (with k-means++ initialization &amp; customizable distance metrics)<br>- similarity/distance functions (dense &amp; sparse versions)<br>- central terms extraction</p><p>The attached code example (also in the project readme) uses this package to creeate a clustering of all ~210 <a href="https://mastodon.thi.ng/tags/ThingUmbrella" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>ThingUmbrella</span></a> packages, based on their assigned tags/keywords...</p><p>The library is not intended to be a full-blown NLP solution, but I keep on finding myself running into these functions/concepts quite often, and maybe you'll find them useful too...</p><p><a href="https://mastodon.thi.ng/tags/Text" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Text</span></a> <a href="https://mastodon.thi.ng/tags/Analysis" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Analysis</span></a> <a href="https://mastodon.thi.ng/tags/Cluster" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Cluster</span></a> <a href="https://mastodon.thi.ng/tags/KMeans" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>KMeans</span></a> <a href="https://mastodon.thi.ng/tags/TFIDF" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>TFIDF</span></a> <a href="https://mastodon.thi.ng/tags/Ngram" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Ngram</span></a> <a href="https://mastodon.thi.ng/tags/Vector" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Vector</span></a> <a href="https://mastodon.thi.ng/tags/TypeScript" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>TypeScript</span></a> <a href="https://mastodon.thi.ng/tags/JavaScript" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>JavaScript</span></a></p>
petersuber<p>Fellow finicky writers: Do you prefer "advance notice" or "advanced notice"?</p><p>Both are attested. But FYI, <a href="https://fediscience.org/tags/ngram" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>ngram</span></a> says that "advance notice" is much more common, even if it's in decline.<br><a href="https://books.google.com/ngrams/graph?content=advance+notice%2C+advanced+notice&amp;year_start=1800&amp;year_end=2022&amp;corpus=en&amp;smoothing=3" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://</span><span class="ellipsis">books.google.com/ngrams/graph?</span><span class="invisible">content=advance+notice%2C+advanced+notice&amp;year_start=1800&amp;year_end=2022&amp;corpus=en&amp;smoothing=3</span></a></p>
François Renaville 🇺🇦🇪🇺<p><a href="https://mastodon.social/tags/Google" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Google</span></a> Books Is Indexing <a href="https://mastodon.social/tags/AI" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>AI</span></a>-Generated Books</p><p>👉 <a href="https://mastodon.social/tags/GoogleBooks" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>GoogleBooks</span></a> is indexing low quality, AI-generated books that will turn up in search results, and could possibly impact Google <a href="https://mastodon.social/tags/Ngram" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Ngram</span></a> viewer, an important tool used by researchers to track <a href="https://mastodon.social/tags/language" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>language</span></a> use throughout history.&nbsp;</p><p><a href="https://timesofindia.indiatimes.com/technology/tech-news/google-books-important-source-for-academics-may-have-a-bot-problem/articleshow/109089043.cms" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://</span><span class="ellipsis">timesofindia.indiatimes.com/te</span><span class="invisible">chnology/tech-news/google-books-important-source-for-academics-may-have-a-bot-problem/articleshow/109089043.cms</span></a> </p><p><a href="https://mastodon.social/tags/GoogleNgram" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>GoogleNgram</span></a> <a href="https://mastodon.social/tags/NgramViewer" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>NgramViewer</span></a> <a href="https://mastodon.social/tags/linguistics" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>linguistics</span></a> <a href="https://mastodon.social/tags/diachrony" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>diachrony</span></a> <a href="https://mastodon.social/tags/diachroniclinguistics" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>diachroniclinguistics</span></a> <a href="https://mastodon.social/tags/research" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>research</span></a> <a href="https://mastodon.social/tags/languages" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>languages</span></a> <a href="https://mastodon.social/tags/aigeneratedcontent" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>aigeneratedcontent</span></a> <a href="https://mastodon.social/tags/AIgeneratedBooks" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>AIgeneratedBooks</span></a></p>
Harald Sack<p>One of the basic questions we tackle when working towards statistical language models is &quot;Can we predict a word?&quot;<br />This was also one of the intro questions to the students last Wednesday in our <a href="https://sigmoid.social/tags/ise2023" class="mention hashtag" rel="tag">#<span>ise2023</span></a> lecture no.4, when we were introducing simple n-gram language models.</p><p><a href="https://sigmoid.social/tags/nlp" class="mention hashtag" rel="tag">#<span>nlp</span></a> <a href="https://sigmoid.social/tags/lecture" class="mention hashtag" rel="tag">#<span>lecture</span></a> <a href="https://sigmoid.social/tags/ngram" class="mention hashtag" rel="tag">#<span>ngram</span></a> <a href="https://sigmoid.social/tags/languagemodels" class="mention hashtag" rel="tag">#<span>languagemodels</span></a> <a href="https://sigmoid.social/tags/language" class="mention hashtag" rel="tag">#<span>language</span></a> <a href="https://sigmoid.social/tags/aiart" class="mention hashtag" rel="tag">#<span>aiart</span></a> <a href="https://sigmoid.social/tags/stablediffusion" class="mention hashtag" rel="tag">#<span>stablediffusion</span></a> <a href="https://sigmoid.social/tags/creativeAI" class="mention hashtag" rel="tag">#<span>creativeAI</span></a> <span class="h-card" translate="no"><a href="https://sigmoid.social/@fizise" class="u-url mention">@<span>fizise</span></a></span> <span class="h-card" translate="no"><a href="https://mastodon.social/@KIT_Karlsruhe" class="u-url mention">@<span>KIT_Karlsruhe</span></a></span> <span class="h-card" translate="no"><a href="https://troet.cafe/@nfdi4ds" class="u-url mention">@<span>nfdi4ds</span></a></span> <span class="h-card" translate="no"><a href="https://nfdi.social/@nfdi4culture" class="u-url mention">@<span>nfdi4culture</span></a></span></p>