sigmoid.social is one of the many independent Mastodon servers you can use to participate in the fediverse.
A social space for people researching, working with, or just interested in AI!

Server stats:

605
active users

#Embedding

1 post1 participant0 posts today

I’m excited to share my newest blog post, "Don't sure cosine similarity carelessly"

p.migdal.pl/blog/2025/01/dont-

We often rely on cosine similarity to compare embeddings—it's like “duct tape” for vector comparisons. But just like duct tape, it can quietly mask deeper problems. Sometimes, embeddings pick up a “wrong kind” of similarity, matching questions to questions instead of questions to answers or getting thrown off by formatting quirks and typos rather than the text's real meaning.

In my post, I discuss what can go wrong with off-the-shelf cosine similarity and share practical alternatives. If you’ve ever wondered why your retrieval system returns oddly matched items or how to refine your embeddings for more meaningful results, this is for you!
`
I want to thank Max Salamonowicz and Grzegorz Kossakowski for their feedback after my flash talk at the Warsaw AI Breakfast, Rafał Małanij for inviting me to give a talk at the Python Summit, and for all the curious questions at the conference, and LinkedIn.

p.migdal.plDon't use cosine similarity carelesslyCosine similarity - the duct tape of AI. Convenient but often misused. Let's find out how to use it better.

🆕 Encoder only model that's a direct drop-in replacement for existing BERT models
- First major upgrade to BERT-style models in six years
- Significantly reduced processing costs for large-scale applications
- Enables longer document processing without chunking
- Better performance in retrieval tasks
- Suitable for consumer-grade GPU deployment
#llm #ai #embedding
huggingface.co/blog/modernbert

huggingface.coFinally, a Replacement for BERT: Introducing ModernBERTWe’re on a journey to advance and democratize artificial intelligence through open source and open science.