sigmoid.social is one of the many independent Mastodon servers you can use to participate in the fediverse.
A social space for people researching, working with, or just interested in AI!

Server stats:

597
active users

Matthias Samwald

We curated and analysed thousands of benchmarks -- to better understand the (mis)measurement of AI! 📏🤖🔬

We cover all of and .

Now live at Nature Communications: nature.com/articles/s41467-022

Benchmarks are crucial to measuring and steering AI progress.

Their number has become astounding.

Each has unique patterns of activity, improvement and eventual stagnation/saturation. Together they form the intricate story of global progress in AI. 🌐

We found a sizable portion of benchmarks have kind of reached saturation ("can't get better than this") or stagnation ("could get better, but we don't know how / nobody tries"). But still a lot of dynamic benchmarks as well!

How does benchmark activity and improvement develop over time and different domains? We mapped all data into an / ontology and devised novel, highly condensed visualisation methods.

Most benchmark datasets are unpopular. 🥲

Traits correlated with popularity:

- versatile (cover more tasks, have more sub-benchmarks)
- have a dedicated leaderboard
- be created by people from top-institutions

The biggest obstacle and limitation for our work is data availability.

This analysis was only possible by using data from the fabulous Papers with Code project... (shoutout to @rstojnic )

As a community, we should incentivize depositing results in Papers with Code more! Lots of potential added value.