The Gradient @thegradient

Matthias Samwald @matthiassamwald@sigmoid.social

We curated and analysed thousands of benchmarks -- to better understand the (mis)measurement of AI!

We cover all of #NLProc and #ComputerVision.

Now live at Nature Communications: https://nature.com/articles/s41467-022-34591-0

Nov 14, 2022, 09:18 AM··Web

16boosts·23favorites

**Matthias Samwald** @matthiassamwald · Nov 14, 2022

Nov 14, 2022

Matthias Samwald @matthiassamwald

Benchmarks are crucial to measuring and steering AI progress.

Their number has become astounding.

Each has unique patterns of activity, improvement and eventual stagnation/saturation. Together they form the intricate story of global progress in AI.

We found a sizable portion of benchmarks have kind of reached saturation ("can't get better than this") or stagnation ("could get better, but we don't know how / nobody tries"). But still a lot of dynamic benchmarks as well!

**Matthias Samwald** @matthiassamwald · Nov 14, 2022

Nov 14, 2022

Matthias Samwald @matthiassamwald

How does benchmark activity and improvement develop over time and different domains? We mapped all data into an #RDF #KnowledgeGraph / ontology and devised novel, highly condensed visualisation methods.

**Matthias Samwald** @matthiassamwald · Nov 14, 2022

Nov 14, 2022

Matthias Samwald @matthiassamwald

Most benchmark datasets are unpopular.

Traits correlated with popularity:

- versatile (cover more tasks, have more sub-benchmarks)
- have a dedicated leaderboard
- be created by people from top-institutions

**Matthias Samwald** @matthiassamwald · Nov 14, 2022

Nov 14, 2022

Matthias Samwald @matthiassamwald

The biggest obstacle and limitation for our work is data availability.

This analysis was only possible by using data from the fabulous Papers with Code project... (shoutout to @rstojnic )

As a community, we should incentivize depositing results in Papers with Code more! Lots of potential added value.

Drag & drop to upload

Recent searches

Search options

Administered by:

Server stats:

Recent searches

Search options

Administered by:

Server stats:

Back