The Gradient @thegradient

Recent searches

Search options

Only available when logged in.

I enjoyed this paper, "The Problem with Metrics is a Fundamental Problem for AI". In ML, often we glamorize algorithmic or model improvements and ignore the importance of what is around that such as the data used, metrics for success, or a good UI for the humans trying to get something done with our models. It is important that ML systems optimize for the right thing, and they often do not because metrics for success are picked too sloppily and hastily.
https://arxiv.org/abs/2002.08512

arXiv.orgThe Problem with Metrics is a Fundamental Problem for AIOptimizing a given metric is a central aspect of most current AI approaches, yet overemphasizing metrics leads to manipulation, gaming, a myopic focus on short-term goals, and other unexpected negative consequences. This poses a fundamental contradiction for AI development. Through a series of real-world case studies, we look at various aspects of where metrics go wrong in practice and aspects of how our online environment and current business practices are exacerbating these failures. Finally, we propose a framework towards mitigating the harms caused by overemphasis of metrics within AI by: (1) using a slate of metrics to get a fuller and more nuanced picture, (2) combining metrics with qualitative accounts, and (3) involving a range of stakeholders, including those who will be most impacted.

Nov 12, 2022, 11:55 PM··Web

49boosts·63favorites

**Chunyuan** @Chunyuan · Nov 13, 2022

Nov 13, 2022

Chunyuan @Chunyuan

@glinden it's an interesting perspective from econ to relate Goodhart's Law to the machine learning community. and for researchers we need to care more about the true impact of the tasks instead of the metric-based optimization itself.

**Lucas Beyer** @lb · Nov 13, 2022

Nov 13, 2022

Lucas Beyer @lb

@glinden this, plus I firmly believe taking val/test iid from train is a bad idea nowadays. Val/test need to be annotated *much* more carefully than train, because models now are too damn good - better than an average annotator.

I came to this conclusion after our deep dive into the ImageNet benchmark in https://arxiv.org/abs/2006.07159 and now usually look for this when looking at new data - and so far, am almost always disappointed.

arXiv.orgAre we done with ImageNet?Yes, and no. We ask whether recent progress on the ImageNet classification benchmark continues to represent meaningful generalization, or whether the community has started to overfit to the idiosyncrasies of its labeling procedure. We therefore develop a significantly more robust procedure for collecting human annotations of the ImageNet validation set. Using these new labels, we reassess the accuracy of recently proposed ImageNet classifiers, and find their gains to be substantially smaller than those reported on the original labels. Furthermore, we find the original ImageNet labels to no longer be the best predictors of this independently-collected set, indicating that their usefulness in evaluating vision models may be nearing an end. Nevertheless, we find our annotation procedure to have largely remedied the errors in the original labels, reinforcing ImageNet as a powerful benchmark for future research in visual recognition.

Drag & drop to upload

Recent searches

Search options

Administered by:

Server stats:

Recent searches

Search options

Administered by:

Server stats:

Back