I enjoyed this paper, "The Problem with Metrics is a Fundamental Problem for AI". In ML, often we glamorize algorithmic or model improvements and ignore the importance of what is around that such as the data used, metrics for success, or a good UI for the humans trying to get something done with our models. It is important that ML systems optimize for the right thing, and they often do not because metrics for success are picked too sloppily and hastily.
https://arxiv.org/abs/2002.08512
@glinden it's an interesting perspective from econ to relate Goodhart's Law to the machine learning community. and for researchers we need to care more about the true impact of the tasks instead of the metric-based optimization itself.
@glinden this, plus I firmly believe taking val/test iid from train is a bad idea nowadays. Val/test need to be annotated *much* more carefully than train, because models now are too damn good - better than an average annotator.
I came to this conclusion after our deep dive into the ImageNet benchmark in https://arxiv.org/abs/2006.07159 and now usually look for this when looking at new data - and so far, am almost always disappointed.