@jesper I've thought a lot about random undersampling and my intuition is that we should preserve the Zipf / Pareto distribution of classes but simply make it less pronounced. I think it should be possible to do by choosing a line with lower intersect and less slope than the natural fit in loglog space. Then finding the ratio in real space and using that as a probability to filter a data point.