When Dimensionality Hurts: The Role of #LLM Embedding Compression for Noisy Regression Tasks https://d.repec.org/n?u=RePEc:arx:papers:2502.02199&r=&r=cmp
"… suggest that the optimal dimensionality is dependent on the signal-to-noise ratio, exposing the necessity of feature compression in high noise environments. The implication of the result is that researchers should consider the #noise of a task when making decisions about the dimensionality of text.
… findings indicate that sentiment and emotion-based representations do not provide inherent advantages over learned latent features, implying that their previous success in similar tasks may be attributed to #regularisation effects rather than intrinsic informativeness."
#ML #autoencoders #Overfitting