Feature selection (or model selection in more general terms) is a critical — and perhaps one of the most opaque — components of the predictive workflow.1 Burnham and Anderson (2004) refer to the problem using the familiar language of a bias-variance trade-off: on the one hand, a more parsimonious model has fewer parameters and hence reduces the risk of overfitting, on the other hand, more features increase the amount of information incorporated into the fitting process. How to select the appropriate features remains a matter of some debate, with an almost unmanageable host of different algorithms to navigate.
In this analysis, I throw the proverbial kitchen sink at a macroeconomic feature selection problem. Correlation filtering all the way to Bayesian model averaging, lasso regression to random forest importance, genetic algorithms to Laplacian scores. The aim is to explore relationships and (dis)agreements among a multidisciplinary array of feature selection algorithms (23 in total) from several of what Molnar (2022) calls “modeling mindsets”, and to examine comparative robustness, breadth and out-of-sample relevance of the selected information.
As it turns out, there are a few things to learn — particularly in the way algorithms are naturally partitioned into 4 distinct clusters. In the next section, I outline key findings.