An Empirical Map of Feature Selection Algorithms

Feature selection (or model selection in more general terms) is a critical — and perhaps one of the most opaque — components of the predictive workflow.1 Burnham and Anderson (2004) refer to the problem using the familiar language of a bias-variance trade-off: on the one hand, a more parsimonious model has fewer parameters and hence reduces the risk of overfitting, on the other hand, more features increase the amount of information incorporated into the fitting process. How to select the appropriate features remains a matter of some debate, with an almost unmanageable host of different algorithms to navigate.

In this analysis, I throw the proverbial kitchen sink at a macroeconomic feature selection problem. Correlation filtering all the way to Bayesian model averaging, lasso regression to random forest importance, genetic algorithms to Laplacian scores. The aim is to explore relationships and (dis)agreements among a multidisciplinary array of feature selection algorithms (23 in total) from several of what Molnar (2022) calls “modeling mindsets”, and to examine comparative robustness, breadth and out-of-sample relevance of the selected information.

As it turns out, there are a few things to learn — particularly in the way algorithms are naturally partitioned into 4 distinct clusters. In the next section, I outline key findings.

Inference in Neural Networks using an Explainable Parameter Encoder Network

A Parameter Encoder Neural Network (PENN) (Pfitzinger 2021) is an explainable machine learning technique that solves two problems associated with traditional XAI algorithms:

  1. It permits the calculation of local parameter distributions. Parameter distributions are often more interesting than feature contributions — particularly in economic and financial applications — since the parameters disentangle the effect from the observation (the contribution can roughly be defined as the demeaned product of effect and observation).
  2. It solves a problem of biased contributions that is inherent to many traditional XAI algorithms. Particularly in the setting where neural networks are powerful — in interactive, dependent processes — traditional XAI can be biased, by attributing effect to each feature independently.

At the end of the tutorial, I will have estimated the following highly nonlinear parameter functions for a simulated regression with three variables:

A Github version of the code can be found here.

tidyfit: Benchmarking regularized regression methods

This workflow demonstrates how tidyfit can be used to easily compare a large number of regularized regression methods in R. Using the Boston house prices data set, the analysis shows how Bayesian methods strongly outperform most alternatives:

tidyfit: Extending the tidyverse with AutoML

tidyfit is an R-package that facilitates and automates linear regression and classification modeling in a tidy environment. The package includes several methods, such as Lasso, PLS and ElasticNet regressions, and can be augmented with custom methods. tidyfit builds on the tidymodels suite, but emphasizes automated modeling with a focus on the linear regression and classification coefficients, which are the primary output of tidyfit.

hfr: An R-Package for Hierarchical Regression Shrinkage

hfr is an R package that implements a novel graph-based regularized regression estimator: the Hierarchical Feature Regression (HFR). The method mobilizes insights from the domains of machine learning and graph theory to estimate robust parameters for a linear regression, constructing a supervised feature graph that decomposes parameters along its edges. The graph adjusts first for common variation and successively incorporates idiosyncratic patterns into the fitting process.

The result is group shrinkage of the parameters, where the extent of shrinkage is governed by a hyperparameter kappa that represents the size of the feature graph. At kappa = 1 the regression is unregularized resulting in OLS parameters. At kappa < 1 the graph is shrunken, reducing the effective model size and regularizing the regression.