# Stacked Generalization of Statistical Learners – A Case Study with Soil Iron Content in Brazil

### Resumo

When modeling soil-landscape relationships we generally test a handful of statistical learners. Having limited data, we use cross-validation to select the best performing learner. In this study we evaluate the benefits of combining learners for soil prediction using stacked generalization. It consists of calibrating multiple learners and submitting them to 10-fold cross-validation. Cross-validation predictions are used as covariates in an interceptless linear regression of the target variable. Constrained to be non-negative, the estimated regression coefficients are the stacking weights expressing the importance of each learner. When making predictions, each learner is used in turn and the weights used to optimally combine multiple predictions into a single prediction. The data was downloaded from the national database maintained by Embrapa. The target variable was the soil iron content (g kg^-1^). Covariates (p = 7) were constructed using soil profile data. The n = 22 981 records remaining after some data cleaning were split into calibration (n~cal~ = 16 086) and validation (n~val~ = 6895) sets. Six learners were used: linear regression with stepwise selection (lm), multivariate adaptive regression splines (mars), regression random forest (rf), single-hidden-layer neural network (nnet), weighted k-nearest neighbor regression (knn) and support vector machine with polynomial kernel (svm). rf and knn severely over fitted the data, while lm, mars and svm were the most stable learners. The first two yielded the lowest absolute and squared errors (RMSE < 45 g kg^-1^) and explained more of the variance (AVE ~ 0.6). mars, nnet and lm were the least biased learners (ME ~ -0.1 g kg^-1^), while svm was the most biased (ME = -5.14 g kg^-1^). lm explained the smallest amount of variance (AVE = 0.49). rf received the largest stacking weight (w = 0.55), knn and svm received moderate weights (w ~ 0.2) and nnet and mars received the smallest weights (w < 0.1) – lm was dropped from the stack (w = 0). Combining learners lowered all absolute and squared errors (RMSE = 43.23 g kg^-1^), yielded a considerably small bias (ME = 0.53 g kg^-1^), and explained the same amount of variance explained by rf (AVE = 0.61). Staking learners was more beneficial than using the single best performing learner because it reduced generalization errors. The magnitude of the benefits seems to depend upon the diversity of learners (over and under fitting, biased and nonbiased). Besides, by using least squares regression to compute stacking weights we can estimate the prediction error variance of any combination of learners.

Data
2017-06-29 15:15 — 15:30
Local
Hotel Hof van Wageningen
Lawickse Allee, 9, Wageningen, GE 6701