A Note on Knowledge Discovery and Machine Learning in Digital Soil Mapping


In digital soil mapping, machine learning (ML) techniques are being used to infer a relationship between a soil property and the covariates. The information derived from this process is often translated into pedological knowledge. This mechanism is referred to as knowledge discovery. This study shows that knowledge discovery based on ML must be treated with caution. We show how pseudo‐covariates can be used to accurately predict soil organic carbon in a hypothetical case study. We demonstrate that ML methods can find relevant patterns even when the covariates are meaningless and not related to soil‐forming factors and processes. We argue that pattern recognition for prediction should not be equated with knowledge discovery. Knowledge discovery requires more than the recognition of patterns and successful prediction. It requires the pre‐selection and preprocessing of pedologically relevant environmental covariates and the posterior interpretation and evaluation of the recognized patterns. We argue that important ML covariates could serve the purpose of providing elements to postulate hypotheses about soil processes that, once validated through experiments, could result in new pedological knowledge.

Alessandro Samuel-Rosa
Alessandro Samuel-Rosa
Professor Adjunto