How to handle missing values in practice ?
Missing values occur almost inevitably in most domains that handle data. Hence most data sets contain missing values, some of them might be informative. Now the question is How to handle these missing values if we want to do estimation, inference, prediction, etc.? There doesn't exist a unique and standard answer to this question since the analyses depend on different aspects of the data and several questions have to be considered before choosing a way to deal with the missing values:
- What do we know about the mechanism behind the missing values?
- Do the missing values contain information?
- What happens if we ignore the missing values (i.e. if we drop the incomplete observations or if we ignore the missingness mechanism)?
In the different sections of this website you will find numerous resources - tutorials, publications, etc. - that cover the majority of existing methods to handle missing values in various contexts. For a straightforward application of some the most common methods we propose several workflows in form of R markdowns. With these workflows we aim at providing a direct implementation of these methods and a template for a direct reuse on other data sets. In fact, we believe that the use of a common template, for instance for generating missing values, allows for better and transparent comparability between different methods and for simple replicability of experimental results.
How to …
- … generate missing values? (PDF and related R source code)
- … estimate with missing values? (PDF)
- … impute missing values? (PDF)