A resource website on missing values - Methods and references for managing missing data

How to handle missing values in practice ?

Missing values occur almost inevitably in most domains that handle data. Hence most data sets contain missing values, some of them might be informative. Now the question is How to handle these missing values if we want to do estimation, inference, prediction, etc.? There doesn't exist a unique and standard answer to this question since the analyses depend on different aspects of the data and several questions have to be considered before choosing a way to deal with the missing values:

  • What do we know about the mechanism behind the missing values?
  • Do the missing values contain information?
  • What happens if we ignore the missing values (i.e. if we drop the incomplete observations or if we ignore the missingness mechanism)?

In the different sections of this website you will find numerous resources - tutorials, publications, etc. - that cover the majority of existing methods to handle missing values in various contexts. For a straightforward application of some the most common methods we propose several workflows in form of R markdowns. With these workflows we aim at providing a direct implementation of these methods and a template for a direct reuse on other data sets. In fact, we believe that the use of a common template, for instance for generating missing values, allows for better and transparent comparability between different methods and for simple replicability of experimental results.

How to …

If you have suggestions for improvement of these workflows or if you discover bugs in some of them, please feel free to contact us via the Contact form or directly submit changes on our GitHub repo.