R-miss-tastic

A resource website on missing data

This page is still under construction.

Here you will find a constantly growing list of interesting datasets which are frequently used in the R community working on missing values. These datasets can be useful to get familiar with different concepts in handling missing values and to assess the quality and performance of new methods.

If you have suggestions on other datasets which might be of interest to others, please feel free to contact us via the Contact form.


Complete data

If you wish to evaluate a certain missing data method on real (or simulated) data it can be useful to first generate missing values in a complete dataset. This allows to control the response mechanism and evaluate the method for different response mechanisms. A useful tool for this is the ampute function of the mice R-package. Rianne Schouten and her colleagues wrote a self-contained tutorial on how to ampute data.


Incomplete data

The datasets listed below are either widely used in general in the missing data community or used for illustration of different methods handling missing values in the tutorials from the Tutorials and R packages sections.

Click on the data set name to obtain further information.

This data set contains daily air quality measurements in New York (May to September 1973) and presents missing values in some variables. It can be loaded in R by calling data(airquality).

Tutorials illustrating methods on this data:
R-package NHANES containing data from the US National Health and Nutrition Examination Study. The data comprises body shape and related measurements from the US National Health and Nutrition Examination Survey (NHANES, 1999-2004, more details on the survey).

Tutorials illustrating methods on this data:
This data set contains geographic and atmospheric measures and presents missing values for all variables. It can be loaded in R by calling data(ozone).
More information on the data exposition (1983-2013).

Tutorials illustrating methods on this data:


Some R-packages on missing values also contain interesting data sets. See for instance naniar, tsImpute and VIM.


Share