vignettes/pkgdown/inspect_na_examples.Rmd
inspect_na_examples.Rmdstarwars
The examples below make use of the starwars and
storms data from the dplyr package
For illustrating comparisons of dataframes, use the
starwars data and produce two new dataframes
star_1 and star_2 that randomly sample the
rows of the original and drop a couple of columns.
inspect_na() for a single dataframe
inspect_na() summarises the prevalence of missing values
by each column in a data frame. A tibble containing the count
(cnt) and the overall percentage (pcnt) of
missing values is returned.
library(inspectdf)
inspect_na(starwars)## # A tibble: 14 × 3
## col_name cnt pcnt
## <chr> <int> <dbl>
## 1 birth_year 44 50.6
## 2 mass 28 32.2
## 3 homeworld 10 11.5
## 4 height 6 6.90
## 5 hair_color 5 5.75
## 6 sex 4 4.60
## 7 gender 4 4.60
## 8 species 4 4.60
## 9 name 0 0
## 10 skin_color 0 0
## 11 eye_color 0 0
## 12 films 0 0
## 13 vehicles 0 0
## 14 starships 0 0
A barplot can be produced by passing the result to
show_plot():
inspect_na(starwars) %>% show_plot()
inspect_na() for two dataframes
When a second dataframe is provided, inspect_na()
returns a tibble containing counts and percentage missingness by column,
with summaries for the first and second data frames are show in columns
with names appended with _1 and _2,
respectively. In addition, a \(p\)-value is calculated which provides a
measure of evidence of whether the difference in missing values is
significantly different.
inspect_na(star_1, star_2)## # A tibble: 14 × 6
## col_name cnt_1 pcnt_1 cnt_2 pcnt_2 p_value
## <chr> <int> <dbl> <int> <dbl> <dbl>
## 1 birth_year 24 48 23 46 1
## 2 mass 14 28 12 24 0.820
## 3 homeworld 5 10 3 6 0.712
## 4 height 4 8 NA NA NA
## 5 hair_color 3 6 3 6 1
## 6 sex 3 6 2 4 1
## 7 gender 3 6 2 4 1
## 8 species 3 6 2 4 1
## 9 name 0 0 NA NA NA
## 10 skin_color 0 0 0 0 NA
## 11 eye_color 0 0 0 0 NA
## 12 films 0 0 0 0 NA
## 13 vehicles 0 0 0 0 NA
## 14 starships 0 0 0 0 NA
inspect_na(star_1, star_2) %>% show_plot()
Notes:
height appears in star_1 but nor
star_2, then the corresponding pcnt_,
cnt_ and p_value columns will contain
NA
p_value is
NA.p_value cannot be calculated, no coloured bar is
shown.alpha
argument to inspect_na(). The default is
alpha = 0.05.