Skip to contents

[Stable]

inspect() scans a data.frame object for errors that may affect the use of functions in metan. By default, all variables are checked regarding the class (numeric or factor), missing values, and presence of possible outliers. The function will return a warning if the data looks like unbalanced, has missing values or possible outliers.

Usage

inspect(.data, ..., plot = FALSE, threshold = 15, verbose = TRUE)

Arguments

.data

The data to be analyzed

...

The variables in .data to check. If no variable is informed, all the variables in .data are used.

plot

Create a plot to show the check? Defaults to FALSE.

threshold

Maximum number of levels allowed in a character / factor column to produce a plot. Defaults to 15.

verbose

Logical argument. If TRUE (default) then the results for checks are shown in the console.

Value

A tibble with the following variables:

  • Variable The name of variable

  • Class The class of the variable

  • Missing Contains missing values?

  • Levels The number of levels of a factor variable

  • Valid_n Number of valid n (omit NAs)

  • Outlier Contains possible outliers?

Author

Tiago Olivoto tiagoolivoto@gmail.com

Examples

# \donttest{
library(metan)
inspect(data_ge)
#> # A tibble: 5 × 10
#>   Variable Class   Missing Levels Valid_n   Min Median   Max Outlier Text 
#>   <chr>    <chr>   <chr>   <chr>    <int> <dbl>  <dbl> <dbl>   <dbl> <lgl>
#> 1 ENV      factor  No      14         420 NA     NA    NA         NA NA   
#> 2 GEN      factor  No      10         420 NA     NA    NA         NA NA   
#> 3 REP      factor  No      3          420 NA     NA    NA         NA NA   
#> 4 GY       numeric No      -          420  0.67   2.61  5.09       0 NA   
#> 5 HM       numeric No      -          420 38     48    58          0 NA   
#> No issues detected while inspecting data.

# Create a toy example with messy data
df <- data_ge2[-c(2, 30, 45, 134), c(1:5)] %>% as.data.frame()
df[c(1, 20, 50), 5] <- NA
df[40, 4] <- "2..814"

inspect(df)
#> # A tibble: 5 × 10
#>   Variable Class     Missing Levels Valid_n   Min Median   Max Outlier Text     
#>   <chr>    <chr>     <chr>   <chr>    <int> <dbl>  <dbl> <dbl>   <dbl> <chr>    
#> 1 ENV      factor    No      4          152 NA     NA    NA         NA NA       
#> 2 GEN      factor    No      13         152 NA     NA    NA         NA NA       
#> 3 REP      factor    No      3          152 NA     NA    NA         NA NA       
#> 4 PH       character No      0          152 NA     NA    NA         NA Line(s):…
#> 5 EH       numeric   Yes     -          149  0.75   1.41  1.88       0 NA       
#> Warning: Considering the levels of factors, .data should have 156 rows, but it has 152. Use 'as_factor()' for coercing a variable to a factor.
#> Warning: Missing values in variable(s) EH.
#> Warning: Possible text fragments in variable(s) PH.
# }