• desc_stat() Computes the most used measures of central tendency, position, and dispersion.

  • desc_wider() is useful to put the variables in columns and grouping variables in rows. The table is filled with a statistic chosen with the argument stat.

desc_stat(
  .data = NULL,
  ...,
  by = NULL,
  stats = "main",
  hist = FALSE,
  level = 0.95,
  digits = 4,
  na.rm = FALSE,
  verbose = TRUE,
  plot_theme = theme_metan()
)

desc_wider(.data, which)

Arguments

.data

The data to be analyzed. It can be a data frame (possible with grouped data passed from group_by() or a numeric vector. For desc_wider() .data is an object of class desc_stat.

...

A single variable name or a comma-separated list of unquoted variables names. If no variable is informed, all the numeric variables from .data will be used. Select helpers are allowed.

by

One variable (factor) to compute the function by. It is a shortcut to group_by(). To compute the statistics by more than one grouping variable use that function.

stats

The descriptive statistics to show. This is used to filter the output after computation. Defaults to "main" (cv, max, mean median, min, sd.amo, se, ci ). Other allowed values are "all" to show all the statistics, "robust" to show robust statistics, "quantile" to show quantile statistics, or chose one (or more) of the following:

  • "av.dev": average deviation.

  • "ci": 95 percent confidence interval of the mean.

  • "cv": coefficient of variation.

  • "iqr": interquartile range.

  • "gmean": geometric mean.

  • "hmean": harmonic mean.

  • "Kurt": kurtosis.

  • "mad": median absolute deviation.

  • "max": maximum value.

  • "mean": arithmetic mean.

  • "median": median.

  • "min": minimum value.

  • "n": the length of the data.

  • "q2.5", "q25", "q75", "q97.5": the percentile 2.5\ quartile, third quartile, and percentile 97.5\

  • range: The range of data).

  • "sd.amo", "sd.pop": the sample and population standard deviation.

  • "se": the standard error of the mean.

  • "skew": skewness.

  • "sum". the sum of the values.

  • "sum.dev": the sum of the absolute deviations.

  • "sum.sq.dev": the sum of the squared deviations.

  • "valid.n": The size of sample with valid number (not NA).

  • "var.amo", "var.pop": the sample and population variance.

Use a names to select the statistics. For example, stats = c("median, mean, cv, n"). Note that the statistic names are not case-sensitive. Both comma or space can be used as separator.

hist

Logical argument defaults to FALSE. If hist = TRUE then a histogram is created for each selected variable.

level

The confidence level to compute the confidence interval of mean. Defaults to 0.95.

digits

The number of significant digits.

na.rm

Logical. Should missing values be removed? Defaults to FALSE.

verbose

Logical argument. If verbose = FALSE the code is run silently.

plot_theme

The graphical theme of the plot. Default is plot_theme = theme_metan(). For more details, see theme.

which

A statistic to fill the table.

Value

  • desc_stats() returns a tibble with the statistics in the columns and variables (with possible grouping factors) in rows.

  • desc_wider() returns a tibble with variables in columns and grouping factors in rows.

Examples

# \donttest{ library(metan) #===============================================================# # Example 1: main statistics (coefficient of variation, maximum,# # mean, median, minimum, sample standard deviation, standard # # error and confidence interval of the mean) for all numeric # # variables in data # #===============================================================# desc_stat(data_ge2)
#> # A tibble: 15 x 9 #> variable cv max mean median min sd.amo se ci #> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> #> 1 CD 7.34 18.6 16.0 16 12.9 1.17 0.0939 0.186 #> 2 CDED 5.71 0.694 0.586 0.588 0.495 0.0334 0.0027 0.0053 #> 3 CL 7.95 34.7 29.0 28.7 23.5 2.31 0.185 0.365 #> 4 CW 25.2 38.5 24.8 24.5 11.1 6.26 0.501 0.99 #> 5 ED 5.58 54.9 49.5 49.9 43.5 2.76 0.221 0.437 #> 6 EH 21.2 1.88 1.34 1.41 0.752 0.284 0.0228 0.045 #> 7 EL 8.28 17.9 15.2 15.1 11.5 1.26 0.101 0.199 #> 8 EP 10.5 0.660 0.537 0.544 0.386 0.0564 0.0045 0.0089 #> 9 KW 18.9 251. 173. 175. 106. 32.8 2.62 5.18 #> 10 NKE 14.2 697. 512. 509. 332. 72.6 5.82 11.5 #> 11 NKR 10.7 42 32.2 32 23.2 3.47 0.277 0.548 #> 12 NR 10.2 21.2 16.1 16 12.4 1.64 0.131 0.259 #> 13 PERK 2.17 91.8 87.4 87.5 81.2 1.90 0.152 0.300 #> 14 PH 13.4 3.04 2.48 2.52 1.71 0.334 0.0267 0.0528 #> 15 TKW 13.9 452. 339. 342. 218. 47.1 3.77 7.44
#===============================================================# #Example 2: robust statistics using a numeric vector as input # # data #===============================================================# vect <- data_ge2$TKW desc_stat(vect, stats = "robust")
#> # A tibble: 1 x 4 #> variable n median iqr #> <chr> <dbl> <dbl> <dbl> #> 1 val 156 342. 57.8
#===============================================================# # Example 3: Select specific statistics. In this example, NAs # # are removed before analysis with a warning message # #===============================================================# desc_stat(c(12, 13, 19, 21, 8, NA, 23, NA), stats = c('mean, se, cv, n, valid.n'), na.rm = TRUE)
#> # A tibble: 1 x 6 #> variable mean se cv n valid.n #> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> #> 1 val 16 2.39 36.7 8 6
#===============================================================# # Example 4: Select specific variables and compute statistics by# # levels of a factor variable (GEN) # #===============================================================# stats <- desc_stat(data_ge2, EP, EL, EH, ED, PH, CD, by = GEN) stats
#> # A tibble: 78 x 10 #> GEN variable cv max mean median min sd.amo se ci #> <fct> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> #> 1 H1 CD 6.44 17.9 15.7 15.7 14.5 1.01 0.292 0.643 #> 2 H1 ED 2.66 53.3 51.2 50.8 49.2 1.36 0.393 0.864 #> 3 H1 EH 19.5 1.88 1.50 1.56 1.05 0.294 0.0848 0.187 #> 4 H1 EL 6.27 16.9 15.1 15.1 13.7 0.947 0.273 0.602 #> 5 H1 EP 9.91 0.658 0.570 0.574 0.492 0.0565 0.0163 0.0359 #> 6 H1 PH 11.7 3.00 2.62 2.70 2.11 0.307 0.0885 0.195 #> 7 H10 CD 6.32 17.5 15.9 15.7 14.4 1.00 0.290 0.638 #> 8 H10 ED 7.70 54.1 48.4 47.7 43.7 3.73 1.08 2.37 #> 9 H10 EH 23.2 1.71 1.26 1.25 0.888 0.293 0.0845 0.186 #> 10 H10 EL 6.83 16.7 15.1 14.9 13.6 1.03 0.298 0.656 #> # ... with 68 more rows
# To get a 'wide' format with the maximum values for all variables desc_wider(stats, max)
#> # A tibble: 13 x 7 #> GEN CD ED EH EL EP PH #> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> #> 1 H1 17.9 53.3 1.88 16.9 0.658 3.00 #> 2 H10 17.5 54.1 1.71 16.7 0.660 2.83 #> 3 H11 18.0 52.3 1.67 17.4 0.600 2.77 #> 4 H12 16.2 52.7 1.58 15.7 0.616 2.79 #> 5 H13 17.8 54.0 1.77 16.3 0.615 2.93 #> 6 H2 17.0 53.6 1.87 16.1 0.615 3.03 #> 7 H3 18.0 52.2 1.80 17.6 0.640 3.04 #> 8 H4 17.7 52.8 1.82 16.8 0.617 3.02 #> 9 H5 17.4 52.7 1.76 16.6 0.632 2.90 #> 10 H6 18.3 54.9 1.69 17.9 0.631 2.94 #> 11 H7 18.6 52.1 1.67 17.5 0.617 2.87 #> 12 H8 18.4 53.3 1.57 17.7 0.585 2.76 #> 13 H9 18.1 53.6 1.71 17.5 0.630 3.00
#===============================================================# # Example 5: Compute all statistics for all numeric variables # # by two or more factors. Note that group_by() was used to pass # # grouped data to the function desc_stat() # #===============================================================# data_ge2 %>% group_by(ENV, GEN) %>% desc_stat()
#> # A tibble: 780 x 11 #> ENV GEN variable cv max mean median min sd.amo se #> <fct> <fct> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> #> 1 A1 H1 CD 6.91 16.4 15.7 16.3 14.5 1.09 0.627 #> 2 A1 H1 CDED 2.04 0.561 0.550 0.551 0.538 0.0112 0.0065 #> 3 A1 H1 CL 1.48 28.4 28.1 28.1 27.6 0.415 0.239 #> 4 A1 H1 CW 7.93 25.1 23.5 24.0 21.4 1.86 1.08 #> 5 A1 H1 ED 1.98 52.2 51.1 50.7 50.3 1.01 0.583 #> 6 A1 H1 EH 5.36 1.76 1.68 1.71 1.58 0.0902 0.0521 #> 7 A1 H1 EL 7.15 16.1 15.4 16.0 14.2 1.10 0.637 #> 8 A1 H1 EP 5.34 0.658 0.626 0.628 0.591 0.0334 0.0193 #> 9 A1 H1 KW 8.31 217. 203. 208. 184. 16.8 9.72 #> 10 A1 H1 NKE 6.80 565. 527. 521. 494. 35.8 20.7 #> # ... with 770 more rows, and 1 more variable: ci <dbl>
# }