Skip to contents

[Stable]

Perform a (multi)collinearity diagnostic of a correlation matrix of predictor variables using several indicators, as shown by Olivoto et al. (2017).

Usage

colindiag(.data, ..., by = NULL, n = NULL)

Arguments

.data

The data to be analyzed. It must be a symmetric correlation matrix, or a data frame, possible with grouped data passed from dplyr::group_by().

...

Variables to use in the correlation. If ... is null then all the numeric variables from .data are used. It must be a single variable name or a comma-separated list of unquoted variables names.

by

One variable (factor) to compute the function by. It is a shortcut to dplyr::group_by(). To compute the statistics by more than one grouping variable use that function.

n

If a correlation matrix is provided, then n is the number of objects used to compute the correlation coefficients.

Value

If .data is a grouped data passed from dplyr::group_by()

then the results will be returned into a list-column of data frames.

  • cormat A symmetric Pearson's coefficient correlation matrix between the variables

  • corlist A hypothesis testing for each of the correlation coefficients

  • evalevet The eigenvalues with associated eigenvectors of the correlation matrix

  • VIF The Variance Inflation Factors, being the diagonal elements of the inverse of the correlation matrix.

  • CN The Condition Number of the correlation matrix, given by the ratio between the largest and smallest eigenvalue.

  • det The determinant of the correlation matrix.

  • ncorhigh Number of correlation greather than |0.8|.

  • largest_corr The largest correlation (in absolute value) observed.

  • smallest_corr The smallest correlation (in absolute value) observed.

  • weight_var The variables with largest eigenvector (largest weight) in the eigenvalue of smallest value, sorted in decreasing order.

References

Olivoto, T., V.Q. Souza, M. Nardino, I.R. Carvalho, M. Ferrari, A.J. Pelegrin, V.J. Szareski, and D. Schmidt. 2017. Multicollinearity in path analysis: a simple method to reduce its effects. Agron. J. 109:131-142. doi:10.2134/agronj2016.04.0196

Author

Tiago Olivoto tiagoolivoto@gmail.com

Examples

# \donttest{
# Using the correlation matrix
library(metan)

cor_iris <- cor(iris[,1:4])
n <- nrow(iris)

col_diag <- colindiag(cor_iris, n = n)


# Using a data frame
col_diag_gen <- data_ge2 %>%
                group_by(GEN) %>%
                colindiag()

# Diagnostic by levels of a factor
# For variables with "N" in variable name
col_diag_gen <- data_ge2 %>%
                group_by(GEN) %>%
                colindiag(contains("N"))
# }