Skip to contents


Select a set of predictors with minimal multicollinearity using the variance inflation factor (VIF) as criteria to remove collinear variables. The algorithm will: (i) compute the VIF value of the correlation matrix containing the variables selected in ...; (ii) arrange the VIF values and delete the variable with the highest VIF; and (iii) iterate step ii until VIF value is less than or equal to max_vif.


  max_vif = 10,
  missingval = "pairwise.complete.obs"



The data set containing the variables.


Variables to be submitted to selection. If ... is null then all the numeric variables from .data are used. It must be a single variable name or a comma-separated list of unquoted variables names.


The maximum value for the Variance Inflation Factor (threshold) that will be accepted in the set of selected predictors.


How to deal with missing values. For more information, please see stats::cor().


A data frame showing the number of selected predictors, maximum VIF value, condition number, determinant value, selected predictors and removed predictors from the original set of variables.


# \donttest{
# All numeric variables
#>          Parameter                                       values
#> 1       Predictors                                           10
#> 2              VIF                                         7.16
#> 3 Condition Number                                       56.797
#> 4      Determinant                                 0.0008810515
#> 5         Selected PERK, EP, CDED, NKR, PH, NR, TKW, EL, CD, ED
#> 6          Removed                          EH, CL, CW, KW, NKE

# Select variables and choose a VIF threshold to 5
non_collinear_vars(data_ge2, EH, CL, CW, KW, NKE, max_vif = 5)
#>          Parameter          values
#> 1       Predictors               4
#> 2              VIF           2.934
#> 3 Condition Number          11.248
#> 4      Determinant    0.2400583901
#> 5         Selected NKE, EH, CL, CW
#> 6          Removed              KW
# }