`path_coeff()`

computes a path analysis using a data frame as input data.`path_coeff_seq()`

computes a sequential path analysis using primary and secondary traits.`path_coeff_mat()`

computes a path analysis using correlation matrices as input data.

## Usage

```
path_coeff(
.data,
resp,
pred = everything(),
by = NULL,
exclude = FALSE,
correction = NULL,
knumber = 50,
brutstep = FALSE,
maxvif = 10,
missingval = "pairwise.complete.obs",
plot_res = FALSE,
verbose = TRUE,
...
)
path_coeff_mat(cor_mat, resp, correction = NULL, knumber = 50, verbose = TRUE)
path_coeff_seq(.data, resp, chain_1, chain_2, by = NULL, verbose = TRUE, ...)
```

## Arguments

- .data
The data. Must be a data frame or a grouped data passed from

`dplyr::group_by()`

- resp
<

`tidy-select`

> The dependent trait.- pred
<

`tidy-select`

> The predictor traits. set to`everything()`

, i.e., the predictor traits are all the numeric traits in the data except that in`resp`

. To select multiple traits, use a comma-separated vector of names, (e.g.,`pred = c(V1, V2, V2)`

), an interval of trait names, (e.g.,`pred = c(V1:V3)`

), or even a select helper (e.g.,`pred = starts_with("V")`

).- by
One variable (factor) to compute the function by. It is a shortcut to

`dplyr::group_by()`

. To compute the statistics by more than one grouping variable use that function.- exclude
Logical argument, set to false. If

`exclude = TRUE`

, then the traits in`pred`

are deleted from the data, and the analysis will use as predictor those that remained, except that in`resp`

.- correction
Set to

`NULL`

. A correction value (k) that will be added into the diagonal elements of the**X'X**matrix aiming at reducing the harmful problems of the multicollinearity in path analysis (Olivoto et al., 2017)- knumber
When

`correction = NULL`

, a plot showing the values of direct effects in a set of different k values (0-1) is produced.`knumber`

is the number of k values used in the range of 0 to 1.- brutstep
Logical argument, set to

`FALSE`

. If true, then an algorithm will select a subset of variables with minimal multicollinearity and fit a set of possible models. See the**Details**section for more information.- maxvif
The maximum value for the Variance Inflation Factor (cut point) that will be accepted. See the

**Details**section for more information.- missingval
How to deal with missing values. For more information, please see

`stats::cor()`

.- plot_res
If

`TRUE`

, create a scatter plot of residual against predicted value and a normal Q-Q plot.- verbose
If

`verbose = TRUE`

then some results are shown in the console.- ...
Depends on the function used:

For

`path_coeff()`

additional arguments passed on to`stats::plot.lm()`

.For

`path_coeff_seq()`

additional arguments passed on to path_coeff.

- cor_mat
Matrix of correlations containing both dependent and independent traits.

- chain_1, chain_2
<

`tidy-select`

> The traits used in the first (primary) and second (secondary) chain.

## Value

Depends on the function used:

`path_coeff()`

, returns a list with the following items:**Corr.x**A correlation matrix between the predictor variables.**Corr.y**A vector of correlations between each predictor variable with the dependent variable.**Coefficients**The path coefficients. Direct effects are the diagonal elements, and the indirect effects those in the off-diagonal elements (lines).**Eigen**Eigenvectors and eigenvalues of the`Corr.x.`

**VIF**The Variance Inflation Factors.**plot**A ggplot2-based graphic showing the direct effects in 21 different k values.**Predictors**The predictor variables used in the model.**CN**The Condition Number, i.e., the ratio between the highest and lowest eigenvalue.**Det**The matrix determinant of the`Corr.x.`

.**R2**The coefficient of determination of the model.**Residual**The residual effect of the model.**Response**The response variable.**weightvar**The order of the predictor variables with the highest weight (highest eigenvector) in the lowest eigenvalue.

`path_coeff_seq()`

returns a list with the following objects**resp_fc**an object of class`path_coeff`

with the results for the analysis with dependent trait and first chain predictors.**resp_sc**an object of class`path_coeff`

with the results for the analysis with dependent trait and second chain predictors.**resp_sc2**The path coefficients of second chain predictors and the dependent trait through the first chain predictors**fc_sc_list**A list of objects with the path analysis using each trait in the first chain as dependent and second chain as predictors.**fc_sc_coef**The coefficients between first- and second-chain traits.**cor_mat**A correlation matrix between the analyzed traits. If`.data`

is a grouped data passed from`dplyr::group_by()`

then the results will be returned into a list-column of data frames.

## Details

In `path_coeff()`

, when `brutstep = TRUE`

, an algorithm to
select a set of predictors with minimal multicollinearity and high
explanatory power is implemented. first, the algorithm will select a set of
predictors with minimal multicollinearity. The selection is based on the
variance inflation factor (VIF). An iterative process is performed until
the maximum VIF observed is less than `maxvif`

. The variables selected
in this iterative process are then used in a series of stepwise-based
regressions. The first model is fitted and p-1 predictor variables are
retained (p is the number of variables selected in the iterative process.
The second model adjusts a regression considering p-2 selected variables,
and so on until the last model, which considers only two variables. Three
objects are created. `Summary`

, with the process summary,
`Models`

, containing the aforementioned values for all the adjusted
models; and `Selectedpred`

, a vector with the name of the selected
variables in the iterative process.

## References

Olivoto, T., V.Q. Souza, M. Nardino, I.R. Carvalho, M. Ferrari, A.J. Pelegrin, V.J. Szareski, and D. Schmidt. 2017. Multicollinearity in path analysis: a simple method to reduce its effects. Agron. J. 109:131-142. doi:10.2134/agronj2016.04.0196

Olivoto, T., M. Nardino, I.R. Carvalho, D.N. Follmann, M. Ferrari, et al. 2017. REML/BLUP and sequential path analysis in estimating genotypic values and interrelationships among simple maize grain yield-related traits. Genet. Mol. Res. 16(1): gmr16019525. doi:10.4238/gmr16019525

## Author

Tiago Olivoto tiagoolivoto@gmail.com

## Examples

```
library(metan)
# Using KW as the response variable and all other ones as predictors
pcoeff <- path_coeff(data_ge2, resp = KW)
#> Severe multicollinearity.
#> Condition Number: 7865.84
#> Consider using a correction factor with 'correction' argument.
#> Consider identifying collinear traits with `non_collinear_vars()`
# The same as above, but using the correlation matrix
cor_mat <- cor(data_ge2 %>% select_numeric_cols())
pcoeff2 <- path_coeff_mat(cor_mat, resp = KW)
#> Severe multicollinearity.
#> Condition Number = 7865.84
#> Please, consider using a correction factor, or use 'brutstep = TRUE'.
# Declaring the predictors
# Create a residual plot with 'plot_res = TRUE'
pcoeff3<- path_coeff(data_ge2,
resp = KW,
pred = c(PH, EH, NKE, TKW),
plot_res = TRUE)
#> Weak multicollinearity.
#> Condition Number: 40.232
#> You will probably have path coefficients close to being unbiased.
# Selecting a set of predictors with minimal multicollinearity
# Maximum variance Inflation factor of 5
pcoeff4 <- path_coeff(data_ge2,
resp = KW,
brutstep = TRUE,
maxvif = 5)
#> --------------------------------------------------------------------------
#> The algorithm has selected a set of 8 predictors with largest VIF = 3.346.
#> Selected predictors: NR PERK EP CDED EL NKR TKW PH
#> A forward stepwise-based selection procedure will fit 6 models.
#> --------------------------------------------------------------------------
#> Adjusting the model 1 with 7 predictors (16.67% concluded)
#> Adjusting the model 2 with 6 predictors (33.33% concluded)
#> Adjusting the model 3 with 5 predictors (50% concluded)
#> Adjusting the model 4 with 4 predictors (66.67% concluded)
#> Adjusting the model 5 with 3 predictors (83.33% concluded)
#> Adjusting the model 6 with 2 predictors (100% concluded)
#> Done!
#> --------------------------------------------------------------------------
#> Summary of the adjusted models
#> --------------------------------------------------------------------------
#> Model AIC Numpred CN Determinant R2 Residual maxVIF
#> MODEL_1 1127 7 13.67 0.0841 0.933 0.259 2.59
#> MODEL_2 1125 6 12.26 0.1383 0.933 0.259 2.46
#> MODEL_3 1126 5 12.05 0.1989 0.932 0.261 2.31
#> MODEL_4 1251 4 6.66 0.4016 0.846 0.393 1.98
#> MODEL_5 1308 3 3.05 0.7438 0.774 0.475 1.34
#> MODEL_6 1329 2 2.23 0.8555 0.738 0.512 1.17
#> --------------------------------------------------------------------------
#>
# When one analysis should be carried out for each environment
# Using the forward-pipe operator %>%
pcoeff5 <- path_coeff(data_ge2, resp = KW, by = ENV)
#> Severe multicollinearity.
#> Condition Number: 13958.438
#> Consider using a correction factor with 'correction' argument.
#> Consider identifying collinear traits with `non_collinear_vars()`
#> Severe multicollinearity.
#> Condition Number: 8139.667
#> Consider using a correction factor with 'correction' argument.
#> Consider identifying collinear traits with `non_collinear_vars()`
#> Severe multicollinearity.
#> Condition Number: 11334.047
#> Consider using a correction factor with 'correction' argument.
#> Consider identifying collinear traits with `non_collinear_vars()`
#> Severe multicollinearity.
#> Condition Number: 12981.917
#> Consider using a correction factor with 'correction' argument.
#> Consider identifying collinear traits with `non_collinear_vars()`
# sequential path analysis
# KW as dependent trait
# NKE and TKW as primary predictors
# PH, EH, EP, and EL as secondary traits
pcoeff6 <-
path_coeff_seq(data_ge2,
resp = KW,
chain_1 = c(NKE, TKW),
chain_2 = c(PH, EH, EP, EL))
#> ========================================================
#> Collinearity diagnosis of first chain predictors
#> ========================================================
#> Weak multicollinearity.
#> Condition Number: 1.139
#> You will probably have path coefficients close to being unbiased.
#> ========================================================
#> Collinearity diagnosis of second chain predictors
#> ========================================================
#> Severe multicollinearity.
#> Condition Number: 1047.993
#> Consider using a correction factor with 'correction' argument.
#> Consider identifying collinear traits with `non_collinear_vars()`
pcoeff6$resp_sc$Coefficients
#> PH EH EP EL linear
#> PH 1.4800514 -1.3237892 0.4282428 0.1689388 0.7534439
#> EH 1.3791537 -1.4206365 0.5832858 0.1611440 0.7029469
#> EP 0.9448831 -1.2353087 0.6707935 0.1170515 0.4974193
#> EL 0.5627096 -0.5151991 0.1767029 0.4443467 0.6685601
pcoeff6$resp_sc2
#> trait effect NKE TKW residual total
#> 1 PH direct 1.45585270 -0.07780424 -0.102002953 1.4800514
#> 2 PH indirect_EH -1.70571124 0.54287910 0.160957033 -1.3237892
#> 3 PH indirect_EP 0.48951552 -0.12747469 -0.066202002 0.4282428
#> 4 PH indirect_EL 0.09405361 0.07201754 -0.002867693 0.1689388
#> 5 PH total 0.33371058 0.40961771 -0.010115614 0.7534439
#> 6 EH direct -1.83049968 0.58259569 0.172732518 -1.4206365
#> 7 EH indirect_PH 1.35660461 -0.07250018 -0.095049229 1.3791537
#> 8 EH indirect_EP 0.66674189 -0.17362620 -0.090170068 0.5832858
#> 9 EH indirect_EL 0.08971396 0.06869464 -0.002735378 0.1611440
#> 10 EH total 0.28256079 0.40516395 -0.015222157 0.7029469
#> 11 EP direct 0.76677015 -0.19967455 -0.103697874 0.6707935
#> 12 EP indirect_PH 0.92943432 -0.04967118 -0.065119943 0.9448831
#> 13 EP indirect_EH -1.59170362 0.50659374 0.150198865 -1.2353087
#> 14 EP indirect_EL 0.06516626 0.04989828 -0.001986918 0.1170515
#> 15 EP total 0.16966711 0.30714629 -0.020605871 0.4974193
#> 16 EL direct 0.24738189 0.18942212 -0.007542670 0.4443467
#> 17 EL indirect_PH 0.55350939 -0.02958086 -0.038781116 0.5627096
#> 18 EL indirect_EH -0.66383754 0.21128050 0.062642093 -0.5151991
#> 19 EL indirect_EP 0.20198546 -0.05259902 -0.027316482 0.1767029
#> 20 EL total 0.33903919 0.31852275 -0.010998175 0.6685601
```