Cross-validation procedure

Cross-validation for blup prediction.

This function provides a cross-validation procedure for mixed models using replicate-based data. By default, complete blocks are randomly selected within each environment. In each iteration, the original dataset is split up into two datasets: training and validation data. The 'training' set has all combinations (genotype x environment) with R - 1 replications. The 'validation' set has the remaining replication. The estimated values are compared with the 'validation' data and the Root Means Square Prediction Difference (Olivoto et al. 2019) is computed. At the end of boots, a list is returned.

Usage

cv_blup(
  .data,
  env,
  gen,
  rep,
  resp,
  block = NULL,
  nboot = 200,
  random = "gen",
  verbose = TRUE
)

Arguments

.data: The dataset containing the columns related to Environments, Genotypes, replication/block and response variable(s).
env: The name of the column that contains the levels of the environments.
gen: The name of the column that contains the levels of the genotypes.
rep: The name of the column that contains the levels of the replications/blocks. AT LEAST THREE REPLICATES ARE REQUIRED TO PERFORM THE CROSS-VALIDATION.
resp: The response variable.
block: Defaults to NULL. In this case, a randomized complete block design is considered. If block is informed, then a resolvable alpha-lattice design (Patterson and Williams, 1976) is employed. See how fixed and random effects are considered, see the section Details.
nboot: The number of resamples to be used in the cross-validation. Defaults to 200
random: The effects of the model assumed to be random. See Details for more information.
verbose: A logical argument to define if a progress bar is shown. Default is TRUE.

Value

An object of class cv_blup with the following items: * RMSPD: A vector with nboot-estimates of the root mean squared prediction difference between predicted and validating data. * RMSPDmean The mean of RMSPDmean estimates.

Details

Six models may be fitted depending upon the values in block and random arguments.

Model 1: block = NULL and random = "gen" (The default option). This model considers a Randomized Complete Block Design in each environment assuming genotype and genotype-environment interaction as random effects. Environments and blocks nested within environments are assumed to fixed factors.
Model 2: block = NULL and random = "env". This model considers a Randomized Complete Block Design in each environment treating environment, genotype-environment interaction, and blocks nested within environments as random factors. Genotypes are assumed to be fixed factors.
Model 3: block = NULL and random = "all". This model considers a Randomized Complete Block Design in each environment assuming a random-effect model, i.e., all effects (genotypes, environments, genotype-vs-environment interaction and blocks nested within environments) are assumed to be random factors.
Model 4: block is not NULL and random = "gen". This model considers an alpha-lattice design in each environment assuming genotype, genotype-environment interaction, and incomplete blocks nested within complete replicates as random to make use of inter-block information (Mohring et al., 2015). Complete replicates nested within environments and environments are assumed to be fixed factors.
Model 5: block is not NULL and random = "env". This model considers an alpha-lattice design in each environment assuming genotype as fixed. All other sources of variation (environment, genotype-environment interaction, complete replicates nested within environments, and incomplete blocks nested within replicates) are assumed to be random factors.
Model 6: block is not NULL and random = "all". This model considers an alpha-lattice design in each environment assuming all effects, except the intercept, as random factors.

IMPORTANT: An error is returned if any combination of genotype-environment has a different number of replications than observed in the trial.

References

Olivoto, T., A.D.C. L\'ucio, J.A.G. da silva, V.S. Marchioro, V.Q. de Souza, and E. Jost. 2019. Mean performance and stability in multi-environment trials I: Combining features of AMMI and BLUP techniques. Agron. J. 111:2949-2960. doi:10.2134/agronj2019.03.0220

Patterson, H.D., and E.R. Williams. 1976. A new class of resolvable incomplete block designs. Biometrika 63:83-92.

Mohring, J., E. Williams, and H.-P. Piepho. 2015. Inter-block information: to recover or not to recover it? TAG. Theor. Appl. Genet. 128:1541-54. doi:10.1007/s00122-015-2530-0

Author

Tiago Olivoto tiagoolivoto@gmail.com

Examples


# \donttest{
library(metan)
model <- cv_blup(data_ge,
                 env = ENV,
                 gen = GEN,
                 rep = REP,
                 resp = GY,
                 nboot = 5)
#> Validating 1 of 5 sets |========                                 | 20% 00:00:01 
Validating 2 of 5 sets |================                         | 40% 00:00:02 
Validating 3 of 5 sets |=========================                | 60% 00:00:04 
Validating 4 of 5 sets |=================================        | 80% 00:00:05 
Validating 5 of 5 sets |=========================================| 100% 00:00:06 


# }