Getting started

In this section, we will use the data in data_ge. For more information, please, see ?data_ge. Other data sets can be used provided that the following columns are in the dataset: environment, genotype, block/replicate and response variable(s).

The first step is to inspect the data with the function inspect().

Individual and joint ANOVA

It is suggested to check if genotype-vs-environment interaction is significant before proceeding with the AMMI analysis. A within-environment ANOVA considering a fixed-effect model is computed with the function anova_ind(). For each environment the Mean Squares for block, genotypes and error are shown. Estimated F-value and the probability error are also shown for block and genotype effects. Some measures of experimental precision are calculated, namely, coefficient of variation, \(CV = (\sqrt{MS_{res}}/Mean) \times 100\); the heritability, \(h2 = (MS_{gen} - MS_{res})/MS_{gen}\), and the accuracy of selection, \(As = \sqrt{h2}\).

The joint ANOVA is performed with the function anova_joint().

library(metan)
joint <- anova_joint(data_ge, ENV, GEN, REP, GY, verbose = FALSE)
print_table(joint$GY$anova)

The genotype-vs-environment interaction was highly significant. So we’ll proceed with the AMMI analysis.

The AMMI model

The estimate of the response variable for the ith genotype in the jth environment using The Additive Main Effect and Multiplicative interaction (AMMI) model, is given as follows:

\[ {y_{ij}} = \mu + {\alpha_i} + {\tau_j} + \sum\limits_{k = 1}^p {{\lambda _k}{a_{ik}}} {t_{jk}} + {\rho _{ij}} + {\varepsilon _{ij}} \]

where \({\lambda_k}\) is the singular value for the k-th interaction principal component axis (IPCA); \(a_{ik}\) is the i-th element of the k-th eigenvector; \(t_{jk}\) is the jth element of the kth eigenvector. A residual \(\rho _{ij}\) remains, if not all p IPCA are used, where \(p \le min(g - 1; e - 1)\).

The AMMI model is fitted with the performs_ammi() function. The first argument is the data, in our example data_ge. The second argument (resp) is the response variable to be analyzed. The function allow a single variable (in this case GY) or a vector of response variables. The arguments (gen, env, and rep) are the name of the columns that contains the levels for genotypes, environments, and replications, respectively. The last argument (verbose) control if the code will run silently.

Note that using the arguments in the correct order, the model above may be fitted cleanly with:

AMMI_model <- performs_ammi(data_ge, ENV, GEN, REP, GY, verbose = FALSE)

The AMMI table

The following comand creates the well-known ANOVA table for the AMMI model. Note that since

Nine interaction principal component axis (IPCA) were fitted and four were significant at 5% probability error. Based on this result, the AMMI4 model would be the best model to predict the yielding of the genotypes in the studied environments.

Estimating the response variable based on significant IPCA axes

The response variable of a two-way table (for example, the yield of m genotypes in n environments) may be estimated using the S3 method predict() applyed to an object of class waas. This estimation is based on the number of multiplicative terms declared in the function. If naxis = 1, the AMMI1 (with one multiplicative term) is used for estimating the response variable. If naxis = min(g - 1; e - 1), the AMMIF is fitted. A summary of all possible AMMI models is presented below.

Member of AMMI family Espected response of the i-th genotype in the jth environment
AMMI0 \(\hat{y}_{ij} = \bar{y}_{i.} + \bar{y}_{.j} - \bar{y}_{..}\)
AMMI1 \(\hat{y}_{ij} = \bar{y}_{i.} + \bar{y}_{.j} - \bar{y}_{..} +\lambda_1 a_{i1}t_{j1}\)
AMMI2 \(\hat{y}_{ij} = \bar{y}_{i.} + \bar{y}_{.j} - \bar{y}_{..} +\lambda_1 a_{i1}t_{j1}+\lambda_2 a_{i2}t_{j2}\)
AMMIF \(\hat{y}_{ij} = \bar{y}_{i.} + \bar{y}_{.j} - \bar{y}_{..} +\lambda_1 a_{i1}t_{j1}+\lambda_2 a_{i2}t_{j2}+...+\lambda_p a_{ip}t_{jp}\)

Procedures based on postdictive success, such as Gollobs’s test (Gollob 1968) or predictive success, such as cross-validation (Piepho 1994) should be used to define the number of IPCA used for estimating the response variable in AMMI analysis. This package provides both. The waas() function compute traditional AMMI analysis showing the number of significant axes according to Gollobs’s test. On the other hand, cv_ammif() function provides cross-validation of AMMI-model family, considering a completely randomized design (CRD) or a randomized complete block design (RCBD).

The following values are presented: ENV is the environment; GEN is the genotype; Y is the response variable; resOLS is the residual (\(\hat{z}_{ij}\)) estimated by the Ordinary Least Square (OLS), where \(\hat{z}_{ij} = y_{ij} - \bar{y}_{i.} - \bar{y}_{.j} + \bar{y}_{ij}\); Ypred is the predicted value by OLS (\(\hat{y}_{ij} = y_{ij} -\hat{z}_{ij}\)); ResAMMI is the residual estimated by the AMMI model (\(\hat{a}_{ij}\)) considering the number of multiplicative terms informed in the function (in this case 5), where \(\hat{a}_{ij} = \lambda_1\alpha_{i1}\tau_{j1}+...+\lambda_5\alpha_{i5}\tau_{j5}\); YpredAMMI is the predicted value by AMMI model \(\hat{ya}_{ij} = \bar{y}_{i.} + \bar{y}_{.j} - \bar{y}_{ij}+\hat{a}_{ij}\); and AMMI0 is the predicted value when no multiplicative terms are used, i.e., \(\hat{y}_{ij} = \bar{y}_{i.} + \bar{y}_{.j} - \bar{y}_{ij}\).

Estimating the WAAS index

The waas() function computes the Weighted Average of Absolute Scores (Olivoto, Lúcio, Da silva, Marchioro, et al. 2019) considering (i) all principal component axes that were significant (\(p < 0.05\) by default); or (ii) declaring a specific number of axes to be used, according to the following equation:

\[ WAAS_i = \sum_{k = 1}^{p} |IPCA_{ik} \times EP_k|/ \sum_{k = 1}^{p}EP_k \]

where \(WAAS_i\) is the weighted average of absolute scores of the ith genotype; \(PCA_{ik}\) is the score of the ith genotype in the kth IPCA; and \(EP_k\) is the explained variance of the kth IPCA for \(k = 1,2,..,p\), considering p the number of significant PCAs, or a declared number of PCAs. The following functions may be used to do that.

waas_index <- waas(data_ge,  ENV, GEN, REP, GY, verbose = FALSE)
# New names:
# * `` -> ...15

Number of axes based on F-test

In this example only IPCAs with P-value < 0.05 will be considered in the WAAS estimation. This is the default setting and the model was already fitted and stored into AMMI_model>GY>model.

The output generated by the waas() function shows the following results: type, genotype (GEN) or environment (ENV); Code, the code attributed to each level of the factors; Y, the response variable (in this case the grain yield); WAAS the weighted average of the absolute scores, estimated with all PCA axes with P-value \(\le\) 0.05; PctWAAS and PctResp that are the percentage values for the WAAS and Y, respectively; OrResp and OrWAAS that are the ranks attributed to the genotype and environment regarding the Y or WAAS, respectively; WAASY is the weighted average of absolute scores and response variable. In this case, considering equal weights for PctResp and PctWAAS, the WAASY for G1 is estimated by: \(WAAS_{G1} = [(24.87\times50)+(91.83\times50)]/50+50 = 58.35\). Then the OrWAASY is the rank for the WAASY value. The genotype (or environment) with the largest WAASY value has the first ranked.

Number of axes declared manually

The second option to compute the WAAS is by manually declaring a specific number of multiplicative terms. In this case, the number of terms declared is used independently of its significance. Let us, for the moment, assume that after a cross-validation procedure the AMMI7 was the most predictively accurate AMMI model and the researcher will use this model. The additional argument naxis in the function waas is then used to overwrite the default chose of significant terms.

The only difference in this output is that here we declared that seven IPCA axes should be used for computing the WAAS value. Thus, only the values of WAAS, OrWAAS, WAASY and OrWAASY may have significant changes.

Biplots

Provided that an object of class waas or performs_ammi is available in the global environment, the graphics may be obtained using the function plot_scores(). To do that, we will revisit the previusly fitted model AMMI_model . Please, refer to plot_scores() for more details.

biplot type 2: PC1 x PC2

biplot type 3: GY x WAAS

The quadrants proposed by Olivoto, Lúcio, Da silva, Marchioro, et al. (2019) in the following biplot represent four classifications regarding the joint interpretation of mean performance and stability. The genotypes or environments included in quadrant I can be considered unstable genotypes or environments with high discrimination ability, and with productivity below the grand mean. In quadrant II are included unstable genotypes, although with productivity above the grand mean. The environments included in this quadrant deserve special attention since, in addition to providing high magnitudes of the response variable, they present a good discrimination ability. Genotypes within quadrant III have low productivity, but can be considered stable due to the lower values of WAASB. The lower this value, the more stable the genotype can be considered. The environments included in this quadrant can be considered as poorly productive and with low discrimination ability. The genotypes within the quadrant IV are highly productive and broadly adapted due to the high magnitude of the response variable and high stability performance (lower values of WAASB). . To obtain this biplot must use an object of class waas (in our example, waas_index).

biplot type 4 : nominal yield and environment IPCA1

g <- plot_scores(AMMI_model, type = 4)
h <- plot_scores(AMMI_model,
                 type = 4,
                 color = FALSE)
arrange_ggplot(g, h, labels = letters[7:8])

Simultaneous selection for mean performance and stability

The WAASY index (Olivoto, Lúcio, Da silva, Sari, et al. 2019) is used for genotype ranking considering both the stability (WAAS) and mean performance based on the following model:

\[ WAASY{_i} = \frac{{\left( {r{G_i} \times {\theta _Y}} \right) + \left( {r{W_i} \times {\theta _S}} \right)}}{{{\theta _Y} + {\theta _S}}} \]

where \(WAASY_i\) is the superiority index for the i-th genotype that weights between performance and stability; \(rG_i\) and \(rW_i\) are the rescaled values (0-100) for GY and WAASB, respectively; \(\theta _Y\) and \(\theta_S\) are the weights for GY and WAASB, respectively.

This index was also already computed and stored into AMMI_model>GY>model. An intuitively plot may be obtained by running

i <- plot_waasby(waas_index)
j <- plot_waasby(waas_index,
                 col.shape = c("gray20", "gray80"),
                 plot_theme = theme_metan(grid = "x"))
arrange_ggplot(i, j, labels = letters[9:10])

The values of WAASY in the plot above were computed considering equal weights for mean performance and stability. Different weights may be assigned using the wresp argument of the waas() function.

Weighting the stability and mean performance

After fitting a model with the functions waas() or waasb() it is possible to compute the superiority indexes WAASY or WAASBY in different scenarios of weights for stability and mean performance. The number of scenarios is defined by the arguments increment. By default, twenty-one different scenarios are computed. In this case, the the superiority index is computed considering the following weights: stability (waasb or waas) = 100; mean performance = 0. In other words, only stability is considered for genotype ranking. In the next iteration, the weights becomes 95/5 (since increment = 5). In the third scenario, the weights become 90/10, and so on up to these weights become 0/100. In the last iteration, the genotype ranking for WAASY or WAASBY matches perfectly with the ranks of the response variable.

WAASratio <- wsmp(waas_index)

Printing the model outputs

The genotype ranking for each scenario of WAASY/GY weight ratio is shown bellow

In addition, the genotype ranking depending on the number of multiplicative terms used to estimate the WAAS index is also computed.

Plotting the heat map graphics

The first type of heatmap shows the genotype ranking depending on the number of principal component axes used for estimating the WAASB index. An euclidean distance-based dendrogram is used for grouping the genotype ranking for both genotypes and principal component axes. The second type of heatmap shows the genotype ranking depending on the WAASB/GY ratio. The ranks obtained with a ratio of 100/0 considers exclusively the stability for genotype ranking. On the other hand, a ratio of 0/100 considers exclusively the productivity for genotype ranking.

Ranks of genotypes depending on the number of PCA used to estimate the WAAS


plot(WAASratio, type = 1)

Ranks of genotypes depending on the WAAS/GY ratio


plot(WAASratio, type = 2)

Getting model data

The function get_model_data() may be used to easily get the data from a model fitted with the function waas(), especially when more than one variables are used. Select helpers can be used in the argument resp. See the example below.

waas_index_all <-
waas(data_ge2, ENV, GEN, REP,
     resp = everything()) %>%
 get_model_data(what = "WAASB")

Other AMMI-based stability indexes

The following AMMI-based stability indexes may be computed using the function AMMI_indexes():

  • AMMI stability value, ASV, (Purchase, Hatting, and Deventer 2000).

\[ ASV = \sqrt {{{\left[ {\frac{{IPCA{1_{ss}}}}{{IPCA{2_{ss}}}} \times \left( {IPCA{1_{score}}} \right)} \right]}^2} + {{\left( {IPCA{2_{score}}} \right)}^2}} \]

  • Sums of the absolute value of the IPCA scores

\[ SIP{C_i} = \sum\nolimits_{k = 1}^P {\left| {\mathop {\lambda }\nolimits_k^{0.5} {a_{ik}}} \right|} \]

  • Averages of the squared eigenvector values

\[ E{V_i} = \sum\nolimits_{k = 1}^P {\mathop a\nolimits_{ik}^2 } /P \] described by Sneller, Kilgore-Norquest, and Dombek (1997), where P is the number of IPCA retained via F-tests;

  • absolute value of the relative contribution of IPCAs to the interaction (Zali et al. 2012).

\[ Z{a_i} = \sum\nolimits_{k = 1}^P {{\theta _k}{a_{ik}}} \]

where \({\theta _k}\) is the percentage sum of squares explained by the k-th IPCA. Simultaneous selection indexes (ssi), are computed by summation of the ranks of the ASV, SIPC, EV and Za indexes and the ranks of the mean yields (Farshadfar 2008), which results in ssiASV, ssiSIPC, ssiEV, and ssiZa, respectively.

The AMMI_index() function has two arguments. The first (x) is the model, which must be an object of the class waas or performs_ammi. The second, (order.y) is the order for ranking the response variable. By default, it is set to NULL, which means that the response variable is ordered in descending order. If x is a list with more than one variable, order.y must be a vector of the same length of x. Each element of the vector must be one of the “h” or “l”. If “h” is used, the response variable will be ordered from maximum to minimum. If “l” is used then the response variable will be ordered from minimum to maximum. We will use the previously fitted model AMMI_model to compute the AMMI-based stability indexes.

AMMI model for unbalanced data

Singular Value Decomposition requires a complete two-way table (i.e., all genotypes in all environments). Sometimes (for several reasons), a complete two-way table cannot be obtained in a multi-environment trial. metan offers an option to impute the missing cells of the two-way table using Expectation-Maximization algorithms. If an incomplete two-way table is identified in performs_ammi() a warning is issued, impute_missing_val() is called internally and the missing value(s) is(are) imputed using a low-rank Singular Value Decomposition approximation estimated by the Expectation-Maximization algorithm. The algorithm will (i) initialize all NA values to the column means; (ii) compute the first axis terms of the SVD of the completed matrix; (iii) replace the previously missing values with their approximations from the SVD; (iv) iterate steps 2 through 3 until convergence or a maximum number of iterations be achieved.

As an example we will run the AMMI model by omiting H2 from E1 in `data_ge2.

miss_val <- 
  data_ge2 %>% 
  remove_rows(4:6) %>% 
  droplevels()

mod_miss <- 
  performs_ammi(miss_val, ENV, GEN, REP, PH)
# ----------------------------------------------
# Convergence information
# ----------------------------------------------
# Number of iterations: 23
# Final RMSE: 6.007683e-11
# Number of axis: 1
# Convergence: TRUE
# ----------------------------------------------
# Warning: Data imputation used to fill the GxE matrix
# New names:
# * `` -> ...9
# variable PH 
# ---------------------------------------------------------------------------
# AMMI analysis table
# ---------------------------------------------------------------------------
#     Source  Df Sum Sq Mean Sq F value   Pr(>F) Percent Accumul
#        ENV   3  6.952  2.3173 117.463 5.92e-07       .       .
#   REP(ENV)   8  0.158  0.0197   0.863 5.51e-01       .       .
#        GEN  12  2.470  0.2058   9.000 3.03e-11       .       .
#    GEN:ENV  35  5.286  0.1510   6.603 1.39e-13       .       .
#        PC1  14  4.413  0.3152  13.780 0.00e+00    83.4    83.4
#        PC2  12  0.588  0.0490   2.140 2.12e-02    11.1    94.5
#        PC3  10  0.290  0.0290   1.270 2.59e-01     5.5     100
#  Residuals  94  2.150  0.0229      NA       NA       .       .
#      Total 152 17.016  0.1119      NA       NA    <NA>    <NA>
# ---------------------------------------------------------------------------
# 
# All variables with significant (p < 0.05) genotype-vs-environment interaction
# Done!
p1 <- plot_scores(mod_miss, type = 2, title = FALSE)

mod_comp <- 
  data_ge2 %>% 
  performs_ammi(ENV, GEN, REP, PH)
# New names:
# * `` -> ...9
# variable PH 
# ---------------------------------------------------------------------------
# AMMI analysis table
# ---------------------------------------------------------------------------
#     Source  Df Sum Sq Mean Sq F value   Pr(>F) Percent Accumul
#        ENV   3  7.719  2.5728 127.913 4.25e-07       .       .
#   REP(ENV)   8  0.161  0.0201   0.897 5.22e-01       .       .
#        GEN  12  1.865  0.1554   6.929 6.89e-09       .       .
#    GEN:ENV  36  5.397  0.1499   6.686 5.01e-14       .       .
#        PC1  14  4.466  0.3190  14.230 0.00e+00    82.8    82.8
#        PC2  12  0.653  0.0545   2.430 8.40e-03    12.1    94.9
#        PC3  10  0.277  0.0277   1.240 2.76e-01     5.1     100
#  Residuals  96  2.153  0.0224      NA       NA       .       .
#      Total 155 17.294  0.1116      NA       NA    <NA>    <NA>
# ---------------------------------------------------------------------------
# 
# All variables with significant (p < 0.05) genotype-vs-environment interaction
# Done!
p2 <- plot_scores(mod_comp, type = 2, title = FALSE)  

arrange_ggplot(p1, p2, labels = c("Missing data", "Complete data"))


Rendering engine

This vignette was built with pkgdown. All tables were produced with the package DT using the following function.

library(DT) # Used to make the tables
# Function to make HTML tables
print_table <- function(table, rownames = FALSE, ...){
datatable(table, rownames = rownames, extensions = 'Buttons',
          options = list(scrollX = TRUE, dom = '<<t>Bp>', buttons = c('copy', 'excel', 'pdf', 'print')), ...) %>%
    formatSignif(columns = c(1:ncol(table)), digits = 3)}

References

Farshadfar, E. 2008. “Incorporation of AMMI stability value and grain yield in a single non-parametric index (GSI) in bread wheat.” Pakistan Journal of Biological Sciences 11 (14): 1791–6. http://www.ncbi.nlm.nih.gov/pubmed/18817218.

Gollob, H. F. 1968. “A statistical model which combines features of factor analytic and analysis of variance techniques.” Psychometrika 33 (1): 73–115. https://doi.org/10.1007/BF02289676.

Olivoto, T., A. D. C Lúcio, J. A. G. Da silva, V. S. Marchioro, V. Q. de Souza, and E. Jost. 2019. “Mean performance and stability in multi-environment trials I: Combining features of AMMI and BLUP techniques.” Agronomy Journal 111 (6): 2949–60. https://doi.org/10.2134/agronj2019.03.0220.

Olivoto, T., A. D. C Lúcio, J. A. G. Da silva, B. G. Sari, and M. I. Diel. 2019. “Mean performance and stability in multi-environment trials II: Selection based on multiple traits.” Agronomy Journal 111 (6): 2961–9. https://doi.org/10.2134/agronj2019.03.0221.

Piepho, H. P. 1994. “Best Linear Unbiased Prediction (Blup) for Regional Yield Trials: A Comparison to Additive Main Effects and Multiplicative Interaction (Ammi) Analysis.” Theor. Appl. Genet. 89 (5): 647–54. https://doi.org/10.1007/BF00222462.

Purchase, J. L., Hesta Hatting, and C. S. van Deventer. 2000. “Genotype × environment interaction of winter wheat ( Triticum aestivum L.) in South Africa: II. Stability analysis of yield performance.” South African Journal of Plant and Soil 17 (3): 101–7. https://doi.org/10.1080/02571862.2000.10634878.

Sneller, C. H., L. Kilgore-Norquest, and D. Dombek. 1997. “Repeatability of Yield Stability Statistics in Soybean.” Crop Science 37 (2): 383–90. https://doi.org/10.2135/cropsci1997.0011183X003700020013x.

Zali, H., E. Farshadfar, S. H. Sabaghpour, and R. Karimizadeh. 2012. “Evaluation of genotype × environment interaction in chickpea using measures of stability from AMMI model.” Annals of Biological Research 3 (7): 3126–36. http://eprints.icrisat.ac.in/7173/.