`vignettes/vignettes_ammi.Rmd`

`vignettes_ammi.Rmd`

In this section, we will use the data in `data_ge`

. For more information, please, see `?data_ge`

. Other data sets can be used provided that the following columns are in the dataset: environment, genotype, block/replicate and response variable(s).

The first step is to inspect the data with the function `inspect()`

.

```
library(metan)
insp <- inspect(data_ge,
verbose = FALSE,
plot = TRUE)
```

It is suggested to check if genotype-vs-environment interaction is significant before proceeding with the AMMI analysis. A within-environment ANOVA considering a fixed-effect model is computed with the function `anova_ind()`

. For each environment the Mean Squares for block, genotypes and error are shown. Estimated F-value and the probability error are also shown for block and genotype effects. Some measures of experimental precision are calculated, namely, coefficient of variation, \(CV = (\sqrt{MS_{res}}/Mean) \times 100\); the heritability, \(h2 = (MS_{gen} - MS_{res})/MS_{gen}\), and the accuracy of selection, \(As = \sqrt{h2}\).

```
indiv <- anova_ind(data_ge, ENV, GEN, REP, GY, verbose = FALSE)
# Warning: `verbose` is deprecated. It will be defunct in the new release.
print_table(indiv$GY$individual)
```

The joint ANOVA is performed with the function `anova_joint()`

.

```
library(metan)
joint <- anova_joint(data_ge, ENV, GEN, REP, GY, verbose = FALSE)
print_table(joint$GY$anova)
```

The genotype-vs-environment interaction was highly significant. So we’ll proceed with the AMMI analysis.

The estimate of the response variable for the *i*th genotype in the *j*th environment using The Additive Main Effect and Multiplicative interaction (AMMI) model, is given as follows:

\[ {y_{ij}} = \mu + {\alpha_i} + {\tau_j} + \sum\limits_{k = 1}^p {{\lambda _k}{a_{ik}}} {t_{jk}} + {\rho _{ij}} + {\varepsilon _{ij}} \]

where \({\lambda_k}\) is the singular value for the *k*-th interaction principal component axis (IPCA); \(a_{ik}\) is the *i*-th element of the *k*-th eigenvector; \(t_{jk}\) is the *j*th element of the *k*th eigenvector. A residual \(\rho _{ij}\) remains, if not all *p* IPCA are used, where \(p \le min(g - 1; e - 1)\).

The AMMI model is fitted with the `performs_ammi()`

function. The first argument is the data, in our example `data_ge`

. The second argument (`resp`

) is the response variable to be analyzed. The function allow a single variable (in this case GY) or a vector of response variables. The arguments (`gen`

, `env`

, and `rep`

) are the name of the columns that contains the levels for genotypes, environments, and replications, respectively. The last argument (`verbose`

) control if the code will run silently.

```
AMMI_model <- performs_ammi(data_ge,
env = ENV,
gen = GEN,
rep = REP,
resp = GY,
verbose = FALSE)
# New names:
# * `` -> ...15
```

Note that using the arguments in the correct order, the model above may be fitted cleanly with:

`AMMI_model <- performs_ammi(data_ge, ENV, GEN, REP, GY, verbose = FALSE)`

The following comand creates the well-known ANOVA table for the AMMI model. Note that since

Nine interaction principal component axis (IPCA) were fitted and four were significant at 5% probability error. Based on this result, the AMMI4 model would be the best model to predict the yielding of the genotypes in the studied environments.

The response variable of a two-way table (for example, the yield of *m* genotypes in *n* environments) may be estimated using the S3 method `predict()`

applyed to an object of class `waas`

. This estimation is based on the number of multiplicative terms declared in the function. If `naxis = 1`

, the AMMI1 (with one multiplicative term) is used for estimating the response variable. If `naxis = min(g - 1; e - 1)`

, the AMMIF is fitted. A summary of all possible AMMI models is presented below.

Member of AMMI family | Espected response of the i-th genotype in the jth environment |
---|---|

AMMI0 | \(\hat{y}_{ij} = \bar{y}_{i.} + \bar{y}_{.j} - \bar{y}_{..}\) |

AMMI1 | \(\hat{y}_{ij} = \bar{y}_{i.} + \bar{y}_{.j} - \bar{y}_{..} +\lambda_1 a_{i1}t_{j1}\) |

AMMI2 | \(\hat{y}_{ij} = \bar{y}_{i.} + \bar{y}_{.j} - \bar{y}_{..} +\lambda_1 a_{i1}t_{j1}+\lambda_2 a_{i2}t_{j2}\) |

… | |

AMMIF | \(\hat{y}_{ij} = \bar{y}_{i.} + \bar{y}_{.j} - \bar{y}_{..} +\lambda_1 a_{i1}t_{j1}+\lambda_2 a_{i2}t_{j2}+...+\lambda_p a_{ip}t_{jp}\) |

Procedures based on postdictive success, such as Gollobs’s test (Gollob 1968) or predictive success, such as cross-validation (Piepho 1994) should be used to define the number of IPCA used for estimating the response variable in AMMI analysis. This package provides both. The `waas()`

function compute traditional AMMI analysis showing the number of significant axes according to Gollobs’s test. On the other hand, `cv_ammif()`

function provides cross-validation of AMMI-model family, considering a completely randomized design (CRD) or a randomized complete block design (RCBD).

```
predicted <- predict(AMMI_model, naxis = 4)
print_table(predicted$GY)
```

The following values are presented: **ENV** is the environment; **GEN** is the genotype; **Y** is the response variable; **resOLS** is the residual (\(\hat{z}_{ij}\)) estimated by the Ordinary Least Square (OLS), where \(\hat{z}_{ij} = y_{ij} - \bar{y}_{i.} - \bar{y}_{.j} + \bar{y}_{ij}\); **Ypred** is the predicted value by OLS (\(\hat{y}_{ij} = y_{ij} -\hat{z}_{ij}\)); **ResAMMI** is the residual estimated by the AMMI model (\(\hat{a}_{ij}\)) considering the number of multiplicative terms informed in the function (in this case 5), where \(\hat{a}_{ij} = \lambda_1\alpha_{i1}\tau_{j1}+...+\lambda_5\alpha_{i5}\tau_{j5}\); **YpredAMMI** is the predicted value by AMMI model \(\hat{ya}_{ij} = \bar{y}_{i.} + \bar{y}_{.j} - \bar{y}_{ij}+\hat{a}_{ij}\); and **AMMI0** is the predicted value when no multiplicative terms are used, i.e., \(\hat{y}_{ij} = \bar{y}_{i.} + \bar{y}_{.j} - \bar{y}_{ij}\).

The `waas()`

function computes the Weighted Average of Absolute Scores (Olivoto, Lúcio, Da silva, Marchioro, et al. 2019) considering (i) all principal component axes that were significant (\(p < 0.05\) by default); or (ii) declaring a specific number of axes to be used, according to the following equation:

\[ WAAS_i = \sum_{k = 1}^{p} |IPCA_{ik} \times EP_k|/ \sum_{k = 1}^{p}EP_k \]

where \(WAAS_i\) is the weighted average of absolute scores of the *i*th genotype; \(PCA_{ik}\) is the score of the *i*th genotype in the *k*th IPCA; and \(EP_k\) is the explained variance of the *k*th IPCA for \(k = 1,2,..,p\), considering *p* the number of significant PCAs, or a declared number of PCAs. The following functions may be used to do that.

```
waas_index <- waas(data_ge, ENV, GEN, REP, GY, verbose = FALSE)
# New names:
# * `` -> ...15
```

In this example only IPCAs with *P*-value < 0.05 will be considered in the WAAS estimation. This is the default setting and the model was already fitted and stored into `AMMI_model>GY>model`

.

The output generated by the `waas()`

function shows the following results: **type**, genotype (GEN) or environment (ENV); **Code**, the code attributed to each level of the factors; **Y**, the response variable (in this case the grain yield); **WAAS** the weighted average of the absolute scores, estimated with all PCA axes with *P*-value \(\le\) 0.05; **PctWAAS** and **PctResp** that are the percentage values for the WAAS and Y, respectively; **OrResp** and **OrWAAS** that are the ranks attributed to the genotype and environment regarding the Y or WAAS, respectively; **WAASY** is the weighted average of absolute scores and response variable. In this case, considering equal weights for PctResp and PctWAAS, the WAASY for G1 is estimated by: \(WAAS_{G1} = [(24.87\times50)+(91.83\times50)]/50+50 = 58.35\). Then the **OrWAASY** is the rank for the WAASY value. The genotype (or environment) with the largest WAASY value has the first ranked.

The second option to compute the WAAS is by manually declaring a specific number of multiplicative terms. In this case, the number of terms declared is used independently of its significance. Let us, for the moment, assume that after a cross-validation procedure the AMMI7 was the most predictively accurate AMMI model and the researcher will use this model. The additional argument `naxis`

in the function `waas`

is then used to overwrite the default chose of significant terms.

```
waas_index2 <- data_ge %>%
waas(ENV, GEN, REP, GY,
naxis = 7, # Use 7 IPCA for computing WAAS
verbose = FALSE)
```

The only difference in this output is that here we declared that seven IPCA axes should be used for computing the WAAS value. Thus, only the values of WAAS, OrWAAS, WAASY and OrWAASY may have significant changes.

Provided that an object of class `waas`

or `performs_ammi`

is available in the global environment, the graphics may be obtained using the function `plot_scores()`

. To do that, we will revisit the previusly fitted model `AMMI_model`

. Please, refer to `plot_scores()`

for more details.

```
a <- plot_scores(AMMI_model)
b <- plot_scores(AMMI_model,
col.gen = "black",
col.env = "gray70",
col.segm.env = "gray70",
axis.expand = 1.5,
plot_theme = theme_metan(grid = "both"))
arrange_ggplot(a, b, labels = letters[1:2])
```

```
c <- plot_scores(AMMI_model, type = 2)
d <- plot_scores(AMMI_model,
type = 2,
polygon = T,
col.segm.env = "transparent",
plot_theme = theme_metan_minimal())
arrange_ggplot(c, d, labels = letters[3:4])
```

The quadrants proposed by Olivoto, Lúcio, Da silva, Marchioro, et al. (2019) in the following biplot represent four classifications regarding the joint interpretation of mean performance and stability. The genotypes or environments included in quadrant I can be considered unstable genotypes or environments with high discrimination ability, and with productivity below the grand mean. In quadrant II are included unstable genotypes, although with productivity above the grand mean. The environments included in this quadrant deserve special attention since, in addition to providing high magnitudes of the response variable, they present a good discrimination ability. Genotypes within quadrant III have low productivity, but can be considered stable due to the lower values of WAASB. The lower this value, the more stable the genotype can be considered. The environments included in this quadrant can be considered as poorly productive and with low discrimination ability. The genotypes within the quadrant IV are highly productive and broadly adapted due to the high magnitude of the response variable and high stability performance (lower values of WAASB). . To obtain this biplot must use an object of class `waas`

(in our example, `waas_index`

).

```
e <- plot_scores(waas_index, type = 3)
f <- plot_scores(waas_index,
type = 3,
x.lab = "My custom x label",
size.shape = 4, # Size of the shape point
col.gen = "gray50", # Color for genotypes
size.tex.pa = 4, # Size of the text
col.alpha.env = 0, # Transparency of environment's point
x.lim = c(2.4, 3.1), # Limits of x axis
x.breaks = seq(2.4, 3.1, by = 0.1), # Markers of x axis
y.lim = c(0, 0.7))+
ggplot2::ggtitle("WAASB vs GY plot", subtitle = "Zoom in genotypes' scores")
arrange_ggplot(e, f, labels = letters[5:6])
```

```
g <- plot_scores(AMMI_model, type = 4)
h <- plot_scores(AMMI_model,
type = 4,
color = FALSE)
arrange_ggplot(g, h, labels = letters[7:8])
```

The WAASY index (Olivoto, Lúcio, Da silva, Sari, et al. 2019) is used for genotype ranking considering both the stability (WAAS) and mean performance based on the following model:

\[ WAASY{_i} = \frac{{\left( {r{G_i} \times {\theta _Y}} \right) + \left( {r{W_i} \times {\theta _S}} \right)}}{{{\theta _Y} + {\theta _S}}} \]

where \(WAASY_i\) is the superiority index for the *i*-th genotype that weights between performance and stability; \(rG_i\) and \(rW_i\) are the rescaled values (0-100) for GY and WAASB, respectively; \(\theta _Y\) and \(\theta_S\) are the weights for GY and WAASB, respectively.

This index was also already computed and stored into AMMI_model>GY>model. An intuitively plot may be obtained by running

```
i <- plot_waasby(waas_index)
j <- plot_waasby(waas_index,
col.shape = c("gray20", "gray80"),
plot_theme = theme_metan(grid = "x"))
arrange_ggplot(i, j, labels = letters[9:10])
```

The values of WAASY in the plot above were computed considering equal weights for mean performance and stability. Different weights may be assigned using the `wresp`

argument of the `waas()`

function.

After fitting a model with the functions `waas()`

or `waasb()`

it is possible to compute the superiority indexes WAASY or WAASBY in different scenarios of weights for stability and mean performance. The number of scenarios is defined by the arguments `increment`

. By default, twenty-one different scenarios are computed. In this case, the the superiority index is computed considering the following weights: stability (waasb or waas) = 100; mean performance = 0. In other words, only stability is considered for genotype ranking. In the next iteration, the weights becomes 95/5 (since increment = 5). In the third scenario, the weights become 90/10, and so on up to these weights become 0/100. In the last iteration, the genotype ranking for WAASY or WAASBY matches perfectly with the ranks of the response variable.

`WAASratio <- wsmp(waas_index)`

The genotype ranking for each scenario of WAASY/GY weight ratio is shown bellow

In addition, the genotype ranking depending on the number of multiplicative terms used to estimate the WAAS index is also computed.

The first type of heatmap shows the genotype ranking depending on the number of principal component axes used for estimating the WAASB index. An euclidean distance-based dendrogram is used for grouping the genotype ranking for both genotypes and principal component axes. The second type of heatmap shows the genotype ranking depending on the WAASB/GY ratio. The ranks obtained with a ratio of 100/0 considers exclusively the stability for genotype ranking. On the other hand, a ratio of 0/100 considers exclusively the productivity for genotype ranking.

```
plot(WAASratio, type = 1)
```

```
plot(WAASratio, type = 2)
```

The function `get_model_data()`

may be used to easily get the data from a model fitted with the function `waas()`

, especially when more than one variables are used. Select helpers can be used in the argument `resp`

. See the example below.

```
waas_index_all <-
waas(data_ge2, ENV, GEN, REP,
resp = everything()) %>%
get_model_data(what = "WAASB")
```

The following AMMI-based stability indexes may be computed using the function `AMMI_indexes()`

:

**AMMI stability value, ASV, (Purchase, Hatting, and Deventer 2000).**

\[ ASV = \sqrt {{{\left[ {\frac{{IPCA{1_{ss}}}}{{IPCA{2_{ss}}}} \times \left( {IPCA{1_{score}}} \right)} \right]}^2} + {{\left( {IPCA{2_{score}}} \right)}^2}} \]

**Sums of the absolute value of the IPCA scores**

\[ SIP{C_i} = \sum\nolimits_{k = 1}^P {\left| {\mathop {\lambda }\nolimits_k^{0.5} {a_{ik}}} \right|} \]

**Averages of the squared eigenvector values**

\[
E{V_i} = \sum\nolimits_{k = 1}^P {\mathop a\nolimits_{ik}^2 } /P
\] described by Sneller, Kilgore-Norquest, and Dombek (1997), where *P* is the number of IPCA retained via F-tests;

**absolute value of the relative contribution of IPCAs to the interaction (Zali et al. 2012).**

\[ Z{a_i} = \sum\nolimits_{k = 1}^P {{\theta _k}{a_{ik}}} \]

where \({\theta _k}\) is the percentage sum of squares explained by the *k*-th IPCA. Simultaneous selection indexes (ssi), are computed by summation of the ranks of the ASV, SIPC, EV and Za indexes and the ranks of the mean yields (Farshadfar 2008), which results in ssiASV, ssiSIPC, ssiEV, and ssiZa, respectively.

The `AMMI_index()`

function has two arguments. The first (x) is the model, which must be an object of the class `waas`

or `performs_ammi`

. The second, (order.y) is the order for ranking the response variable. By default, it is set to NULL, which means that the response variable is ordered in descending order. If `x`

is a list with more than one variable, `order.y`

must be a vector of the same length of x. Each element of the vector must be one of the “h” or “l”. If “h” is used, the response variable will be ordered from maximum to minimum. If “l” is used then the response variable will be ordered from minimum to maximum. We will use the previously fitted model `AMMI_model`

to compute the AMMI-based stability indexes.

```
stab_indexes <- AMMI_indexes(AMMI_model)
print_table(stab_indexes$GY)
```

Singular Value Decomposition requires a complete two-way table (i.e., all genotypes in all environments). Sometimes (for several reasons), a complete two-way table cannot be obtained in a multi-environment trial. `metan`

offers an option to impute the missing cells of the two-way table using Expectation-Maximization algorithms. If an incomplete two-way table is identified in `performs_ammi()`

a warning is issued, `impute_missing_val()`

is called internally and the missing value(s) is(are) imputed using a low-rank Singular Value Decomposition approximation estimated by the Expectation-Maximization algorithm. The algorithm will (i) initialize all NA values to the column means; (ii) compute the first axis terms of the SVD of the completed matrix; (iii) replace the previously missing values with their approximations from the SVD; (iv) iterate steps 2 through 3 until convergence or a maximum number of iterations be achieved.

As an example we will run the AMMI model by omiting `H2`

from `E1`

in `data_ge2.

```
miss_val <-
data_ge2 %>%
remove_rows(4:6) %>%
droplevels()
mod_miss <-
performs_ammi(miss_val, ENV, GEN, REP, PH)
# ----------------------------------------------
# Convergence information
# ----------------------------------------------
# Number of iterations: 23
# Final RMSE: 6.007683e-11
# Number of axis: 1
# Convergence: TRUE
# ----------------------------------------------
# Warning: Data imputation used to fill the GxE matrix
# New names:
# * `` -> ...9
# variable PH
# ---------------------------------------------------------------------------
# AMMI analysis table
# ---------------------------------------------------------------------------
# Source Df Sum Sq Mean Sq F value Pr(>F) Percent Accumul
# ENV 3 6.952 2.3173 117.463 5.92e-07 . .
# REP(ENV) 8 0.158 0.0197 0.863 5.51e-01 . .
# GEN 12 2.470 0.2058 9.000 3.03e-11 . .
# GEN:ENV 35 5.286 0.1510 6.603 1.39e-13 . .
# PC1 14 4.413 0.3152 13.780 0.00e+00 83.4 83.4
# PC2 12 0.588 0.0490 2.140 2.12e-02 11.1 94.5
# PC3 10 0.290 0.0290 1.270 2.59e-01 5.5 100
# Residuals 94 2.150 0.0229 NA NA . .
# Total 152 17.016 0.1119 NA NA <NA> <NA>
# ---------------------------------------------------------------------------
#
# All variables with significant (p < 0.05) genotype-vs-environment interaction
# Done!
p1 <- plot_scores(mod_miss, type = 2, title = FALSE)
mod_comp <-
data_ge2 %>%
performs_ammi(ENV, GEN, REP, PH)
# New names:
# * `` -> ...9
# variable PH
# ---------------------------------------------------------------------------
# AMMI analysis table
# ---------------------------------------------------------------------------
# Source Df Sum Sq Mean Sq F value Pr(>F) Percent Accumul
# ENV 3 7.719 2.5728 127.913 4.25e-07 . .
# REP(ENV) 8 0.161 0.0201 0.897 5.22e-01 . .
# GEN 12 1.865 0.1554 6.929 6.89e-09 . .
# GEN:ENV 36 5.397 0.1499 6.686 5.01e-14 . .
# PC1 14 4.466 0.3190 14.230 0.00e+00 82.8 82.8
# PC2 12 0.653 0.0545 2.430 8.40e-03 12.1 94.9
# PC3 10 0.277 0.0277 1.240 2.76e-01 5.1 100
# Residuals 96 2.153 0.0224 NA NA . .
# Total 155 17.294 0.1116 NA NA <NA> <NA>
# ---------------------------------------------------------------------------
#
# All variables with significant (p < 0.05) genotype-vs-environment interaction
# Done!
p2 <- plot_scores(mod_comp, type = 2, title = FALSE)
arrange_ggplot(p1, p2, labels = c("Missing data", "Complete data"))
```

**Rendering engine**

This vignette was built with pkgdown. All tables were produced with the package `DT`

using the following function.

```
library(DT) # Used to make the tables
# Function to make HTML tables
print_table <- function(table, rownames = FALSE, ...){
datatable(table, rownames = rownames, extensions = 'Buttons',
options = list(scrollX = TRUE, dom = '<<t>Bp>', buttons = c('copy', 'excel', 'pdf', 'print')), ...) %>%
formatSignif(columns = c(1:ncol(table)), digits = 3)}
```

Farshadfar, E. 2008. “Incorporation of AMMI stability value and grain yield in a single non-parametric index (GSI) in bread wheat.” *Pakistan Journal of Biological Sciences* 11 (14): 1791–6. http://www.ncbi.nlm.nih.gov/pubmed/18817218.

Gollob, H. F. 1968. “A statistical model which combines features of factor analytic and analysis of variance techniques.” *Psychometrika* 33 (1): 73–115. https://doi.org/10.1007/BF02289676.

Olivoto, T., A. D. C Lúcio, J. A. G. Da silva, V. S. Marchioro, V. Q. de Souza, and E. Jost. 2019. “Mean performance and stability in multi-environment trials I: Combining features of AMMI and BLUP techniques.” *Agronomy Journal* 111 (6): 2949–60. https://doi.org/10.2134/agronj2019.03.0220.

Olivoto, T., A. D. C Lúcio, J. A. G. Da silva, B. G. Sari, and M. I. Diel. 2019. “Mean performance and stability in multi-environment trials II: Selection based on multiple traits.” *Agronomy Journal* 111 (6): 2961–9. https://doi.org/10.2134/agronj2019.03.0221.

Piepho, H. P. 1994. “Best Linear Unbiased Prediction (Blup) for Regional Yield Trials: A Comparison to Additive Main Effects and Multiplicative Interaction (Ammi) Analysis.” *Theor. Appl. Genet.* 89 (5): 647–54. https://doi.org/10.1007/BF00222462.

Purchase, J. L., Hesta Hatting, and C. S. van Deventer. 2000. “Genotype × environment interaction of winter wheat ( Triticum aestivum L.) in South Africa: II. Stability analysis of yield performance.” *South African Journal of Plant and Soil* 17 (3): 101–7. https://doi.org/10.1080/02571862.2000.10634878.

Sneller, C. H., L. Kilgore-Norquest, and D. Dombek. 1997. “Repeatability of Yield Stability Statistics in Soybean.” *Crop Science* 37 (2): 383–90. https://doi.org/10.2135/cropsci1997.0011183X003700020013x.

Zali, H., E. Farshadfar, S. H. Sabaghpour, and R. Karimizadeh. 2012. “Evaluation of genotype × environment interaction in chickpea using measures of stability from AMMI model.” *Annals of Biological Research* 3 (7): 3126–36. http://eprints.icrisat.ac.in/7173/.