Impute the missing entries of a matrix with missing values using different algorithms. See Details section for more details
Usage
impute_missing_val(
  .data,
  naxis = 1,
  algorithm = "EM-SVD",
  tol = 1e-10,
  max_iter = 1000,
  simplified = FALSE,
  verbose = TRUE
)Arguments
- .data
- A matrix to impute the missing entries. Frequently a two-way table of genotype means in each environment. 
- naxis
- The rank of the Singular Value Approximation. Defaults to - 1.
- algorithm
- The algorithm to impute missing values. Defaults to - "EM-SVD". Other possible values are- "EM-AMMI"and- "colmeans". See Details section.
- tol
- The convergence tolerance for the algorithm. 
- max_iter
- The maximum number of steps to take. If - max_iteris achieved without convergence, the algorithm will stop with a warning.
- simplified
- Valid argument when - algorithm = "EM-AMMI". IF- FALSE(default), the current effects of rows and columns change from iteration to iteration. If- TRUE, the general mean and effects of rows and columns are computed in the first iteration only, and in next iterations uses these values.
- verbose
- Logical argument. If - verbose = FALSEthe code will run silently.
Value
An object of class imv with the following values:
- .data The imputed matrix 
- pc_ss The sum of squares representing variation explained by the principal components 
- iter The final number of iterations. 
- Final_RMSE The maximum change of the estimated values for missing cells in the last step of iteration. 
- final_axis The final number of principal component axis. 
- convergence Logical value indicating whether the modern converged. 
Details
EM-AMMI algorithm
The EM-AMMI algorithm completes a data set with missing values according to both
main and interaction effects. The algorithm works as follows (Gauch and
Zobel, 1990):
- The initial values are calculated as the grand mean increased by main effects of rows and main effects of columns. That way, the matrix of observations is pre-filled in. 
- The parameters of the AMMI model are estimated. 
- The adjusted means are calculated based on the AMMI model with - naxisprincipal components.
- The missing cells are filled with the adjusted means. 
- The root mean square error of the predicted values ( - RMSE_p) is calculated with the two lasts iteration steps. If- RMSE_p > tol, the steps 2 through 5 are repeated. Declare convergence if- RMSE_p < tol. If- max_iteris achieved without convergence, the algorithm will stop with a warning.
EM-SVD algorithm
The EM-SVD algorithm impute the missing entries using a low-rank Singular
Value Decomposition approximation estimated by the Expectation-Maximization
algorithm. The algorithm works as follows (Troyanskaya et al., 2001).
- Initialize all - NAvalues to the column means.
- Compute the first - naxisterms of the SVD of the completed matrix
- Replace the previously missing values with their approximations from the SVD 
- The root mean square error of the predicted values ( - RMSE_p) is calculated with the two lasts iteration steps. If- RMSE_p > tol, the steps 2 through 3 are repeated. Declare convergence if- RMSE_p < tol. If- max_iteris achieved without convergence, the algorithm will stop with a warning.
colmeans algorithm
The colmeans algorithm simply impute the missing entires using the
column mean of the respective entire. Thus, there is no iteractive process.
References
Gauch, H. G., & Zobel, R. W. (1990). Imputing missing yield trial data. Theoretical and Applied Genetics, 79(6), 753-761. doi:10.1007/BF00224240
Troyanskaya, O., Cantor, M., Sherlock, G., Brown, P., Hastie, T., Tibshirani, R., . Altman, R. B. (2001). Missing value estimation methods for DNA microarrays. Bioinformatics, 17(6), 520-525.
Examples
# \donttest{
library(metan)
mat <- (1:20) %*% t(1:10)
mat
#>       [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
#>  [1,]    1    2    3    4    5    6    7    8    9    10
#>  [2,]    2    4    6    8   10   12   14   16   18    20
#>  [3,]    3    6    9   12   15   18   21   24   27    30
#>  [4,]    4    8   12   16   20   24   28   32   36    40
#>  [5,]    5   10   15   20   25   30   35   40   45    50
#>  [6,]    6   12   18   24   30   36   42   48   54    60
#>  [7,]    7   14   21   28   35   42   49   56   63    70
#>  [8,]    8   16   24   32   40   48   56   64   72    80
#>  [9,]    9   18   27   36   45   54   63   72   81    90
#> [10,]   10   20   30   40   50   60   70   80   90   100
#> [11,]   11   22   33   44   55   66   77   88   99   110
#> [12,]   12   24   36   48   60   72   84   96  108   120
#> [13,]   13   26   39   52   65   78   91  104  117   130
#> [14,]   14   28   42   56   70   84   98  112  126   140
#> [15,]   15   30   45   60   75   90  105  120  135   150
#> [16,]   16   32   48   64   80   96  112  128  144   160
#> [17,]   17   34   51   68   85  102  119  136  153   170
#> [18,]   18   36   54   72   90  108  126  144  162   180
#> [19,]   19   38   57   76   95  114  133  152  171   190
#> [20,]   20   40   60   80  100  120  140  160  180   200
# 10% of missing values at random
miss_mat <- random_na(mat, prop = 10)
miss_mat
#>       [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
#>  [1,]    1    2    3    4    5    6    7    8    9    10
#>  [2,]    2    4    6    8   10   NA   14   16   NA    20
#>  [3,]    3    6    9   NA   15   18   21   24   27    30
#>  [4,]    4    8   12   16   20   24   28   32   36    40
#>  [5,]    5   10   15   20   25   30   35   40   45    50
#>  [6,]   NA   12   18   24   30   36   42   48   54    60
#>  [7,]    7   14   21   28   35   42   49   56   63    70
#>  [8,]    8   16   24   NA   40   48   56   64   72    80
#>  [9,]    9   18   NA   NA   NA   54   63   72   81    90
#> [10,]   NA   20   30   40   50   60   NA   NA   90   100
#> [11,]   NA   22   33   44   55   66   77   88   99   110
#> [12,]   12   24   36   48   60   72   84   96  108   120
#> [13,]   13   26   39   52   65   78   91  104  117   130
#> [14,]   14   28   42   56   70   84   NA  112  126   140
#> [15,]   15   30   45   60   75   90  105   NA  135   150
#> [16,]   16   32   48   64   NA   96  112  128  144    NA
#> [17,]   17   34   51   68   85  102  119  136  153   170
#> [18,]   18   36   54   72   90  108  126  144  162   180
#> [19,]   NA   38   57   76   95  114  133  152   NA    NA
#> [20,]   20   NA   60   80  100  120  140  160  180   200
mod <- impute_missing_val(miss_mat)
#> ----------------------------------------------
#> Convergence information
#> ----------------------------------------------
#> Number of iterations: 46
#> Final RMSE: 6.827148e-11
#> Number of axis: 1
#> Convergence: TRUE
#> ----------------------------------------------
mod$.data
#>       X1 X2 X3 X4  X5  X6  X7  X8  X9 X10
#>  [1,]  1  2  3  4   5   6   7   8   9  10
#>  [2,]  2  4  6  8  10  12  14  16  18  20
#>  [3,]  3  6  9 12  15  18  21  24  27  30
#>  [4,]  4  8 12 16  20  24  28  32  36  40
#>  [5,]  5 10 15 20  25  30  35  40  45  50
#>  [6,]  6 12 18 24  30  36  42  48  54  60
#>  [7,]  7 14 21 28  35  42  49  56  63  70
#>  [8,]  8 16 24 32  40  48  56  64  72  80
#>  [9,]  9 18 27 36  45  54  63  72  81  90
#> [10,] 10 20 30 40  50  60  70  80  90 100
#> [11,] 11 22 33 44  55  66  77  88  99 110
#> [12,] 12 24 36 48  60  72  84  96 108 120
#> [13,] 13 26 39 52  65  78  91 104 117 130
#> [14,] 14 28 42 56  70  84  98 112 126 140
#> [15,] 15 30 45 60  75  90 105 120 135 150
#> [16,] 16 32 48 64  80  96 112 128 144 160
#> [17,] 17 34 51 68  85 102 119 136 153 170
#> [18,] 18 36 54 72  90 108 126 144 162 180
#> [19,] 19 38 57 76  95 114 133 152 171 190
#> [20,] 20 40 60 80 100 120 140 160 180 200
# }
