Random Sampling — utils_samples • metan

sample_random() performs Simple Random Sampling or Stratified Random Sampling
sample_systematic() performs systematic sampling. In this case, a regular interval of size k (k = floor(N/n)) is generated considering the population size (N) and desired sample size (n). Then, the starting member (r) is randomly chosen between 1-k. The second element is r + k, and so on.

Usage

sample_random(data, n, prop, by = NULL, weight = NULL)

sample_systematic(data, n, r = NULL, by = NULL)

Arguments

data: A data frame. If data is a grouped_df, the operation will be performed on each group (stratified).
n, prop: Provide either n, the number of rows, or prop, the proportion of rows to select. If neither are supplied, n = 1 will be used.
by: A categorical variable to compute the sample by. It is a shortcut to dplyr::group_by() that allows to group the data by one categorical variable. If more than one grouping variable needs to be used, use dplyr::group_by() to pass the data grouped.
weight: Sampling weights. This must evaluate to a vector of non-negative numbers the same length as the input. Weights are automatically standardised to sum to 1.
r: The starting element. By default, r is randomly selected between 1:k

Value

An object of the same type as data.

Examples

library(metan)
sample_random(data_ge, n = 5)
#> # A tibble: 5 × 5
#>   ENV   GEN   REP      GY    HM
#>   <fct> <fct> <fct> <dbl> <dbl>
#> 1 E14   G6    1      2.00  39  
#> 2 E6    G9    2      2.85  47.5
#> 3 E1    G10   1      2.18  47.3
#> 4 E10   G8    2      2.89  45  
#> 5 E14   G8    1      1.77  42  
sample_random(data_ge,
              n = 3,
              by = ENV)
#> # A tibble: 42 × 5
#>    ENV   GEN   REP      GY    HM
#>    <fct> <fct> <fct> <dbl> <dbl>
#>  1 E1    G10   3      2.48  48.8
#>  2 E1    G1    1      2.17  44.9
#>  3 E1    G6    3      2.19  48.4
#>  4 E10   G7    3      2.18  43  
#>  5 E10   G7    1      2.60  40  
#>  6 E10   G6    2      2.53  47  
#>  7 E11   G8    2      1.41  58  
#>  8 E11   G9    1      1.01  53.7
#>  9 E11   G10   2      1.04  51  
#> 10 E12   G4    3      1.44  53  
#> # … with 32 more rows

sample_systematic(data_g, n = 6)
#> k = 6
#> # A tibble: 6 × 18
#>     .id GEN   REP      PH    EH    EP    EL    ED    CL    CD    CW    KW    NR
#>   <dbl> <fct> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1     2 H1    2      2.20 1.09  0.492  13.7  49.2  30.5  14.7  22.3  130.  16.4
#> 2     8 H11   2      2.09 1.06  0.509  12.2  46.9  26.5  14.3  13.5  114.  16.8
#> 3    14 H13   2      2.58 1.32  0.511  15.2  50.3  26.7  15.9  19.3  174.  20.4
#> 4    20 H3    2      1.96 0.926 0.473  15.5  46.2  27.4  16.2  17.8  135.  14  
#> 5    26 H5    2      2.05 0.932 0.454  15.3  49.1  29.9  16.2  26.9  144.  18  
#> 6    32 H7    2      2.14 1.05  0.489  13.8  46.2  27.8  14.3  23.0  135.  14.4
#> # … with 5 more variables: NKR <dbl>, CDED <dbl>, PERK <dbl>, TKW <dbl>,
#> #   NKE <dbl>