Utilities for data manipulation

See the section Rendering engine to know how HTML tables were generated.

Utilities for rows and columns

Add columns and rows

The functions add_cols() and add_rows() can be used to add columns and rows, respectively to a data frame.

library(metan)
add_cols(data_ge,
         ROW_ID = 1:420) %>% 
  print_table()

It is also possible to add a column based on existing data. Note that the arguments .after and .before are used to select the position of the new column(s). This is particularly useful to put variables of the same category together.

add_cols(data_ge,
         GY2 = GY^2,
         .after = GY) %>% 
  print_table()

Select or remove columns and rows

The functions select_cols() and select_rows() can be used to select columns and rows, respectively from a data frame.

select_cols(data_ge2, ENV, GEN) %>% print_table()

select_rows(data_ge2, 1:3) %>% print_table()

Numeric columns can be selected quickly by using the function select_numeric_cols(). Non-numeric columns are selected with select_non_numeric_cols()

select_numeric_cols(data_ge2) %>% print_table()

select_non_numeric_cols(data_ge2) %>% print_table()

We can select the first or last columns quickly with select_first_col() and select_last_col(), respectively.

select_first_col(data_ge2) %>% print_table()

select_last_col(data_ge2) %>% print_table()

To remove columns or rows, use remove_cols() and remove_rows().

remove_cols(data_ge2, ENV, GEN)  %>% print_table()

remove_rows(data_ge2, 1:2, 5:8) %>% print_table()

Concatenating columns

The function concatetate() can be used to concatenate multiple columns of a data frame. It return a data frame with all the original columns in .data plus the concatenated variable, after the last column (by default), or at any position when using the arguments .before or .after.

concatenate(data_ge, ENV, GEN, REP, .after = REP) %>% print_table()

To drop the existing variables and keep only the concatenated column, use the argument drop = TRUE. To use concatenate() within a given function like add_cols() use the argument pull = TRUE to pull out the results to a vector.

concatenate(data_ge, ENV, GEN, REP, drop = TRUE) %>% head()
# # A tibble: 6 × 1
#   new_var
#   <chr>  
# 1 E1_G1_1
# 2 E1_G1_2
# 3 E1_G1_3
# 4 E1_G2_1
# 5 E1_G2_2
# 6 E1_G2_3
concatenate(data_ge, ENV, GEN, REP, pull = TRUE) %>% head()
# [1] "E1_G1_1" "E1_G1_2" "E1_G1_3" "E1_G2_1" "E1_G2_2" "E1_G2_3"

To check if a column exists in a data frame, use column_exists()

column_exists(data_ge, "ENV")
# [1] TRUE

Getting levels

To get the levels and the size of the levels of a factor, the functions get_levels() and get_level_size() can be used.

get_levels(data_ge, ENV)
#  [1] "E1"  "E10" "E11" "E12" "E13" "E14" "E2"  "E3"  "E4"  "E5"  "E6"  "E7" 
# [13] "E8"  "E9"
get_level_size(data_ge, ENV)
# # A tibble: 14 × 5
#    ENV     GEN   REP    GY    HM
#    <fct> <int> <int> <int> <int>
#  1 E1       30    30    30    30
#  2 E10      30    30    30    30
#  3 E11      30    30    30    30
#  4 E12      30    30    30    30
#  5 E13      30    30    30    30
#  6 E14      30    30    30    30
#  7 E2       30    30    30    30
#  8 E3       30    30    30    30
#  9 E4       30    30    30    30
# 10 E5       30    30    30    30
# 11 E6       30    30    30    30
# 12 E7       30    30    30    30
# 13 E8       30    30    30    30
# 14 E9       30    30    30    30

Utilities for numbers and strings

Rounding whole data frames

The function round_cols()round a selected column or a whole data frame to the specified number of decimal places (default 0). If no variables are informed, then all numeric variables are rounded.

round_cols(data_ge2) %>% print_table()

Alternatively, select variables to round.

round_cols(data_ge2, PH, EP, digits = 1)  %>% print_table()

Extracting and replacing numbers

The functions extract_number(), and replace_number() can be used to extract or replace numbers. As an example, we will extract the number of each genotype in data_g.

extract_number(data_ge, GEN) %>% print_table()

To replace numbers of a given column with a specified replacement, use replace_number(). By default, numbers are replaced with ““.

replace_number(data_ge, GEN) %>% print_table()

replace_number(data_ge, REP, pattern = 1, replacement = "Rep_1") %>% print_table()

Extracting, replacing, and removing strings

The functions extract_string(), and replace_string() are used in the same context of extract_number(), and replace_number(), but for handling with strings.

extract_string(data_ge, GEN) %>% print_table()

To replace strings, we can use the function replace_strings().

replace_string(data_ge,
               GEN,
               replacement = "GENOTYPE_") %>% 
  print_table()

To remove all strings of a data frame, use remove_strings().

remove_strings(data_ge) %>% print_table()

Tidy strings

The function tidy_strings() tidy up characters strings, non-numeric columns, or any selected columns in a data frame by putting all word in upper case, replacing any space, tabulation, punctuation characters by '_', and putting '_' between lower and upper cases. Consider the following character strings: messy_env by definition should represent a unique level of the factor environment (environment 1). messy_gen shows six genotypes, and messy_int represents the interaction of such genotypes with environment 1.

messy_env <- c("ENV 1", "Env   1", "Env1", "env1", "Env.1", "Env_1")
messy_gen <- c("GEN1", "gen 2", "Gen.3", "gen-4", "Gen_5", "GEN_6")
messy_int <- c("Env1Gen1", "Env1_Gen2", "env1 gen3", "Env1 Gen4", "ENV_1GEN5", "ENV1GEN6")

These character vectors are visually messy. Let’s tidy them.

tidy_strings(messy_env)
# [1] "ENV_1" "ENV_1" "ENV_1" "ENV_1" "ENV_1" "ENV_1"
tidy_strings(messy_gen)
# [1] "GEN_1" "GEN_2" "GEN_3" "GEN_4" "GEN_5" "GEN_6"
tidy_strings(messy_int)
# [1] "ENV_1_GEN_1" "ENV_1_GEN_2" "ENV_1_GEN_3" "ENV_1_GEN_4" "ENV_1_GEN_5"
# [6] "ENV_1_GEN_6"

tidy_strings() works also to tidy a whole data frame or specific columns. Let’s create a ‘messy’ data frame in the context of plant breeding trials.

library(tibble)
# 
# Attaching package: 'tibble'
# The following objects are masked from 'package:metan':
# 
#     column_to_rownames, remove_rownames, rownames_to_column
df <- tibble(Env = messy_env,
             gen = messy_gen,
             Env_GEN = interaction(Env, gen),
             y = rnorm(6, 300, 10))
df %>% print_table()

tidy_strings(df) %>% print_table() tidy_strings(df, gen) %>% print_table()

Rendering engine

This vignette was built with pkgdown. All tables were produced with the package DT using the following function.

library(metan)
library(DT) # Used to make the tables
# Function to make HTML tables
print_table <- function(table, rownames = FALSE, digits = 3, ...){
  df <- datatable(table, rownames = rownames, extensions = 'Buttons',
                  options = list(scrollX = TRUE, 
                                 dom = '<<t>Bp>',
                                 buttons = c('copy', 'excel', 'pdf', 'print')), ...)
  num_cols <- c(as.numeric(which(sapply(table, class) == "numeric")))
  if(length(num_cols) > 0){
    formatSignif(df, columns = num_cols, digits = digits)
  } else{
    df
  }
}

Tiago Olivoto

2023-03-06