Utilities for rows and columns

Add columns and rows

The functions add_cols() and add_rows() can be used to add columns and rows, respectively to a data frame.

It is also possible to add a column based on existing data. Note that the arguments .after and .before are used to select the position of the new column(s). This is particularly useful to put variables of the same category together.

Select or remove columns and rows

The functions select_cols() and select_rows() can be used to select columns and rows, respectively from a data frame.

select_cols(data_ge2, ENV, GEN) %>% print_table()
select_rows(data_ge2, 1:3)  %>% print_table()

Numeric columns can be selected quickly by using the function select_numeric_cols(). Non-numeric columns are selected with select_non_numeric_cols()

select_numeric_cols(data_ge2) %>% print_table()
select_non_numeric_cols(data_ge2) %>% print_table()

We can select the first or last columns quickly with select_first_col() and select_last_col(), respectively.

select_first_col(data_ge2) %>% print_table()
select_last_col(data_ge2) %>% print_table()

To remove columns or rows, use remove_cols() and remove_rows().

remove_cols(data_ge2, ENV, GEN)  %>% print_table()
remove_rows(data_ge2, 1:2, 5:8) %>% print_table()

Concatenating columns

The function concatetate() can be used to concatenate multiple columns of a data frame. It return a data frame with all the original columns in .data plus the concatenated variable, after the last column (by default), or at any position when using the arguments .before or .after.

concatenate(data_ge, ENV, GEN, REP, .after = "REP") %>% print_table()

To drop the existing variables and keep only the concatenated column, use the argument drop = TRUE. To use concatenate() within a given function like add_cols() use the argument pull = TRUE to pull out the results to a vector.

To check if a column exists in a data frame, use column_exists()

column_exists(data_ge, "ENV")
# [1] TRUE

Utilities for numbers and strings

Rounding whole data frames

The function round_cols()round a selected column or a whole data frame to the specified number of decimal places (default 0). If no variables are informed, then all numeric variables are rounded.

round_cols(data_ge2) %>% print_table()

Alternatively, select variables to round.

round_cols(data_ge2, PH, EP, digits = 1)  %>% print_table()

Extracting and replacing numbers

The functions extract_number(), and replace_number() can be used to extract or replace numbers. As an example, we will extract the number of each genotype in data_g. By default, the extracted numbers are put as a new variable called new_var after the last column of the data.

extract_number(data_ge, GEN, .after = "GEN") %>% print_table()

If the argument drop is set to TRUE then, only the new variable is kept and all others are dropped.

extract_number(data_ge, GEN, drop = TRUE) %>% print_table()

To pull out the results into a vector, use the argument pull = TRUE. This is particularly useful when extract_* or replace_* are used within a function like add_cols().

To replace numbers of a given column with a specified replacement, use replace_number(). By default, numbers are replaced with "". The argument drop and pull can also be used, as shown above.

replace_number(data_ge, GEN) %>% print_table()

Extracting, replacing, and removing strings

The functions extract_string(), and replace_string() are used in the same context of extract_number(), and replace_number(), but for handling with strings.

extract_string(data_ge, GEN) %>% print_table()

To replace strings, we can use the function replace_strings().

To remove all strings of a data frame, use remove_strings().

remove_strings(data_ge) %>% print_table()

Tidy strings

The function tidy_strings() tidy up characters strings, non-numeric columns, or any selected columns in a data frame by putting all word in upper case, replacing any space, tabulation, punctuation characters by '_', and putting '_' between lower and upper cases. Consider the following character strings: messy_env by definition should represent a unique level of the factor environment (environment 1). messy_gen shows six genotypes, and messy_int represents the interaction of such genotypes with environment 1.

messy_env <- c("ENV 1", "Env   1", "Env1", "env1", "Env.1", "Env_1")
messy_gen <- c("GEN1", "gen 2", "Gen.3", "gen-4", "Gen_5", "GEN_6")
messy_int <- c("Env1Gen1", "Env1_Gen2", "env1 gen3", "Env1 Gen4", "ENV_1GEN5", "ENV1GEN6")

These character vectors are visually messy. Let’s tidy them.

tidy_strings() works also to tidy a whole data frame or specific columns. Let’s create a ‘messy’ data frame in the context of plant breeding trials.

tidy_strings(df) %>% print_table()
tidy_strings(df, gen) %>% print_table()

Rendering engine

This vignette was built with pkgdown. All tables were produced with the package DT using the following function.

library(metan)
library(DT) # Used to make the tables
# Function to make HTML tables
print_table <- function(table, rownames = FALSE, ...){
datatable(table, rownames = rownames, extensions = 'Buttons',
          options = list(scrollX = TRUE, dom = '<<t>Bp>', buttons = c('copy', 'excel', 'pdf', 'print')), ...) %>%
    formatSignif(columns = c(1:ncol(table)), digits = 3)}