Split data into training and test sets

split_by_factor splits according to given factor
split_within_factor splits according to given data point indices within the same level of a factor
split_within_factor_random selects k points from each level of a factor uniformly at random as test data
split_random splits uniformly at random
split_data splits according to given data rows

split_by_factor(data, test, var_name = "id")

split_within_factor(data, idx_test, var_name = "id")

split_within_factor_random(data, k_test = 1, var_name = "id")

split_random(data, p_test = 0.2, n_test = NULL)

split_data(data, i_test, sort_ids = TRUE)

Arguments

data: a data frame
test: the levels of the factor that will be used as test data
var_name: name of a factor in the data
idx_test: indices point indices with the factor
k_test: desired number of test data points per each level of the factor
p_test: desired proportion of test data
n_test: desired number of test data points (if NULL, p_test is used to compute this)
i_test: test data row indices
sort_ids: should the test indices be sorted into increasing order

Value

a named list with names train, test, i_train

and i_test

Split data into training and test sets

Arguments

Value

See also