• split_by_factor splits according to given factor

  • split_within_factor splits according to given data point indices within the same level of a factor

  • split_within_factor_random selects k points from each level of a factor uniformly at random as test data

  • split_random splits uniformly at random

  • split_data splits according to given data rows

split_by_factor(data, test, var_name = "id")

split_within_factor(data, idx_test, var_name = "id")

split_within_factor_random(data, k_test = 1, var_name = "id")

split_random(data, p_test = 0.2, n_test = NULL)

split_data(data, i_test, sort_ids = TRUE)

Arguments

data

a data frame

test

the levels of the factor that will be used as test data

var_name

name of a factor in the data

idx_test

indices point indices with the factor

k_test

desired number of test data points per each level of the factor

p_test

desired proportion of test data

n_test

desired number of test data points (if NULL, p_test is used to compute this)

i_test

test data row indices

sort_ids

should the test indices be sorted into increasing order

Value

a named list with names train, test, i_train

and i_test

See also

Other data frame handling functions: add_dis_age(), add_factor_crossing(), add_factor(), adjusted_c_hat(), new_x()