Herding Cats!

xgboost (and many other) modeling functions expect matrix input with factor levels one-hot encoded.

cat_spread will one-hot encode any factor or character variable in data and return a one-hot encoded tibble. Alternatively, cat_gather will apply the inverse operation and convert one-hot encoded columns back into factors.

cat_spread(data, ...)

cat_gather(data, factor_levels)

Arguments

data

data	data with categorical variables (i.e., factors) that need to be spread or gathered.
...	Arguments passed on to `mltools::one_hot` sparsifyNAs Should NAs be converted to 0s? naCols Should columns be generated to indicate the present of NAs? Will only apply to factor columns with at least one NA dropCols Should the resulting data.table exclude the original columns which are one-hot-encoded?
factor_levels	This parameter is only relevant for `cat_gather`. A named list of factor levels, with each name corresponding to the column in the data that the factor levels describe.

data with categorical variables (i.e., factors) that need to be spread or gathered.

...

Arguments passed on to mltools::one_hot

sparsifyNAs: Should NAs be converted to 0s?
naCols: Should columns be generated to indicate the present of NAs? Will only apply to factor columns with at least one NA
dropCols: Should the resulting data.table exclude the original columns which are one-hot-encoded?

factor_levels

This parameter is only relevant for cat_gather. A named list of factor levels, with each name corresponding to the column in the data that the factor levels describe.

Value

a tibble with categorical variables herded as you like.

Examples


df <- data.frame(x = rep(letters[1:2], 50), y = 1:100)

one_hot_df <- cat_spread(df)
cat_gather(one_hot_df, factor_levels = list(x=c('a','b')))
#> # A tibble: 100 x 2
#>    x         y
#>    <fct> <int>
#>  1 a         1
#>  2 b         2
#>  3 a         3
#>  4 b         4
#>  5 a         5
#>  6 b         6
#>  7 a         7
#>  8 b         8
#>  9 a         9
#> 10 b        10
#> # ... with 90 more rows

Arguments

Value

Examples

Contents