xgboost operates using a data structure called xgb.DMatrix. The xgboost functions can internally create these data structures if they are given a matrix with columns giving predictor variables and a vector representing the label. For survival analysis, the label vector is a combination of time/status values (see sgb_label). This function automates the creation of a label vector and creates a list of components that are easily plugged in to xgboost functions.

sgb_data(data, label)

as_sgb_data(data, status, time)

Arguments

data

the data containing predictor variables and a label column

label

a numeric vector based on time and status values. Time values should be less than zero for censored observations, and greater than zero for non-censored observations.

status

a numeric vector indicating status at a given time. Normally, 0 indicates no event and 1 indicates an event occurred.

time

a numeric vector of follow-up time values.

Value

an object of class sgb_data with components:

  • data: a matrix with columns representing predictor variables

  • label: a numeric vector representing time until event. Negative times indicate that an event did not occur, but the observation was censored at the absolute value of the given time. Positive times indicate the time of the event.

Examples

df = data.frame(time=c(1,2,3), status = c(0,0,1), x = c(2,2,1)) as_sgb_data(df, status = status, time = time)
#> $data #> x #> [1,] 2 #> [2,] 2 #> [3,] 1 #> #> $label #> [1] -1 -2 3 #> #> attr(,"class") #> [1] "sgb_data"
sgb_data(df, label = sgb_label(df$time, df$status))
#> $data #> time status x #> [1,] 1 0 2 #> [2,] 2 0 2 #> [3,] 3 1 1 #> #> $label #> [1] -1 -2 3 #> #> attr(,"class") #> [1] "sgb_data"