xgboost operates using a data structure
called xgb.DMatrix
. The xgboost
functions can internally create
these data structures if they are given a matrix with columns giving
predictor variables and a vector representing the label. For survival
analysis, the label vector is a combination of time/status values
(see sgb_label). This function automates the creation of a label
vector and creates a list of components that are easily plugged in
to xgboost
functions.
sgb_data(data, label) as_sgb_data(data, status, time)
data | the data containing predictor variables and a label column |
---|---|
label | a numeric vector based on |
status | a numeric vector indicating status at a given time. Normally, 0 indicates no event and 1 indicates an event occurred. |
time | a numeric vector of follow-up time values. |
an object of class sgb_data
with components:
data
: a matrix with columns representing predictor variables
label
: a numeric vector representing time until event. Negative
times indicate that an event did not occur, but the observation
was censored at the absolute value of the given time. Positive times
indicate the time of the event.
df = data.frame(time=c(1,2,3), status = c(0,0,1), x = c(2,2,1)) as_sgb_data(df, status = status, time = time)#> $data #> x #> [1,] 2 #> [2,] 2 #> [3,] 1 #> #> $label #> [1] -1 -2 3 #> #> attr(,"class") #> [1] "sgb_data"#> $data #> time status x #> [1,] 1 0 2 #> [2,] 2 0 2 #> [3,] 3 1 1 #> #> $label #> [1] -1 -2 3 #> #> attr(,"class") #> [1] "sgb_data"