Oblique random survival forests

class: center, middle, inverse, title-slide

# Oblique random survival forests
## Why I developed them and why my collaborators use them.
### Byron C. Jaeger
### February 8, 2021

---

## Overview

- What is a random forest?

+ Decision trees
    
    + Ensemble learning
    
- Oblique random survival forest (ORSF)

+ Strengths/weaknesses
    
    + Benchmark
    
- ORSF in the wild

+ Heart failure risk prediction
    
    + Allograft loss risk prediction

.footnote[slides are available online at https://bcjaeger.github.io/seminar---obliqueRSF/]

---
class: inverse, center, middle

# What is a random forest?

---
background-image: url(img/penguins.png)
background-size: 45%
background-position: 85% 72.5%

## Decision trees

- Frequently used in supervised learning.

- Partitions the space of predictor variables.

- Can be used for classification, regression, and survival analysis.

.pull-left[
We'll demonstrate the mechanics of decision trees by developing a prediction rule to classify penguin1 species (chinstrap, gentoo, or adelie) based on bill and flipper length.
]

.footnote[
1Data were collected and made available by [Dr. Kristen Gorman](https://www.uaf.edu/cfos/people/faculty/detail/kristen-gorman.php) and the [Palmer Station](https://pal.lternet.edu/), a member of the [Long Term Ecological Research Network](https://lternet.edu/).
]

---

Dimensions for Adelie, Chinstrap and Gentoo Penguins at Palmer Station

---

Partition all the penguins into flipper length < 207 or ≥ 207 mm

---

Partition penguins on the left side into into bill length < 43 or ≥ 43 mm

---

The same partitions, visualized as a binary tree.

Node text, top to bottom: predicted class type; predicted class probability; percentage of data in node;

---

For survival trees, the Kaplan-Meier curve or cumulative hazard function is calculated in each terminal node.

Note: the survival outcomes are simulated

---

With oblique splits, partitions do not need to be rectangles

---

## Ensemble learning

Decision trees have been studied in thousands of peer reviewed articles and dozens of textbooks. Tl;dr: single trees are okay but not great at prediction.
- Leo Breiman introduced the idea of ensemble learning through bagging (bootstrap aggregating).1
 + Form an ensemble of 'weak learners' that are de-correlated by fitting each to a bootstrapped replicate of the data.
 + Individually, the learners give poor answers, but the collective wisdom of the ensemble is substantial.
 + e.g., myself and friends in graduate school

.footnote[1Breiman, Leo. "Bagging predictors." Machine learning 24.2 (1996): 123-140.]

---

## Ensemble learning

Decision trees have been studied in thousands of peer reviewed articles and dozens of textbooks. Tl;dr: single trees are okay but not great at prediction.

- Leo Breiman introduced the idea of ensemble learning through bagging (bootstrap aggregating).1
- Later, Leo Breiman tweaked the idea of bagging by restricting candidate variables for splitting a node to a random subset.
 + This modification created the random forest!2

.footnote[1Breiman, Leo. "Bagging predictors." Machine learning 24.2 (1996): 123-140. 2Breiman, Leo. "Random forests." Machine learning 45.1 (2001): 5-32.]

---

## Ensemble learning

Decision trees have been studied in thousands of peer reviewed articles and dozens of textbooks. Tl;dr: single trees are okay but not great at prediction.

- Leo Breiman introduced the idea of ensemble learning through bagging (bootstrap aggregating).1
- Later, Leo Breiman tweaked the idea of bagging by restricting candidate variables for splitting a node to a random subset.2
- Even later, random survival forests were developed.3

---

## Ensemble learning

Decision trees have been studied in thousands of peer reviewed articles and dozens of textbooks. Tl;dr: single trees are okay but not great at prediction.

- Leo Breiman introduced the idea of ensemble learning through bagging (bootstrap aggregating).1
- Later, Leo Breiman tweaked the idea of bagging by restricting candidate variables for splitting a node to a random subset.2
- Even later, random survival forests were developed.3
- Even more later, I made the oblique random survival forest.4

.footnote[1Breiman, Leo. "Bagging predictors." Machine learning 24.2 (1996): 123-140. 2Breiman, Leo. "Random forests." Machine learning 45.1 (2001): 5-32. 3Ishwaran, Hemant, et al. "Random survival forests." Annals of Applied Statistics 2.3 (2008): 841-860. 4Jaeger, Byron C., et al. "Oblique random survival forests." Annals of Applied Statistics 13.3 (2019): 1847-1883.]

---
class: inverse, center, middle

# Oblique random survival forest (ORSF)

---

## Oblique random survival forest (ORSF)

__Definition__: Ensemble of oblique survival decision trees grown with Leo Breiman's original protocol for random forests.

__Strengths__:

- ORSF directly models right censored time-to-event data, which are very common in medical settings.

- oblique splitting increases efficiency, which makes ORSF ideal for smaller cohort studies.

- risk prediction just works. (No need to estimate hazard functions).

__Weaknesses__

- slow (finding linear combinations takes time).

- variable importance does not work as well for oblique splits.

---

Benchmark experiment from Jaeger et al. (ORSF vs. all y'all):

![](index_files/figure-html/fig_orsf_overall_comp-1.png)

---

How ORSF compares to others in terms of the Brier score

![](index_files/figure-html/fig_orsf_overall_comp_focus_left-1.png)

---

How ORSF compares to others in terms of model concordance.

![](index_files/figure-html/fig_orsf_overall_comp_focus_right-1.png)

---
class: inverse, center, middle

# ORSF in the wild

---

External validation of 10-year risk prediction models for heart failure in the ARIC (Atherosclerosis risk in communities) and MESA/DHS (Multi-Ethnic Study of Atherosclerosis/Dallas Heart Study) cohorts.

.left-column[
__Source:__ _Development and Validation of Machine Learning-based Race-specific Models to Predict 10-year Risk of Heart Failure: A Multi-cohort Analysis_

Matthew W Segar et al. [Circulation](https://doi.org/10.1161/circ.142.suppl_3.196) 2020; 142:A196
]

.right-column[

<table class="table" style="margin-left: auto; margin-right: auto;">
 <thead>
<tr>
<th style="empty-cells: hide;border-bottom:hidden;" colspan="1"></th>
<th style="border-bottom:hidden;padding-bottom:0; padding-left:3px;padding-right:3px;text-align: center; " colspan="2"><div style="border-bottom: 1px solid #ddd; padding-bottom: 5px; ">ARIC</div></th>
<th style="border-bottom:hidden;padding-bottom:0; padding-left:3px;padding-right:3px;text-align: center; " colspan="2"><div style="border-bottom: 1px solid #ddd; padding-bottom: 5px; ">MESA/DHS</div></th>
</tr>
 <tr>
 <th style="text-align:left;"> Model </th>
 <th style="text-align:center;"> C-statistic </th>
 <th style="text-align:center;"> P-value1 </th>
 <th style="text-align:center;"> C-statistic </th>
 <th style="text-align:center;"> P-value1 </th>
 </tr>
 </thead>
<tbody>
 <tr grouplength="3"><td colspan="5" style="border-bottom: 1px solid;">Black adults</td></tr>
<tr>
 <td style="text-align:left;"> ORSF </td>
 <td style="text-align:center;"> 0.81 </td>
 <td style="text-align:center;"> 0.24 </td>
 <td style="text-align:center;"> 0.83 </td>
 <td style="text-align:center;"> 0.170 </td>
 </tr>
 <tr>
 <td style="text-align:left;"> Nambi et al </td>
 <td style="text-align:center;"> 0.77 </td>
 <td style="text-align:center;"> 0.10 </td>
 <td style="text-align:center;"> 0.80 </td>
 <td style="text-align:center;"> 0.001 </td>
 </tr>
 <tr>
 <td style="text-align:left;"> Khan et al </td>
 <td style="text-align:center;"> 0.71 </td>
 <td style="text-align:center;"> 0.79 </td>
 <td style="text-align:center;"> 0.78 </td>
 <td style="text-align:center;"> 0.540 </td>
 </tr>
 <tr groupLength="3"><td colspan="5" style="border-bottom: 1px solid;">White adults</td></tr>
<tr>
 <td style="text-align:left;"> ORSF </td>
 <td style="text-align:center;"> -- </td>
 <td style="text-align:center;"> -- </td>
 <td style="text-align:center;"> 0.82 </td>
 <td style="text-align:center;"> 0.150 </td>
 </tr>
 <tr>
 <td style="text-align:left;"> Nambi et al </td>
 <td style="text-align:center;"> -- </td>
 <td style="text-align:center;"> -- </td>
 <td style="text-align:center;"> 0.79 </td>
 <td style="text-align:center;"> 0.001 </td>
 </tr>
 <tr>
 <td style="text-align:left;"> Khan et al </td>
 <td style="text-align:center;"> -- </td>
 <td style="text-align:center;"> -- </td>
 <td style="text-align:center;"> 0.80 </td>
 <td style="text-align:center;"> 0.400 </td>
 </tr>
</tbody>
</table>

1From modified Nam-D'Agostino test for mis-calibration

]

---

Internal validation of 1-year risk prediction models for allograft loss or mortality in the PHTS (Pediatric Heart Transplant Society) registry.

.left-column[

__Continuation of:__ _Risk Factors for One-year Mortality and Allograft Loss in Pediatric Heart Transplant Patients Using Machine Learning_

Bethany L Wisotzkey et al. [_Circulation_](https://www.ahajournals.org/doi/abs/10.1161/circ.142.suppl_3.14239). 2020; 142:A14239

]

.right-column[

![](index_files/figure-html/fig_orsf_phts_auc-1.png)

Data presented are median values from 500 replications of Monte-Carlo cross validation

]

---

Internal validation of 1-year risk prediction models for allograft loss or mortality in the PHTS (Pediatric Heart Transplant Society) registry.

.left-column[

__Continuation of:__ _Risk Factors for One-year Mortality and Allograft Loss in Pediatric Heart Transplant Patients Using Machine Learning_

Bethany L Wisotzkey et al. [_Circulation_](https://www.ahajournals.org/doi/abs/10.1161/circ.142.suppl_3.14239). 2020; 142:A14239

]

.right-column[

![](index_files/figure-html/fig_orsf_phts_gnd-1.png)

Data presented are median values from 500 replications of Monte-Carlo cross validation

]

---

## What I've heard from friends

The only reason ORSF has been used in recent projects is because others have been able to use it through the R package, `obliqueRSF`.

`obliqueRSF` does two things particularly well:

- Develop accurate risk prediction models

- Apply them to new data.

.pull-left[

```r
library(obliqueRSF)

orsf_fit <- 
 ORSF(data = pbc[-c(1:5), ], 
 time = 'time', 
 status = 'status', 
 ntree = 5)
```

]

.pull-right[

```r
predict(orsf_fit, 
        newdata = pbc[1:5, ], 
        times = 500)
```

```
##           [,1]
## [1,] 0.4838889
## [2,] 1.0000000
## [3,] 0.7750000
## [4,] 0.9666667
## [5,] 0.9416667
```

]

---
class: right, top
background-image: url(img/collaborators_orsf.png)
background-size: contain

# Thank you!