class: center, middle, inverse, title-slide # Oblique random survival forests ## Why I developed them and why my collaborators use them. ### Byron C. Jaeger ### February 8, 2021 --- ## Overview - What is a random forest? + Decision trees + Ensemble learning - Oblique random survival forest (ORSF) + Strengths/weaknesses + Benchmark - ORSF in the wild + Heart failure risk prediction + Allograft loss risk prediction .footnote[slides are available online at https://bcjaeger.github.io/seminar---obliqueRSF/] --- class: inverse, center, middle # What is a random forest? --- background-image: url(img/penguins.png) background-size: 45% background-position: 85% 72.5% ## Decision trees - Frequently used in supervised learning. - Partitions the space of predictor variables. - Can be used for classification, regression, and survival analysis. .pull-left[ We'll demonstrate the mechanics of decision trees by developing a prediction rule to classify penguin<sup>1</sup> species (chinstrap, gentoo, or adelie) based on bill and flipper length. ] .footnote[ <sup>1</sup>Data were collected and made available by [Dr. Kristen Gorman](https://www.uaf.edu/cfos/people/faculty/detail/kristen-gorman.php) and the [Palmer Station](https://pal.lternet.edu/), a member of the [Long Term Ecological Research Network](https://lternet.edu/). ] --- Dimensions for Adelie, Chinstrap and Gentoo Penguins at Palmer Station <img src="index_files/figure-html/fig_penguins_nopart-1.png" style="display: block; margin: auto;" /> --- Partition all the penguins into flipper length < 207 or ≥ 207 mm <img src="index_files/figure-html/fig_penguins_part1-1.png" style="display: block; margin: auto;" /> --- Partition penguins on the left side into into bill length < 43 or ≥ 43 mm <img src="index_files/figure-html/fig_penguins_part2-1.png" style="display: block; margin: auto;" /> --- The same partitions, visualized as a binary tree. <img src="img/rpart_plot_classif.png" width="100%" style="display: block; margin: auto;" /> Node text, top to bottom: predicted class type; predicted class probability; percentage of data in node; --- For survival trees, the Kaplan-Meier curve or cumulative hazard function is calculated in each terminal node. <img src="img/rpart_plot_surv.png" width="100%" style="display: block; margin: auto;" /> Note: the survival outcomes are simulated --- With oblique splits, partitions do not need to be rectangles <img src="index_files/figure-html/fig_penguins_part2_oblique-1.png" style="display: block; margin: auto;" /> --- ## Ensemble learning Decision trees have been studied in thousands of peer reviewed articles and dozens of textbooks. Tl;dr: single trees are okay but not great at prediction. - Leo Breiman introduced the idea of ensemble learning through bagging (bootstrap aggregating).<sup>1</sup> + Form an ensemble of 'weak learners' that are de-correlated by fitting each to a bootstrapped replicate of the data. + Individually, the learners give poor answers, but the collective wisdom of the ensemble is substantial. + e.g., myself and friends in graduate school .footnote[<sup>1</sup>Breiman, Leo. "Bagging predictors." Machine learning 24.2 (1996): 123-140.] --- ## Ensemble learning Decision trees have been studied in thousands of peer reviewed articles and dozens of textbooks. Tl;dr: single trees are okay but not great at prediction. - Leo Breiman introduced the idea of ensemble learning through bagging (bootstrap aggregating).<sup>1</sup> - Later, Leo Breiman tweaked the idea of bagging by restricting candidate variables for splitting a node to a random subset. + This modification created the random forest!<sup>2</sup> .footnote[<sup>1</sup>Breiman, Leo. "Bagging predictors." Machine learning 24.2 (1996): 123-140.<br/><sup>2</sup>Breiman, Leo. "Random forests." Machine learning 45.1 (2001): 5-32.] --- ## Ensemble learning Decision trees have been studied in thousands of peer reviewed articles and dozens of textbooks. Tl;dr: single trees are okay but not great at prediction. - Leo Breiman introduced the idea of ensemble learning through bagging (bootstrap aggregating).<sup>1</sup> - Later, Leo Breiman tweaked the idea of bagging by restricting candidate variables for splitting a node to a random subset.<sup>2</sup> - Even later, random survival forests were developed.<sup>3</sup> .footnote[<sup>1</sup>Breiman, Leo. "Bagging predictors." Machine learning 24.2 (1996): 123-140.<br/><sup>2</sup>Breiman, Leo. "Random forests." Machine learning 45.1 (2001): 5-32.<br/><sup>3</sup>Ishwaran, Hemant, et al. "Random survival forests." Annals of Applied Statistics 2.3 (2008): 841-860.] --- ## Ensemble learning Decision trees have been studied in thousands of peer reviewed articles and dozens of textbooks. Tl;dr: single trees are okay but not great at prediction. - Leo Breiman introduced the idea of ensemble learning through bagging (bootstrap aggregating).<sup>1</sup> - Later, Leo Breiman tweaked the idea of bagging by restricting candidate variables for splitting a node to a random subset.<sup>2</sup> - Even later, random survival forests were developed.<sup>3</sup> - Even more later, I made the oblique random survival forest.<sup>4</sup> .footnote[<sup>1</sup>Breiman, Leo. "Bagging predictors." Machine learning 24.2 (1996): 123-140.<br/><sup>2</sup>Breiman, Leo. "Random forests." Machine learning 45.1 (2001): 5-32.<br/><sup>3</sup>Ishwaran, Hemant, et al. "Random survival forests." Annals of Applied Statistics 2.3 (2008): 841-860.<br/><sup>4</sup>Jaeger, Byron C., et al. "Oblique random survival forests." Annals of Applied Statistics 13.3 (2019): 1847-1883.] --- class: inverse, center, middle # Oblique random survival forest (ORSF) --- ## Oblique random survival forest (ORSF) __Definition__: Ensemble of oblique survival decision trees grown with Leo Breiman's original protocol for random forests. __Strengths__: - ORSF directly models right censored time-to-event data, which are very common in medical settings. - oblique splitting increases efficiency, which makes ORSF ideal for smaller cohort studies. - risk prediction just works. (No need to estimate hazard functions). __Weaknesses__ - slow (finding linear combinations takes time). - variable importance does not work as well for oblique splits. --- Benchmark experiment from Jaeger et al. (ORSF vs. all y'all): ![](index_files/figure-html/fig_orsf_overall_comp-1.png)<!-- --> --- How ORSF compares to others in terms of the Brier score ![](index_files/figure-html/fig_orsf_overall_comp_focus_left-1.png)<!-- --> --- How ORSF compares to others in terms of model concordance. ![](index_files/figure-html/fig_orsf_overall_comp_focus_right-1.png)<!-- --> --- class: inverse, center, middle # ORSF in the wild --- External validation of 10-year risk prediction models for heart failure in the ARIC (Atherosclerosis risk in communities) and MESA/DHS (Multi-Ethnic Study of Atherosclerosis/Dallas Heart Study) cohorts. .left-column[ __Source:__ _Development and Validation of Machine Learning-based Race-specific Models to Predict 10-year Risk of Heart Failure: A Multi-cohort Analysis_ Matthew W Segar et al. [Circulation](https://doi.org/10.1161/circ.142.suppl_3.196) 2020; 142:A196 ] <!-- Nambi V, Liu X, Chambless LE, et al. Troponin T and N-terminal pro-B-type natriuretic peptide: a biomarker approach to predict heart failure risk--the atherosclerosis risk in communities study. Clin Chem. 2013;59(12):1802-1810. --> <!-- 23. Khan SS, Ning H, Shah SJ, et al. 10-Year Risk Equations for Incident Heart Failure in the General Population. J Am Coll Cardiol. 2019;73(19):2388-2397. --> .right-column[ <table class="table" style="margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="empty-cells: hide;border-bottom:hidden;" colspan="1"></th> <th style="border-bottom:hidden;padding-bottom:0; padding-left:3px;padding-right:3px;text-align: center; " colspan="2"><div style="border-bottom: 1px solid #ddd; padding-bottom: 5px; ">ARIC</div></th> <th style="border-bottom:hidden;padding-bottom:0; padding-left:3px;padding-right:3px;text-align: center; " colspan="2"><div style="border-bottom: 1px solid #ddd; padding-bottom: 5px; ">MESA/DHS</div></th> </tr> <tr> <th style="text-align:left;"> Model </th> <th style="text-align:center;"> C-statistic </th> <th style="text-align:center;"> P-value<sup>1</sup> </th> <th style="text-align:center;"> C-statistic </th> <th style="text-align:center;"> P-value<sup>1</sup> </th> </tr> </thead> <tbody> <tr grouplength="3"><td colspan="5" style="border-bottom: 1px solid;"><strong>Black adults</strong></td></tr> <tr> <td style="text-align:left;"> ORSF </td> <td style="text-align:center;"> 0.81 </td> <td style="text-align:center;"> 0.24 </td> <td style="text-align:center;"> 0.83 </td> <td style="text-align:center;"> 0.170 </td> </tr> <tr> <td style="text-align:left;"> Nambi et al </td> <td style="text-align:center;"> 0.77 </td> <td style="text-align:center;"> 0.10 </td> <td style="text-align:center;"> 0.80 </td> <td style="text-align:center;"> 0.001 </td> </tr> <tr> <td style="text-align:left;"> Khan et al </td> <td style="text-align:center;"> 0.71 </td> <td style="text-align:center;"> 0.79 </td> <td style="text-align:center;"> 0.78 </td> <td style="text-align:center;"> 0.540 </td> </tr> <tr groupLength="3"><td colspan="5" style="border-bottom: 1px solid;"><strong>White adults</strong></td></tr> <tr> <td style="text-align:left;"> ORSF </td> <td style="text-align:center;"> -- </td> <td style="text-align:center;"> -- </td> <td style="text-align:center;"> 0.82 </td> <td style="text-align:center;"> 0.150 </td> </tr> <tr> <td style="text-align:left;"> Nambi et al </td> <td style="text-align:center;"> -- </td> <td style="text-align:center;"> -- </td> <td style="text-align:center;"> 0.79 </td> <td style="text-align:center;"> 0.001 </td> </tr> <tr> <td style="text-align:left;"> Khan et al </td> <td style="text-align:center;"> -- </td> <td style="text-align:center;"> -- </td> <td style="text-align:center;"> 0.80 </td> <td style="text-align:center;"> 0.400 </td> </tr> </tbody> </table> <sup>1</sup>From modified Nam-D'Agostino test for mis-calibration ] --- Internal validation of 1-year risk prediction models for allograft loss or mortality in the PHTS (Pediatric Heart Transplant Society) registry. .left-column[ __Continuation of:__ _Risk Factors for One-year Mortality and Allograft Loss in Pediatric Heart Transplant Patients Using Machine Learning_ Bethany L Wisotzkey et al. [_Circulation_](https://www.ahajournals.org/doi/abs/10.1161/circ.142.suppl_3.14239). 2020; 142:A14239 ] .right-column[ ![](index_files/figure-html/fig_orsf_phts_auc-1.png)<!-- --> Data presented are median values from 500 replications of Monte-Carlo cross validation ] --- Internal validation of 1-year risk prediction models for allograft loss or mortality in the PHTS (Pediatric Heart Transplant Society) registry. .left-column[ __Continuation of:__ _Risk Factors for One-year Mortality and Allograft Loss in Pediatric Heart Transplant Patients Using Machine Learning_ Bethany L Wisotzkey et al. [_Circulation_](https://www.ahajournals.org/doi/abs/10.1161/circ.142.suppl_3.14239). 2020; 142:A14239 ] .right-column[ ![](index_files/figure-html/fig_orsf_phts_gnd-1.png)<!-- --> Data presented are median values from 500 replications of Monte-Carlo cross validation ] --- ## What I've heard from friends The only reason ORSF has been used in recent projects is because others have been able to use it through the R package, `obliqueRSF`. `obliqueRSF` does two things particularly well: - Develop accurate risk prediction models - Apply them to new data. .pull-left[ ```r library(obliqueRSF) orsf_fit <- ORSF(data = pbc[-c(1:5), ], time = 'time', status = 'status', ntree = 5) ``` ] .pull-right[ ```r predict(orsf_fit, newdata = pbc[1:5, ], times = 500) ``` ``` ## [,1] ## [1,] 0.4838889 ## [2,] 1.0000000 ## [3,] 0.7750000 ## [4,] 0.9666667 ## [5,] 0.9416667 ``` ] --- class: right, top background-image: url(img/collaborators_orsf.png) background-size: contain # Thank you!