Welcome

These are the materials for the MEthods in LOngitudinal DEMentia (MELODEM) workshops offered at Château de Bellinglise, France, July 2024.

The workshop focuses on the use of random forests to predict dementia risk and identify heterogeneity in the association between APOE4 and dementia risk. This website hosts the materials for the workshop and the corresponding GitHub repository (todo: add link here) includes additional materials for manuscript development.

Data set requirements

Data sets must have the following characteristics:

  • a right-censored time-to-dementia outcome.
  • data on APOE4 (carrier vs non-carrier).
  • data on age and sex.
  • You personally are able to access it from the venue in France (e.g., Wifi, hard drive)

If the dataset doesn’t meet all of these criteria, it will not be eligible for the workshop analysis.

For efficiency during the workshop, please ensure the names of the required variables are as follows:

  • Time to dementia should be named ‘time’ and should be numeric time in years from baseline assessment to the time of censoring, death, or dementia, whichever occurred first.
  • Dementia status should be named ‘status’ and should take values of 0 or 1, with 1 occurring if and only if dementia occurred before censoring and death.
  • Apoe4 should be named ‘apoe4’ and should have values of ‘carrier’ vs. ‘non_carrier’. Keep this binary for the sake of using it in causal random forests.
  • Age should be named ‘age’ and should be numeric with age in years.
  • Sex should be named ‘sex’ and should include values of ‘male’ or ‘female’. This does not need to be binary if your data have a large sample (i.e., > 250 observations with >10 dementia events) of people who do not identify in either of those two categories

Data sets should have as many predictors as possible. Predictors would include any variable that

  • May be associated with dementia risk
  • May be associated with APOE4 status
  • Is not an effect of dementia or cognitive impairment
  • May explain heterogeneity in the association of APOE4 with dementia risk.

Data sets need not have an exhaustive set of predictors, but any predictors must be measured before or at the start of follow-up for the right-censored dementia outcome. Because the analysis will be exploratory and focus on the identifying heterogeneity in the association of APOE4 with incident dementia, we encourage everyone to include as many valid predictors from their data as feasible. A few examples include (but are not limited to):

  • education,
  • race/ethnicity,
  • smoking history,
  • mid-life htn,
  • mid-life obesity,
  • htn drugs and statins,
  • SES,
  • diabetes,
  • renal disease,
  • atrial fibrillation,
  • cardiovascular disease,
  • depression,
  • head trauma,
  • insomnia,
  • orthostatic hypotension,
  • family history of cognitive impairment,
  • family history of Alzheimer’s Disease

Preparation

Please join the workshop with a computer that has the following installed (all available for free):

# A few more packages may be added - this list isn't final yet.
# Install required packages for the workshop. 
pkgs <- 
  c("tidyverse", "tidymodels", "data.table", "haven", "magrittr",
    "glue", "grf", "aorsf", "glmnet", "xgboost", "randomForestSRC",
    "party", "riskRegression", "survival", "officer", "flextable", 
    "table.glue", "gtsummary", "usethis", "cli", "ggforce",
    "rpart", "rpart.plot", "ranger", "withr", "gt", "recipes", 
    "butcher", "sandwich", "lmtest", "gbm", "officedown", "Matrix",
    "ggsurvfit", "tidycmprsk", "here", "tarchetypes", "targets",
    "palmerpenguins", "ggrepel")

install.packages('job')

job::job({install.packages(pkgs)})

If you’re a Windows user and encounter an error message during installation noting a missing Rtools installation, install Rtools using the installer linked here.

GitHub

A major aim of the workshop will be to establish a baseline of collaboration for developing and publishing a manuscript. We will use GitHub to facilitate this.

  • If you do not have a GitHub account, please create one here: https://github.com/
  • If you have a GitHub account, make sure you have a personal access token for HTTPS stored in your local Rstudio. Instructions to do this are provided here: https://happygitwithr.com/https-pat

targets

In addition to using GitHub, we will also use targets to coordinate our workflow. There will be a brief tutorial for targets during the workshop, but getting familiar with this R package before the workshop is highly encouraged. A full textbook on targets is available: https://books.ropensci.org/targets/.

Reading the following sections prior to the workshop will be very helpful:

  1. Introduction: https://books.ropensci.org/targets/#intro
  2. A walk through: https://books.ropensci.org/targets/walkthrough.html
  3. Function-based workflow: https://books.ropensci.org/targets/functions.html

Slides

These slides are designed to use with live teaching and are published for workshop participants’ convenience. They are not meant as standalone learning materials.

Day 1: Oblique and causal random forests

  1. Introduction
  2. Decision trees and random forests
  3. Oblique random forests
  4. Causal random forests

Day 2: Collaborative analysis and manuscript planning

Acknowledgments

This website, including the slides, is made with Quarto and is based on the fantastic workshops developed by the tidymodels team. Please submit an issue (todo: add link to github repo) on the GitHub repo for this workshop if you find something that could be fixed or improved.

Reuse and licensing

Unless otherwise noted (i.e. not an original creation and reused from another source), these educational materials are licensed under Creative Commons Attribution CC BY-SA 4.0.