We are going to add your data to the pipeline, carefully.
Think of a name for your data.
Example name: regards
Save a copy of your data in data/sensitive. The name of your file should be name-raw.csv or name-raw.sas7bdat, where name is your data’s name. E.g., regards-raw.csv
03:00
Your turn
We are going to add your data to the pipeline, carefully.
Switch from the R console to the terminal.
Verify you have no uncommitted changes:
git status
Should return “nothing to commit, working tree clean”
Create a new branch with git:
git branch -b regards
03:00
Your turn
We are going to add your data to the pipeline, carefully.
Copy/paste code shown here to _targets.R, just beneath the line that starts with # real data cohorts.
Replace zzzz with the name of your data.
Save the _targets.R file
Run tar_make() in the R console.
file_zzzz_tar <-tar_target( file_zzzz,command ="data/sensitive/zzzz-raw.csv",format ="file")data_zzzz_tar <-tar_target( data_zzzz,data_prepare(file_name ="data/sensitive/zzzz-raw.csv" ))# don't forget to add these targets# to the targets list at the bottom!
05:00
Your turn
run tar_read(data_zzzz), where zzzz is your data name.
Each dataset is unique, and some may require customized preparation:
Different elements need to be cleaned.
Different variables need to be derived.
Different variables may be selected.
Different exclusions may be applied.
data_load makes its output have a customized class based on the name of the dataset so that you, the owner of the data, are in control of these steps that may be uniquely defined for your data.
How?
R’s generic function system. Generic functions (e.g., plot()) dispatch different methods depending on the type of input object.
Here’s a look at the generic function for cleaning an object of class sim:
data_clean.melodem_sim <-function(data){ dt <-data_clean_minimal(data$values) dt[, age := age *5+65] dt[, sex :=fifelse(sex >0, 1, 0)] dt[, sex :=factor(sex, levels =c(0, 1),labels =c("male", "female"))] data$values <- dt data}
How?
R’s generic function system. Generic functions (e.g., plot()) dispatch different methods depending on the type of input object.
Here’s the generic function for cleaning an object of class melodem_data: