# Load required packages
library(dplyr)
library(tidyr)
library(pharmaversesdtm)Day 2: SDTM Domain Structure & Tidyverse Mastery
Understanding SDTM Domain Classes Through Code
1 Learning Objectives
By the end of Day 2, you will be able to:
- Understand the four SDTM domain classes and their purposes
- Recognize and interpret SDTM variable naming conventions
- Apply advanced tidyverse operations to clinical data
- Create meaningful summaries of clinical trial data
- Work fluently with
dplyrandtidyron pharmaverse datasets
2 Introduction
Day 2 builds on the foundations from Day 1. Now that you can load and explore SDTM datasets, we’ll dive deeper into:
- SDTM Domain Classes: Understanding why domains are organized the way they are
- Variable Naming Conventions: Decoding the patterns like
--TESTCD,--ORRES,--DY - Tidyverse Mastery: This is 50% of what you’ll do daily as a clinical programmer
The pharmaverse packages are built on tidyverse principles, so mastering dplyr and tidyr is essential for working with admiral, sdtm.oak, and other clinical programming tools.
3 Package Installation & Loading
3.1 Required Packages
| Package | Purpose |
|---|---|
dplyr |
Data manipulation (filter, select, mutate, etc.) |
tidyr |
Data reshaping (pivot_wider, pivot_longer, etc.) |
pharmaversesdtm |
Example SDTM datasets for learning |
3.2 Load Packages
3.3 Load SDTM Datasets
We’ll work with multiple domains today to understand their structures and relationships.
# Load all domains we'll use today
data("dm", package = "pharmaversesdtm")
data("ae", package = "pharmaversesdtm")
data("ex", package = "pharmaversesdtm")
data("lb", package = "pharmaversesdtm")
data("vs", package = "pharmaversesdtm")
data("cm", package = "pharmaversesdtm")
data("ds", package = "pharmaversesdtm")4 SDTM Domain Classes
SDTM organizes clinical data into four domain classes, each designed for a specific type of information. Understanding these classes helps you know what to expect from each domain. Here’s a simple breakdown:
4.1 The Four Domain Classes
| Class | Purpose | Example Domains | Key Characteristics |
|---|---|---|---|
| Special Purpose | Subject-level information | DM, CO, SE, SV | Usually one row per subject |
| Events | Discrete occurrences | AE, DS, MH, DV | Multiple rows per subject, start/end |
| Interventions | Treatments administered | EX, CM, EC, SU | Dosing info, frequencies, routes |
| Findings | Collected measurements/tests | LB, VS, EG, PE | Test codes, results, units, standards |
4.1.1 What do these mean?
- Special Purpose: These domains (like DM for Demographics) are the backbone of your study. They contain key information about each subject, such as their ID, age, sex, and treatment group. Usually, there is one row per subject.
- Events: These domains (like AE for Adverse Events) record things that happen to a subject, such as getting sick or dropping out of the study. There can be many events per subject, so you’ll see multiple rows for the same person.
- Interventions: These domains (like EX for Exposure) track what is done to a subject, such as giving them a drug or a procedure. Each row is a record of a treatment or intervention.
- Findings: These domains (like LB for Laboratory) store results of tests or measurements taken from the subject, such as blood pressure or lab results. Each row is a single measurement.
4.2 Special Purpose Class: DM (Demographics)
The Demographics domain is the anchor of all SDTM data. It contains one row per subject and provides the reference information that other domains link to. For example, if you want to know a subject’s age or which treatment group they are in, you look in DM.
# DM structure - note the one row per subject
dm %>%
dplyr::select(STUDYID, USUBJID, SUBJID, RFSTDTC, RFENDTC, ARM, ACTARM, AGE, SEX, RACE) %>%
utils::head(5)# A tibble: 5 × 10
STUDYID USUBJID SUBJID RFSTDTC RFENDTC ARM ACTARM AGE SEX RACE
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <dbl> <chr> <chr>
1 CDISCPILOT01 01-701-1015 1015 2014-0… 2014-0… Plac… Place… 63 F WHITE
2 CDISCPILOT01 01-701-1023 1023 2012-0… 2012-0… Plac… Place… 64 M WHITE
3 CDISCPILOT01 01-701-1028 1028 2013-0… 2014-0… Xano… Xanom… 71 M WHITE
4 CDISCPILOT01 01-701-1033 1033 2014-0… 2014-0… Xano… Xanom… 74 M WHITE
5 CDISCPILOT01 01-701-1034 1034 2014-0… 2014-1… Xano… Xanom… 77 F WHITE
- STUDYID: Study Identifier (the name/code of the study)
- USUBJID: Unique Subject Identifier (a unique code for each subject, used to link all domains)
- RFSTDTC: Reference Start Date/Time (usually the date of first dose)
- RFENDTC: Reference End Date/Time (when the subject finished or left the study)
- ARM/ACTARM: Planned/Actual Treatment Arm (which group the subject was assigned to)
4.3 Events Class: AE (Adverse Events)
Events domains capture discrete occurrences that happen to subjects. Each event has a start and potentially an end. For example, if a subject gets a headache during the study, it is recorded as an adverse event (AE).
# AE structure - multiple rows per subject
ae %>%
dplyr::select(USUBJID, AETERM, AEDECOD, AEBODSYS, AESEV, AESER, AESTDTC, AEENDTC) %>%
utils::head(10)# A tibble: 10 × 8
USUBJID AETERM AEDECOD AEBODSYS AESEV AESER AESTDTC AEENDTC
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 01-701-1015 APPLICATION SITE ER… APPLIC… GENERAL… MILD N 2014-0… <NA>
2 01-701-1015 APPLICATION SITE PR… APPLIC… GENERAL… MILD N 2014-0… <NA>
3 01-701-1015 DIARRHOEA DIARRH… GASTROI… MILD N 2014-0… 2014-0…
4 01-701-1023 ATRIOVENTRICULAR BL… ATRIOV… CARDIAC… MILD N 2012-0… <NA>
5 01-701-1023 ERYTHEMA ERYTHE… SKIN AN… MILD N 2012-0… 2012-0…
6 01-701-1023 ERYTHEMA ERYTHE… SKIN AN… MODE… N 2012-0… <NA>
7 01-701-1023 ERYTHEMA ERYTHE… SKIN AN… MILD N 2012-0… 2012-0…
8 01-701-1028 APPLICATION SITE ER… APPLIC… GENERAL… MILD N 2013-0… <NA>
9 01-701-1028 APPLICATION SITE PR… APPLIC… GENERAL… MILD N 2013-0… <NA>
10 01-701-1034 APPLICATION SITE PR… APPLIC… GENERAL… MILD N 2014-0… <NA>
- AETERM: The verbatim term reported by the investigator (e.g., “Headache”)
- AEDECOD: The standardized (coded) term (e.g., “HEADACHE”)
- AEBODSYS: The body system affected (e.g., “Nervous system disorders”)
- AESEV: Severity (e.g., Mild, Moderate, Severe)
- AESER: Seriousness (Yes/No)
- AESTDTC/AEENDTC: Start and end dates of the event
4.4 Interventions Class: EX (Exposure)
Interventions domains track treatments given to subjects. They include dosing information, routes, and frequencies. For example, every time a subject receives a dose of study drug, it is recorded in EX.
# EX structure - dosing records
ex %>%
dplyr::select(USUBJID, EXTRT, EXDOSE, EXDOSU, EXROUTE, EXSTDTC, EXENDTC) %>%
utils::head(10)# A tibble: 10 × 7
USUBJID EXTRT EXDOSE EXDOSU EXROUTE EXSTDTC EXENDTC
<chr> <chr> <dbl> <chr> <chr> <chr> <chr>
1 01-701-1015 PLACEBO 0 mg TRANSDERMAL 2014-01-02 2014-01-16
2 01-701-1015 PLACEBO 0 mg TRANSDERMAL 2014-01-17 2014-06-18
3 01-701-1015 PLACEBO 0 mg TRANSDERMAL 2014-06-19 2014-07-02
4 01-701-1023 PLACEBO 0 mg TRANSDERMAL 2012-08-05 2012-08-27
5 01-701-1023 PLACEBO 0 mg TRANSDERMAL 2012-08-28 2012-09-01
6 01-701-1028 XANOMELINE 54 mg TRANSDERMAL 2013-07-19 2013-08-01
7 01-701-1028 XANOMELINE 81 mg TRANSDERMAL 2013-08-02 2014-01-06
8 01-701-1028 XANOMELINE 54 mg TRANSDERMAL 2014-01-07 2014-01-14
9 01-701-1033 XANOMELINE 54 mg TRANSDERMAL 2014-03-18 2014-03-31
10 01-701-1034 XANOMELINE 54 mg TRANSDERMAL 2014-07-01 2014-07-15
- EXTRT: Name of the treatment (e.g., “Drug A”)
- EXDOSE: Dose amount
- EXDOSU: Dose units (e.g., mg)
- EXROUTE: Route of administration (e.g., Oral)
- EXSTDTC/EXENDTC: Start and end dates of dosing
4.5 Findings Class: LB (Laboratory)
Findings domains contain collected measurements. They have a standardized structure with test codes, results, units, and reference ranges. For example, each blood test result is a row in LB.
# LB structure - lab results with standardization
lb %>%
dplyr::select(USUBJID, LBTESTCD, LBTEST, LBORRES, LBORRESU, LBSTRESN, LBSTRESU) %>%
utils::head(10)# A tibble: 10 × 7
USUBJID LBTESTCD LBTEST LBORRES LBORRESU LBSTRESN LBSTRESU
<chr> <chr> <chr> <chr> <chr> <dbl> <chr>
1 01-701-1015 ALB Albumin 3.8 g/dL 38 g/L
2 01-701-1015 ALB Albumin 3.9 g/dL 39 g/L
3 01-701-1015 ALB Albumin 3.8 g/dL 38 g/L
4 01-701-1015 ALB Albumin 3.7 g/dL 37 g/L
5 01-701-1015 ALB Albumin 3.8 g/dL 38 g/L
6 01-701-1015 ALB Albumin 3.8 g/dL 38 g/L
7 01-701-1015 ALB Albumin 3.7 g/dL 37 g/L
8 01-701-1015 ALB Albumin 3.7 g/dL 37 g/L
9 01-701-1015 ALB Albumin 3.8 g/dL 38 g/L
10 01-701-1015 ALB Albumin 3.8 g/dL 38 g/L
- LBTESTCD: Short test code (e.g., “GLUC”)
- LBTEST: Full test name (e.g., “Glucose”)
- LBORRES: Original result (as collected)
- LBORRESU: Original result units (e.g., mg/dL)
- LBSTRESN: Standardized numeric result
- LBSTRESU: Standardized units (e.g., mmol/L)
5 SDTM Variable Naming Conventions
SDTM uses consistent naming patterns across all domains. Understanding these patterns helps you quickly identify variable purposes without documentation.
5.1 The Two-Character Domain Prefix
Every domain-specific variable starts with a two-character prefix that identifies the domain:
| Prefix | Domain | Example Variables |
|---|---|---|
| DM | Demographics | DMDTC, DMDY |
| AE | Adverse Events | AESTDTC, AEENDTC, AESEV |
| EX | Exposure | EXSTDTC, EXDOSE, EXROUTE |
| LB | Laboratory | LBTESTCD, LBORRES, LBSTRESN |
| VS | Vital Signs | VSTESTCD, VSORRES, VSSTRESN |
| CM | Concomitant Meds | CMTRT, CMDOSE, CMSTDTC |
5.2 Common Variable Suffixes
The suffix tells you what type of information the variable contains:
| Suffix | Meaning | Examples |
|---|---|---|
--TERM |
Reported term (verbatim) | AETERM, CMTRT |
--DECOD |
Decoded/standardized term | AEDECOD, CMDECOD |
--TESTCD |
Test code (short) | LBTESTCD, VSTESTCD |
--TEST |
Test name (full) | LBTEST, VSTEST |
--ORRES |
Original result (as collected) | LBORRES, VSORRES |
--ORRESU |
Original result units | LBORRESU, VSORRESU |
--STRESN |
Standardized result (numeric) | LBSTRESN, VSSTRESN |
--STRESU |
Standardized result units | LBSTRESU, VSSTRESU |
--STDTC |
Start date/time | AESTDTC, EXSTDTC, CMSTDTC |
--ENDTC |
End date/time | AEENDTC, EXENDTC, CMENDTC |
--DY |
Study day | AESTDY, AEENDY, EXSTDY |
--SEV |
Severity | AESEV |
--SER |
Serious (Y/N) | AESER |
--REL |
Relationship/Causality | AEREL |
5.3 Understanding the Pattern: Findings Class
Let’s examine how the naming convention works in the Findings class (LB, VS):
# The Findings class pattern in LB
lb %>%
dplyr::select(
USUBJID,
LBTESTCD, # Test code (short identifier)
LBTEST, # Test name (full description)
LBORRES, # Original result (as collected)
LBORRESU, # Original units
LBSTRESC, # Standardized result (character)
LBSTRESN, # Standardized result (numeric)
LBSTRESU # Standardized units
) %>%
dplyr::filter(LBTESTCD == "ALT") %>%
utils::head(5)# A tibble: 5 × 8
USUBJID LBTESTCD LBTEST LBORRES LBORRESU LBSTRESC LBSTRESN LBSTRESU
<chr> <chr> <chr> <chr> <chr> <chr> <dbl> <chr>
1 01-701-1015 ALT Alanine Amin… 27 U/L 27 27 U/L
2 01-701-1015 ALT Alanine Amin… 41 U/L 41 41 U/L
3 01-701-1015 ALT Alanine Amin… 18 U/L 18 18 U/L
4 01-701-1015 ALT Alanine Amin… 26 U/L 26 26 U/L
5 01-701-1015 ALT Alanine Amin… 22 U/L 22 22 U/L
- ORRES/ORRESU: Original result as collected (might be “< 5”, “>100”, or character values)
- STRESN/STRESU: Standardized numeric result for analysis (handles unit conversions)
This separation preserves the original data while providing analysis-ready values.
6 Tidyverse Mastery: 10 Essential Transformations
The deliverable for Day 2 is to demonstrate 10 tidyverse transformations. Let’s work through them systematically.
6.1 Transformation 1: filter() with Multiple Conditions
Filter AEs that are both serious AND severe.
# Complex filtering: Serious AND Severe AEs
serious_severe_ae <- ae %>%
dplyr::filter(AESER == "Y" & AESEV == "SEVERE")
cat("Total AEs:", nrow(ae), "\n")Total AEs: 1191
cat("Serious + Severe AEs:", nrow(serious_severe_ae), "\n")Serious + Severe AEs: 2
# Preview
serious_severe_ae %>%
dplyr::select(USUBJID, AEDECOD, AESEV, AESER) %>%
utils::head()# A tibble: 2 × 4
USUBJID AEDECOD AESEV AESER
<chr> <chr> <chr> <chr>
1 01-718-1170 SYNCOPE SEVERE Y
2 01-718-1371 PARTIAL SEIZURES WITH SECONDARY GENERALISATION SEVERE Y
6.2 Transformation 2: filter() with %in% for Multiple Values
Filter specific test codes from the lab data.
# Filter for liver function tests
liver_tests <- c("ALT", "AST", "BILI", "ALP")
lft_data <- lb %>%
dplyr::filter(LBTESTCD %in% liver_tests)
cat("Total LB records:", nrow(lb), "\n")Total LB records: 59580
cat("Liver function tests:", nrow(lft_data), "\n")Liver function tests: 7266
# Count by test
lft_data %>%
dplyr::count(LBTESTCD, LBTEST)# A tibble: 4 × 3
LBTESTCD LBTEST n
<chr> <chr> <int>
1 ALP Alkaline Phosphatase 1824
2 ALT Alanine Aminotransferase 1814
3 AST Aspartate Aminotransferase 1814
4 BILI Bilirubin 1814
6.3 Transformation 3: select() with Helper Functions
Use select helpers to choose columns by pattern.
# Select all date columns from AE
ae_dates <- ae %>%
dplyr::select(USUBJID, dplyr::ends_with("DTC"))
names(ae_dates)[1] "USUBJID" "AEDTC" "AESTDTC" "AEENDTC"
# Select all result columns from LB
lb_results <- lb %>%
dplyr::select(USUBJID, LBTESTCD, dplyr::contains("RES"))
names(lb_results)[1] "USUBJID" "LBTESTCD" "LBORRES" "LBORRESU" "LBSTRESC" "LBSTRESN" "LBSTRESU"
6.4 Transformation 4: mutate() with case_when()
Create a severity score based on AE severity.
# Create numeric severity score
ae_scored <- ae %>%
dplyr::mutate(
AESEV_N = dplyr::case_when(
AESEV == "MILD" ~ 1L,
AESEV == "MODERATE" ~ 2L,
AESEV == "SEVERE" ~ 3L,
TRUE ~ NA_integer_
)
)
# Check the mapping
ae_scored %>%
dplyr::distinct(AESEV, AESEV_N) %>%
dplyr::arrange(AESEV_N)# A tibble: 3 × 2
AESEV AESEV_N
<chr> <int>
1 MILD 1
2 MODERATE 2
3 SEVERE 3
6.5 Transformation 5: mutate() with across()
Apply transformations to multiple columns at once.
# Convert all character date columns to uppercase (demonstration)
ae_upper <- ae %>%
dplyr::mutate(
dplyr::across(
dplyr::where(is.character),
~ toupper(.x)
)
)
# This is useful for standardizing text fields
ae_upper %>%
dplyr::select(USUBJID, AETERM, AEDECOD) %>%
utils::head(3)# A tibble: 3 × 3
USUBJID AETERM AEDECOD
<chr> <chr> <chr>
1 01-701-1015 APPLICATION SITE ERYTHEMA APPLICATION SITE ERYTHEMA
2 01-701-1015 APPLICATION SITE PRURITUS APPLICATION SITE PRURITUS
3 01-701-1015 DIARRHOEA DIARRHOEA
6.6 Transformation 6: group_by() + summarise() - AE Summary by SOC
This is the key deliverable: Create an AE summary by System Organ Class.
# AE Summary by System Organ Class (SOC)
ae_soc_summary <- ae %>%
dplyr::group_by(AEBODSYS) %>%
dplyr::summarise(
n_events = dplyr::n(),
n_subjects = dplyr::n_distinct(USUBJID),
n_serious = sum(AESER == "Y", na.rm = TRUE),
n_severe = sum(AESEV == "SEVERE", na.rm = TRUE),
.groups = "drop"
) %>%
dplyr::arrange(dplyr::desc(n_events))
# Display the summary
ae_soc_summary# A tibble: 23 × 5
AEBODSYS n_events n_subjects n_serious n_severe
<chr> <int> <int> <int> <int>
1 GENERAL DISORDERS AND ADMINISTRATION … 292 108 0 10
2 SKIN AND SUBCUTANEOUS TISSUE DISORDERS 276 105 0 6
3 NERVOUS SYSTEM DISORDERS 101 59 3 10
4 CARDIAC DISORDERS 91 44 0 4
5 GASTROINTESTINAL DISORDERS 87 53 0 2
6 INFECTIONS AND INFESTATIONS 73 39 0 1
7 RESPIRATORY, THORACIC AND MEDIASTINAL… 53 30 0 0
8 INVESTIGATIONS 40 23 0 0
9 PSYCHIATRIC DISORDERS 40 29 0 2
10 INJURY, POISONING AND PROCEDURAL COMP… 29 14 0 3
# ℹ 13 more rows
6.7 Transformation 7: group_by() + summarise() - By Treatment Arm
Summarize by treatment arm using joined data.
# First, join AE with DM to get treatment arm
ae_with_arm <- ae %>%
dplyr::inner_join(
dm %>% dplyr::select(USUBJID, ARM),
by = "USUBJID"
)
# Now summarize by treatment arm
ae_arm_summary <- ae_with_arm %>%
dplyr::group_by(ARM) %>%
dplyr::summarise(
n_subjects_with_ae = dplyr::n_distinct(USUBJID),
total_aes = dplyr::n(),
mean_aes_per_subject = round(dplyr::n() / dplyr::n_distinct(USUBJID), 1),
.groups = "drop"
)
ae_arm_summary# A tibble: 3 × 4
ARM n_subjects_with_ae total_aes mean_aes_per_subject
<chr> <int> <int> <dbl>
1 Placebo 69 301 4.4
2 Xanomeline High Dose 79 455 5.8
3 Xanomeline Low Dose 77 435 5.6
6.8 Transformation 8: pivot_wider() - Reshape Lab Data
Convert long lab data to wide format (one row per subject/visit with columns for each test).
# Pivot LB data to wide format
lb_wide <- lb %>%
dplyr::filter(LBTESTCD %in% c("ALT", "AST", "BILI")) %>%
dplyr::select(USUBJID, VISIT, LBTESTCD, LBSTRESN) %>%
tidyr::pivot_wider(
names_from = LBTESTCD,
values_from = LBSTRESN,
values_fn = mean # Handle multiple values per cell
)
utils::head(lb_wide, 10)# A tibble: 10 × 5
USUBJID VISIT ALT AST BILI
<chr> <chr> <dbl> <dbl> <dbl>
1 01-701-1015 SCREENING 1 27 40 10.3
2 01-701-1015 WEEK 2 41 33 8.55
3 01-701-1015 WEEK 4 18 21 6.84
4 01-701-1015 WEEK 6 26 26 5.13
5 01-701-1015 WEEK 8 22 21 8.55
6 01-701-1015 WEEK 12 27 22 6.84
7 01-701-1015 WEEK 16 17 23 6.84
8 01-701-1015 WEEK 20 21 19 6.84
9 01-701-1015 WEEK 24 23 23 6.84
10 01-701-1015 WEEK 26 23 19 5.13
6.9 Transformation 9: pivot_longer() - Reshape Wide to Long
Convert wide data back to long format (useful for plotting).
# Pivot back to long format
lb_long <- lb_wide %>%
tidyr::pivot_longer(
cols = c(ALT, AST, BILI),
names_to = "LBTESTCD",
values_to = "LBSTRESN"
) %>%
dplyr::filter(!is.na(LBSTRESN))
utils::head(lb_long, 10)# A tibble: 10 × 4
USUBJID VISIT LBTESTCD LBSTRESN
<chr> <chr> <chr> <dbl>
1 01-701-1015 SCREENING 1 ALT 27
2 01-701-1015 SCREENING 1 AST 40
3 01-701-1015 SCREENING 1 BILI 10.3
4 01-701-1015 WEEK 2 ALT 41
5 01-701-1015 WEEK 2 AST 33
6 01-701-1015 WEEK 2 BILI 8.55
7 01-701-1015 WEEK 4 ALT 18
8 01-701-1015 WEEK 4 AST 21
9 01-701-1015 WEEK 4 BILI 6.84
10 01-701-1015 WEEK 6 ALT 26
6.10 Transformation 10: left_join() + anti_join() - Data Quality Check
Find subjects with exposure but no adverse events (data quality check).
# Subjects who were dosed (in EX)
subjects_dosed <- ex %>%
dplyr::distinct(USUBJID)
# Subjects with AEs
subjects_with_ae <- ae %>%
dplyr::distinct(USUBJID)
# Find dosed subjects with NO AEs
dosed_no_ae <- subjects_dosed %>%
dplyr::anti_join(subjects_with_ae, by = "USUBJID")
cat("Subjects dosed:", nrow(subjects_dosed), "\n")Subjects dosed: 254
cat("Subjects with AEs:", nrow(subjects_with_ae), "\n")Subjects with AEs: 225
cat("Dosed subjects with NO AEs:", nrow(dosed_no_ae), "\n")Dosed subjects with NO AEs: 29
7 Advanced: Combining Multiple Operations
Real clinical programming often chains many operations together. Here’s a complete example:
# Complete analysis: AE summary by SOC and Treatment Arm
complete_ae_summary <- ae %>%
# Join with demographics
dplyr::inner_join(
dm %>% dplyr::select(USUBJID, ARM, AGE, SEX),
by = "USUBJID"
) %>%
# Create severity score
dplyr::mutate(
AESEV_N = dplyr::case_when(
AESEV == "MILD" ~ 1L,
AESEV == "MODERATE" ~ 2L,
AESEV == "SEVERE" ~ 3L,
TRUE ~ NA_integer_
)
) %>%
# Group and summarize
dplyr::group_by(ARM, AEBODSYS) %>%
dplyr::summarise(
n_events = dplyr::n(),
n_subjects = dplyr::n_distinct(USUBJID),
n_serious = sum(AESER == "Y", na.rm = TRUE),
mean_severity = round(mean(AESEV_N, na.rm = TRUE), 2),
.groups = "drop"
) %>%
# Filter to most common SOCs
dplyr::filter(n_events >= 5) %>%
# Sort
dplyr::arrange(ARM, dplyr::desc(n_events))
# Display top results
utils::head(complete_ae_summary, 15)# A tibble: 15 × 6
ARM AEBODSYS n_events n_subjects n_serious mean_severity
<chr> <chr> <int> <int> <int> <dbl>
1 Placebo GENERAL DISORDERS AND AD… 48 21 0 1.25
2 Placebo SKIN AND SUBCUTANEOUS TI… 47 21 0 1.4
3 Placebo INFECTIONS AND INFESTATI… 35 16 0 1.2
4 Placebo CARDIAC DISORDERS 27 13 0 1.41
5 Placebo GASTROINTESTINAL DISORDE… 26 17 0 1.15
6 Placebo INVESTIGATIONS 19 10 0 1.16
7 Placebo NERVOUS SYSTEM DISORDERS 16 12 0 1.38
8 Placebo RESPIRATORY, THORACIC AN… 15 10 0 1.07
9 Placebo PSYCHIATRIC DISORDERS 14 10 0 1.43
10 Placebo INJURY, POISONING AND PR… 9 4 0 1.44
11 Placebo EYE DISORDERS 8 4 0 1.38
12 Placebo METABOLISM AND NUTRITION… 8 6 0 1.25
13 Placebo MUSCULOSKELETAL AND CONN… 8 5 0 1.5
14 Placebo VASCULAR DISORDERS 7 3 0 1.43
15 Placebo RENAL AND URINARY DISORD… 5 4 0 1.2
8 Day 2 Deliverable Summary
Today you completed the following:
| Task | Status |
|---|---|
| Understood SDTM domain classes | Done |
| Learned variable naming conventions | Done |
| Transformation 1: filter() with multiple conditions | Done |
| Transformation 2: filter() with %in% | Done |
| Transformation 3: select() with helper functions | Done |
| Transformation 4: mutate() with case_when() | Done |
| Transformation 5: mutate() with across() | Done |
| Transformation 6: group_by + summarise (AE by SOC) | Done |
| Transformation 7: Summarize by treatment arm | Done |
| Transformation 8: pivot_wider() | Done |
| Transformation 9: pivot_longer() | Done |
| Transformation 10: anti_join() for data QC | Done |
9 Key Takeaways
- SDTM has four domain classes: Special Purpose, Events, Interventions, Findings
- Variable naming is systematic: Prefix (domain) + suffix (content type)
--TESTCDvs--TEST: Short code vs full name--ORRESvs--STRESN: Original vs standardized results- Tidyverse is foundational: 50% of clinical programming is dplyr/tidyr operations
10 Resources
- CDISC SDTM Implementation Guide v3.4 - Official SDTM documentation
- tidyverse.org - Complete tidyverse documentation
- pharmaverse blog: ‘The pharmaverse (hi)story’ - Background on pharmaverse development
- dplyr cheatsheet - Quick reference for dplyr
11 What’s Next?
In Day 3, we’ll explore:
- Controlled Terminology and why it matters for submissions
- MedDRA coding and creating lookup tables
- CDISC CT concepts: AESEV, AEREL, AESER values
- How metacore + metatools automate CT validation