30 Days of Pharmaverse
  • Week 1: SDTM Fundamentals
  • Week 2: Production SDTM
  • Week 3: ADaM Deep Dive
  • Week 4: Tables, Listings and Figures
  1. Day 10: AE Domain Mastery & SAE Logic
  • Day 8: Complex SDTM Domains - LB (Lab Results)
  • Day 9: VS (Vital Signs) & Repeated Measures
  • Day 10: AE Domain Mastery & SAE Logic
  • Day 11: Disposition (DS) & Trial Design Domains
  • Day 12: Data Cuts with datacutr
  • Day 13: SDTM Validation with sdtmchecks
  • Day 14: Week 2 Capstone - Metadata-Driven SDTM with metacore & xportr

On this page

  • 1 Learning Objectives
  • 2 Why This Day Matters
  • 3 Package Installation & Loading
    • 3.1 Required Packages
    • 3.2 Install & Load
  • 4 Exploring the AE Domain from pharmaversesdtm
    • 4.1 Load and Inspect
    • 4.2 Understanding What Each AE Variable Means
  • 5 Severity vs. Toxicity Grading
    • 5.1 Severity Grading (AESEV)
    • 5.2 Toxicity Grading (AETOXGR - CTCAE Scale)
  • 6 Serious Adverse Events (SAEs)
    • 6.1 What Makes an AE “Serious”?
    • 6.2 Exploring SAE Data
    • 6.3 SAE Criteria Breakdown
    • 6.4 Simulating SAE Logic from Scratch
  • 7 Treatment-Emergent Adverse Events (TEAEs)
    • 7.1 What is a TEAE?
    • 7.2 The TEAE Decision Tree
    • 7.3 Deriving TEAEs in Code
  • 8 AE Duration Calculations
    • 8.1 Computing Duration in Days
  • 9 Causality Assessment
    • 9.1 How Causality Is Determined
    • 9.2 Deriving a Binary Relatedness Flag
  • 10 Action Taken and Outcome
    • 10.1 AEACN - Action Taken with Study Drug
    • 10.2 AEOUT - Outcome of the AE
  • 11 Complete Example: Production AE Processing
  • 12 Key AE Counts for Safety Reporting
  • 13 Preview: From AE to ADAE
  • 14 Deliverable Summary
  • 15 Key Takeaways
  • 16 Resources
  • 17 What’s Next?

Day 10: AE Domain Mastery & SAE Logic

Deep Dive into Severity, Causality, and Outcomes

← Back to Roadmap

1 Learning Objectives

By the end of Day 10, you will be able to:

  1. Understand the AE domain at a production level - beyond the basics covered in Day 7
  2. Distinguish between severity grading (MILD/MODERATE/SEVERE) and toxicity grading (CTCAE Grade 1-5)
  3. Work with all SAE-related variables: AESER, AESDTH, AESLIFE, AESHOSP, AESDISAB, AESCONG, AESMIE
  4. Derive treatment-emergent adverse events (TEAEs) - one of the most critical derivations in clinical programming
  5. Calculate AE duration and handle ongoing AEs correctly
  6. Understand how AE data maps to the ADaM ADAE dataset

2 Why This Day Matters

On Day 7 we built a basic AE domain as part of the capstone. Today we go much deeper - because in production clinical programming, AE data is arguably the most scrutinized dataset by regulatory agencies. Getting AE logic wrong can delay a submission or trigger FDA queries.

ImportantThe Stakes Are High

Adverse event data directly impacts:

  • Patient safety decisions - Should the trial continue?
  • Drug labeling - What warnings go on the label?
  • Regulatory approval - Is the risk-benefit profile acceptable?
  • Post-marketing surveillance - What to watch for after approval

Every variable, every flag, every derivation matters.


3 Package Installation & Loading

3.1 Required Packages

Package Purpose
dplyr Data manipulation (filter, mutate, joins)
lubridate Date/time arithmetic for AE durations
pharmaversesdtm Example SDTM datasets including AE and DM

3.2 Install & Load

if (!requireNamespace("dplyr", quietly = TRUE)) install.packages("dplyr")
if (!requireNamespace("lubridate", quietly = TRUE)) install.packages("lubridate")
if (!requireNamespace("pharmaversesdtm", quietly = TRUE)) install.packages("pharmaversesdtm")

library(dplyr)
library(lubridate)
library(pharmaversesdtm)

4 Exploring the AE Domain from pharmaversesdtm

4.1 Load and Inspect

# Load AE and DM domains
data("ae", package = "pharmaversesdtm")
data("dm", package = "pharmaversesdtm")

cat("AE domain dimensions:", nrow(ae), "rows x", ncol(ae), "columns\n")
AE domain dimensions: 1191 rows x 35 columns
cat("Number of unique subjects:", n_distinct(ae$USUBJID), "\n")
Number of unique subjects: 225 
cat("Variables available:\n")
Variables available:
cat(paste(names(ae), collapse = ", "), "\n")
STUDYID, DOMAIN, USUBJID, AESEQ, AESPID, AETERM, AELLT, AELLTCD, AEDECOD, AEPTCD, AEHLT, AEHLTCD, AEHLGT, AEHLGTCD, AEBODSYS, AEBDSYCD, AESOC, AESOCCD, AESEV, AESER, AEACN, AEREL, AEOUT, AESCAN, AESCONG, AESDISAB, AESDTH, AESHOSP, AESLIFE, AESOD, AEDTC, AESTDTC, AEENDTC, AESTDY, AEENDY 

4.2 Understanding What Each AE Variable Means

Let’s look at the actual data - every column tells a story:

dplyr::glimpse(ae)
Rows: 1,191
Columns: 35
$ STUDYID  <chr> "CDISCPILOT01", "CDISCPILOT01", "CDISCPILOT01", "CDISCPILOT01…
$ DOMAIN   <chr> "AE", "AE", "AE", "AE", "AE", "AE", "AE", "AE", "AE", "AE", "…
$ USUBJID  <chr> "01-701-1015", "01-701-1015", "01-701-1015", "01-701-1023", "…
$ AESEQ    <dbl> 1, 2, 3, 3, 1, 2, 4, 1, 2, 1, 2, 4, 1, 2, 3, 4, 10, 3, 1, 9, …
$ AESPID   <chr> "E07", "E08", "E06", "E10", "E08", "E09", "E08", "E04", "E05"…
$ AETERM   <chr> "APPLICATION SITE ERYTHEMA", "APPLICATION SITE PRURITUS", "DI…
$ AELLT    <chr> "APPLICATION SITE REDNESS", "APPLICATION SITE ITCHING", "DIAR…
$ AELLTCD  <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ AEDECOD  <chr> "APPLICATION SITE ERYTHEMA", "APPLICATION SITE PRURITUS", "DI…
$ AEPTCD   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ AEHLT    <chr> "HLT_0617", "HLT_0317", "HLT_0148", "HLT_0415", "HLT_0284", "…
$ AEHLTCD  <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ AEHLGT   <chr> "HLGT_0152", "HLGT_0338", "HLGT_0588", "HLGT_0086", "HLGT_019…
$ AEHLGTCD <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ AEBODSYS <chr> "GENERAL DISORDERS AND ADMINISTRATION SITE CONDITIONS", "GENE…
$ AEBDSYCD <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ AESOC    <chr> "GENERAL DISORDERS AND ADMINISTRATION SITE CONDITIONS", "GENE…
$ AESOCCD  <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ AESEV    <chr> "MILD", "MILD", "MILD", "MILD", "MILD", "MODERATE", "MILD", "…
$ AESER    <chr> "N", "N", "N", "N", "N", "N", "N", "N", "N", "N", "N", "N", "…
$ AEACN    <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ AEREL    <chr> "PROBABLE", "PROBABLE", "REMOTE", "POSSIBLE", "POSSIBLE", "PR…
$ AEOUT    <chr> "NOT RECOVERED/NOT RESOLVED", "NOT RECOVERED/NOT RESOLVED", "…
$ AESCAN   <chr> "N", "N", "N", "N", "N", "N", "N", "N", "N", "N", "N", "N", "…
$ AESCONG  <chr> "N", "N", "N", "N", "N", "N", "N", "N", "N", "N", "N", "N", "…
$ AESDISAB <chr> "N", "N", "N", "N", "N", "N", "N", "N", "N", "N", "N", "N", "…
$ AESDTH   <chr> "N", "N", "N", "N", "N", "N", "N", "N", "N", "N", "N", "N", "…
$ AESHOSP  <chr> "N", "N", "N", "N", "N", "N", "N", "N", "N", "N", "N", "N", "…
$ AESLIFE  <chr> "N", "N", "N", "N", "N", "N", "N", "N", "N", "N", "N", "N", "…
$ AESOD    <chr> "N", "N", "N", "N", "N", "N", "N", "N", "N", "N", "N", "N", "…
$ AEDTC    <chr> "2014-01-16", "2014-01-16", "2014-01-16", "2012-08-27", "2012…
$ AESTDTC  <chr> "2014-01-03", "2014-01-03", "2014-01-09", "2012-08-26", "2012…
$ AEENDTC  <chr> NA, NA, "2014-01-11", NA, "2012-08-30", NA, "2012-08-30", NA,…
$ AESTDY   <dbl> 2, 2, 8, 22, 3, 3, 3, 3, 21, 58, 125, 27, 1, 1, 23, 52, 52, 5…
$ AEENDY   <dbl> NA, NA, 10, NA, 26, NA, 26, NA, NA, NA, NA, NA, 1, 1, NA, NA,…
NoteAE Variables Quick Reference

Here’s what the key variables represent in plain English:

Variable What it means Example
AETERM AE as reported by investigator “Headache”
AEDECOD Standardized preferred term (MedDRA) “HEADACHE”
AEBODSYS Body system (MedDRA SOC) “NERVOUS SYSTEM DISORDERS”
AESEV Severity: MILD, MODERATE, SEVERE “MODERATE”
AESER Is it serious? Y/N “N”
AEREL Related to study drug? “POSSIBLY RELATED”
AEACN Action taken with study drug “DOSE NOT CHANGED”
AEOUT Outcome “RECOVERED/RESOLVED”
AESTDTC Start date (ISO 8601) “2014-01-03”
AEENDTC End date (ISO 8601) “2014-01-12”

5 Severity vs. Toxicity Grading

This is a concept that confuses many programmers. Let’s clarify it with code.

5.1 Severity Grading (AESEV)

Severity describes the intensity of the adverse event. It’s a clinical judgment:

  • MILD: Awareness of event but easily tolerated
  • MODERATE: Discomfort causing interference with usual activity
  • SEVERE: Incapacitating; unable to do usual activities
# What severity values exist in our data?
ae %>%
  count(AESEV, name = "Count") %>%
  mutate(Percent = round(100 * Count / sum(Count), 1)) %>%
  arrange(match(AESEV, c("MILD", "MODERATE", "SEVERE")))
# A tibble: 3 × 3
  AESEV    Count Percent
  <chr>    <int>   <dbl>
1 MILD       770    64.7
2 MODERATE   378    31.7
3 SEVERE      43     3.6

5.2 Toxicity Grading (AETOXGR - CTCAE Scale)

Toxicity grading uses the CTCAE (Common Terminology Criteria for Adverse Events) scale and is more granular:

Grade Description
1 Mild; asymptomatic or mild symptoms
2 Moderate; minimal, local, or non-invasive intervention indicated
3 Severe or medically significant; hospitalization indicated
4 Life-threatening; urgent intervention indicated
5 Death related to AE
# Check if AETOXGR exists in our data
if ("AETOXGR" %in% names(ae)) {
  ae %>%
    count(AETOXGR, AESEV) %>%
    arrange(AETOXGR)
} else {
  cat("AETOXGR is not present in the pharmaversesdtm AE dataset.\n")
  cat("This is common - not all studies use CTCAE grading.\n\n")
  
  cat("When AETOXGR IS available, it goes in the AE domain as:\n")
  cat("  AETOXGR = Toxicity grade (1-5)\n")
  cat("  AETOXGRS = Toxicity grade from source\n")
}
AETOXGR is not present in the pharmaversesdtm AE dataset.
This is common - not all studies use CTCAE grading.

When AETOXGR IS available, it goes in the AE domain as:
  AETOXGR = Toxicity grade (1-5)
  AETOXGRS = Toxicity grade from source
ImportantSeverity ≠ Seriousness

This is the #1 most common confusion in clinical programming:

  • AESEV (Severity) = How intense is the event? (MILD/MODERATE/SEVERE)
  • AESER (Seriousness) = Does it meet SAE criteria? (Y/N)

A MILD rash could be an SAE if it requires hospitalization. A SEVERE headache may NOT be an SAE if it resolves quickly with OTC medication.

Severity describes intensity. Seriousness describes regulatory significance.


6 Serious Adverse Events (SAEs)

6.1 What Makes an AE “Serious”?

An adverse event is classified as serious (AESER = “Y”) if it meets any of the following criteria:

┌─────────────────────────────────────────────────────────────────────┐
│                   SAE CRITERIA VARIABLES                            │
├─────────────────────────────────────────────────────────────────────┤
│  AESER    = "Y" if ANY of the following are "Y":                    │
│                                                                     │
│  AESDTH   = Results in Death                                        │
│  AESLIFE  = Is Life-Threatening                                     │
│  AESHOSP  = Requires or Prolongs Hospitalization                    │
│  AESDISAB = Results in Persistent/Significant Disability            │
│  AESCONG  = Congenital Anomaly/Birth Defect                         │
│  AESMIE   = Other Medically Important Event                         │
└─────────────────────────────────────────────────────────────────────┘

6.2 Exploring SAE Data

# How many SAEs in our data?
ae %>%
  count(AESER, name = "Count") %>%
  mutate(Percent = round(100 * Count / sum(Count), 1))
# A tibble: 2 × 3
  AESER Count Percent
  <chr> <int>   <dbl>
1 N      1188    99.7
2 Y         3     0.3

6.3 SAE Criteria Breakdown

# Check which SAE criteria variables are available
sae_vars <- c("AESDTH", "AESLIFE", "AESHOSP", "AESDISAB", "AESCONG", "AESMIE")
available_sae_vars <- sae_vars[sae_vars %in% names(ae)]

cat("SAE criteria variables available:", paste(available_sae_vars, collapse = ", "), "\n\n")
SAE criteria variables available: AESDTH, AESLIFE, AESHOSP, AESDISAB, AESCONG 
if (length(available_sae_vars) > 0) {
  # Show SAE details
  ae %>%
    filter(AESER == "Y") %>%
    select(USUBJID, AEDECOD, AESEV, AESER, any_of(sae_vars)) %>%
    head(10)
} else {
  cat("No SAE criteria sub-variables found in this dataset.\n")
  cat("In production, you would create them from raw CRF data.\n")
}
# A tibble: 3 × 9
  USUBJID     AEDECOD        AESEV AESER AESDTH AESLIFE AESHOSP AESDISAB AESCONG
  <chr>       <chr>          <chr> <chr> <chr>  <chr>   <chr>   <chr>    <chr>  
1 01-709-1424 SYNCOPE        MODE… Y     N      Y       N       N        N      
2 01-718-1170 SYNCOPE        SEVE… Y     N      N       Y       N        N      
3 01-718-1371 PARTIAL SEIZU… SEVE… Y     N      N       Y       N        N      

6.4 Simulating SAE Logic from Scratch

Since the practice dataset may not have all SAE sub-variables, let’s build the logic ourselves. This is exactly what you’d do on a real study:

# Create simulated AE data with SAE criteria
set.seed(42)

ae_sample <- tibble(
  USUBJID  = rep(paste0("CDISC01-001-00", 1:5), each = 4),
  AESEQ    = rep(1:4, 5),
  AEDECOD  = sample(c("HEADACHE", "NAUSEA", "RASH", "FALL", "PNEUMONIA",
                       "MYOCARDIAL INFARCTION", "SEIZURE", "ANEMIA"), 20, replace = TRUE),
  AESEV    = sample(c("MILD", "MODERATE", "SEVERE"), 20, replace = TRUE, 
                    prob = c(0.5, 0.35, 0.15)),
  # SAE criteria - simulate realistic probabilities
  AESDTH   = sample(c("Y", "N"), 20, replace = TRUE, prob = c(0.02, 0.98)),
  AESLIFE  = sample(c("Y", "N"), 20, replace = TRUE, prob = c(0.05, 0.95)),
  AESHOSP  = sample(c("Y", "N"), 20, replace = TRUE, prob = c(0.10, 0.90)),
  AESDISAB = sample(c("Y", "N"), 20, replace = TRUE, prob = c(0.03, 0.97)),
  AESCONG  = "N",  # Very rare in adult trials
  AESMIE   = sample(c("Y", "N"), 20, replace = TRUE, prob = c(0.05, 0.95))
)

# Derive AESER: "Y" if ANY criterion is "Y"
ae_with_ser <- ae_sample %>%
  mutate(
    AESER = case_when(
      AESDTH   == "Y" ~ "Y",
      AESLIFE  == "Y" ~ "Y",
      AESHOSP  == "Y" ~ "Y",
      AESDISAB == "Y" ~ "Y",
      AESCONG  == "Y" ~ "Y",
      AESMIE   == "Y" ~ "Y",
      TRUE ~ "N"
    )
  )

cat("SAE derivation results:\n")
SAE derivation results:
ae_with_ser %>%
  count(AESER, name = "Count") %>%
  mutate(Percent = round(100 * Count / sum(Count), 1))
# A tibble: 2 × 3
  AESER Count Percent
  <chr> <int>   <dbl>
1 N        16      80
2 Y         4      20
# View SAE records with their criteria
cat("\nSAE records detail:\n")

SAE records detail:
ae_with_ser %>%
  filter(AESER == "Y") %>%
  select(USUBJID, AEDECOD, AESEV, AESER, AESDTH, AESLIFE, AESHOSP, AESDISAB, AESMIE)
# A tibble: 4 × 9
  USUBJID         AEDECOD   AESEV   AESER AESDTH AESLIFE AESHOSP AESDISAB AESMIE
  <chr>           <chr>     <chr>   <chr> <chr>  <chr>   <chr>   <chr>    <chr> 
1 CDISC01-001-001 PNEUMONIA MILD    Y     N      Y       N       N        N     
2 CDISC01-001-002 NAUSEA    SEVERE  Y     N      N       N       N        Y     
3 CDISC01-001-004 FALL      MODERA… Y     N      N       Y       N        N     
4 CDISC01-001-004 HEADACHE  MILD    Y     N      N       Y       N        N     
TipRead the Logic Carefully

The case_when() above uses a waterfall approach - it checks each criterion in sequence and returns "Y" at the first match. This works well for deriving AESER, but in production you’d typically use a more explicit approach:

AESER = if_else(
  AESDTH == "Y" | AESLIFE == "Y" | AESHOSP == "Y" | 
  AESDISAB == "Y" | AESCONG == "Y" | AESMIE == "Y",
  "Y", "N"
)

Both approaches give the same result. The if_else() version makes the OR-logic more explicit.


7 Treatment-Emergent Adverse Events (TEAEs)

7.1 What is a TEAE?

A treatment-emergent adverse event is an AE that:

  1. Started on or after the first dose of study drug, OR
  2. Was present before treatment but worsened after the first dose

This is one of the most important derivations in clinical programming because most safety analyses focus exclusively on TEAEs.

7.2 The TEAE Decision Tree

┌─────────────────────────────────────────────────────────────────────┐
│                      TEAE DECISION LOGIC                            │
├─────────────────────────────────────────────────────────────────────┤
│                                                                     │
│  Is AE Start Date (AESTDTC) >= First Dose Date (RFSTDTC)?          │
│    YES → TEAE = "Y"                                                │
│    NO  → Was severity worse than pre-treatment?                     │
│           YES → TEAE = "Y"                                         │
│           NO  → TEAE = "N" (Pre-treatment AE, not worsened)        │
│                                                                     │
│  Is AE Start Date missing?                                          │
│    → Compare AE End Date with First Dose Date                       │
│    → If AEENDTC >= RFSTDTC, treat as potentially TEAE              │
│    → Flag for medical review                                        │
└─────────────────────────────────────────────────────────────────────┘

7.3 Deriving TEAEs in Code

# Load DM for reference dates
data("dm", package = "pharmaversesdtm")
data("ae", package = "pharmaversesdtm")

# Get reference start date per subject
ref_dates <- dm %>%
  select(USUBJID, RFSTDTC) %>%
  filter(!is.na(RFSTDTC))

cat("Reference dates (first dose) for sample subjects:\n")
Reference dates (first dose) for sample subjects:
head(ref_dates)
# A tibble: 6 × 2
  USUBJID     RFSTDTC   
  <chr>       <chr>     
1 01-701-1015 2014-01-02
2 01-701-1023 2012-08-05
3 01-701-1028 2013-07-19
4 01-701-1033 2014-03-18
5 01-701-1034 2014-07-01
6 01-701-1047 2013-02-12
# Join AE with reference dates and derive TEAE flag
ae_with_teae <- ae %>%
  left_join(ref_dates, by = "USUBJID") %>%
  mutate(
    # Parse dates - handle potential partial dates
    ae_start = ymd(AESTDTC),
    ref_start = ymd(RFSTDTC),
    
    # Core TEAE derivation
    TRTEMFL = case_when(
      # Case 1: AE starts on or after first dose
      !is.na(ae_start) & !is.na(ref_start) & ae_start >= ref_start ~ "Y",
      
      # Case 2: AE start date is missing - conservative approach
      is.na(ae_start) & !is.na(AEENDTC) & !is.na(ref_start) &
        ymd(AEENDTC) >= ref_start ~ "Y",
      
      # Case 3: Both dates available, AE started before treatment
      !is.na(ae_start) & !is.na(ref_start) & ae_start < ref_start ~ "N",
      
      # Case 4: Cannot determine
      TRUE ~ NA_character_
    )
  )

# Summary
cat("TEAE derivation results:\n")
TEAE derivation results:
ae_with_teae %>%
  count(TRTEMFL, name = "Count") %>%
  mutate(Percent = round(100 * Count / sum(Count), 1))
# A tibble: 3 × 3
  TRTEMFL Count Percent
  <chr>   <int>   <dbl>
1 N          45     3.8
2 Y        1131    95  
3 <NA>       15     1.3
# Show examples of TEAE vs non-TEAE
cat("\nSample of TEAE records (started on/after first dose):\n")

Sample of TEAE records (started on/after first dose):
ae_with_teae %>%
  filter(TRTEMFL == "Y") %>%
  select(USUBJID, AEDECOD, AESTDTC, RFSTDTC, TRTEMFL) %>%
  head(5)
# A tibble: 5 × 5
  USUBJID     AEDECOD                              AESTDTC    RFSTDTC    TRTEMFL
  <chr>       <chr>                                <chr>      <chr>      <chr>  
1 01-701-1015 APPLICATION SITE ERYTHEMA            2014-01-03 2014-01-02 Y      
2 01-701-1015 APPLICATION SITE PRURITUS            2014-01-03 2014-01-02 Y      
3 01-701-1015 DIARRHOEA                            2014-01-09 2014-01-02 Y      
4 01-701-1023 ATRIOVENTRICULAR BLOCK SECOND DEGREE 2012-08-26 2012-08-05 Y      
5 01-701-1023 ERYTHEMA                             2012-08-07 2012-08-05 Y      
cat("\nSample of non-TEAE records (started before first dose):\n")

Sample of non-TEAE records (started before first dose):
ae_with_teae %>%
  filter(TRTEMFL == "N") %>%
  select(USUBJID, AEDECOD, AESTDTC, RFSTDTC, TRTEMFL) %>%
  head(5)
# A tibble: 5 × 5
  USUBJID     AEDECOD             AESTDTC    RFSTDTC    TRTEMFL
  <chr>       <chr>               <chr>      <chr>      <chr>  
1 01-701-1111 ERYTHEMA            2012-09-02 2012-09-07 N      
2 01-701-1111 ERYTHEMA            2012-09-02 2012-09-07 N      
3 01-701-1111 LOCALISED INFECTION 2012-07-08 2012-09-07 N      
4 01-701-1111 PRURITUS            2012-09-02 2012-09-07 N      
5 01-701-1111 PRURITUS            2012-09-02 2012-09-07 N      
WarningPartial Dates Are Common

In real clinical data, you will frequently encounter partial dates like:

  • "2014-01" (month known, day unknown)
  • "2014" (only year known)
  • "" or NA (completely missing)

The lubridate::ymd() function will return NA for partial dates. In production, you’d implement date imputation rules as specified in the Statistics Analysis Plan (SAP). Common approaches:

  • Conservative: Impute to the latest possible date (e.g., first of the month)
  • Non-conservative: Impute to the earliest possible date
  • Rule-based: Use other available information to make the best guess

8 AE Duration Calculations

8.1 Computing Duration in Days

# Calculate AE duration
ae_duration <- ae %>%
  mutate(
    ae_start = ymd(AESTDTC),
    ae_end   = ymd(AEENDTC),
    
    # Duration = end - start + 1 (inclusive of start and end day)
    AEDUR = as.numeric(ae_end - ae_start) + 1,
    
    # Flag ongoing AEs (no end date)
    AEONGO = if_else(is.na(ae_end), "Y", "N")
  )

# Summary of durations
cat("AE Duration Summary (days):\n")
AE Duration Summary (days):
ae_duration %>%
  filter(!is.na(AEDUR)) %>%
  summarise(
    N = n(),
    Mean = round(mean(AEDUR), 1),
    Median = median(AEDUR),
    Min = min(AEDUR),
    Max = max(AEDUR),
    SD = round(sd(AEDUR), 1)
  )
# A tibble: 1 × 6
      N  Mean Median   Min   Max    SD
  <int> <dbl>  <dbl> <dbl> <dbl> <dbl>
1   714  23.8     11     1   444  40.2
# Ongoing AEs
cat("\nOngoing AEs (no end date):\n")

Ongoing AEs (no end date):
ae_duration %>%
  count(AEONGO, name = "Count") %>%
  mutate(Percent = round(100 * Count / sum(Count), 1))
# A tibble: 2 × 3
  AEONGO Count Percent
  <chr>  <int>   <dbl>
1 N        718    60.3
2 Y        473    39.7
NoteThe “+1” Rule

Notice we calculate duration as (end - start) + 1. This is the standard clinical convention:

  • An AE that starts on Day 5 and ends on Day 5 lasted 1 day (not 0)
  • An AE that starts on Day 5 and ends on Day 7 lasted 3 days (not 2)

This is the same “inclusive” counting rule used for study day calculations.


9 Causality Assessment

9.1 How Causality Is Determined

Causality (AEREL) indicates whether the AE is related to the study drug. Common categories:

AEREL Value Description
NOT RELATED No reasonable possibility of relationship
UNLIKELY Doubtful relationship
POSSIBLE Cannot rule out relationship
PROBABLE Likely related
DEFINITE Clearly related
# Causality distribution
cat("Causality (AEREL) distribution:\n")
Causality (AEREL) distribution:
ae %>%
  count(AEREL, name = "Count") %>%
  mutate(Percent = round(100 * Count / sum(Count), 1)) %>%
  arrange(desc(Count))
# A tibble: 5 × 3
  AEREL    Count Percent
  <chr>    <int>   <dbl>
1 PROBABLE   361    30.3
2 POSSIBLE   343    28.8
3 NONE       322    27  
4 REMOTE     161    13.5
5 <NA>         4     0.3

9.2 Deriving a Binary Relatedness Flag

For many analyses, we simplify causality into a binary flag:

# In ADaM, we often create a binary related flag
ae_related <- ae %>%
  mutate(
    # RELFL = "Y" if possibly, probably, or definitely related
    RELFL = case_when(
      AEREL %in% c("POSSIBLE", "PROBABLE", "DEFINITE") ~ "Y",
      AEREL %in% c("NOT RELATED", "UNLIKELY") ~ "N",
      # Handle variations in terminology
      grepl("RELAT", AEREL, ignore.case = TRUE) ~ "Y",
      TRUE ~ "N"
    )
  )

cat("Binary relatedness flag:\n")
Binary relatedness flag:
ae_related %>%
  count(AEREL, RELFL) %>%
  arrange(RELFL, desc(n))
# A tibble: 5 × 3
  AEREL    RELFL     n
  <chr>    <chr> <int>
1 NONE     N       322
2 REMOTE   N       161
3 <NA>     N         4
4 PROBABLE Y       361
5 POSSIBLE Y       343

10 Action Taken and Outcome

10.1 AEACN - Action Taken with Study Drug

The action taken in response to the AE is captured in AEACN:

ae %>%
  count(AEACN, name = "Count") %>%
  mutate(Percent = round(100 * Count / sum(Count), 1)) %>%
  arrange(desc(Count))
# A tibble: 1 × 3
  AEACN Count Percent
  <chr> <int>   <dbl>
1 <NA>   1191     100

10.2 AEOUT - Outcome of the AE

ae %>%
  count(AEOUT, name = "Count") %>%
  mutate(Percent = round(100 * Count / sum(Count), 1)) %>%
  arrange(desc(Count))
# A tibble: 3 × 3
  AEOUT                      Count Percent
  <chr>                      <int>   <dbl>
1 NOT RECOVERED/NOT RESOLVED   723    60.7
2 RECOVERED/RESOLVED           465    39  
3 FATAL                          3     0.3
TipControlled Terminology Matters

All of these variables - AEACN, AEOUT, AESEV, AESER - use CDISC Controlled Terminology. You can’t invent your own values. The allowed values are specified in the CDISC CT Package.

For example, valid AEOUT values are: - RECOVERED/RESOLVED - RECOVERING/RESOLVING - NOT RECOVERED/NOT RESOLVED - RECOVERED/RESOLVED WITH SEQUELAE - FATAL - UNKNOWN


11 Complete Example: Production AE Processing

Let’s put everything together into a production-quality AE processing pipeline:

# ---- Production AE Processing Pipeline ----

# Step 1: Start with raw AE data
data("ae", package = "pharmaversesdtm")
data("dm", package = "pharmaversesdtm")

# Step 2: Get reference dates
ref <- dm %>%
  select(STUDYID, USUBJID, RFSTDTC, RFENDTC) %>%
  mutate(
    ref_start = ymd(RFSTDTC),
    ref_end   = ymd(RFENDTC)
  )

# Step 3: Enhance AE data with all derivations
ae_production <- ae %>%
  # Join reference dates
  left_join(ref %>% select(USUBJID, ref_start, ref_end, RFSTDTC), by = "USUBJID") %>%
  mutate(
    # Parse dates
    ae_start = ymd(AESTDTC),
    ae_end   = ymd(AEENDTC),
    
    # ---- TEAE Flag ----
    TRTEMFL = case_when(
      !is.na(ae_start) & !is.na(ref_start) & ae_start >= ref_start ~ "Y",
      is.na(ae_start) & !is.na(ae_end) & !is.na(ref_start) & ae_end >= ref_start ~ "Y",
      !is.na(ae_start) & !is.na(ref_start) & ae_start < ref_start ~ "N",
      TRUE ~ NA_character_
    ),
    
    # ---- Study Days ----
    AESTDY = case_when(
      !is.na(ae_start) & !is.na(ref_start) & ae_start >= ref_start ~ 
        as.numeric(ae_start - ref_start) + 1,
      !is.na(ae_start) & !is.na(ref_start) & ae_start < ref_start ~ 
        as.numeric(ae_start - ref_start),
      TRUE ~ NA_real_
    ),
    AEENDY = case_when(
      !is.na(ae_end) & !is.na(ref_start) & ae_end >= ref_start ~ 
        as.numeric(ae_end - ref_start) + 1,
      !is.na(ae_end) & !is.na(ref_start) & ae_end < ref_start ~ 
        as.numeric(ae_end - ref_start),
      TRUE ~ NA_real_
    ),
    
    # ---- Duration ----
    AEDUR = if_else(!is.na(ae_start) & !is.na(ae_end),
                    as.numeric(ae_end - ae_start) + 1,
                    NA_real_),
    
    # ---- Ongoing Flag ----
    AEONGO = if_else(is.na(ae_end), "Y", "N"),
    
    # ---- Binary Relatedness ----
    RELFL = case_when(
      AEREL %in% c("POSSIBLE", "PROBABLE", "DEFINITE") ~ "Y",
      grepl("RELAT", AEREL, ignore.case = TRUE) ~ "Y",
      TRUE ~ "N"
    )
  ) %>%
  # Clean up helper columns
  select(-ae_start, -ae_end, -ref_start, -ref_end)

cat("Production AE dataset:\n")
Production AE dataset:
cat("Total AE records:", nrow(ae_production), "\n")
Total AE records: 1191 
cat("TEAEs:", sum(ae_production$TRTEMFL == "Y", na.rm = TRUE), "\n")
TEAEs: 1131 
cat("SAEs:", sum(ae_production$AESER == "Y", na.rm = TRUE), "\n")
SAEs: 3 
cat("Drug-related:", sum(ae_production$RELFL == "Y", na.rm = TRUE), "\n\n")
Drug-related: 704 
# Preview the enhanced dataset
ae_production %>%
  select(USUBJID, AEDECOD, AESEV, AESER, TRTEMFL, AESTDY, AEDUR, AEONGO, RELFL) %>%
  head(15)
# A tibble: 15 × 9
   USUBJID     AEDECOD             AESEV AESER TRTEMFL AESTDY AEDUR AEONGO RELFL
   <chr>       <chr>               <chr> <chr> <chr>    <dbl> <dbl> <chr>  <chr>
 1 01-701-1015 APPLICATION SITE E… MILD  N     Y            2    NA Y      Y    
 2 01-701-1015 APPLICATION SITE P… MILD  N     Y            2    NA Y      Y    
 3 01-701-1015 DIARRHOEA           MILD  N     Y            8     3 N      N    
 4 01-701-1023 ATRIOVENTRICULAR B… MILD  N     Y           22    NA Y      Y    
 5 01-701-1023 ERYTHEMA            MILD  N     Y            3    24 N      Y    
 6 01-701-1023 ERYTHEMA            MODE… N     Y            3    NA Y      Y    
 7 01-701-1023 ERYTHEMA            MILD  N     Y            3    24 N      Y    
 8 01-701-1028 APPLICATION SITE E… MILD  N     Y            3    NA Y      Y    
 9 01-701-1028 APPLICATION SITE P… MILD  N     Y           21    NA Y      Y    
10 01-701-1034 APPLICATION SITE P… MILD  N     Y           58    NA Y      Y    
11 01-701-1034 FATIGUE             MILD  N     Y          125    NA Y      Y    
12 01-701-1047 BUNDLE BRANCH BLOC… MILD  N     Y           27    NA Y      N    
13 01-701-1047 HIATUS HERNIA       MODE… N     Y            1     1 N      N    
14 01-701-1047 HIATUS HERNIA       MODE… N     Y            1     1 N      N    
15 01-701-1047 UPPER RESPIRATORY … MILD  N     Y           23    NA Y      N    

12 Key AE Counts for Safety Reporting

In clinical study reports, AE tables almost always include these counts:

# ---- Summary Table: AE Incidence ----
cat("=== AE Incidence Summary ===\n\n")
=== AE Incidence Summary ===
# Total subjects
n_total <- n_distinct(dm$USUBJID)
n_ae <- n_distinct(ae_production$USUBJID)

cat("Total subjects enrolled:", n_total, "\n")
Total subjects enrolled: 306 
cat("Subjects with any AE:", n_ae, 
    sprintf("(%.1f%%)", 100 * n_ae / n_total), "\n\n")
Subjects with any AE: 225 (73.5%) 
# TEAE summary
teae_data <- ae_production %>% filter(TRTEMFL == "Y")
n_teae <- n_distinct(teae_data$USUBJID)
cat("Subjects with any TEAE:", n_teae,
    sprintf("(%.1f%%)", 100 * n_teae / n_total), "\n")
Subjects with any TEAE: 218 (71.2%) 
# SAE summary
sae_data <- ae_production %>% filter(AESER == "Y")
n_sae <- n_distinct(sae_data$USUBJID)
cat("Subjects with any SAE:", n_sae,
    sprintf("(%.1f%%)", 100 * n_sae / n_total), "\n")
Subjects with any SAE: 3 (1.0%) 
# Drug-related TEAE
rel_teae <- ae_production %>% filter(TRTEMFL == "Y", RELFL == "Y")
n_rel <- n_distinct(rel_teae$USUBJID)
cat("Subjects with drug-related TEAE:", n_rel,
    sprintf("(%.1f%%)", 100 * n_rel / n_total), "\n")
Subjects with drug-related TEAE: 185 (60.5%) 
# TEAE by maximum severity
cat("\nTEAE by Maximum Severity per Subject:\n")

TEAE by Maximum Severity per Subject:
teae_data %>%
  mutate(SEV_NUM = case_when(
    AESEV == "MILD" ~ 1,
    AESEV == "MODERATE" ~ 2,
    AESEV == "SEVERE" ~ 3
  )) %>%
  group_by(USUBJID) %>%
  summarise(MAX_SEV = max(SEV_NUM, na.rm = TRUE), .groups = "drop") %>%
  mutate(MAX_AESEV = case_when(
    MAX_SEV == 1 ~ "MILD",
    MAX_SEV == 2 ~ "MODERATE",
    MAX_SEV == 3 ~ "SEVERE"
  )) %>%
  count(MAX_AESEV, name = "N_Subjects") %>%
  mutate(Percent = round(100 * N_Subjects / n_total, 1)) %>%
  arrange(match(MAX_AESEV, c("MILD", "MODERATE", "SEVERE")))
# A tibble: 3 × 3
  MAX_AESEV N_Subjects Percent
  <chr>          <int>   <dbl>
1 MILD              77    25.2
2 MODERATE         112    36.6
3 SEVERE            29     9.5

13 Preview: From AE to ADAE

The SDTM AE domain feeds into the ADaM ADAE dataset. Here’s a preview of the key mappings:

SDTM AE ADaM ADAE Description
AEDECOD AEDECOD Preferred term (carried forward)
AEBODSYS AEBODSYS Body system (carried forward)
AESEV AESEV Severity (carried forward)
AESTDTC → parsed ASTDT Analysis start date (numeric)
AEENDTC → parsed AENDT Analysis end date (numeric)
derived TRTEMFL Treatment-emergent flag
derived AESEQ_GR Worst event selection per subject/term
from DM TRT01A Actual treatment (from ADSL)
NoteLooking Ahead

In Week 3, when we build ADaM datasets with admiral, the derive_var_trtemfl() function will handle TEAE derivation automatically - but understanding the logic behind it (as we’ve done today) is essential for debugging and validation.


14 Deliverable Summary

Today you completed the following:

Task Status
Understood severity vs. toxicity grading ✓ Done
Explored all SAE criteria variables (AESER, AESDTH, etc.) ✓ Done
Derived SAE flag from sub-criteria using OR-logic ✓ Done
Derived treatment-emergent AE (TEAE) flag ✓ Done
Calculated AE duration and identified ongoing AEs ✓ Done
Analyzed causality, action taken, and outcome ✓ Done
Built a production AE processing pipeline ✓ Done
Generated safety summary counts ✓ Done

15 Key Takeaways

  1. Severity ≠ Seriousness - A mild AE can be serious; a severe AE may not be serious
  2. SAE criteria are additive - AESER = “Y” if ANY sub-criterion is “Y”
  3. TEAEs are the focus - Most safety analyses exclude pre-treatment AEs
  4. Duration uses the +1 rule - Inclusive of both start and end day
  5. Causality is simplified - Binary related/not-related flags are common in ADaM
  6. Controlled Terminology is mandatory - Use only CDISC-approved values

16 Resources

  • CDISC SDTM Implementation Guide - AE Domain - Official AE specification
  • MedDRA Terminology - Medical Dictionary for Regulatory Activities
  • CTCAE v5.0 - Common Terminology Criteria for Adverse Events
  • ICH E2A Guidelines - Clinical Safety Data Management
  • Admiral ADAE Vignette - Building ADAE with admiral

17 What’s Next?

In Day 11, we will focus on Disposition (DS) & Trial Design Domains:

  • Understanding the DS domain for screen failures, completers, early terminators
  • Trial Design domains: TA, TE, TV, TI, TS
  • Working with EPOCH and milestone variables
  • Subject flow and disposition summaries

 

30 Days of Pharmaverse  ·  Disclaimer  ·  Indraneel Chakraborty  ·  © 2026