30 Days of Pharmaverse
  • Week 1: SDTM Fundamentals
  • Week 2: Production SDTM
  • Week 3: ADaM Deep Dive
  • Week 4: Tables, Listings and Figures
  1. Day 4: Clinical Date Derivations with lubridate
  • Day 1: Environment Setup & First SDTM Code
  • Day 2: SDTM Domain Structure & Tidyverse Mastery
  • Day 3: Controlled Terminology & MedDRA Coding
  • Day 4: Clinical Date Derivations with lubridate
  • Day 5: Advanced Tidyverse: Pivoting & Joining
  • Day 6: Introduction to sdtm.oak
  • Day 7: Week 1 Capstone: End-to-End SDTM Script

On this page

  • 1 Learning Objectives
  • 2 Introduction
    • 2.1 Dates in CDISC (ISO 8601)
    • 2.2 The “No Day 0” Rule
    • 2.3 The Role of lubridate
  • 3 Package Installation & Loading
    • 3.1 Required Packages
    • 3.2 Load Packages
  • 4 Part 1: Basic Date Parsing
    • 4.1 Parsing Complete Dates
    • 4.2 Parsing Dates with Times
    • 4.3 Extracting Date Components
  • 5 Part 2: Calculating Study Day (--DY)
    • 5.1 The Formula
    • 5.2 Creating a Reusable Function
    • 5.3 Applying to Real Data: Derive AESTDY
  • 6 Part 3: Calculating Duration (--DUR)
    • 6.1 Duration in Days
    • 6.2 Duration Statistics
  • 7 Part 4: Handling Partial Dates
    • 7.1 Common Imputation Strategies
    • 7.2 Reusable Imputation Functions
    • 7.3 Testing the Imputation Functions
  • 8 Part 5: Complete Exercise - Derive AESTDY and AEENDY
    • 8.1 Summary Statistics
  • 9 🎯 Practice Exercise
    • 9.1 Your Turn: Apply to Exposure (EX) Domain
  • 10 Deliverable Summary
  • 11 Key Takeaways
  • 12 Resources
  • 13 What’s Next?

Day 4: Clinical Date Derivations with lubridate

Study Day, Duration, and Partial Date Imputation

← Back to Roadmap

1 Learning Objectives

By the end of Day 4, you will be able to:

  1. Parse ISO 8601 clinical dates using lubridate
  2. Calculate Study Days (--DY variables) adhering to CDISC rules (no Day 0)
  3. Calculate Durations (--DUR) between start and end dates
  4. Handle Partial Dates (e.g., “2023-05”) using imputation strategies
  5. Create reusable functions for date derivations

2 Introduction

2.1 Dates in CDISC (ISO 8601)

In clinical trials, dates and times are very important. CDISC requires that all dates and times follow the ISO 8601 standard format. This makes it easy for computers and people to read and compare dates. The format looks like this: YYYY-MM-DDThh:mm:ss.

Format Example Description
Complete date 2024-01-15 Full date
Date + time 2024-01-15T09:30 Date with time
Year-Month only 2024-01 Partial (missing day)
Year only 2024 Partial (month + day)

2.1.1 Why does this matter?

If everyone uses the same date format, it’s much easier to merge, analyze, and review data. It also prevents mistakes that can happen if dates are written in different ways (like 01/02/2024- is that January 2 or February 1?).

2.2 The “No Day 0” Rule

When calculating study days (like how many days since a subject started treatment), CDISC has a special rule: There is NO Day 0. The reference date (usually the first dose) is always Day 1. Days before the reference date are negative numbers (Day -1, Day -2, etc.).

  • Day 1 = Reference date (usually first dose)
  • Day -1 = The day before reference date
  • There is NO Day 0

2.2.1 Why is this important?

This rule keeps things clear and avoids confusion. If you see Day 1, you know it’s the first day of treatment. There’s never a Day 0 to worry about.

2.3 The Role of lubridate

Working with dates in R can be tricky, but the lubridate package (part of the tidyverse) makes it much easier. It helps you:

  • Parse (read) dates from character strings
  • Do date math (like adding days)
  • Extract parts of a date (year, month, day)

This is very helpful for clinical data, where you often need to calculate things like study day, duration, or handle partial dates.


3 Package Installation & Loading

3.1 Required Packages

Package Purpose
dplyr Data manipulation
lubridate Date/time manipulation
pharmaversesdtm Source data

3.2 Load Packages

library(dplyr)
library(lubridate)
library(pharmaversesdtm)

# Load the data we'll use
data("ae", package = "pharmaversesdtm")
data("dm", package = "pharmaversesdtm")

4 Part 1: Basic Date Parsing

First, let’s learn how to convert character strings (which SDTM dates always are) into actual Date objects in R.

4.1 Parsing Complete Dates

# Use ymd() for Year-Month-Day strings
date_str <- "2024-01-15"
date_obj <- ymd(date_str)

cat("Original string:", date_str, "\n")
Original string: 2024-01-15 
cat("Parsed date:", as.character(date_obj), "\n")
Parsed date: 2024-01-15 
cat("Class:", class(date_obj), "\n")
Class: Date 

4.2 Parsing Dates with Times

# ymd_hms() for full datetime
datetime_full <- "2024-01-15T14:30:00"
ymd_hms(datetime_full)
[1] "2024-01-15 14:30:00 UTC"
# ymd_hm() when seconds are missing
datetime_short <- "2024-01-15T14:30"
ymd_hm(datetime_short)
[1] "2024-01-15 14:30:00 UTC"

4.3 Extracting Date Components

sample_date <- ymd("2024-03-15")

cat("Year:", year(sample_date), "\n")
Year: 2024 
cat("Month:", month(sample_date), "\n")
Month: 3 
cat("Day:", day(sample_date), "\n")
Day: 15 
cat("Weekday:", wday(sample_date, label = TRUE), "\n")
Weekday: 6 

5 Part 2: Calculating Study Day (--DY)

The Study Day (--DY) indicates the number of days from the Reference Start Date (RFSTDTC).

5.1 The Formula

If event_date >= reference_date:
    Study Day = (event_date - reference_date) + 1

If event_date < reference_date:
    Study Day = (event_date - reference_date)

5.2 Creating a Reusable Function

This function can be reused across all your SDTM and ADaM programs.

#' Derive Study Day following CDISC "No Day 0" convention
#' 
#' @param event_date Date of the event
#' @param reference_date Reference start date (usually RFSTDTC)
#' @return Integer study day (positive = on/after ref, negative = before ref)
derive_study_day <- function(event_date, reference_date) {
  # Calculate raw difference in days
  diff_days <- as.numeric(event_date - reference_date)
  
  # Apply "No Day 0" rule
  study_day <- ifelse(diff_days >= 0, diff_days + 1, diff_days)
  
  return(as.integer(study_day))
}

# Test the function
test_ref <- ymd("2024-01-01")
test_dates <- c("2023-12-30", "2023-12-31", "2024-01-01", "2024-01-02", "2024-01-05")

data.frame(
  Event_Date = test_dates,
  Reference = "2024-01-01",
  Study_Day = derive_study_day(ymd(test_dates), test_ref)
)
  Event_Date  Reference Study_Day
1 2023-12-30 2024-01-01        -2
2 2023-12-31 2024-01-01        -1
3 2024-01-01 2024-01-01         1
4 2024-01-02 2024-01-01         2
5 2024-01-05 2024-01-01         5
NoteNotice the “No Day 0”

Look at the output above: Day -2, Day -1, Day 1, Day 2, Day 5. There is no Day 0!

5.3 Applying to Real Data: Derive AESTDY

# Merge RFSTDTC from DM into AE
ae_with_ref <- ae %>%
  select(USUBJID, AESEQ, AESTDTC, AEENDTC) %>%
  inner_join(
    dm %>% select(USUBJID, RFSTDTC),
    by = "USUBJID"
  ) %>%
  # Filter to complete dates only (length >= 10)
  filter(nchar(AESTDTC) >= 10 & nchar(RFSTDTC) >= 10)

# Apply our function
ae_study_days <- ae_with_ref %>%
  mutate(
    AE_DT = ymd(substring(AESTDTC, 1, 10)),
    REF_DT = ymd(substring(RFSTDTC, 1, 10)),
    AESTDY = derive_study_day(AE_DT, REF_DT)
  )

# Preview
ae_study_days %>%
  select(USUBJID, AESTDTC, RFSTDTC, AESTDY) %>%
  head(10)
# A tibble: 10 × 4
   USUBJID     AESTDTC    RFSTDTC    AESTDY
   <chr>       <chr>      <chr>       <int>
 1 01-701-1015 2014-01-03 2014-01-02      2
 2 01-701-1015 2014-01-03 2014-01-02      2
 3 01-701-1015 2014-01-09 2014-01-02      8
 4 01-701-1023 2012-08-26 2012-08-05     22
 5 01-701-1023 2012-08-07 2012-08-05      3
 6 01-701-1023 2012-08-07 2012-08-05      3
 7 01-701-1023 2012-08-07 2012-08-05      3
 8 01-701-1028 2013-07-21 2013-07-19      3
 9 01-701-1028 2013-08-08 2013-07-19     21
10 01-701-1034 2014-08-27 2014-07-01     58

6 Part 3: Calculating Duration (--DUR)

Duration is typically: \(EndDate - StartDate + 1\) (inclusive of both days).

6.1 Duration in Days

# Filter for records with both start and end dates
ae_dur <- ae %>%
  select(USUBJID, AESEQ, AESTDTC, AEENDTC) %>%
  filter(nchar(AESTDTC) >= 10 & nchar(AEENDTC) >= 10) %>%
  mutate(
    START_DT = ymd(substring(AESTDTC, 1, 10)),
    END_DT = ymd(substring(AEENDTC, 1, 10)),
    
    # Duration in days (inclusive)
    AEDUR = as.integer(END_DT - START_DT) + 1L
  )

ae_dur %>%
  select(USUBJID, AESTDTC, AEENDTC, AEDUR) %>%
  head(10)
# A tibble: 10 × 4
   USUBJID     AESTDTC    AEENDTC    AEDUR
   <chr>       <chr>      <chr>      <int>
 1 01-701-1015 2014-01-09 2014-01-11     3
 2 01-701-1023 2012-08-07 2012-08-30    24
 3 01-701-1023 2012-08-07 2012-08-30    24
 4 01-701-1047 2013-02-12 2013-02-12     1
 5 01-701-1047 2013-02-12 2013-02-12     1
 6 01-701-1097 2014-02-20 2014-02-22     3
 7 01-701-1097 2014-04-19 2014-04-22     4
 8 01-701-1097 2014-04-19 2014-04-22     4
 9 01-701-1097 2014-02-20 2014-02-20     1
10 01-701-1097 2014-03-21 2014-03-21     1

6.2 Duration Statistics

ae_dur %>%
  summarise(
    N = n(),
    Mean_Duration = round(mean(AEDUR, na.rm = TRUE), 1),
    Median_Duration = median(AEDUR, na.rm = TRUE),
    Min = min(AEDUR, na.rm = TRUE),
    Max = max(AEDUR, na.rm = TRUE)
  )
# A tibble: 1 × 5
      N Mean_Duration Median_Duration   Min   Max
  <int>         <dbl>           <dbl> <int> <int>
1   714          23.8              11     1   444

7 Part 4: Handling Partial Dates

In clinical data, dates are often partial (e.g., “2023-06” or just “2023”). We cannot calculate durations or study days without imputing these to full dates.

7.1 Common Imputation Strategies

Date Type Missing Parts Start Date Imputation End Date Imputation
Year-Month Day 01 (first) Last day of month
Year only Month + Day 01-01 12-31

7.2 Reusable Imputation Functions

#' Impute partial START dates (conservative: use earliest possible date)
#' 
#' @param date_str Character date string in ISO 8601 format
#' @return Character string with complete date
impute_start_date <- function(date_str) {
  dplyr::case_when(
    is.na(date_str) | date_str == "" ~ NA_character_,
    # Only Year (YYYY) -> YYYY-01-01
    nchar(date_str) == 4 ~ paste0(date_str, "-01-01"),
    # Year-Month (YYYY-MM) -> YYYY-MM-01
    nchar(date_str) == 7 ~ paste0(date_str, "-01"),
    # Full date -> Keep first 10 characters
    TRUE ~ substring(date_str, 1, 10)
  )
}

#' Impute partial END dates (conservative: use latest possible date)
#' 
#' @param date_str Character date string in ISO 8601 format
#' @return Character string with complete date
impute_end_date <- function(date_str) {
  dplyr::case_when(
    is.na(date_str) | date_str == "" ~ NA_character_,
    # Only Year (YYYY) -> YYYY-12-31
    nchar(date_str) == 4 ~ paste0(date_str, "-12-31"),
    # Year-Month (YYYY-MM) -> Last day of that month
    nchar(date_str) == 7 ~ {
      # Parse as first of month, then get last day
      first_of_month <- ymd(paste0(date_str, "-01"))
      last_of_month <- ceiling_date(first_of_month, "month") - days(1)
      as.character(last_of_month)
    },
    # Full date -> Keep first 10 characters
    TRUE ~ substring(date_str, 1, 10)
  )
}

7.3 Testing the Imputation Functions

# Test cases with various partial dates
test_dates <- c("2023-05-12", "2023-05", "2023", "2023-02", NA)

data.frame(
  Original = test_dates,
  Imputed_Start = impute_start_date(test_dates),
  Imputed_End = impute_end_date(test_dates)
)
    Original Imputed_Start Imputed_End
1 2023-05-12    2023-05-12  2023-05-12
2    2023-05    2023-05-01  2023-05-31
3       2023    2023-01-01  2023-12-31
4    2023-02    2023-02-01  2023-02-28
5       <NA>          <NA>        <NA>
TipWhy Different Imputation Rules?
  • For start dates, we impute to the earliest possible date to be conservative about when something started.
  • For end dates, we impute to the latest possible date to be conservative about when something ended.
  • This approach ensures we don’t underestimate durations or miss events.

8 Part 5: Complete Exercise - Derive AESTDY and AEENDY

Let’s bring everything together with a complete derivation.

# Full AE date derivation workflow
ae_dates_complete <- ae %>%
  select(USUBJID, AESEQ, AETERM, AESTDTC, AEENDTC) %>%
  # Join reference date
  inner_join(
    dm %>% select(USUBJID, RFSTDTC),
    by = "USUBJID"
  ) %>%
  mutate(
    # Impute partial dates
    AESTDTC_IMP = impute_start_date(AESTDTC),
    AEENDTC_IMP = impute_end_date(AEENDTC),
    RFSTDTC_IMP = impute_start_date(RFSTDTC),
    
    # Create imputation flags (for traceability)
    AESTDTF = ifelse(nchar(AESTDTC) < 10, "D", NA_character_),
    AEENDTF = ifelse(nchar(AEENDTC) < 10, "D", NA_character_),
    
    # Parse dates
    ASTDT = ymd(AESTDTC_IMP),
    AENDT = ymd(AEENDTC_IMP),
    RFSTDT = ymd(RFSTDTC_IMP),
    
    # Derive study days
    AESTDY = derive_study_day(ASTDT, RFSTDT),
    AEENDY = derive_study_day(AENDT, RFSTDT),
    
    # Derive duration
    AEDUR = ifelse(!is.na(ASTDT) & !is.na(AENDT),
                   as.integer(AENDT - ASTDT) + 1L,
                   NA_integer_)
  )

# Preview key results
ae_dates_complete %>%
  select(USUBJID, AETERM, AESTDY, AEENDY, AEDUR, AESTDTF) %>%
  head(15)
# A tibble: 15 × 6
   USUBJID     AETERM                               AESTDY AEENDY AEDUR AESTDTF
   <chr>       <chr>                                 <int>  <int> <int> <chr>  
 1 01-701-1015 APPLICATION SITE ERYTHEMA                 2     NA    NA <NA>   
 2 01-701-1015 APPLICATION SITE PRURITUS                 2     NA    NA <NA>   
 3 01-701-1015 DIARRHOEA                                 8     10     3 <NA>   
 4 01-701-1023 ATRIOVENTRICULAR BLOCK SECOND DEGREE     22     NA    NA <NA>   
 5 01-701-1023 ERYTHEMA                                  3     26    24 <NA>   
 6 01-701-1023 ERYTHEMA                                  3     NA    NA <NA>   
 7 01-701-1023 ERYTHEMA                                  3     26    24 <NA>   
 8 01-701-1028 APPLICATION SITE ERYTHEMA                 3     NA    NA <NA>   
 9 01-701-1028 APPLICATION SITE PRURITUS                21     NA    NA <NA>   
10 01-701-1034 APPLICATION SITE PRURITUS                58     NA    NA <NA>   
11 01-701-1034 FATIGUE                                 125     NA    NA <NA>   
12 01-701-1047 BUNDLE BRANCH BLOCK LEFT                 27     NA    NA <NA>   
13 01-701-1047 HIATUS HERNIA                             1      1     1 <NA>   
14 01-701-1047 HIATUS HERNIA                             1      1     1 <NA>   
15 01-701-1047 UPPER RESPIRATORY TRACT INFECTION        23     NA    NA <NA>   

8.1 Summary Statistics

ae_dates_complete %>%
  summarise(
    Total_Records = n(),
    Records_with_AESTDY = sum(!is.na(AESTDY)),
    Records_with_AEENDY = sum(!is.na(AEENDY)),
    Imputed_Start = sum(!is.na(AESTDTF)),
    Mean_Duration = round(mean(AEDUR, na.rm = TRUE), 1)
  )
# A tibble: 1 × 5
  Total_Records Records_with_AESTDY Records_with_AEENDY Imputed_Start
          <int>               <int>               <int>         <int>
1          1191                1191                 718            26
# ℹ 1 more variable: Mean_Duration <dbl>

9 🎯 Practice Exercise

9.1 Your Turn: Apply to Exposure (EX) Domain

Using the functions we created, derive study days for the Exposure domain.

data("ex", package = "pharmaversesdtm")

# TODO: Complete the following
ex_dates <- ex %>%
  select(USUBJID, EXSEQ, EXSTDTC, EXENDTC) %>%
  inner_join(
    dm %>% select(USUBJID, RFSTDTC),
    by = "USUBJID"
  ) %>%
  mutate(
    # Step 1: Impute dates
    # EXSTDTC_IMP = ???
    # EXENDTC_IMP = ???
    
    # Step 2: Parse dates
    # EXSTDT = ???
    # EXENDT = ???
    
    # Step 3: Derive study days
    # EXSTDY = ???
    # EXENDY = ???
  )

# Check your results
head(ex_dates)

10 Deliverable Summary

Today you completed the following:

Task Status
Parsed ISO 8601 dates with lubridate ✓ Done
Created derive_study_day() function ✓ Done
Created impute_start_date() function ✓ Done
Created impute_end_date() function ✓ Done
Derived AESTDY, AEENDY, AEDUR from AE ✓ Done

11 Key Takeaways

  1. ISO 8601 (YYYY-MM-DD) is the standard format.
  2. No Day 0: Day -1 is immediately followed by Day 1.
  3. Reusable Functions: Create functions for derivations you’ll use repeatedly.
  4. Imputation Flags: Use --DTF variables to indicate imputed dates.
  5. Documentation: Comment your logic for traceability.

12 Resources

  • lubridate Cheatsheet
  • CDISC SDTM Date Conventions
  • admiral derive_vars_dt() Vignette

13 What’s Next?

In Day 5, we will move to Advanced Tidyverse operations:

  • Reshaping data with pivot_wider() and pivot_longer()
  • Creating analysis-ready datasets
  • Performing complex joins for QC

 

30 Days of Pharmaverse  ·  Disclaimer  ·  Indraneel Chakraborty  ·  © 2026