Day 12: Data Cuts with datacutr

Applying Clinical Cutoff Dates for Interim & Final Analyses

1 Learning Objectives

By the end of Day 12, you will be able to:

Explain what a clinical data cut is and why it’s essential for submissions
Understand the difference between patient-level and record-level data cuts
Use the datacutr package to apply cutoff dates across multiple SDTM domains
Handle edge cases: ongoing AEs, partial dates, and records at the cutoff boundary
Build a reproducible data cut pipeline that works across DM, AE, LB, and VS domains

2 What is a Data Cut?

2.1 The Problem

Clinical trials collect data continuously over months or years. But at some point, you need to freeze the data for analysis - this is the data cutoff.

Timeline of a clinical trial:
────────────────────────────────────────────────────────────────────────
  First Subject In          Interim Analysis         Final Analysis
       │                         │                        │
       ▼                         ▼                        ▼
  ────────────────────────────────────────────────────────────────────
  ██████████████████████████│ DATA CUT │████████████████│ DATA CUT │
                             (Cutoff 1)                  (Cutoff 2)
                                 │                           │
                                 ▼                           ▼
                        Only data BEFORE           Only data BEFORE
                        this date included         this date included

Why Data Cuts Matter

Regulatory submissions require data frozen at a specific cutoff date
Interim analyses (e.g., for a Data Safety Monitoring Board) need a clean snapshot
Data integrity - analyses must be reproducible based on the same data
Blinding - in ongoing studies, only data up to the cutoff should be analyzed

2.2 Types of Data Cuts

Cut Type	What It Means	Example
Patient-level cut	Exclude entire subjects enrolled after cutoff	Subject consented on 2024-07-01, cutoff is 2024-06-15 → exclude entire subject
Record-level cut	Keep the subject but remove records after cutoff	Subject enrolled 2024-01-01, AE started 2024-07-01, cutoff 2024-06-15 → keep subject, remove this AE
Hybrid	Patient-level for some domains, record-level for others	Common in practice

3 Package Installation & Loading

if (!requireNamespace("dplyr", quietly = TRUE)) suppressMessages(install.packages("dplyr"))
if (!requireNamespace("lubridate", quietly = TRUE)) suppressMessages(install.packages("lubridate"))
if (!requireNamespace("tidyr", quietly = TRUE)) suppressMessages(install.packages("tidyr"))
if (!requireNamespace("stringr", quietly = TRUE)) suppressMessages(install.packages("stringr"))

# datacutr is the Pharmaverse package for data cuts
if (!requireNamespace("datacutr", quietly = TRUE)) suppressMessages(install.packages("datacutr"))

library(dplyr)
library(lubridate)
library(tidyr)
library(stringr)
library(datacutr)

4 Understanding datacutr

4.1 What is datacutr?

datacutr is a Pharmaverse package designed to apply clinical data cutoff dates to SDTM datasets in a standardized, reproducible way. It handles the complex logic of:

Identifying which date variable to use for each domain
Applying patient-level vs. record-level cuts
Handling special cases (ongoing events, missing dates)

4.2 How datacutr Works

The package uses a datacut metadata approach:

You define a patient-level cutoff date for each subject
You specify which date variable to cut on for each domain
datacutr applies the cut rules and returns clean datasets

# What functions does datacutr provide?
cat("Key datacutr functions:\n\n")

Key datacutr functions:

cat("1. create_dcut()        - Create the patient-level datacut dataset\n")

1. create_dcut()        - Create the patient-level datacut dataset

cat("2. date_cut()           - Flag records for removal (adds DCUT_TEMP_REMOVE)\n")

2. date_cut()           - Flag records for removal (adds DCUT_TEMP_REMOVE)

cat("3. apply_cut()          - Remove flagged records\n")

3. apply_cut()          - Remove flagged records

cat("4. special_dm_cut()     - Special handling for DM domain\n")

4. special_dm_cut()     - Special handling for DM domain

cat("5. impute_sdtm()        - Impute partial dates before cutting\n")

5. impute_sdtm()        - Impute partial dates before cutting

cat("6. impute_dcutdtc()     - Impute partial datacut dates\n")

6. impute_dcutdtc()     - Impute partial datacut dates

cat("7. pt_cut()             - Patient-level cut\n")

7. pt_cut()             - Patient-level cut

cat("8. process_cut()        - Wrapper function for entire workflow\n")

8. process_cut()        - Wrapper function for entire workflow

5 Setting Up Sample Data for Data Cuts

Let’s create a realistic multi-domain SDTM dataset to practice data cuts on:

set.seed(42)

# ---- Study Parameters ----
STUDYID <- "CDISC01"
cutoff_date <- as.Date("2024-06-15")

n_subjects <- 10
subjects <- paste0("CDISC01-001-", sprintf("%03d", 1:n_subjects))

# ---- DM Domain ----
dm_sample <- tibble(
  STUDYID  = STUDYID,
  DOMAIN   = "DM",
  USUBJID  = subjects,
  RFSTDTC  = as.character(as.Date("2024-01-01") + sample(0:120, n_subjects, replace = TRUE)),
  RFENDTC  = NA_character_,
  DTHDTC   = NA_character_,  # Required by special_dm_cut()
  ARM      = sample(c("Placebo", "Active 10mg", "Active 20mg"), n_subjects, replace = TRUE),
  SEX      = sample(c("M", "F"), n_subjects, replace = TRUE),
  AGE      = sample(30:70, n_subjects, replace = TRUE),
  AGEU     = "YEARS"
) %>%
  mutate(
    # Some subjects started AFTER the cutoff (to test patient-level cuts)
    RFSTDTC = case_when(
      row_number() == 9 ~ as.character(cutoff_date + 5),
      row_number() == 10 ~ as.character(cutoff_date + 15),
      TRUE ~ RFSTDTC
    )
  )

cat("DM domain:", nrow(dm_sample), "subjects\n")

DM domain: 10 subjects

cat("Cutoff date:", as.character(cutoff_date), "\n\n")

Cutoff date: 2024-06-15

# Which subjects started after cutoff?
dm_sample %>%
  mutate(AFTER_CUTOFF = ymd(RFSTDTC) > cutoff_date) %>%
  select(USUBJID, RFSTDTC, ARM, AFTER_CUTOFF)

# A tibble: 10 × 4
   USUBJID         RFSTDTC    ARM         AFTER_CUTOFF
   <chr>           <chr>      <chr>       <lgl>       
 1 CDISC01-001-001 2024-02-18 Active 20mg FALSE       
 2 CDISC01-001-002 2024-04-10 Placebo     FALSE       
 3 CDISC01-001-003 2024-03-05 Placebo     FALSE       
 4 CDISC01-001-004 2024-01-25 Active 10mg FALSE       
 5 CDISC01-001-005 2024-03-14 Active 10mg FALSE       
 6 CDISC01-001-006 2024-04-09 Active 10mg FALSE       
 7 CDISC01-001-007 2024-01-18 Active 20mg FALSE       
 8 CDISC01-001-008 2024-02-18 Active 20mg FALSE       
 9 CDISC01-001-009 2024-06-20 Placebo     TRUE        
10 CDISC01-001-010 2024-06-30 Placebo     TRUE

# ---- AE Domain ----
ae_sample <- tibble(
  STUDYID = STUDYID,
  DOMAIN  = "AE",
  USUBJID = sample(subjects[1:8], 20, replace = TRUE),
  AETERM  = sample(c("Headache", "Nausea", "Fatigue", "Dizziness", "Rash"), 20, replace = TRUE),
  AESEV   = sample(c("MILD", "MODERATE", "SEVERE"), 20, replace = TRUE, prob = c(0.6, 0.3, 0.1)),
  AESER   = sample(c("Y", "N"), 20, replace = TRUE, prob = c(0.1, 0.9))
) %>%
  mutate(AEDECOD = AETERM)

# Add dates - some AEs span across or start after cutoff
ae_sample <- ae_sample %>%
  left_join(dm_sample %>% select(USUBJID, RFSTDTC), by = "USUBJID") %>%
  rowwise() %>%
  mutate(
    start_offset = sample(1:180, 1),
    ae_start = ymd(RFSTDTC) + start_offset,
    AESTDTC = as.character(ae_start),
    # Some AEs are ongoing (no end date), some end after cutoff
    ae_end_offset = sample(c(NA, 3, 7, 14, 30, 60, 90), 1),
    AEENDTC = if_else(!is.na(ae_end_offset),
                      as.character(ae_start + ae_end_offset),
                      NA_character_)
  ) %>%
  ungroup() %>%
  group_by(USUBJID) %>%
  mutate(AESEQ = row_number()) %>%
  ungroup() %>%
  select(STUDYID, DOMAIN, USUBJID, AESEQ, AETERM, AEDECOD, AESEV, AESER,
         AESTDTC, AEENDTC)

cat("AE domain:", nrow(ae_sample), "records\n\n")

AE domain: 20 records

# Show which AEs cross the cutoff boundary
ae_sample %>%
  mutate(
    START_VS_CUT = case_when(
      ymd(AESTDTC) > cutoff_date ~ "AFTER cutoff",
      ymd(AESTDTC) <= cutoff_date ~ "BEFORE cutoff",
      TRUE ~ "UNKNOWN"
    ),
    END_VS_CUT = case_when(
      is.na(AEENDTC) ~ "ONGOING",
      ymd(AEENDTC) > cutoff_date ~ "AFTER cutoff",
      ymd(AEENDTC) <= cutoff_date ~ "BEFORE cutoff",
      TRUE ~ "UNKNOWN"
    )
  ) %>%
  count(START_VS_CUT, END_VS_CUT, name = "Count")

# A tibble: 5 × 3
  START_VS_CUT  END_VS_CUT    Count
  <chr>         <chr>         <int>
1 AFTER cutoff  AFTER cutoff      9
2 AFTER cutoff  ONGOING           1
3 BEFORE cutoff AFTER cutoff      4
4 BEFORE cutoff BEFORE cutoff     5
5 BEFORE cutoff ONGOING           1

# ---- LB Domain (simplified) ----
visits <- tibble(
  VISITNUM = c(1, 2, 3, 4, 5),
  VISIT    = c("BASELINE", "WEEK 4", "WEEK 8", "WEEK 12", "WEEK 16")
)

lb_sample <- expand_grid(
  USUBJID = subjects[1:8],
  tibble(
    LBTESTCD = c("ALT", "CREAT", "GLUC"),
    LBTEST   = c("Alanine Aminotransferase", "Creatinine", "Glucose")
  ),
  visits
) %>%
  left_join(dm_sample %>% select(USUBJID, RFSTDTC), by = "USUBJID") %>%
  mutate(
    STUDYID  = STUDYID,
    DOMAIN   = "LB",
    visit_offset = case_when(
      VISITNUM == 1 ~ 0,
      VISITNUM == 2 ~ 28,
      VISITNUM == 3 ~ 56,
      VISITNUM == 4 ~ 84,
      VISITNUM == 5 ~ 112
    ),
    LBDTC = as.character(ymd(RFSTDTC) + visit_offset),
    LBSTRESN = round(rnorm(n(), 50, 15), 1),
    LBSTRESU = "U/L"
  ) %>%
  group_by(USUBJID) %>%
  mutate(LBSEQ = row_number()) %>%
  ungroup() %>%
  select(STUDYID, DOMAIN, USUBJID, LBSEQ, LBTESTCD, LBTEST,
         LBSTRESN, LBSTRESU, VISITNUM, VISIT, LBDTC)

cat("LB domain:", nrow(lb_sample), "records\n")

LB domain: 120 records

# How many LB records are after cutoff?
cat("LB records after cutoff:", 
    sum(ymd(lb_sample$LBDTC) > cutoff_date, na.rm = TRUE), "\n")

LB records after cutoff: 18

6 Applying Data Cuts Manually (Understanding the Logic)

Before using datacutr, let’s understand the logic by implementing cuts manually:

6.1 Patient-Level Cut

# ---- Patient-Level Cut ----
# Remove subjects whose first dose date is AFTER the cutoff

# Step 1: Identify subjects to keep
subjects_to_keep <- dm_sample %>%
  filter(ymd(RFSTDTC) <= cutoff_date) %>%
  pull(USUBJID)

cat("Patient-level cut:\n")

Patient-level cut:

cat("  Subjects before cut:", n_distinct(dm_sample$USUBJID), "\n")

  Subjects before cut: 10

cat("  Subjects after cut:", length(subjects_to_keep), "\n")

  Subjects after cut: 8

cat("  Subjects removed:", n_distinct(dm_sample$USUBJID) - length(subjects_to_keep), "\n\n")

  Subjects removed: 2

# Step 2: Apply to all domains
dm_cut <- dm_sample %>% filter(USUBJID %in% subjects_to_keep)
ae_patient_cut <- ae_sample %>% filter(USUBJID %in% subjects_to_keep)
lb_patient_cut <- lb_sample %>% filter(USUBJID %in% subjects_to_keep)

cat("Records after patient-level cut:\n")

Records after patient-level cut:

cat("  DM:", nrow(dm_cut), "\n")

  DM: 8

cat("  AE:", nrow(ae_patient_cut), "(from", nrow(ae_sample), ")\n")

  AE: 20 (from 20 )

cat("  LB:", nrow(lb_patient_cut), "(from", nrow(lb_sample), ")\n")

  LB: 120 (from 120 )

6.2 Record-Level Cut

# ---- Record-Level Cut ----
# Keep subjects, but remove individual records after cutoff

# AE: Remove AEs that STARTED after cutoff
ae_record_cut <- ae_patient_cut %>%
  filter(ymd(AESTDTC) <= cutoff_date)

cat("AE Record-level cut:\n")

AE Record-level cut:

cat("  Records before:", nrow(ae_patient_cut), "\n")

  Records before: 20

cat("  Records after:", nrow(ae_record_cut), "\n")

  Records after: 10

cat("  Records removed:", nrow(ae_patient_cut) - nrow(ae_record_cut), "\n\n")

  Records removed: 10

# LB: Remove lab results collected after cutoff
lb_record_cut <- lb_patient_cut %>%
  filter(ymd(LBDTC) <= cutoff_date)

cat("LB Record-level cut:\n")

LB Record-level cut:

cat("  Records before:", nrow(lb_patient_cut), "\n")

  Records before: 120

cat("  Records after:", nrow(lb_record_cut), "\n")

  Records after: 102

cat("  Records removed:", nrow(lb_patient_cut) - nrow(lb_record_cut), "\n")

  Records removed: 18

What About Ongoing AEs at Cutoff?

An AE that started before the cutoff but has no end date (ongoing) needs special handling:

Keep the AE record (it started before cutoff)
Set AEENDTC to the cutoff date (or leave it blank depending on convention)
Flag it as “ongoing at data cutoff”

This is where datacutr really helps - it handles these edge cases automatically.

6.3 Handling Ongoing AEs at Cutoff

# Handle AEs that span the cutoff boundary
ae_final_cut <- ae_patient_cut %>%
  mutate(
    ae_start = ymd(AESTDTC),
    ae_end   = ymd(AEENDTC)
  ) %>%
  # Keep AEs that started on or before cutoff
  filter(ae_start <= cutoff_date) %>%
  mutate(
    # If AE ends AFTER cutoff, truncate the end date to cutoff
    AEENDTC_ORIG = AEENDTC,
    AEENDTC = case_when(
      # AE ends after cutoff → truncate
      !is.na(ae_end) & ae_end > cutoff_date ~ as.character(cutoff_date),
      # AE is ongoing → leave as is (still ongoing at cutoff)
      is.na(ae_end) ~ NA_character_,
      # AE ended before cutoff → keep original
      TRUE ~ AEENDTC
    ),
    # Flag for records modified by the cut
    DCUT_TEMP_REMOVE = "N",
    DCUT_TEMP_DTHCHANGE = if_else(
      !is.na(AEENDTC_ORIG) & AEENDTC != AEENDTC_ORIG,
      "Y", "N"
    )
  ) %>%
  select(-ae_start, -ae_end, -AEENDTC_ORIG)

cat("AE handling at cutoff boundary:\n")

AE handling at cutoff boundary:

cat("  Records kept:", nrow(ae_final_cut), "\n")

  Records kept: 10

cat("  Records with modified end dates:", 
    sum(ae_final_cut$DCUT_TEMP_DTHCHANGE == "Y", na.rm = TRUE), "\n")

  Records with modified end dates: 4

7 Using datacutr

Now let’s use datacutr to do this properly:

7.1 Step 1: Create the Datacut Dataset

# The datacut dataset defines per-patient cutoff dates
# In simple cases, all subjects have the same cutoff date
# In complex cases (e.g., rolling enrollment), dates may differ

# For special_dm_cut, we need RFSTDTC and DTHDTC in the dcut dataset
dcut <- dm_sample %>%
  select(USUBJID, RFSTDTC, DTHDTC) %>%
  mutate(
    DCUTDTC = as.character(cutoff_date),  # Same cutoff for all
    DCUTDTM = as.POSIXct(paste0(cutoff_date, " 23:59:59"))
  )

cat("Datacut metadata:\n")

Datacut metadata:

print(dcut)

# A tibble: 10 × 5
   USUBJID         RFSTDTC    DTHDTC DCUTDTC    DCUTDTM            
   <chr>           <chr>      <chr>  <chr>      <dttm>             
 1 CDISC01-001-001 2024-02-18 <NA>   2024-06-15 2024-06-15 23:59:59
 2 CDISC01-001-002 2024-04-10 <NA>   2024-06-15 2024-06-15 23:59:59
 3 CDISC01-001-003 2024-03-05 <NA>   2024-06-15 2024-06-15 23:59:59
 4 CDISC01-001-004 2024-01-25 <NA>   2024-06-15 2024-06-15 23:59:59
 5 CDISC01-001-005 2024-03-14 <NA>   2024-06-15 2024-06-15 23:59:59
 6 CDISC01-001-006 2024-04-09 <NA>   2024-06-15 2024-06-15 23:59:59
 7 CDISC01-001-007 2024-01-18 <NA>   2024-06-15 2024-06-15 23:59:59
 8 CDISC01-001-008 2024-02-18 <NA>   2024-06-15 2024-06-15 23:59:59
 9 CDISC01-001-009 2024-06-20 <NA>   2024-06-15 2024-06-15 23:59:59
10 CDISC01-001-010 2024-06-30 <NA>   2024-06-15 2024-06-15 23:59:59

7.2 Step 2: Define the Cut Strategy

# datacutr uses a two-step process:
# 1. date_cut() flags records (adds DCUT_TEMP_REMOVE)
# 2. apply_cut() removes flagged records

# For DM: special_dm_cut() handles patient-level cut based on RFSTDTC
cat("Cut strategy:\n")

Cut strategy:

cat("  DM: Patient-level cut on RFSTDTC (first dose date)\n")

  DM: Patient-level cut on RFSTDTC (first dose date)

cat("  AE: Record-level cut on AESTDTC (AE start date)\n")

  AE: Record-level cut on AESTDTC (AE start date)

cat("  LB: Record-level cut on LBDTC (lab collection date)\n\n")

  LB: Record-level cut on LBDTC (lab collection date)

7.3 Step 3: Apply the Cut Using datacutr Functions

# ---- Step 3a: DM Domain ----
# special_dm_cut handles the patient-level cut for DM
# It compares RFSTDTC (from dcut) against DCUTDTC
dm_after_cut <- special_dm_cut(
  dataset_dm = dm_sample,
  dataset_cut = dcut
)

[1] "At least 1 patient with missing datacut date, all records will be kept."

cat("DM after cut:", nrow(dm_after_cut), "subjects (from", nrow(dm_sample), ")\n")

DM after cut: 10 subjects (from 10 )

# ---- Step 3b: AE Domain ----
# date_cut handles record-level cuts and adds DCUT_TEMP_REMOVE flag
# First, we need to know which subjects passed the DM cut
subjects_after_dm_cut <- dm_after_cut$USUBJID

ae_for_cut <- ae_sample %>%
  filter(USUBJID %in% subjects_after_dm_cut)

ae_after_cut_temp <- date_cut(
  dataset_sdtm = ae_for_cut,
  sdtm_date_var = AESTDTC,
  dataset_cut = dcut,
  cut_var = DCUTDTM
)

[1] "At least 1 patient with missing datacut date, all records will be kept."

# Apply the cut using apply_cut which removes flagged records
ae_after_cut <- apply_cut(
  dsin = ae_after_cut_temp,
  dcutvar = DCUT_TEMP_REMOVE,
  dthchangevar = DCUT_TEMP_DTHCHANGE
)

cat("AE after cut:", nrow(ae_after_cut), "records (from", nrow(ae_sample), ")\n")

AE after cut: 10 records (from 20 )

# ---- Step 3c: LB Domain ----
lb_for_cut <- lb_sample %>%
  filter(USUBJID %in% subjects_after_dm_cut)

lb_after_cut_temp <- date_cut(
  dataset_sdtm = lb_for_cut,
  sdtm_date_var = LBDTC,
  dataset_cut = dcut,
  cut_var = DCUTDTM
)

[1] "At least 1 patient with missing datacut date, all records will be kept."

# Apply the cut using apply_cut which removes flagged records
lb_after_cut <- apply_cut(
  dsin = lb_after_cut_temp,
  dcutvar = DCUT_TEMP_REMOVE,
  dthchangevar = DCUT_TEMP_DTHCHANGE
)

cat("LB after cut:", nrow(lb_after_cut), "records (from", nrow(lb_sample), ")\n")

LB after cut: 102 records (from 120 )

7.4 Step 4: Summary of Data Cut

cat("=== DATA CUT SUMMARY ===\n")

=== DATA CUT SUMMARY ===

cat("Cutoff date:", as.character(cutoff_date), "\n\n")

Cutoff date: 2024-06-15

summary_table <- tibble::tribble(
  ~Domain, ~Before_Cut, ~After_Cut, ~Removed,
  "DM", nrow(dm_sample), nrow(dm_after_cut), nrow(dm_sample) - nrow(dm_after_cut),
  "AE", nrow(ae_sample), nrow(ae_after_cut), nrow(ae_sample) - nrow(ae_after_cut),
  "LB", nrow(lb_sample), nrow(lb_after_cut), nrow(lb_sample) - nrow(lb_after_cut)
)

print(summary_table)

# A tibble: 3 × 4
  Domain Before_Cut After_Cut Removed
  <chr>       <int>     <int>   <int>
1 DM             10        10       0
2 AE             20        10      10
3 LB            120       102      18

8 Real-World Data Cut Considerations

8.1 Partial Dates

# In real data, you'll encounter partial dates
cat("Common partial date scenarios:\n\n")

Common partial date scenarios:

partial_examples <- tibble::tribble(
  ~AESTDTC,       ~Problem,                          ~Solution,
  "2024-06",       "Day missing",                    "Impute to 1st or last of month",
  "2024",          "Month and day missing",          "Impute based on SAP rules",
  "",              "Completely missing",             "Flag for medical review",
  "2024-06-15",    "Complete date",                  "No imputation needed",
  "2024-06-UN",    "Day unknown (UN notation)",      "Impute per convention"
)

print(partial_examples)

# A tibble: 5 × 3
  AESTDTC      Problem                   Solution                      
  <chr>        <chr>                     <chr>                         
1 "2024-06"    Day missing               Impute to 1st or last of month
2 "2024"       Month and day missing     Impute based on SAP rules     
3 ""           Completely missing        Flag for medical review       
4 "2024-06-15" Complete date             No imputation needed          
5 "2024-06-UN" Day unknown (UN notation) Impute per convention

datacutr and Partial Dates

The datacutr::impute_sdtm() function can handle partial date imputation before applying the cut. Common imputation strategies:

Conservative (for safety): Impute missing day to the 1st of the month - this maximizes the chance of including the record (more conservative for safety data)
Conservative (for efficacy): Impute missing day to the last of the month - interpretation depends on the analysis
SAP-driven: Always follow the Statistical Analysis Plan’s imputation rules

8.2 Multiple Cutoff Dates

# In some studies, cutoff dates differ by subject (rolling enrollment)
# or by analysis (interim vs final)

cat("Example: Different cutoffs per analysis:\n\n")

Example: Different cutoffs per analysis:

analysis_cuts <- tibble::tribble(
  ~Analysis,          ~Cutoff_Date,     ~Purpose,
  "DSMB Review 1",    "2024-03-31",     "Safety review at 50% enrollment",
  "Interim Analysis",  "2024-06-15",     "Pre-specified interim for efficacy",
  "Final Analysis",    "2024-12-31",     "Primary analysis for submission"
)

print(analysis_cuts)

# A tibble: 3 × 3
  Analysis         Cutoff_Date Purpose                           
  <chr>            <chr>       <chr>                             
1 DSMB Review 1    2024-03-31  Safety review at 50% enrollment   
2 Interim Analysis 2024-06-15  Pre-specified interim for efficacy
3 Final Analysis   2024-12-31  Primary analysis for submission

9 Best Practices for Data Cuts

9.1 DO

cat("=== DATA CUT BEST PRACTICES ===\n\n")

=== DATA CUT BEST PRACTICES ===

best_practices <- tibble::tribble(
  ~Rule,                                              ~Why,
  "Document the cutoff date in TS domain",            "Traceability for regulators",
  "Apply cuts BEFORE creating ADaM datasets",         "ADaM should be based on cut data",
  "Keep both pre-cut and post-cut datasets",          "Audit trail and reproducibility",
  "Handle ongoing events explicitly",                 "AEs without end dates need special logic",
  "Use datacutr for consistency",                     "Avoids manual errors across domains",
  "Log all records removed by the cut",               "Quality control and transparency",
  "Validate record counts before and after",          "Ensure no data is accidentally lost"
)

print(best_practices)

# A tibble: 7 × 2
  Rule                                     Why                                  
  <chr>                                    <chr>                                
1 Document the cutoff date in TS domain    Traceability for regulators          
2 Apply cuts BEFORE creating ADaM datasets ADaM should be based on cut data     
3 Keep both pre-cut and post-cut datasets  Audit trail and reproducibility      
4 Handle ongoing events explicitly         AEs without end dates need special l…
5 Use datacutr for consistency             Avoids manual errors across domains  
6 Log all records removed by the cut       Quality control and transparency     
7 Validate record counts before and after  Ensure no data is accidentally lost

9.2 The Data Cut in the Submission Package

cat("Where the data cut fits in the submission process:\n\n")

Where the data cut fits in the submission process:

cat("1. Raw data collected continuously\n")

1. Raw data collected continuously

cat("2. DATABASE LOCK (data cleaning complete)\n")

2. DATABASE LOCK (data cleaning complete)

cat("3. DATA CUT applied (datacutr)\n")

3. DATA CUT applied (datacutr)

cat("4. SDTM datasets created from cut data\n")

4. SDTM datasets created from cut data

cat("5. ADaM datasets derived from SDTM\n")

5. ADaM datasets derived from SDTM

cat("6. TFLs (Tables, Figures, Listings) generated from ADaM\n")

6. TFLs (Tables, Figures, Listings) generated from ADaM

cat("7. Clinical Study Report written\n")

7. Clinical Study Report written

cat("8. Submission package assembled (eCTD)\n\n")

8. Submission package assembled (eCTD)

cat("The cutoff date is recorded in:\n")

The cutoff date is recorded in:

cat("  - TS domain (TSPARMCD = 'DCUTDTC')\n")

  - TS domain (TSPARMCD = 'DCUTDTC')

cat("  - Study report\n")

  - Study report

cat("  - define.xml\n")

  - define.xml

10 Deliverable Summary

Today you completed the following:

Task	Status
Understood what data cuts are and why they matter	✓ Done
Implemented patient-level and record-level cuts manually	✓ Done
Handled ongoing AEs at the cutoff boundary	✓ Done
Used `datacutr` functions: `special_dm_cut()`, `date_cut()`, `apply_cut()`	✓ Done
Learned about partial date imputation	✓ Done
Reviewed best practices for data cuts	✓ Done

11 Key Takeaways

Data cuts freeze the data - Only data before the cutoff is included in the analysis
Patient-level cuts exclude entire subjects; record-level cuts exclude individual records
Ongoing events need special handling - AEs that span the cutoff require careful logic
datacutr standardizes the process - Avoids manual errors and ensures reproducibility
Cuts happen BEFORE SDTM/ADaM - The cut data is the foundation for all downstream datasets
Document everything - The cutoff date must be recorded in TS and the study report

12 Resources

datacutr Documentation - Official datacutr package documentation
datacutr GitHub - Source code and examples
CDISC Implementation Guide - Data Cuts - SDTM guidance
Pharmaverse.org - R packages for clinical data
ICH E9(R1) - Estimands and data handling

13 What’s Next?

In Day 13, we will focus on SDTM Validation with sdtmchecks:

Why validation is essential before creating ADaM datasets
Running FDA business rules against your SDTM domains
Understanding validation reports and resolving findings
Common SDTM issues and how to fix them