30 Days of Pharmaverse
  • Week 1: SDTM Fundamentals
  • Week 2: Production SDTM
  • Week 3: ADaM Deep Dive
  • Week 4: Tables, Listings and Figures
  1. Day 28: Tplyr - Declarative Clinical Table Programming
  • Day 22: Demography Table with gtsummary + gt
  • Day 23: ADCM and ADRS - Concomitant Meds and Oncology Response
  • Day 24: ARD-First Reporting with cards and cardx
  • Day 25: gtsummary and tfrmt - ARD-Backed Production Tables
  • Day 26: flextable and officer - Word and RTF Clinical Tables
  • Day 27: rtables, tern, and r2rtf - Structured Clinical Tables
  • Day 28: Tplyr - Declarative Clinical Table Programming
  • Day 29: ggsurvfit + gtsummary - Survival Plots and Clinical Figures
  • Day 30: Capstone - Full Clinical Reporting Workflow

On this page

  • 1 Overview
  • 2 Setup
  • 3 Part 1: Core Tplyr Concepts
    • 3.1 Step 1: Single-layer count table
    • 3.2 Step 2: Descriptive statistics layer
  • 4 Part 2: Demographics Table
    • 4.1 Step 3: Multi-layer demographics table
  • 5 Part 3: AE Frequency Table
    • 5.1 Step 4: AE table with subject-level denominators
    • 5.2 Step 5: Apply a frequency threshold
  • 6 Part 4: Shift Table
    • 6.1 Step 6: Lab shift table with group_shift()
  • 7 Part 5: Extracting Numeric Data
    • 7.1 Step 7: get_numeric_data() for traceability
  • 8 Part 6: Column Nesting
    • 8.1 Step 8: cols argument for nested column splits
  • 9 Validation Checks
  • 10 Key Takeaways
  • 11 Resources

Day 28: Tplyr - Declarative Clinical Table Programming

Grammar of clinical data summaries

Back to Roadmap

1 Overview

Tplyr provides a grammar-of-tables approach to clinical data summaries. A table is built by stacking independent layers. When build() is called, all layers execute and assemble into a single data frame ready for downstream display or RTF export.

Concept Tplyr rtables / tern
Entry point tplyr_table() basic_table()
Analysis unit layer (add_layer()) analyze() / split_rows_by()
Execution build() returns a data frame build_table() returns an rtable
Sorting dplyr post-build() pre-ordered factors
Output data frame -> any renderer rtable -> r2rtf

2 Setup

library(Tplyr)
library(pharmaverseadam)
library(dplyr)
library(tidyr)
library(knitr)

adsl <- pharmaverseadam::adsl
adae  <- pharmaverseadam::adae
adlb  <- pharmaverseadam::adlb

adsl_safe <- adsl |>
  dplyr::filter(SAFFL == "Y") |>
  dplyr::mutate(
    TRT01A = factor(TRT01A),
    SEX    = factor(SEX),
    RACE   = factor(RACE)
  )

adae_safe <- adae |>
  dplyr::filter(SAFFL == "Y", TRTEMFL == "Y")

adlb_alt <- adlb |>
  dplyr::filter(SAFFL == "Y", PARAMCD == "ALT",
                ANL01FL == "Y", !is.na(BNRIND), !is.na(ANRIND)) |>
  dplyr::mutate(
    BNRIND = factor(BNRIND, levels = c('LOW', 'NORMAL', 'HIGH')),
    ANRIND = factor(ANRIND, levels = c('LOW', 'NORMAL', 'HIGH'))
  )

cat("ADSL safety N:", nrow(adsl_safe), "\n")
ADSL safety N: 254 
cat("Treatment arms:", levels(adsl_safe$TRT01A), "\n")
Treatment arms: Placebo Xanomeline High Dose Xanomeline Low Dose 
cat("ADAE TEAE records:", nrow(adae_safe), "\n")
ADAE TEAE records: 1122 
cat("ADLB ALT records:", nrow(adlb_alt), "\n")
ADLB ALT records: 2094 

3 Part 1: Core Tplyr Concepts

3.1 Step 1: Single-layer count table

t_sex <- tplyr_table(adsl_safe, TRT01A) |>
  add_total_group() |>
  add_layer(
    group_count(SEX, by = 'Sex') |>
      set_format_strings(f_str('xx (xx.x%)', n, pct))
  )
build(t_sex) |>
  dplyr::select(-dplyr::starts_with('ord')) |>
  knitr::kable()
row_label1 row_label2 var1_Placebo var1_Total var1_Xanomeline High Dose var1_Xanomeline Low Dose
Sex F 53 (61.6%) 143 (56.3%) 35 (48.6%) 55 (57.3%)
Sex M 33 (38.4%) 111 (43.7%) 37 (51.4%) 41 (42.7%)

3.2 Step 2: Descriptive statistics layer

# Named entries in set_format_strings() become row labels.
# f_str() maps positional args to statistic keywords: mean, sd, median, q1, q3, min, max
t_age <- tplyr_table(adsl_safe, TRT01A) |>
  add_total_group() |>
  add_layer(
    group_desc(AGE, by = 'Age (years)') |>
      set_format_strings(
        'n'           = f_str('xx', n),
        'Mean (SD)'   = f_str('xx.x (xx.xx)', mean, sd),
        'Median'      = f_str('xx.x', median),
        'Q1, Q3'      = f_str('xx.x, xx.x', q1, q3),
        'Min, Max'    = f_str('xx.x, xx.x', min, max)
      )
  )
build(t_age) |>
  dplyr::select(-dplyr::starts_with('ord')) |>
  knitr::kable()
row_label1 row_label2 var1_Placebo var1_Xanomeline High Dose var1_Xanomeline Low Dose var1_Total
Age (years) n 86 72 96 254
Age (years) Mean (SD) 75.2 ( 8.59) 73.8 ( 7.94) 76.0 ( 8.11) 75.1 ( 8.25)
Age (years) Median 76.0 75.5 78.0 77.0
Age (years) Q1, Q3 69.2, 81.8 70.5, 79.0 71.0, 82.0 70.0, 81.0
Age (years) Min, Max 52.0, 89.0 56.0, 88.0 51.0, 88.0 51.0, 89.0

4 Part 2: Demographics Table

4.1 Step 3: Multi-layer demographics table

t_demo <- tplyr_table(adsl_safe, TRT01A) |>
  add_total_group() |>
  add_layer(
    group_desc(AGE, by = 'Age (years)') |>
      set_format_strings(
        'n'         = f_str('xx', n),
        'Mean (SD)' = f_str('xx.x (xx.xx)', mean, sd),
        'Median'    = f_str('xx.x', median),
        'Min, Max'  = f_str('xx.x, xx.x', min, max)
      )
  ) |>
  add_layer(
    group_count(SEX, by = 'Sex') |>
      set_format_strings(f_str('xx (xx.x%)', n, pct))
  ) |>
  add_layer(
    group_count(RACE, by = 'Race') |>
      set_format_strings(f_str('xx (xx.x%)', n, pct))
  ) |>
  add_layer(
    group_count(AGEGR1, by = 'Age Group') |>
      set_format_strings(f_str('xx (xx.x%)', n, pct))
  )
demo_result <- build(t_demo) |>
  dplyr::select(-dplyr::starts_with('ord'))
knitr::kable(demo_result)
row_label1 row_label2 var1_Placebo var1_Xanomeline High Dose var1_Xanomeline Low Dose var1_Total
Age (years) n 86 72 96 254
Age (years) Mean (SD) 75.2 ( 8.59) 73.8 ( 7.94) 76.0 ( 8.11) 75.1 ( 8.25)
Age (years) Median 76.0 75.5 78.0 77.0
Age (years) Min, Max 52.0, 89.0 56.0, 88.0 51.0, 88.0 51.0, 89.0
Sex F 53 (61.6%) 35 (48.6%) 55 (57.3%) 143 (56.3%)
Sex M 33 (38.4%) 37 (51.4%) 41 (42.7%) 111 (43.7%)
Race AMERICAN INDIAN OR ALASKA NATIVE 0 ( 0.0%) 1 ( 1.4%) 0 ( 0.0%) 1 ( 0.4%)
Race BLACK OR AFRICAN AMERICAN 8 ( 9.3%) 9 (12.5%) 6 ( 6.2%) 23 ( 9.1%)
Race WHITE 78 (90.7%) 62 (86.1%) 90 (93.8%) 230 (90.6%)
Age Group >64 72 (83.7%) 61 (84.7%) 88 (91.7%) 221 (87.0%)
Age Group 18-64 14 (16.3%) 11 (15.3%) 8 ( 8.3%) 33 (13.0%)

5 Part 3: AE Frequency Table

5.1 Step 4: AE table with subject-level denominators

# Required trio for subject-level AE denominators:
#   set_pop_data() / set_pop_treat_var() / set_distinct_by(USUBJID)
# Sort by frequency AFTER build() using dplyr on the Total column.
t_ae <- tplyr_table(adae_safe, TRT01A) |>
  set_pop_data(adsl_safe) |>
  set_pop_treat_var(TRT01A) |>
  add_total_group() |>
  add_layer(
    group_count(AEDECOD, by = vars(AEBODSYS)) |>
      set_distinct_by(USUBJID) |>
      set_format_strings(f_str('xx (xx.x%)', distinct_n, distinct_pct))
  )
ae_result_raw <- build(t_ae) |>
  dplyr::select(-dplyr::starts_with('ord'))
total_col <- grep('Total', names(ae_result_raw), value = TRUE)[1]
ae_result <- ae_result_raw |>
  dplyr::mutate(
    .sort_n = suppressWarnings(
      as.integer(trimws(sub('\\s*\\(.*', '', .data[[total_col]])))
    )
  ) |>
  dplyr::arrange(row_label1, dplyr::desc(.sort_n)) |>
  dplyr::select(-.sort_n)
cat("AE table rows:", nrow(ae_result), "\n")
AE table rows: 5290 
knitr::kable(head(ae_result, 20))
row_label1 row_label2 var1_Placebo var1_Total var1_Xanomeline High Dose var1_Xanomeline Low Dose
CARDIAC DISORDERS SINUS BRADYCARDIA 2 ( 2.3%) 17 ( 6.7%) 8 (11.1%) 7 ( 7.3%)
CARDIAC DISORDERS MYOCARDIAL INFARCTION 4 ( 4.7%) 10 ( 3.9%) 4 ( 5.6%) 2 ( 2.1%)
CARDIAC DISORDERS ATRIAL FIBRILLATION 1 ( 1.2%) 5 ( 2.0%) 2 ( 2.8%) 2 ( 2.1%)
CARDIAC DISORDERS SUPRAVENTRICULAR EXTRASYSTOLES 1 ( 1.2%) 3 ( 1.2%) 1 ( 1.4%) 1 ( 1.0%)
CARDIAC DISORDERS VENTRICULAR EXTRASYSTOLES 0 ( 0.0%) 3 ( 1.2%) 1 ( 1.4%) 2 ( 2.1%)
CARDIAC DISORDERS ATRIAL FLUTTER 0 ( 0.0%) 2 ( 0.8%) 1 ( 1.4%) 1 ( 1.0%)
CARDIAC DISORDERS ATRIOVENTRICULAR BLOCK FIRST DEGREE 1 ( 1.2%) 2 ( 0.8%) 0 ( 0.0%) 1 ( 1.0%)
CARDIAC DISORDERS BUNDLE BRANCH BLOCK RIGHT 1 ( 1.2%) 2 ( 0.8%) 0 ( 0.0%) 1 ( 1.0%)
CARDIAC DISORDERS PALPITATIONS 0 ( 0.0%) 2 ( 0.8%) 0 ( 0.0%) 2 ( 2.1%)
CARDIAC DISORDERS ATRIAL HYPERTROPHY 1 ( 1.2%) 1 ( 0.4%) 0 ( 0.0%) 0 ( 0.0%)
CARDIAC DISORDERS ATRIOVENTRICULAR BLOCK SECOND DEGREE 1 ( 1.2%) 1 ( 0.4%) 0 ( 0.0%) 0 ( 0.0%)
CARDIAC DISORDERS BRADYCARDIA 1 ( 1.2%) 1 ( 0.4%) 0 ( 0.0%) 0 ( 0.0%)
CARDIAC DISORDERS BUNDLE BRANCH BLOCK LEFT 1 ( 1.2%) 1 ( 0.4%) 0 ( 0.0%) 0 ( 0.0%)
CARDIAC DISORDERS CARDIAC DISORDER 0 ( 0.0%) 1 ( 0.4%) 1 ( 1.4%) 0 ( 0.0%)
CARDIAC DISORDERS CARDIAC FAILURE CONGESTIVE 1 ( 1.2%) 1 ( 0.4%) 0 ( 0.0%) 0 ( 0.0%)
CARDIAC DISORDERS SINUS ARRHYTHMIA 1 ( 1.2%) 1 ( 0.4%) 0 ( 0.0%) 0 ( 0.0%)
CARDIAC DISORDERS SUPRAVENTRICULAR TACHYCARDIA 0 ( 0.0%) 1 ( 0.4%) 0 ( 0.0%) 1 ( 1.0%)
CARDIAC DISORDERS TACHYCARDIA 1 ( 1.2%) 1 ( 0.4%) 0 ( 0.0%) 0 ( 0.0%)
CARDIAC DISORDERS VENTRICULAR HYPERTROPHY 1 ( 1.2%) 1 ( 0.4%) 0 ( 0.0%) 0 ( 0.0%)
CARDIAC DISORDERS WOLFF-PARKINSON-WHITE SYNDROME 0 ( 0.0%) 1 ( 0.4%) 0 ( 0.0%) 1 ( 1.0%)

5.2 Step 5: Apply a frequency threshold

arm_cols <- grep('^var1_', names(ae_result), value = TRUE)
arm_cols <- arm_cols[!grepl('Total', arm_cols, ignore.case = TRUE)]
arm1_col <- arm_cols[1]
cat("First arm column:", arm1_col, "\n")
First arm column: var1_Placebo 
ae_filtered <- ae_result |>
  dplyr::filter(
    !is.na(row_label2),
    suppressWarnings(
      as.integer(trimws(sub('\\s*\\(.*', '', .data[[arm1_col]]))) >= 5
    )
  )
cat("PT rows with n >= 5:", nrow(ae_filtered), "\n")
PT rows with n >= 5: 7 
knitr::kable(ae_filtered)
row_label1 row_label2 var1_Placebo var1_Total var1_Xanomeline High Dose var1_Xanomeline Low Dose
GASTROINTESTINAL DISORDERS DIARRHOEA 9 (10.5%) 17 ( 6.7%) 3 ( 4.2%) 5 ( 5.2%)
GENERAL DISORDERS AND ADMINISTRATION SITE CONDITIONS APPLICATION SITE PRURITUS 6 ( 7.0%) 50 (19.7%) 21 (29.2%) 23 (24.0%)
GENERAL DISORDERS AND ADMINISTRATION SITE CONDITIONS APPLICATION SITE DERMATITIS 5 ( 5.8%) 21 ( 8.3%) 7 ( 9.7%) 9 ( 9.4%)
INFECTIONS AND INFESTATIONS UPPER RESPIRATORY TRACT INFECTION 6 ( 7.0%) 10 ( 3.9%) 3 ( 4.2%) 1 ( 1.0%)
SKIN AND SUBCUTANEOUS TISSUE DISORDERS PRURITUS 8 ( 9.3%) 54 (21.3%) 25 (34.7%) 21 (21.9%)
SKIN AND SUBCUTANEOUS TISSUE DISORDERS ERYTHEMA 8 ( 9.3%) 36 (14.2%) 14 (19.4%) 14 (14.6%)
SKIN AND SUBCUTANEOUS TISSUE DISORDERS RASH 5 ( 5.8%) 26 (10.2%) 8 (11.1%) 13 (13.5%)

6 Part 4: Shift Table

6.1 Step 6: Lab shift table with group_shift()

# group_shift() uses vars(row = <baseline>, column = <post-baseline>).
# The keyword is 'column', not 'col'.
t_shift <- tplyr_table(adlb_alt, TRT01A) |>
  set_pop_data(adsl_safe) |>
  set_pop_treat_var(TRT01A) |>
  add_layer(
    group_shift(vars(row = BNRIND, column = ANRIND), by = 'ALT') |>
      set_format_strings(f_str('xx (xx.x%)', n, pct))
  )
shift_result <- build(t_shift) |>
  dplyr::select(-dplyr::starts_with('ord'))
knitr::kable(shift_result)
row_label1 row_label2 var1_Placebo_LOW var1_Placebo_NORMAL var1_Placebo_HIGH var1_Xanomeline High Dose_LOW var1_Xanomeline High Dose_NORMAL var1_Xanomeline High Dose_HIGH var1_Xanomeline Low Dose_LOW var1_Xanomeline Low Dose_NORMAL var1_Xanomeline Low Dose_HIGH
ALT LOW 0 ( 0.0%) 0 ( 0.0%) 0 ( 0.0%) 0 ( 0.0%) 0 ( 0.0%) 0 ( 0.0%) 0 ( 0.0%) 0 ( 0.0%) 0 ( 0.0%)
ALT NORMAL 6 ( 0.7%) 786 (92.3%) 20 ( 2.3%) 0 ( 0.0%) 563 (91.1%) 18 ( 2.9%) 10 ( 1.6%) 577 (92.5%) 23 ( 3.7%)
ALT HIGH 0 ( 0.0%) 24 ( 2.8%) 16 ( 1.9%) 0 ( 0.0%) 22 ( 3.6%) 15 ( 2.4%) 0 ( 0.0%) 5 ( 0.8%) 9 ( 1.4%)

7 Part 5: Extracting Numeric Data

7.1 Step 7: get_numeric_data() for traceability

# get_numeric_data() returns a list of data frames with unformatted statistics.
# Column names depend on which statistics were requested.
# Always inspect with names() first -- never hard-code statistic column names.
t_age2 <- tplyr_table(adsl_safe, TRT01A) |>
  add_layer(
    group_desc(AGE) |>
      set_format_strings(
        'Mean (SD)' = f_str('xx.x (xx.xx)', mean, sd),
        'Median'    = f_str('xx.x', median)
      )
  )
built_age <- build(t_age2)
num_data  <- get_numeric_data(t_age2)
cat("Layers returned:", length(num_data), "\n")
Layers returned: 1 
cat("Layer 1 columns:", paste(names(num_data[[1]]), collapse = ", "), "\n")
Layer 1 columns: summary_var, TRT01A, stat, value 
# Display all columns; select() avoided because column names vary by statistics
knitr::kable(num_data[[1]], digits = 2)
summary_var TRT01A stat value
AGE Placebo mean 75.21
AGE Placebo sd 8.59
AGE Placebo median 76.00
AGE Xanomeline High Dose mean 73.78
AGE Xanomeline High Dose sd 7.94
AGE Xanomeline High Dose median 75.50
AGE Xanomeline Low Dose mean 75.96
AGE Xanomeline Low Dose sd 8.11
AGE Xanomeline Low Dose median 78.00

8 Part 6: Column Nesting

8.1 Step 8: cols argument for nested column splits

# cols adds a second level of column splitting (here: SEX within TRT01A)
t_nested <- tplyr_table(adsl_safe, TRT01A, cols = SEX) |>
  add_layer(
    group_count(AGEGR1, by = 'Age Group') |>
      set_format_strings(f_str('xx (xx.x%)', n, pct))
  )
build(t_nested) |>
  dplyr::select(-dplyr::starts_with('ord')) |>
  knitr::kable()
row_label1 row_label2 var1_Placebo_F var1_Placebo_M var1_Xanomeline High Dose_F var1_Xanomeline High Dose_M var1_Xanomeline Low Dose_F var1_Xanomeline Low Dose_M
Age Group >64 44 (83.0%) 28 (84.8%) 30 (85.7%) 31 (83.8%) 50 (90.9%) 38 (92.7%)
Age Group 18-64 9 (17.0%) 5 (15.2%) 5 (14.3%) 6 (16.2%) 5 ( 9.1%) 3 ( 7.3%)

9 Validation Checks

cat("\n=== Day 28 Validation ===\n\n")

=== Day 28 Validation ===
cat("Check 1 - Demographics rows:", nrow(demo_result), "\n")
Check 1 - Demographics rows: 11 
cat("Check 2 - AE table rows:", nrow(ae_result), "\n")
Check 2 - AE table rows: 5290 
cat("Check 3 - AE rows after n>=5 filter:", nrow(ae_filtered), "\n")
Check 3 - AE rows after n>=5 filter: 7 
cat("Check 4 - Shift table rows:", nrow(shift_result), "\n")
Check 4 - Shift table rows: 3 
cat("Check 5 - Numeric data layers:", length(num_data), "\n")
Check 5 - Numeric data layers: 1 
cat("\nValidation complete\n")

Validation complete

10 Key Takeaways

  1. tplyr_table(data, treat_var) – unquoted NSE; add_total_group() adds a Total column
  2. Three layer types: group_desc() continuous, group_count() categorical, group_shift() shift
  3. f_str('xx.x (xx.xx)', mean, sd) – positional vars map to statistics; named entries become row labels
  4. group_shift() uses column= (not col=) in vars(row = ..., column = ...)
  5. set_pop_data() + set_pop_treat_var() + set_distinct_by(USUBJID) – required for AE subject-level denominators
  6. Never combine set_order_count_method() + set_ordering_cols() – sort with dplyr::arrange() after build() instead
  7. get_numeric_data() column names vary – always inspect with names() first; never hard-code statistic column names
  8. build() returns a plain data frame – filter, sort, join like any tibble; drop sort columns with select(-starts_with('ord'))

11 Resources

  • Tplyr documentation: https://atorus-research.github.io/Tplyr/
  • Tplyr cheatsheet: https://atorus-research.github.io/Tplyr_cheatsheet.pdf
  • r4csr book: https://r4csr.org

End of Day 28


 

30 Days of Pharmaverse  ·  Disclaimer  ·  Indraneel Chakraborty  ·  © 2026