30 Days of Pharmaverse
Your Friendly Guide to Clinical Data Science with R in Open-Source
Exploration Roadmap
Welcome to 30 Days of Pharmaverse - a free, open-source, hands-on exploration of Clinical Data Science with R and the Pharmaverse ecosystem. Whether you’re a clinical programmer, biostatistician, data scientist, or a curious learner, this project walks you through real-world CDISC workflows one day at a time.
Each day tackles a specific concept - from setting up a working environment to building complex ADaM datasets and producing Tables, Listings, and Figures - using community-driven, open-source R packages like admiral, xportr, metacore, rtables, tern, and many more.
Who Is This For?
- Clinical programmers transitioning from SAS to R, looking for a structured, code-first guide to the Pharmaverse
- Biostatisticians & data scientists exploring how SDTM, ADaM, and TLF production work end-to-end in R
- Students & academics learning clinical trial data standards and reproducible research workflows
- Open-source contributors interested in the Pharmaverse - a growing family of validated R packages for clinical reporting
No SAS license or prior CDISC experience required. If you know base R / tidyverse and are curious about clinical data pipelines, start with Day 1.
Week 1: SDTM Fundamentals & Core R Skills
| Day | Topic | Deep Dive | Key Packages |
|---|---|---|---|
| Day 1 | Environment Setup & First SDTM Code | SDTM Programming Walkthrough | {admiral}, {pharmaversesdtm}, {dplyr}, {haven}, {xportr} |
| Day 2 | SDTM Domain Structure & Tidyverse Mastery | Understanding SDTM Domain Classes Through Code | {dplyr}, {tidyr}, {pharmaversesdtm} |
| Day 3 | Controlled Terminology & MedDRA Coding | Building AE Codelists in R | {dplyr}, {pharmaversesdtm} |
| Day 4 | Clinical Date Derivations with lubridate | Study Day, Duration, and Partial Date Imputation | {lubridate}, {dplyr}, {pharmaversesdtm} |
| Day 5 | Advanced Tidyverse: Pivoting & Joining | Reshaping SDTM Data for Analysis | {dplyr}, {tidyr}, {pharmaversesdtm} |
| Day 6 | Introduction to sdtm.oak | EDC-to-SDTM Transformation Patterns | {sdtm.oak}, {dplyr}, {pharmaversesdtm} |
| Day 7 | Week 1 Capstone: End-to-End SDTM Script | Build DM, AE, EX from Scratch with xportr | {dplyr}, {lubridate}, {sdtm.oak}, {xportr}, {haven} |
Week 2: Production SDTM & Validation
| Day | Topic | Deep Dive | Key Packages |
|---|---|---|---|
| Day 8 | Complex SDTM Domains - LB (Lab Results) | Findings Class with Unit Standardization | {dplyr}, {tidyr}, {sdtm.oak}, {pharmaversesdtm} |
| Day 9 | VS (Vital Signs) & Repeated Measures | Visit-Level Data and Positional Readings | {dplyr}, {tidyr}, {pharmaversesdtm} |
| Day 10 | AE Domain Mastery & SAE Logic | Deep Dive into Severity, Causality, and Outcomes | {dplyr}, {lubridate}, {sdtm.oak} |
| Day 11 | Disposition (DS) & Trial Design Domains | Screen Failures, Completers, and Study Structure | {dplyr}, {lubridate}, {sdtm.oak} |
| Day 12 | Data Cuts with datacutr | Applying Clinical Cutoff Dates for Interim & Final Analyses | {dplyr}, {lubridate}, {datacutr} |
| Day 13 | SDTM Validation with sdtmchecks | Running FDA Business Rules Against Your Domains | {sdtmchecks}, {dplyr}, {pharmaversesdtm} |
| Day 14 | Week 2 Capstone - Metadata-Driven SDTM with metacore & xportr | Specification-Driven Workflows for Submission-Ready Data | {metacore}, {metatools}, {xportr}, {dplyr} |
Week 3: ADaM Deep Dive & Admiral Mastery
| Day | Topic | Deep Dive | Key Packages |
|---|---|---|---|
| Day 15 | ADaM Architecture & Admiral Core Engine | Week 3, Day 15: Understanding ADaM Structures, Admiral Philosophy, and Core Derivation Patterns | {admiral}, {pharmaversesdtm}, {pharmaverseadam}, {dplyr} |
| Day 16 | ADSL Part 1 - Treatment Variables & Dates | Week 3, Day 16: First dose dates, treatment assignment, and study timeline from EX and DS | {admiral}, {dplyr}, {lubridate}, {pharmaversesdtm} |
| Day 17 | ADSL Part 2 - Population Flags & Demographics | Week 3, Day 17: SAFFL, ITTFL, demographic groupings, and baseline measurements | {admiral}, {dplyr}, {pharmaversesdtm} |
| Day 18 | ADAE - Adverse Events Analysis Dataset | Week 3, Day 18: OCCDS structure with treatment emergent flags | {admiral}, {dplyr}, {lubridate} |
| Day 19 | ADLB - Lab Analysis Dataset (BDS) | Week 3, Day 19: BDS structure with baseline, change, and toxicity grading | {admiral}, {dplyr}, {lubridate}, {pharmaversesdtm} |
| Day 20 | ADVS - Vitals Analysis Dataset (BDS) | Week 3, Day 20: BDS structure with visit windows and multiple readings | {admiral}, {dplyr}, {lubridate}, {pharmaversesdtm} |
| Day 21 | ADTTE - Time-to-Event Analysis Dataset | Week 3, Day 21: Survival analysis structure with events and censoring | {admiral}, {dplyr}, {lubridate}, {pharmaversesdtm} |
Week 4: ADaM Advanced & TLF Production
| Day | Topic | Deep Dive | Key Packages |
|---|---|---|---|
| Day 22 | Demography Table with gtsummary + gt | First TLF from ADSL | {admiral}, {dplyr}, {lubridate} |
| Day 23 | ADCM and ADRS - Concomitant Meds and Oncology Response | Week 4, Day 23: OCCDS period flags and RECIST 1.1 with admiralonco | {admiral}, {admiralonco}, {dplyr}, {lubridate}, {pharmaversesdtm} |
| Day 24 | ARD-First Reporting with cards and cardx | CDISC Analysis Results Data - computation decoupled from presentation | {cards}, {cardx}, {pharmaverseadam}, {dplyr}, {ggsurvfit} |
| Day 25 | gtsummary and tfrmt - ARD-Backed Production Tables | Flexible TLGs from ARD and format-spec driven TLF libraries | {gtsummary}, {tfrmt}, {pharmaverseadam}, {dplyr}, {ggsurvfit} |
| Day 26 | flextable and officer - Word and RTF Clinical Tables | Formatted TLF output to Word and RTF | {flextable}, {officer}, {pharmaverseadam}, {dplyr} |
| Day 27 | rtables, tern, and r2rtf - Structured Clinical Tables | Declarative table layout and RTF output | {rtables}, {tern}, {r2rtf}, {pharmaverseadam}, {dplyr} |
| Day 28 | Tplyr - Declarative Clinical Table Programming | Grammar of clinical data summaries | {rlistings}, {ggsurvfit}, {ggplot2}, {dplyr}, {purrr} |
| Day 29 | ggsurvfit + gtsummary - Survival Plots and Clinical Figures | Publication-ready TTE figures in pharmaverse | {Tplyr}, {testthat}, {xportr}, {logrx}, {metacore}, {pharmaverseadam} |
| Day 30 | Capstone - Full Clinical Reporting Workflow | Survival · Safety · Lab · Subgroups across the pharmaverse | {rtables}, {tern}, {r2rtf}, {flextable}, {officer}, {gtsummary}, {rlistings}, {ggsurvfit}, {ggplot2}, {patchwork} |
What You’ll Explore
This journey covers the essential toolkit for modern clinical programming in R:
- SDTM & ADaM Standards: Understanding the structure and intent of CDISC-standardised data.
- Pharmaverse Tools: Hands-on work with
admiral,pharmaversesdtm,pharmaverseadam,xportr,sdtm.oak,sdtmchecks,metacore, and more. - ADaM Construction: Building ADSL, ADAE, ADLB, ADVS, ADTTE, ADCM, ADEX, and ADRS from SDTM using
admiralandadmiralonco. - TLF Production: Tables, listings, and figures using
rtables/tern,r2rtf,flextable+officer,gtsummary,tfrmt,rlistings, andggsurvfit. - ARD-First Workflows: Using
cardsandcardxto produce tidy Analysis Results Datasets that separate computation from formatting. - QC & Traceability: Unit testing with
testthat, cell-level traceability withTplyr, audit logging withlogrx, and metadata-driven export withxportr.
The Toolkit
Prerequisites: - Comfort with R and the tidyverse - A curiosity about clinical trials and data standards
Core Packages:
# SDTM & ADaM layer
install.packages(c(
"admiral", # ADaM derivations
"admiralonco", # Oncology extensions
"pharmaversesdtm", # Example SDTM datasets
"pharmaverseadam", # Example ADaM datasets
"sdtm.oak", # SDTM utilities
"sdtmchecks", # SDTM validation
"datacutr", # Data cut utilities
"metacore", # Metadata management
"metatools", # Metadata tools
"xportr", # Regulatory export (.xpt)
"haven", # SAS interoperability
"dplyr", # Data wrangling
"lubridate" # Date handling
))
# TLF layer
install.packages(c(
"rtables", "tern", "r2rtf", "flextable", "officer",
"gtsummary", "tfrmt", "Tplyr", "rlistings",
"ggplot2", "ggsurvfit", "patchwork"
))
# ARD & QC
install.packages(c("cards", "cardx", "testthat", "logrx"))⚠️ Important Disclaimer
This project is a personal initiative created for learning and exploratory purposes only. It is in no way affiliated with, endorsed by, sponsored by, funded by, or assisted by any organisation or company at any capacity.
- This project draws heavily from open-source projects, Pharmaverse examples, public repositories, and official CDISC documentation.
- Significant portions of the code and content have been created with assistance from Large Language Models (LLMs) and refined through human review and intervention.
- Examples and patterns are adapted from established best practices in the R and clinical programming communities.
Learners are strongly encouraged to:
- Think critically - Don’t just copy-paste code. Understand why each step exists.
- Verify independently - Check code against official Pharmaverse documentation and CDISC standards.
- Test thoroughly - Validate any code in your own environment before use in any context.
- Consult experts - When implementing for real clinical trials, seek guidance from qualified statisticians and data managers.
The views, opinions, code, and materials shared in this project are solely for exploratory and learning purposes and do not guarantee accuracy, correctness, or compliance with any standard. They do not represent the positions of any external organisation and should not be treated as guidance for any formal or regulated use without independent verification.
Community & Resources
- Pharmaverse.org - The central hub for open-source clinical R packages
- CDISC.org - The official standards body for clinical data interchange
- Admiral Documentation - The flagship ADaM derivation package
- Pharmaverse TLG Catalogue - Reference TLF outputs from the pharmaverse stack
- r4csr.org - Open-access book: R for Clinical Study Reports (Zhang et al.)
- R Consortium R Submissions WG - Pilot projects for R-based regulatory submissions
- R for Data Science - The foundation for learning tidyverse
- GitHub Repository - Source code, issues, and contributions welcome
Contributing
This is a living document powered with Quarto. Suggestions, corrections, and improvements are always welcome. Feel free to open an issue or submit a pull request on GitHub!
If you find the project useful, please consider starring the repository and sharing it with your network. The more eyes and minds we have, the better this resource can become for everyone exploring clinical data science with R.
Curated by: Indraneel Chakraborty