30 Days of Pharmaverse
  • Week 1: SDTM Fundamentals
  • Week 2: Production SDTM
  • Week 3: ADaM Deep Dive
  • Week 4: Tables, Listings and Figures
  1. Day 14: Week 2 Capstone - Metadata-Driven SDTM with metacore & xportr
  • Day 8: Complex SDTM Domains - LB (Lab Results)
  • Day 9: VS (Vital Signs) & Repeated Measures
  • Day 10: AE Domain Mastery & SAE Logic
  • Day 11: Disposition (DS) & Trial Design Domains
  • Day 12: Data Cuts with datacutr
  • Day 13: SDTM Validation with sdtmchecks
  • Day 14: Week 2 Capstone - Metadata-Driven SDTM with metacore & xportr

On this page

  • 1 Learning Objectives
  • 2 Why Metadata-Driven Workflows?
    • 2.1 The Problem with Manual Programming
  • 3 Package Installation & Loading
    • 3.1 Required Packages
  • 4 Understanding metacore
    • 4.1 What is metacore?
    • 4.2 The metacore Object Structure
    • 4.3 Creating a metacore Object
  • 5 Using xportr: The Metadata Application Engine
    • 5.1 What xportr Does
    • 5.2 Applying xportr Step by Step
    • 5.3 Step 2: Create a Specification for xportr
    • 5.4 Step 3: Apply xportr Functions
    • 5.5 Step 4: Export as .xpt
  • 6 Using metatools for Metadata-Based Checks
    • 6.1 What metatools Provides
  • 7 End-to-End Capstone Pipeline
  • 8 Week 2 Review: Everything You’ve Learned
    • 8.1 What You’re Now Ready For
  • 9 Deliverable Summary
  • 10 Key Takeaways
  • 11 Resources
  • 12 πŸŽ‰ Congratulations! Week 2 Complete!
  • 13 What’s Next?

Day 14: Week 2 Capstone - Metadata-Driven SDTM with metacore & xportr

Specification-Driven Workflows for Submission-Ready Data

← Back to Roadmap

1 Learning Objectives

By the end of Day 14 (Week 2 Capstone), you will be able to:

  1. Load specification objects using metacore - the Pharmaverse metadata standard
  2. Use metatools to apply metadata checks and select columns from specs
  3. Use xportr to apply labels, types, formats, lengths, and variable ordering from specs
  4. Build an end-to-end pipeline: raw data β†’ SDTM β†’ validate β†’ export .xpt
  5. Appreciate why metadata-driven workflows are the future of clinical programming
  6. Understand everything needed before starting ADaM datasets in Week 3

2 Why Metadata-Driven Workflows?

2.1 The Problem with Manual Programming

In traditional clinical programming, programmers manually:

  • Assign variable labels (label(dm$USUBJID) <- "Unique Subject Identifier")
  • Set variable types (character vs numeric)
  • Order variables in the correct sequence
  • Set variable lengths for transport files

This is error-prone, tedious, and hard to maintain.

ImportantThe Metadata-Driven Solution

Instead of hardcoding metadata in your programs, you:

  1. Define metadata once in a specification file (Excel, or a metacore object)
  2. Load metadata into R using metacore
  3. Apply metadata to your datasets using xportr/metatools
  4. Validate that your datasets match the specification

This ensures:

  • Consistency - All datasets follow the same rules
  • Reproducibility - Metadata changes automatically propagate
  • Compliance - Variable labels, types, and lengths match define.xml

3 Package Installation & Loading

3.1 Required Packages

Package Purpose
metacore Load and manage dataset specifications
metatools Apply metadata-based transformations and checks
xportr Apply labels, types, formats, lengths; export .xpt
dplyr Data manipulation
pharmaversesdtm Example SDTM datasets
if (!requireNamespace("dplyr", quietly = TRUE)) suppressMessages(install.packages("dplyr"))
if (!requireNamespace("lubridate", quietly = TRUE)) suppressMessages(install.packages("lubridate"))
if (!requireNamespace("pharmaversesdtm", quietly = TRUE)) suppressMessages(install.packages("pharmaversesdtm"))
if (!requireNamespace("metacore", quietly = TRUE)) suppressMessages(install.packages("metacore"))
if (!requireNamespace("metatools", quietly = TRUE)) suppressMessages(install.packages("metatools"))
if (!requireNamespace("xportr", quietly = TRUE)) suppressMessages(install.packages("xportr"))

library(dplyr)
library(lubridate)
library(pharmaversesdtm)
library(metacore)
library(metatools)
library(xportr)

4 Understanding metacore

4.1 What is metacore?

metacore is a Pharmaverse package that provides a standardized way to represent dataset specifications in R. Think of it as the bridge between your Excel specification file and your R programs.

4.2 The metacore Object Structure

A metacore object contains multiple related tables:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                      METACORE OBJECT                                    β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                                         β”‚
β”‚  ds_spec        = Dataset-level metadata                                β”‚
β”‚                   (domain name, label, structure)                       β”‚
β”‚                                                                         β”‚
β”‚  ds_vars        = Variable-level metadata                               β”‚
β”‚                   (which variables belong to which dataset)              β”‚
β”‚                                                                         β”‚
β”‚  var_spec       = Variable specifications                               β”‚
β”‚                   (variable name, label, type, length, format)          β”‚
β”‚                                                                         β”‚
β”‚  value_spec     = Value-level metadata                                  β”‚
β”‚                   (codelist values, decode values)                      β”‚
β”‚                                                                         β”‚
β”‚  derivations    = Derivation metadata                                   β”‚
β”‚                   (how variables are derived)                           β”‚
β”‚                                                                         β”‚
β”‚  codelist       = Code list definitions                                 β”‚
β”‚                   (controlled terminology)                              β”‚
β”‚                                                                         β”‚
β”‚  supp           = Supplemental qualifiers                               β”‚
β”‚                   (SUPP-- domain information)                           β”‚
β”‚                                                                         β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

4.3 Creating a metacore Object

In production, you’d load this from an Excel specification. For learning, let’s build one:

Metadata components created:
  Dataset specs: 3 datasets
  Variable assignments: 28 variable-dataset pairs
  Variable specs: 25 unique variables
# In production, you would create the metacore object like this:
mc <- metacore::metacore(
  ds_spec  = ds_spec,
  ds_vars  = ds_vars,
  var_spec = var_spec
)

# Or more commonly, load from a specification file:
mc <- metacore::spec_to_metacore("path/to/specs.xlsx")
Notespec_to_metacore()

The most common way to create a metacore object in production is using spec_to_metacore(), which reads from a formatted Excel specification file. This spec file is typically created by the study statistician or data standards team and contains all the metadata for every dataset and variable.


5 Using xportr: The Metadata Application Engine

5.1 What xportr Does

xportr is the workhorse for making your datasets submission-ready. It applies metadata from your specification to your actual data:

xportr: Key Functions
# A tibble: 6 Γ— 2
  Function        What_It_Does                                            
  <chr>           <chr>                                                   
1 xportr_type()   Coerce variables to the correct type (character/numeric)
2 xportr_length() Set variable lengths for SAS transport                  
3 xportr_label()  Apply variable labels from specification                
4 xportr_order()  Reorder variables to match specification                
5 xportr_format() Apply SAS display formats                               
6 xportr_write()  Export the dataset as a .xpt (SAS transport) file       

5.2 Applying xportr Step by Step

Let’s work through the full pipeline using the DM domain:

DM dataset before xportr:
Rows: 15 
Cols: 16 
Rows: 15
Columns: 16
$ STUDYID  <chr> "CDISC01", "CDISC01", "CDISC01", "CDISC01", "CDISC01", "CDISC…
$ DOMAIN   <chr> "DM", "DM", "DM", "DM", "DM", "DM", "DM", "DM", "DM", "DM", "…
$ SUBJID   <chr> "001", "002", "003", "004", "005", "006", "007", "008", "009"…
$ SITEID   <chr> "101", "101", "101", "101", "102", "102", "102", "101", "103"…
$ AGE      <int> 71, 27, 65, 49, 51, 60, 61, 55, 69, 29, 44, 58, 52, 64, 27
$ AGEU     <chr> "YEARS", "YEARS", "YEARS", "YEARS", "YEARS", "YEARS", "YEARS"…
$ SEX      <chr> "F", "M", "F", "F", "F", "M", "M", "F", "F", "F", "F", "F", "…
$ RACE     <chr> "WHITE", "BLACK OR AFRICAN AMERICAN", "BLACK OR AFRICAN AMERI…
$ ETHNIC   <chr> "NOT HISPANIC OR LATINO", "NOT HISPANIC OR LATINO", "NOT HISP…
$ ARM      <chr> "Placebo", "Active 20mg", "Active 20mg", "Placebo", "Placebo"…
$ ARMCD    <chr> "PBO", "ACT20", "ACT20", "PBO", "PBO", "ACT10", "PBO", "ACT10…
$ ACTARM   <chr> "Placebo", "Active 20mg", "Active 20mg", "Placebo", "Placebo"…
$ ACTARMCD <chr> "PBO", "ACT20", "ACT20", "PBO", "PBO", "ACT10", "PBO", "ACT10…
$ RFSTDTC  <chr> "2024-02-09", "2024-01-21", "2024-02-05", "2024-02-26", "2024…
$ RFENDTC  <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA
$ USUBJID  <chr> "CDISC01-101-001", "CDISC01-101-002", "CDISC01-101-003", "CDI…

5.3 Step 2: Create a Specification for xportr

DM specification:
# A tibble: 16 Γ— 5
   variable type      label                             length order
   <chr>    <chr>     <chr>                              <int> <int>
 1 STUDYID  character Study Identifier                      12     1
 2 DOMAIN   character Domain Abbreviation                    2     2
 3 USUBJID  character Unique Subject Identifier             40     3
 4 SUBJID   character Subject Identifier for the Study       8     4
 5 SITEID   character Study Site Identifier                  8     5
 6 AGE      numeric   Age                                    8     6
 7 AGEU     character Age Units                              6     7
 8 SEX      character Sex                                    2     8
 9 RACE     character Race                                  40     9
10 ETHNIC   character Ethnicity                             40    10
11 ARM      character Planned Arm                           40    11
12 ARMCD    character Planned Arm Code                      20    12
13 ACTARM   character Actual Arm                            40    13
14 ACTARMCD character Actual Arm Code                       20    14
15 RFSTDTC  character Subject Reference Start Date/Time     20    15
16 RFENDTC  character Subject Reference End Date/Time       20    16

5.4 Step 3: Apply xportr Functions

After xportr_type():
  AGE type: integer 
  STUDYID type: character 
After xportr_label():
  Variable                            Label
1  STUDYID                 Study Identifier
2   DOMAIN              Domain Abbreviation
3   SUBJID Subject Identifier for the Study
4   SITEID            Study Site Identifier
5      AGE                              Age
6     AGEU                        Age Units
7      SEX                              Sex
8     RACE                             Race
After xportr_order():
Variable order: STUDYID, DOMAIN, USUBJID, SUBJID, SITEID, AGE, AGEU, SEX, RACE, ETHNIC, ARM, ARMCD, ACTARM, ACTARMCD, RFSTDTC, RFENDTC 
After xportr_length():
All metadata applied βœ“

5.5 Step 4: Export as .xpt

Exported: output/dm.xpt 
File size: 7920 bytes
TipThe xportr Pipeline

In practice, you’d chain all xportr functions together:

dm_final <- raw_dm %>%
  xportr_type(spec, domain = "DM") %>%
  xportr_length(spec, domain = "DM") %>%
  xportr_label(spec, domain = "DM") %>%
  xportr_order(spec, domain = "DM") %>%
  xportr_format(spec, domain = "DM") %>%
  xportr_write("output/dm.xpt")

This single pipeline takes your raw dataset and makes it submission-ready!


6 Using metatools for Metadata-Based Checks

6.1 What metatools Provides

metatools helps you work with metadata - selecting variables, checking CT compliance, and building datasets from specs:

metatools: Key Functions
# A tibble: 5 Γ— 2
  Function             Purpose                                            
  <chr>                <chr>                                              
1 build_from_derived() Create a dataset shell from specification          
2 check_ct_col()       Check if column values match controlled terminology
3 check_variables()    Verify dataset variables match specification       
4 combine_supp()       Combine SUPP-- with parent domain                  
5 drop_unspec_vars()   Remove variables not in the specification          

7 End-to-End Capstone Pipeline

Let’s build a complete pipeline that takes raw data through to validated, exported SDTM:

=== WEEK 2 CAPSTONE: END-TO-END SDTM PIPELINE ===
Step 1: Generate raw clinical data
  Demographics: 20 subjects
  Adverse Events: 30 records
Step 2: Transform to SDTM format
  SDTM DM: 20 rows x 16 cols
  SDTM AE: 30 rows x 12 cols
Step 3: Validate SDTM domains
  Orphan AE records: 16 βœ— 
  DM required variables: All present βœ“ 
  AE required variables: All present βœ“ 
  SEX controlled terminology: Valid βœ“ 
  AESEV controlled terminology: Valid βœ“ 
  AE date logic (start <= end): Valid βœ“ 
Step 4: Apply metadata and export
  DM metadata applied:
    Variables: 16 
    Order: STUDYID β†’ DOMAIN β†’ USUBJID β†’ SUBJID β†’ SITEID β†’ AGE β†’ AGEU β†’ SEX β†’ RACE β†’ ETHNIC β†’ ARM β†’ ARMCD β†’ ACTARM β†’ ACTARMCD β†’ RFSTDTC β†’ RFENDTC 
    Labels applied: 16 of 16 
Step 5: Export as .xpt files
  βœ“ Exported: output/dm.xpt - 9520 bytes
  βœ“ Exported: output/ae.xpt - 25120 bytes

8 Week 2 Review: Everything You’ve Learned

=== WEEK 2 COMPLETE REVIEW ===
# A tibble: 7 Γ— 3
  Day   Topic                                Key_Skill                          
  <chr> <chr>                                <chr>                              
1 8     LB Domain & Unit Standardization     Unit conversion, reference ranges,…
2 9     VS & Repeated Measures               Multiple readings, VSPOS, VSTPT, w…
3 10    AE Domain Mastery & SAE Logic        Severity vs seriousness, TEAE, SAE…
4 11    Disposition & Trial Design           DS domain, EPOCH, TA/TV/TS, ADSL p…
5 12    Data Cuts with datacutr              Patient-level & record-level cuts,…
6 13    SDTM Validation with sdtmchecks      FDA business rules, cross-domain c…
7 14    Metadata-Driven SDTM (this capstone) metacore, metatools, xportr pipeli…

8.1 What You’re Now Ready For

ImportantPreparation Complete for Week 3: ADaM

You now have a solid foundation in SDTM:

In Week 3, we will use admiral to build ADaM datasets (ADSL, ADAE, ADVS, ADLB) from the SDTM data you’ve mastered.


9 Deliverable Summary

Today you completed the following:

Task Status
Understood metacore specification objects βœ“ Done
Created variable specifications for DM and AE βœ“ Done
Applied xportr_type, xportr_label, xportr_order, xportr_length βœ“ Done
Exported submission-ready .xpt files βœ“ Done
Built an end-to-end pipeline: raw β†’ SDTM β†’ validate β†’ export βœ“ Done
Reviewed all Week 2 topics βœ“ Done

10 Key Takeaways

  1. Metadata-driven is the future - Define once, apply everywhere
  2. metacore standardizes specs - One R object for all dataset/variable metadata
  3. xportr applies metadata - Types, labels, lengths, ordering, and export
  4. metatools enables checks - Verify your data matches the specification
  5. The pipeline is reproducible - Same spec + same code = same output every time
  6. You’re ready for ADaM - All SDTM fundamentals are in place

11 Resources

  • metacore Documentation - Official metacore package
  • metatools Documentation - Metadata utility functions
  • xportr Documentation - SAS transport export
  • Admiral Documentation - ADaM derivation package (Week 3!)
  • Pharmaverse.org - R packages for clinical data
  • FDA Data Standards Resources - FDA guidance

12 πŸŽ‰ Congratulations! Week 2 Complete!

You’ve now mastered:

  • Complex SDTM domains (LB, VS, AE, DS)
  • Production workflows (data cuts, validation, metadata)
  • Pharmaverse tools (datacutr, sdtmchecks, metacore, xportr)
  • Regulatory requirements (SAE logic, TEAE derivation, controlled terminology)

13 What’s Next?

Week 3: ADaM Datasets with Admiral

  • Using admiral to derive ADaM datasets from SDTM
  • Building ADSL (Subject-Level Analysis Dataset)
  • Creating BDS datasets: ADVS, ADLB
  • Deriving baseline, change from baseline, shift tables
  • ADAE creation with treatment-emergent logic

 

30 Days of Pharmaverse  Β·  Disclaimer  Β·  Indraneel Chakraborty  Β·  Β© 2026