Day 14: Week 2 Capstone - Metadata-Driven SDTM with metacore & xportr

Specification-Driven Workflows for Submission-Ready Data

← Back to Roadmap

1 Learning Objectives

By the end of Day 14 (Week 2 Capstone), you will be able to:

Load specification objects using metacore - the Pharmaverse metadata standard
Use metatools to apply metadata checks and select columns from specs
Use xportr to apply labels, types, formats, lengths, and variable ordering from specs
Build an end-to-end pipeline: raw data → SDTM → validate → export .xpt
Appreciate why metadata-driven workflows are the future of clinical programming
Understand everything needed before starting ADaM datasets in Week 3

2 Why Metadata-Driven Workflows?

2.1 The Problem with Manual Programming

In traditional clinical programming, programmers manually:

Assign variable labels (label(dm$USUBJID) <- "Unique Subject Identifier")
Set variable types (character vs numeric)
Order variables in the correct sequence
Set variable lengths for transport files

This is error-prone, tedious, and hard to maintain.

The Metadata-Driven Solution

Instead of hardcoding metadata in your programs, you:

Define metadata once in a specification file (Excel, or a metacore object)
Load metadata into R using metacore
Apply metadata to your datasets using xportr/metatools
Validate that your datasets match the specification

This ensures:

Consistency - All datasets follow the same rules
Reproducibility - Metadata changes automatically propagate
Compliance - Variable labels, types, and lengths match define.xml

3 Package Installation & Loading

3.1 Required Packages

Package	Purpose
`metacore`	Load and manage dataset specifications
`metatools`	Apply metadata-based transformations and checks
`xportr`	Apply labels, types, formats, lengths; export `.xpt`
`dplyr`	Data manipulation
`pharmaversesdtm`	Example SDTM datasets

if (!requireNamespace("dplyr", quietly = TRUE)) suppressMessages(install.packages("dplyr"))
if (!requireNamespace("lubridate", quietly = TRUE)) suppressMessages(install.packages("lubridate"))
if (!requireNamespace("pharmaversesdtm", quietly = TRUE)) suppressMessages(install.packages("pharmaversesdtm"))
if (!requireNamespace("metacore", quietly = TRUE)) suppressMessages(install.packages("metacore"))
if (!requireNamespace("metatools", quietly = TRUE)) suppressMessages(install.packages("metatools"))
if (!requireNamespace("xportr", quietly = TRUE)) suppressMessages(install.packages("xportr"))

library(dplyr)
library(lubridate)
library(pharmaversesdtm)
library(metacore)
library(metatools)
library(xportr)

4 Understanding metacore

4.1 What is metacore?

metacore is a Pharmaverse package that provides a standardized way to represent dataset specifications in R. Think of it as the bridge between your Excel specification file and your R programs.

4.2 The metacore Object Structure

A metacore object contains multiple related tables:

┌─────────────────────────────────────────────────────────────────────────┐
│                      METACORE OBJECT                                    │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                         │
│  ds_spec        = Dataset-level metadata                                │
│                   (domain name, label, structure)                       │
│                                                                         │
│  ds_vars        = Variable-level metadata                               │
│                   (which variables belong to which dataset)              │
│                                                                         │
│  var_spec       = Variable specifications                               │
│                   (variable name, label, type, length, format)          │
│                                                                         │
│  value_spec     = Value-level metadata                                  │
│                   (codelist values, decode values)                      │
│                                                                         │
│  derivations    = Derivation metadata                                   │
│                   (how variables are derived)                           │
│                                                                         │
│  codelist       = Code list definitions                                 │
│                   (controlled terminology)                              │
│                                                                         │
│  supp           = Supplemental qualifiers                               │
│                   (SUPP-- domain information)                           │
│                                                                         │
└─────────────────────────────────────────────────────────────────────────┘

4.3 Creating a metacore Object

In production, you’d load this from an Excel specification. For learning, let’s build one:

Metadata components created:

  Dataset specs: 3 datasets

  Variable assignments: 28 variable-dataset pairs

  Variable specs: 25 unique variables

# In production, you would create the metacore object like this:
mc <- metacore::metacore(
  ds_spec  = ds_spec,
  ds_vars  = ds_vars,
  var_spec = var_spec
)

# Or more commonly, load from a specification file:
mc <- metacore::spec_to_metacore("path/to/specs.xlsx")

spec_to_metacore()

The most common way to create a metacore object in production is using spec_to_metacore(), which reads from a formatted Excel specification file. This spec file is typically created by the study statistician or data standards team and contains all the metadata for every dataset and variable.

5 Using xportr: The Metadata Application Engine

5.1 What xportr Does

xportr is the workhorse for making your datasets submission-ready. It applies metadata from your specification to your actual data:

xportr: Key Functions

# A tibble: 6 × 2
  Function        What_It_Does                                            
  <chr>           <chr>                                                   
1 xportr_type()   Coerce variables to the correct type (character/numeric)
2 xportr_length() Set variable lengths for SAS transport                  
3 xportr_label()  Apply variable labels from specification                
4 xportr_order()  Reorder variables to match specification                
5 xportr_format() Apply SAS display formats                               
6 xportr_write()  Export the dataset as a .xpt (SAS transport) file

5.2 Applying xportr Step by Step

Let’s work through the full pipeline using the DM domain:

DM dataset before xportr:

Rows: 15

Cols: 16

Rows: 15
Columns: 16
$ STUDYID  <chr> "CDISC01", "CDISC01", "CDISC01", "CDISC01", "CDISC01", "CDISC…
$ DOMAIN   <chr> "DM", "DM", "DM", "DM", "DM", "DM", "DM", "DM", "DM", "DM", "…
$ SUBJID   <chr> "001", "002", "003", "004", "005", "006", "007", "008", "009"…
$ SITEID   <chr> "101", "101", "101", "101", "102", "102", "102", "101", "103"…
$ AGE      <int> 71, 27, 65, 49, 51, 60, 61, 55, 69, 29, 44, 58, 52, 64, 27
$ AGEU     <chr> "YEARS", "YEARS", "YEARS", "YEARS", "YEARS", "YEARS", "YEARS"…
$ SEX      <chr> "F", "M", "F", "F", "F", "M", "M", "F", "F", "F", "F", "F", "…
$ RACE     <chr> "WHITE", "BLACK OR AFRICAN AMERICAN", "BLACK OR AFRICAN AMERI…
$ ETHNIC   <chr> "NOT HISPANIC OR LATINO", "NOT HISPANIC OR LATINO", "NOT HISP…
$ ARM      <chr> "Placebo", "Active 20mg", "Active 20mg", "Placebo", "Placebo"…
$ ARMCD    <chr> "PBO", "ACT20", "ACT20", "PBO", "PBO", "ACT10", "PBO", "ACT10…
$ ACTARM   <chr> "Placebo", "Active 20mg", "Active 20mg", "Placebo", "Placebo"…
$ ACTARMCD <chr> "PBO", "ACT20", "ACT20", "PBO", "PBO", "ACT10", "PBO", "ACT10…
$ RFSTDTC  <chr> "2024-02-09", "2024-01-21", "2024-02-05", "2024-02-26", "2024…
$ RFENDTC  <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA
$ USUBJID  <chr> "CDISC01-101-001", "CDISC01-101-002", "CDISC01-101-003", "CDI…

5.3 Step 2: Create a Specification for xportr

DM specification:

# A tibble: 16 × 5
   variable type      label                             length order
   <chr>    <chr>     <chr>                              <int> <int>
 1 STUDYID  character Study Identifier                      12     1
 2 DOMAIN   character Domain Abbreviation                    2     2
 3 USUBJID  character Unique Subject Identifier             40     3
 4 SUBJID   character Subject Identifier for the Study       8     4
 5 SITEID   character Study Site Identifier                  8     5
 6 AGE      numeric   Age                                    8     6
 7 AGEU     character Age Units                              6     7
 8 SEX      character Sex                                    2     8
 9 RACE     character Race                                  40     9
10 ETHNIC   character Ethnicity                             40    10
11 ARM      character Planned Arm                           40    11
12 ARMCD    character Planned Arm Code                      20    12
13 ACTARM   character Actual Arm                            40    13
14 ACTARMCD character Actual Arm Code                       20    14
15 RFSTDTC  character Subject Reference Start Date/Time     20    15
16 RFENDTC  character Subject Reference End Date/Time       20    16

5.4 Step 3: Apply xportr Functions

After xportr_type():

  AGE type: integer

  STUDYID type: character

After xportr_label():

  Variable                            Label
1  STUDYID                 Study Identifier
2   DOMAIN              Domain Abbreviation
3   SUBJID Subject Identifier for the Study
4   SITEID            Study Site Identifier
5      AGE                              Age
6     AGEU                        Age Units
7      SEX                              Sex
8     RACE                             Race

After xportr_order():

Variable order: STUDYID, DOMAIN, USUBJID, SUBJID, SITEID, AGE, AGEU, SEX, RACE, ETHNIC, ARM, ARMCD, ACTARM, ACTARMCD, RFSTDTC, RFENDTC

After xportr_length():

All metadata applied ✓

5.5 Step 4: Export as .xpt

Exported: output/dm.xpt

File size: 7920 bytes

The xportr Pipeline

In practice, you’d chain all xportr functions together:

dm_final <- raw_dm %>%
  xportr_type(spec, domain = "DM") %>%
  xportr_length(spec, domain = "DM") %>%
  xportr_label(spec, domain = "DM") %>%
  xportr_order(spec, domain = "DM") %>%
  xportr_format(spec, domain = "DM") %>%
  xportr_write("output/dm.xpt")

This single pipeline takes your raw dataset and makes it submission-ready!

6 Using metatools for Metadata-Based Checks

6.1 What metatools Provides

metatools helps you work with metadata - selecting variables, checking CT compliance, and building datasets from specs:

metatools: Key Functions

# A tibble: 5 × 2
  Function             Purpose                                            
  <chr>                <chr>                                              
1 build_from_derived() Create a dataset shell from specification          
2 check_ct_col()       Check if column values match controlled terminology
3 check_variables()    Verify dataset variables match specification       
4 combine_supp()       Combine SUPP-- with parent domain                  
5 drop_unspec_vars()   Remove variables not in the specification

7 End-to-End Capstone Pipeline

Let’s build a complete pipeline that takes raw data through to validated, exported SDTM:

=== WEEK 2 CAPSTONE: END-TO-END SDTM PIPELINE ===

Step 1: Generate raw clinical data

  Demographics: 20 subjects

  Adverse Events: 30 records

Step 2: Transform to SDTM format

  SDTM DM: 20 rows x 16 cols

  SDTM AE: 30 rows x 12 cols

Step 3: Validate SDTM domains

  Orphan AE records: 16 ✗

  DM required variables: All present ✓

  AE required variables: All present ✓

  SEX controlled terminology: Valid ✓

  AESEV controlled terminology: Valid ✓

  AE date logic (start <= end): Valid ✓

Step 4: Apply metadata and export

  DM metadata applied:

    Variables: 16

    Order: STUDYID → DOMAIN → USUBJID → SUBJID → SITEID → AGE → AGEU → SEX → RACE → ETHNIC → ARM → ARMCD → ACTARM → ACTARMCD → RFSTDTC → RFENDTC

    Labels applied: 16 of 16

Step 5: Export as .xpt files

  ✓ Exported: output/dm.xpt - 9520 bytes

  ✓ Exported: output/ae.xpt - 25120 bytes

8 Week 2 Review: Everything You’ve Learned

=== WEEK 2 COMPLETE REVIEW ===

# A tibble: 7 × 3
  Day   Topic                                Key_Skill                          
  <chr> <chr>                                <chr>                              
1 8     LB Domain & Unit Standardization     Unit conversion, reference ranges,…
2 9     VS & Repeated Measures               Multiple readings, VSPOS, VSTPT, w…
3 10    AE Domain Mastery & SAE Logic        Severity vs seriousness, TEAE, SAE…
4 11    Disposition & Trial Design           DS domain, EPOCH, TA/TV/TS, ADSL p…
5 12    Data Cuts with datacutr              Patient-level & record-level cuts,…
6 13    SDTM Validation with sdtmchecks      FDA business rules, cross-domain c…
7 14    Metadata-Driven SDTM (this capstone) metacore, metatools, xportr pipeli…

8.1 What You’re Now Ready For

Preparation Complete for Week 3: ADaM

You now have a solid foundation in SDTM:

You can build any SDTM domain from raw data
You understand Controlled Terminology and date derivations
You can handle complex domains (LB, VS, AE, DS)
You know how to validate SDTM data
You can apply metadata and export submission-ready .xpt files
You understand how SDTM maps to ADaM

In Week 3, we will use admiral to build ADaM datasets (ADSL, ADAE, ADVS, ADLB) from the SDTM data you’ve mastered.

9 Deliverable Summary

Today you completed the following:

Task	Status
Understood metacore specification objects	✓ Done
Created variable specifications for DM and AE	✓ Done
Applied xportr_type, xportr_label, xportr_order, xportr_length	✓ Done
Exported submission-ready .xpt files	✓ Done
Built an end-to-end pipeline: raw → SDTM → validate → export	✓ Done
Reviewed all Week 2 topics	✓ Done

10 Key Takeaways

Metadata-driven is the future - Define once, apply everywhere
metacore standardizes specs - One R object for all dataset/variable metadata
xportr applies metadata - Types, labels, lengths, ordering, and export
metatools enables checks - Verify your data matches the specification
The pipeline is reproducible - Same spec + same code = same output every time
You’re ready for ADaM - All SDTM fundamentals are in place

11 Resources

metacore Documentation - Official metacore package
metatools Documentation - Metadata utility functions
xportr Documentation - SAS transport export
Admiral Documentation - ADaM derivation package (Week 3!)
Pharmaverse.org - R packages for clinical data
FDA Data Standards Resources - FDA guidance

12 🎉 Congratulations! Week 2 Complete!

You’ve now mastered:

Complex SDTM domains (LB, VS, AE, DS)
Production workflows (data cuts, validation, metadata)
Pharmaverse tools (datacutr, sdtmchecks, metacore, xportr)
Regulatory requirements (SAE logic, TEAE derivation, controlled terminology)

13 What’s Next?

Week 3: ADaM Datasets with Admiral

Using admiral to derive ADaM datasets from SDTM
Building ADSL (Subject-Level Analysis Dataset)
Creating BDS datasets: ADVS, ADLB
Deriving baseline, change from baseline, shift tables
ADAE creation with treatment-emergent logic