Day 6: Introduction to sdtm.oak

EDC-to-SDTM Transformation Patterns

1 Learning Objectives

By the end of Day 6, you will be able to:

Understand the philosophy of sdtm.oak for SDTM generation
Simulate “Raw” EDC datasets for both VS and LB
Apply algorithm-based transformations (mapping, pivoting, hardcoding)
Perform unit standardization (e.g., mg/dL to mmol/L)
Create a complete SDTM LB domain with sequence numbers

2 Introduction

2.1 What is `sdtm.oak`?

sdtm.oak is a package from the pharmaverse that helps you create SDTM datasets in a standardized, repeatable way. Instead of writing custom code for every study, you use a set of reusable algorithms and rules. This makes your code easier to maintain, test, and share with others.

2.1.1 Key concepts:

Algorithm-based: You use small, well-defined steps (algorithms) to transform your data. For example, you might have an algorithm to assign subject IDs, another to map test codes, and another to standardize units.
Metadata-driven: The rules for how to transform the data are defined in a specification (metadata), not hard-coded in your script. This means you can update the rules without rewriting your code.
Modular: Each transformation is a small, testable function. You can chain them together to build complex workflows.

2.1.2 Why is this important?

This approach saves time, reduces errors, and makes it easier to follow CDISC standards. It also helps new programmers understand what each step is doing, because the code is organized and well-documented.

2.2 The sdtm.oak Philosophy

Here are some common algorithms used in SDTM transformations:

Algorithm	Description	Example Use
Assign	Copy source to target	Subject ID → USUBJID
Hardcode	Set a constant value	DOMAIN = “LB”
Condition	Apply logic based on condition	If severity >= 3 then “Y”
Assign CT	Map to Controlled Terminology	“Male” → “M”

Each algorithm does one thing, and you can combine them to build your SDTM domains step by step.

Note: We’ll simulate sdtm.oak patterns using dplyr/tidyr to understand the concepts.

3 Package Loading

library(dplyr)
library(tidyr)
library(tibble)
library(lubridate)

4 Part 1: Simulating Raw EDC Data

4.1 Raw Vital Signs Data

First, let’s create data that looks like it came from an Electronic Data Capture (EDC) system like Rave or Veeva.

# Simulated Raw Vital Signs (EDC export format)
raw_vs <- tribble(
  ~SubjectID,  ~Site, ~Visit,       ~Date,        ~SysBP, ~DiaBP, ~Pulse, ~Temp_C,
  "001",       "101", "Screening",  "2024-01-01", 120,    80,     72,     36.5,
  "001",       "101", "Baseline",   "2024-01-15", 118,    78,     70,     36.8,
  "001",       "101", "Week 4",     "2024-02-12", 115,    76,     68,     36.6,
  "002",       "101", "Screening",  "2024-01-02", 130,    85,     88,     37.0,
  "002",       "101", "Baseline",   "2024-01-16", 128,    82,     85,     36.7,
  "003",       "102", "Screening",  "2024-01-03", 145,    92,     95,     36.9,
  "003",       "102", "Baseline",   "2024-01-17", 140,    88,     90,     36.5
)

print(raw_vs)

# A tibble: 7 × 8
  SubjectID Site  Visit     Date       SysBP DiaBP Pulse Temp_C
  <chr>     <chr> <chr>     <chr>      <dbl> <dbl> <dbl>  <dbl>
1 001       101   Screening 2024-01-01   120    80    72   36.5
2 001       101   Baseline  2024-01-15   118    78    70   36.8
3 001       101   Week 4    2024-02-12   115    76    68   36.6
4 002       101   Screening 2024-01-02   130    85    88   37  
5 002       101   Baseline  2024-01-16   128    82    85   36.7
6 003       102   Screening 2024-01-03   145    92    95   36.9
7 003       102   Baseline  2024-01-17   140    88    90   36.5

4.2 Raw Laboratory Data

Now let’s create lab data with values in different units that need standardization.

# Simulated Raw Lab Data (with different units per site)
raw_lb <- tribble(
  ~SubjectID,  ~Site, ~Visit,      ~Date,        ~Test,        ~Result, ~Unit,
  # Site 101: Uses US units
  "001",       "101", "Screening", "2024-01-01", "Glucose",    95,      "mg/dL",
  "001",       "101", "Screening", "2024-01-01", "Cholesterol", 180,    "mg/dL",
  "001",       "101", "Screening", "2024-01-01", "ALT",        25,      "U/L",
  "001",       "101", "Baseline",  "2024-01-15", "Glucose",    92,      "mg/dL",
  "001",       "101", "Baseline",  "2024-01-15", "Cholesterol", 175,    "mg/dL",
  "001",       "101", "Baseline",  "2024-01-15", "ALT",        28,      "U/L",
  "002",       "101", "Screening", "2024-01-02", "Glucose",    110,     "mg/dL",
  "002",       "101", "Screening", "2024-01-02", "Cholesterol", 220,    "mg/dL",
  "002",       "101", "Screening", "2024-01-02", "ALT",        45,      "U/L",
  # Site 102: Uses SI units
  "003",       "102", "Screening", "2024-01-03", "Glucose",    5.5,     "mmol/L",
  "003",       "102", "Screening", "2024-01-03", "Cholesterol", 4.8,    "mmol/L",
  "003",       "102", "Screening", "2024-01-03", "ALT",        30,      "U/L",
  "003",       "102", "Baseline",  "2024-01-17", "Glucose",    5.2,     "mmol/L",
  "003",       "102", "Baseline",  "2024-01-17", "Cholesterol", 4.5,    "mmol/L",
  "003",       "102", "Baseline",  "2024-01-17", "ALT",        32,      "U/L"
)

print(raw_lb)

# A tibble: 15 × 7
   SubjectID Site  Visit     Date       Test        Result Unit  
   <chr>     <chr> <chr>     <chr>      <chr>        <dbl> <chr> 
 1 001       101   Screening 2024-01-01 Glucose       95   mg/dL 
 2 001       101   Screening 2024-01-01 Cholesterol  180   mg/dL 
 3 001       101   Screening 2024-01-01 ALT           25   U/L   
 4 001       101   Baseline  2024-01-15 Glucose       92   mg/dL 
 5 001       101   Baseline  2024-01-15 Cholesterol  175   mg/dL 
 6 001       101   Baseline  2024-01-15 ALT           28   U/L   
 7 002       101   Screening 2024-01-02 Glucose      110   mg/dL 
 8 002       101   Screening 2024-01-02 Cholesterol  220   mg/dL 
 9 002       101   Screening 2024-01-02 ALT           45   U/L   
10 003       102   Screening 2024-01-03 Glucose        5.5 mmol/L
11 003       102   Screening 2024-01-03 Cholesterol    4.8 mmol/L
12 003       102   Screening 2024-01-03 ALT           30   U/L   
13 003       102   Baseline  2024-01-17 Glucose        5.2 mmol/L
14 003       102   Baseline  2024-01-17 Cholesterol    4.5 mmol/L
15 003       102   Baseline  2024-01-17 ALT           32   U/L

5 Part 2: Creating SDTM VS Domain

5.1 Step 1: Hardcode Standard Variables

# Algorithm: Hardcode
vs_step1 <- raw_vs %>%
  mutate(
    STUDYID = "DEMO-001",
    DOMAIN = "VS",
    # Algorithm: Assign (concatenate)
    USUBJID = paste(STUDYID, Site, SubjectID, sep = "-")
  )

head(vs_step1)

# A tibble: 6 × 11
  SubjectID Site  Visit    Date  SysBP DiaBP Pulse Temp_C STUDYID DOMAIN USUBJID
  <chr>     <chr> <chr>    <chr> <dbl> <dbl> <dbl>  <dbl> <chr>   <chr>  <chr>  
1 001       101   Screeni… 2024…   120    80    72   36.5 DEMO-0… VS     DEMO-0…
2 001       101   Baseline 2024…   118    78    70   36.8 DEMO-0… VS     DEMO-0…
3 001       101   Week 4   2024…   115    76    68   36.6 DEMO-0… VS     DEMO-0…
4 002       101   Screeni… 2024…   130    85    88   37   DEMO-0… VS     DEMO-0…
5 002       101   Baseline 2024…   128    82    85   36.7 DEMO-0… VS     DEMO-0…
6 003       102   Screeni… 2024…   145    92    95   36.9 DEMO-0… VS     DEMO-0…

5.2 Step 2: Pivot to Long Format (Findings Algorithm)

SDTM Findings domains are long (one row per test per visit).

# Algorithm: Transpose/Pivot
vs_step2 <- vs_step1 %>%
  pivot_longer(
    cols = c(SysBP, DiaBP, Pulse, Temp_C),
    names_to = "RAW_TEST",
    values_to = "VSORRES_NUM"
  )

head(vs_step2, 10)

# A tibble: 10 × 9
   SubjectID Site  Visit     Date    STUDYID DOMAIN USUBJID RAW_TEST VSORRES_NUM
   <chr>     <chr> <chr>     <chr>   <chr>   <chr>  <chr>   <chr>          <dbl>
 1 001       101   Screening 2024-0… DEMO-0… VS     DEMO-0… SysBP          120  
 2 001       101   Screening 2024-0… DEMO-0… VS     DEMO-0… DiaBP           80  
 3 001       101   Screening 2024-0… DEMO-0… VS     DEMO-0… Pulse           72  
 4 001       101   Screening 2024-0… DEMO-0… VS     DEMO-0… Temp_C          36.5
 5 001       101   Baseline  2024-0… DEMO-0… VS     DEMO-0… SysBP          118  
 6 001       101   Baseline  2024-0… DEMO-0… VS     DEMO-0… DiaBP           78  
 7 001       101   Baseline  2024-0… DEMO-0… VS     DEMO-0… Pulse           70  
 8 001       101   Baseline  2024-0… DEMO-0… VS     DEMO-0… Temp_C          36.8
 9 001       101   Week 4    2024-0… DEMO-0… VS     DEMO-0… SysBP          115  
10 001       101   Week 4    2024-0… DEMO-0… VS     DEMO-0… DiaBP           76

5.3 Step 3: Map to Controlled Terminology

# Algorithm: Assign CT (Map test codes)
vs_step3 <- vs_step2 %>%
  mutate(
    # Controlled Terminology mapping
    VSTESTCD = case_when(
      RAW_TEST == "SysBP"  ~ "SYSBP",
      RAW_TEST == "DiaBP"  ~ "DIABP",
      RAW_TEST == "Pulse"  ~ "PULSE",
      RAW_TEST == "Temp_C" ~ "TEMP"
    ),
    VSTEST = case_when(
      VSTESTCD == "SYSBP" ~ "Systolic Blood Pressure",
      VSTESTCD == "DIABP" ~ "Diastolic Blood Pressure",
      VSTESTCD == "PULSE" ~ "Pulse Rate",
      VSTESTCD == "TEMP"  ~ "Temperature"
    ),
    VSORRESU = case_when(
      VSTESTCD %in% c("SYSBP", "DIABP") ~ "mmHg",
      VSTESTCD == "PULSE" ~ "BEATS/MIN",
      VSTESTCD == "TEMP"  ~ "C"
    )
  )

head(vs_step3)

# A tibble: 6 × 12
  SubjectID Site  Visit     Date     STUDYID DOMAIN USUBJID RAW_TEST VSORRES_NUM
  <chr>     <chr> <chr>     <chr>    <chr>   <chr>  <chr>   <chr>          <dbl>
1 001       101   Screening 2024-01… DEMO-0… VS     DEMO-0… SysBP          120  
2 001       101   Screening 2024-01… DEMO-0… VS     DEMO-0… DiaBP           80  
3 001       101   Screening 2024-01… DEMO-0… VS     DEMO-0… Pulse           72  
4 001       101   Screening 2024-01… DEMO-0… VS     DEMO-0… Temp_C          36.5
5 001       101   Baseline  2024-01… DEMO-0… VS     DEMO-0… SysBP          118  
6 001       101   Baseline  2024-01… DEMO-0… VS     DEMO-0… DiaBP           78  
# ℹ 3 more variables: VSTESTCD <chr>, VSTEST <chr>, VSORRESU <chr>

5.4 Step 4: Add Sequence Number

Every SDTM record needs a unique sequence number (--SEQ) within subject.

# Algorithm: Derive sequence
vs_step4 <- vs_step3 %>%
  arrange(USUBJID, Date, VSTESTCD) %>%
  group_by(USUBJID) %>%
  mutate(VSSEQ = row_number()) %>%
  ungroup()

vs_step4 %>%
  select(USUBJID, VSSEQ, VSTESTCD, Visit) %>%
  head(10)

# A tibble: 10 × 4
   USUBJID          VSSEQ VSTESTCD Visit    
   <chr>            <int> <chr>    <chr>    
 1 DEMO-001-101-001     1 DIABP    Screening
 2 DEMO-001-101-001     2 PULSE    Screening
 3 DEMO-001-101-001     3 SYSBP    Screening
 4 DEMO-001-101-001     4 TEMP     Screening
 5 DEMO-001-101-001     5 DIABP    Baseline 
 6 DEMO-001-101-001     6 PULSE    Baseline 
 7 DEMO-001-101-001     7 SYSBP    Baseline 
 8 DEMO-001-101-001     8 TEMP     Baseline 
 9 DEMO-001-101-001     9 DIABP    Week 4   
10 DEMO-001-101-001    10 PULSE    Week 4

5.5 Step 5: Final SDTM VS Domain

sdtm_vs <- vs_step4 %>%
  mutate(
    VSORRES = as.character(VSORRES_NUM),
    VSSTRESN = VSORRES_NUM,
    VSSTRESU = VSORRESU,
    VSDTC = Date,
    VISIT = Visit
  ) %>%
  select(
    STUDYID, DOMAIN, USUBJID, VSSEQ, VSTESTCD, VSTEST,
    VSORRES, VSORRESU, VSSTRESN, VSSTRESU, VSDTC, VISIT
  )

cat("SDTM VS Domain:\n")

SDTM VS Domain:

cat("Records:", nrow(sdtm_vs), "\n\n")

Records: 28

head(sdtm_vs, 10)

# A tibble: 10 × 12
   STUDYID  DOMAIN USUBJID       VSSEQ VSTESTCD VSTEST VSORRES VSORRESU VSSTRESN
   <chr>    <chr>  <chr>         <int> <chr>    <chr>  <chr>   <chr>       <dbl>
 1 DEMO-001 VS     DEMO-001-101…     1 DIABP    Diast… 80      mmHg         80  
 2 DEMO-001 VS     DEMO-001-101…     2 PULSE    Pulse… 72      BEATS/M…     72  
 3 DEMO-001 VS     DEMO-001-101…     3 SYSBP    Systo… 120     mmHg        120  
 4 DEMO-001 VS     DEMO-001-101…     4 TEMP     Tempe… 36.5    C            36.5
 5 DEMO-001 VS     DEMO-001-101…     5 DIABP    Diast… 78      mmHg         78  
 6 DEMO-001 VS     DEMO-001-101…     6 PULSE    Pulse… 70      BEATS/M…     70  
 7 DEMO-001 VS     DEMO-001-101…     7 SYSBP    Systo… 118     mmHg        118  
 8 DEMO-001 VS     DEMO-001-101…     8 TEMP     Tempe… 36.8    C            36.8
 9 DEMO-001 VS     DEMO-001-101…     9 DIABP    Diast… 76      mmHg         76  
10 DEMO-001 VS     DEMO-001-101…    10 PULSE    Pulse… 68      BEATS/M…     68  
# ℹ 3 more variables: VSSTRESU <chr>, VSDTC <chr>, VISIT <chr>

6 Part 3: Creating SDTM LB Domain with Unit Standardization

The LB domain is more complex because we need to standardize units across sites.

6.1 Unit Conversion Reference

Test	Original Unit	Standard Unit	Conversion Factor
Glucose	mg/dL	mmol/L	÷ 18.0182
Cholesterol	mg/dL	mmol/L	÷ 38.67
ALT	U/L	U/L	None (already SI)

6.2 Step 1: Initial Mapping

lb_step1 <- raw_lb %>%
  mutate(
    STUDYID = "DEMO-001",
    DOMAIN = "LB",
    USUBJID = paste(STUDYID, Site, SubjectID, sep = "-"),
    
    # Map test codes
    LBTESTCD = case_when(
      Test == "Glucose"     ~ "GLUC",
      Test == "Cholesterol" ~ "CHOL",
      Test == "ALT"         ~ "ALT"
    ),
    LBTEST = case_when(
      LBTESTCD == "GLUC" ~ "Glucose",
      LBTESTCD == "CHOL" ~ "Cholesterol",
      LBTESTCD == "ALT"  ~ "Alanine Aminotransferase"
    ),
    
    # Original results
    LBORRES = as.character(Result),
    LBORRESU = Unit
  )

head(lb_step1)

# A tibble: 6 × 14
  SubjectID Site  Visit Date  Test  Result Unit  STUDYID DOMAIN USUBJID LBTESTCD
  <chr>     <chr> <chr> <chr> <chr>  <dbl> <chr> <chr>   <chr>  <chr>   <chr>   
1 001       101   Scre… 2024… Gluc…     95 mg/dL DEMO-0… LB     DEMO-0… GLUC    
2 001       101   Scre… 2024… Chol…    180 mg/dL DEMO-0… LB     DEMO-0… CHOL    
3 001       101   Scre… 2024… ALT       25 U/L   DEMO-0… LB     DEMO-0… ALT     
4 001       101   Base… 2024… Gluc…     92 mg/dL DEMO-0… LB     DEMO-0… GLUC    
5 001       101   Base… 2024… Chol…    175 mg/dL DEMO-0… LB     DEMO-0… CHOL    
6 001       101   Base… 2024… ALT       28 U/L   DEMO-0… LB     DEMO-0… ALT     
# ℹ 3 more variables: LBTEST <chr>, LBORRES <chr>, LBORRESU <chr>

6.3 Step 2: Unit Standardization

This is where the real work happens - converting all values to standard SI units.

lb_step2 <- lb_step1 %>%
  mutate(
    # Standard unit is SI (mmol/L for glucose/cholesterol, U/L for ALT)
    LBSTRESU = case_when(
      LBTESTCD %in% c("GLUC", "CHOL") ~ "mmol/L",
      LBTESTCD == "ALT" ~ "U/L"
    ),
    
    # Convert to standard units
    LBSTRESN = case_when(
      # Glucose: mg/dL to mmol/L
      LBTESTCD == "GLUC" & LBORRESU == "mg/dL" ~ round(Result / 18.0182, 2),
      LBTESTCD == "GLUC" & LBORRESU == "mmol/L" ~ Result,
      
      # Cholesterol: mg/dL to mmol/L
      LBTESTCD == "CHOL" & LBORRESU == "mg/dL" ~ round(Result / 38.67, 2),
      LBTESTCD == "CHOL" & LBORRESU == "mmol/L" ~ Result,
      
      # ALT: Already in U/L
      LBTESTCD == "ALT" ~ Result
    ),
    
    # Character version of standardized result
    LBSTRESC = as.character(LBSTRESN)
  )

# Show the conversion
lb_step2 %>%
  select(USUBJID, LBTESTCD, LBORRES, LBORRESU, LBSTRESN, LBSTRESU) %>%
  head(10)

# A tibble: 10 × 6
   USUBJID          LBTESTCD LBORRES LBORRESU LBSTRESN LBSTRESU
   <chr>            <chr>    <chr>   <chr>       <dbl> <chr>   
 1 DEMO-001-101-001 GLUC     95      mg/dL        5.27 mmol/L  
 2 DEMO-001-101-001 CHOL     180     mg/dL        4.65 mmol/L  
 3 DEMO-001-101-001 ALT      25      U/L         25    U/L     
 4 DEMO-001-101-001 GLUC     92      mg/dL        5.11 mmol/L  
 5 DEMO-001-101-001 CHOL     175     mg/dL        4.53 mmol/L  
 6 DEMO-001-101-001 ALT      28      U/L         28    U/L     
 7 DEMO-001-101-002 GLUC     110     mg/dL        6.1  mmol/L  
 8 DEMO-001-101-002 CHOL     220     mg/dL        5.69 mmol/L  
 9 DEMO-001-101-002 ALT      45      U/L         45    U/L     
10 DEMO-001-102-003 GLUC     5.5     mmol/L       5.5  mmol/L

Unit Standardization in Practice

Notice how subjects from Site 101 (US units) and Site 102 (SI units) now have comparable values in LBSTRESN. This is essential for cross-site analysis!

6.4 Step 3: Add Sequence and Reference Ranges

lb_step3 <- lb_step2 %>%
  arrange(USUBJID, Date, LBTESTCD) %>%
  group_by(USUBJID) %>%
  mutate(LBSEQ = row_number()) %>%
  ungroup() %>%
  # Add reference ranges (in standard units)
  mutate(
    LBSTNRLO = case_when(
      LBTESTCD == "GLUC" ~ 3.9,
      LBTESTCD == "CHOL" ~ 0.0,
      LBTESTCD == "ALT"  ~ 7.0
    ),
    LBSTNRHI = case_when(
      LBTESTCD == "GLUC" ~ 5.6,
      LBTESTCD == "CHOL" ~ 5.2,
      LBTESTCD == "ALT"  ~ 56.0
    ),
    # Normal range indicator
    LBNRIND = case_when(
      LBSTRESN < LBSTNRLO ~ "LOW",
      LBSTRESN > LBSTNRHI ~ "HIGH",
      TRUE ~ "NORMAL"
    )
  )

lb_step3 %>%
  select(USUBJID, LBTESTCD, LBSTRESN, LBSTNRLO, LBSTNRHI, LBNRIND) %>%
  head(10)

# A tibble: 10 × 6
   USUBJID          LBTESTCD LBSTRESN LBSTNRLO LBSTNRHI LBNRIND
   <chr>            <chr>       <dbl>    <dbl>    <dbl> <chr>  
 1 DEMO-001-101-001 ALT         25         7       56   NORMAL 
 2 DEMO-001-101-001 CHOL         4.65      0        5.2 NORMAL 
 3 DEMO-001-101-001 GLUC         5.27      3.9      5.6 NORMAL 
 4 DEMO-001-101-001 ALT         28         7       56   NORMAL 
 5 DEMO-001-101-001 CHOL         4.53      0        5.2 NORMAL 
 6 DEMO-001-101-001 GLUC         5.11      3.9      5.6 NORMAL 
 7 DEMO-001-101-002 ALT         45         7       56   NORMAL 
 8 DEMO-001-101-002 CHOL         5.69      0        5.2 HIGH   
 9 DEMO-001-101-002 GLUC         6.1       3.9      5.6 HIGH   
10 DEMO-001-102-003 ALT         30         7       56   NORMAL

6.5 Step 4: Final SDTM LB Domain

sdtm_lb <- lb_step3 %>%
  mutate(
    LBDTC = Date,
    VISIT = Visit
  ) %>%
  select(
    STUDYID, DOMAIN, USUBJID, LBSEQ, LBTESTCD, LBTEST,
    LBORRES, LBORRESU, LBSTRESC, LBSTRESN, LBSTRESU,
    LBSTNRLO, LBSTNRHI, LBNRIND, LBDTC, VISIT
  )

cat("SDTM LB Domain:\n")

SDTM LB Domain:

cat("Records:", nrow(sdtm_lb), "\n")

Records: 15

cat("Variables:", ncol(sdtm_lb), "\n\n")

Variables: 16

head(sdtm_lb, 10)

# A tibble: 10 × 16
   STUDYID  DOMAIN USUBJID       LBSEQ LBTESTCD LBTEST LBORRES LBORRESU LBSTRESC
   <chr>    <chr>  <chr>         <int> <chr>    <chr>  <chr>   <chr>    <chr>   
 1 DEMO-001 LB     DEMO-001-101…     1 ALT      Alani… 25      U/L      25      
 2 DEMO-001 LB     DEMO-001-101…     2 CHOL     Chole… 180     mg/dL    4.65    
 3 DEMO-001 LB     DEMO-001-101…     3 GLUC     Gluco… 95      mg/dL    5.27    
 4 DEMO-001 LB     DEMO-001-101…     4 ALT      Alani… 28      U/L      28      
 5 DEMO-001 LB     DEMO-001-101…     5 CHOL     Chole… 175     mg/dL    4.53    
 6 DEMO-001 LB     DEMO-001-101…     6 GLUC     Gluco… 92      mg/dL    5.11    
 7 DEMO-001 LB     DEMO-001-101…     1 ALT      Alani… 45      U/L      45      
 8 DEMO-001 LB     DEMO-001-101…     2 CHOL     Chole… 220     mg/dL    5.69    
 9 DEMO-001 LB     DEMO-001-101…     3 GLUC     Gluco… 110     mg/dL    6.1     
10 DEMO-001 LB     DEMO-001-102…     1 ALT      Alani… 30      U/L      30      
# ℹ 7 more variables: LBSTRESN <dbl>, LBSTRESU <chr>, LBSTNRLO <dbl>,
#   LBSTNRHI <dbl>, LBNRIND <chr>, LBDTC <chr>, VISIT <chr>

6.6 Summary Statistics

sdtm_lb %>%
  group_by(LBTESTCD, LBTEST) %>%
  summarise(
    N = n(),
    Mean = round(mean(LBSTRESN, na.rm = TRUE), 2),
    SD = round(sd(LBSTRESN, na.rm = TRUE), 2),
    Low = sum(LBNRIND == "LOW"),
    Normal = sum(LBNRIND == "NORMAL"),
    High = sum(LBNRIND == "HIGH"),
    .groups = "drop"
  )

# A tibble: 3 × 8
  LBTESTCD LBTEST                       N  Mean    SD   Low Normal  High
  <chr>    <chr>                    <int> <dbl> <dbl> <int>  <int> <int>
1 ALT      Alanine Aminotransferase     5 32     7.71     0      5     0
2 CHOL     Cholesterol                  5  4.83  0.49     0      4     1
3 GLUC     Glucose                      5  5.44  0.4      0      4     1

7 🎯 Practice Exercise

7.1 Your Turn: Add LOINC Codes

LOINC codes are standard lab test identifiers. Add LBLOINC to the LB domain.

# LOINC Reference
loinc_lookup <- tribble(
  ~LBTESTCD, ~LBLOINC,
  "GLUC",    "2345-7",
  "CHOL",    "2093-3",
  "ALT",     "1742-6"
)

# TODO: Join the LOINC codes to sdtm_lb
sdtm_lb_loinc <- sdtm_lb %>%
  # Your code here...

head(sdtm_lb_loinc)

8 Deliverable Summary

Today you completed the following:

Task	Status
Created simulated Raw VS and LB data	✓ Done
Built SDTM VS with sequence numbers	✓ Done
Performed unit standardization (mg/dL → mmol/L)	✓ Done
Added reference ranges and normal indicators	✓ Done
Created complete SDTM LB domain	✓ Done

9 Key Takeaways

sdtm.oak Philosophy: Algorithm-based, modular, traceable.
Pivoting: Raw data is wide; SDTM Findings are long.
Unit Standardization: Critical for multi-site studies.
Sequence Numbers: Every record needs a unique --SEQ.
Reference Ranges: LBSTNRLO, LBSTNRHI, LBNRIND are key for flagging abnormals.

10 Resources

11 What’s Next?

In Day 7, we will complete the Week 1 Capstone:

Build DM, AE, EX domains from scratch (20+ subjects)
Apply all concepts learned this week
Export to submission-ready .xpt files

1 Learning Objectives

2 Introduction

2.1 What is sdtm.oak?

2.1.1 Key concepts:

2.1.2 Why is this important?

2.2 The sdtm.oak Philosophy

3 Package Loading

4 Part 1: Simulating Raw EDC Data

4.1 Raw Vital Signs Data

4.2 Raw Laboratory Data

5 Part 2: Creating SDTM VS Domain

5.1 Step 1: Hardcode Standard Variables

5.2 Step 2: Pivot to Long Format (Findings Algorithm)

5.3 Step 3: Map to Controlled Terminology

5.4 Step 4: Add Sequence Number

5.5 Step 5: Final SDTM VS Domain

6 Part 3: Creating SDTM LB Domain with Unit Standardization

6.1 Unit Conversion Reference

6.2 Step 1: Initial Mapping

6.3 Step 2: Unit Standardization

6.4 Step 3: Add Sequence and Reference Ranges

6.5 Step 4: Final SDTM LB Domain

6.6 Summary Statistics

7 🎯 Practice Exercise

7.1 Your Turn: Add LOINC Codes

8 Deliverable Summary

9 Key Takeaways

10 Resources

11 What’s Next?

2.1 What is `sdtm.oak`?