30 Days of Pharmaverse
  • Week 1: SDTM Fundamentals
  • Week 2: Production SDTM
  • Week 3: ADaM Deep Dive
  • Week 4: Tables, Listings and Figures
  1. Day 3: Controlled Terminology & MedDRA Coding
  • Day 1: Environment Setup & First SDTM Code
  • Day 2: SDTM Domain Structure & Tidyverse Mastery
  • Day 3: Controlled Terminology & MedDRA Coding
  • Day 4: Clinical Date Derivations with lubridate
  • Day 5: Advanced Tidyverse: Pivoting & Joining
  • Day 6: Introduction to sdtm.oak
  • Day 7: Week 1 Capstone: End-to-End SDTM Script

On this page

  • 1 Learning Objectives
  • 2 Introduction
    • 2.1 Why Controlled Terminology Matters
    • 2.2 The Role of MedDRA
  • 3 Package Installation & Loading
    • 3.1 Required Packages
    • 3.2 Load Packages
    • 3.3 Load AE Data
  • 4 Part 1: Building a Complete MedDRA Lookup Table
    • 4.1 Step 1: Identify All Unique Preferred Terms
    • 4.2 Step 2: Create the Complete Lookup Table
  • 5 Part 2: Applying Codes with Joins
    • 5.1 Step 3: Simulate Uncoded Data
    • 5.2 Step 4: Apply Coding via Left Join
  • 6 Part 3: Validating the Coding
    • 6.1 Step 5: Generate Validation Report
    • 6.2 Step 6: Compare Against Original
  • 7 Part 4: Other CDISC Controlled Terminology
    • 7.1 Severity (AESEV)
    • 7.2 Seriousness (AESER)
    • 7.3 Causality (AEREL)
  • 8 🎯 Practice Exercise
    • 8.1 Your Turn: Complete the Missing Mappings
  • 9 Deliverable Summary
  • 10 Key Takeaways
  • 11 Resources
  • 12 What’s Next?

Day 3: Controlled Terminology & MedDRA Coding

Building AE Codelists in R

← Back to Roadmap

1 Learning Objectives

By the end of Day 3, you will be able to:

  1. Understand the importance of Controlled Terminology (CT) in regulatory submissions
  2. Create lookup tables (codelists) in R for standardized coding
  3. Perform MedDRA coding operations (mapping Preferred Terms to System Organ Classes)
  4. Use left_join() effectively to apply standardized codes to clinical data
  5. Validate that all terms have been correctly coded

2 Introduction

2.1 Why Controlled Terminology Matters

In clinical trials, it is extremely important that everyone uses the same words to describe the same things. This is called Controlled Terminology (CT). Regulatory agencies like the FDA require this because they need to compare data from many different studies and sponsors. If everyone used their own words, it would be impossible to analyze the data together.

2.1.1 Example:

Suppose one doctor writes β€œMild”, another writes β€œMod”, and a third writes β€œ1” to describe the severity of an event. Without CT, these would all be treated as different, even though they mean the same thing. With CDISC CT, everyone must use the same set of allowed values, like β€œMILD”, β€œMODERATE”, β€œSEVERE”.

2.1.2 Why is this important?

Using CT makes your data clear, consistent, and ready for regulatory review. It also helps avoid mistakes and confusion when combining data from different sources.

2.2 The Role of MedDRA

MedDRA (Medical Dictionary for Regulatory Activities) is the international standard dictionary for coding medical terms in clinical trials. It is especially important for coding adverse events (AEs). MedDRA organizes medical terms into a hierarchy, from very broad categories down to very specific terms:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  SOC  β”‚ System Organ Class (e.g., "Gastrointestinal disorders") β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚  HLGT β”‚ High Level Group Term                                   β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚  HLT  β”‚ High Level Term                                         β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚  PT   β”‚ Preferred Term (e.g., "Nausea") β†’ Used in AEDECOD       β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚  LLT  β”‚ Lowest Level Term (e.g., "Feeling queasy")              β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

In SDTM:

  • AEDECOD = Preferred Term (PT)
  • AEBODSYS = System Organ Class (SOC)

3 Package Installation & Loading

3.1 Required Packages

Package Purpose
dplyr Data manipulation (joins, mutations)
pharmaversesdtm Source of AE data

3.2 Load Packages

library(dplyr)
library(pharmaversesdtm)

3.3 Load AE Data

data("ae", package = "pharmaversesdtm")

# Preview the variables involved in coding
ae %>%
  select(USUBJID, AETERM, AEDECOD, AEBODSYS) %>%
  head()
# A tibble: 6 Γ— 4
  USUBJID     AETERM                               AEDECOD              AEBODSYS
  <chr>       <chr>                                <chr>                <chr>   
1 01-701-1015 APPLICATION SITE ERYTHEMA            APPLICATION SITE ER… GENERAL…
2 01-701-1015 APPLICATION SITE PRURITUS            APPLICATION SITE PR… GENERAL…
3 01-701-1015 DIARRHOEA                            DIARRHOEA            GASTROI…
4 01-701-1023 ATRIOVENTRICULAR BLOCK SECOND DEGREE ATRIOVENTRICULAR BL… CARDIAC…
5 01-701-1023 ERYTHEMA                             ERYTHEMA             SKIN AN…
6 01-701-1023 ERYTHEMA                             ERYTHEMA             SKIN AN…

4 Part 1: Building a Complete MedDRA Lookup Table

In a real environment, you would have access to the full MedDRA dictionary files. For this exercise, we will create a complete lookup table covering all terms in our dataset.

4.1 Step 1: Identify All Unique Preferred Terms

First, let’s see all unique Preferred Terms (AEDECOD) and their corresponding SOCs in our dataset.

# Get all unique PT-SOC combinations from the source data
unique_terms <- ae %>%
  distinct(AEDECOD, AEBODSYS) %>%
  arrange(AEBODSYS, AEDECOD)

cat("Total unique Preferred Terms:", nrow(unique_terms), "\n\n")
Total unique Preferred Terms: 242 
print(unique_terms)
# A tibble: 242 Γ— 2
   AEDECOD                              AEBODSYS         
   <chr>                                <chr>            
 1 ATRIAL FIBRILLATION                  CARDIAC DISORDERS
 2 ATRIAL FLUTTER                       CARDIAC DISORDERS
 3 ATRIAL HYPERTROPHY                   CARDIAC DISORDERS
 4 ATRIOVENTRICULAR BLOCK FIRST DEGREE  CARDIAC DISORDERS
 5 ATRIOVENTRICULAR BLOCK SECOND DEGREE CARDIAC DISORDERS
 6 BRADYCARDIA                          CARDIAC DISORDERS
 7 BUNDLE BRANCH BLOCK LEFT             CARDIAC DISORDERS
 8 BUNDLE BRANCH BLOCK RIGHT            CARDIAC DISORDERS
 9 CARDIAC DISORDER                     CARDIAC DISORDERS
10 CARDIAC FAILURE CONGESTIVE           CARDIAC DISORDERS
# β„Ή 232 more rows

4.2 Step 2: Create the Complete Lookup Table

Now we create a comprehensive lookup table that maps all Preferred Terms to their SOCs. This is the dictionary we would use in production.

# Create a complete MedDRA-style lookup table
# In production, this comes from official MedDRA files
meddra_lookup <- tribble(
  ~AEDECOD,                            ~SOC_CODED,
  # CARDIAC DISORDERS
  "ATRIAL FIBRILLATION",               "CARDIAC DISORDERS",
  "ATRIAL FLUTTER",                    "CARDIAC DISORDERS",
  "BUNDLE BRANCH BLOCK LEFT",          "CARDIAC DISORDERS",
  "BUNDLE BRANCH BLOCK RIGHT",         "CARDIAC DISORDERS",
  "MYOCARDIAL INFARCTION",             "CARDIAC DISORDERS",
  "SINUS BRADYCARDIA",                 "CARDIAC DISORDERS",
  "SINUS TACHYCARDIA",                 "CARDIAC DISORDERS",
  
  # GASTROINTESTINAL DISORDERS
  "ABDOMINAL DISCOMFORT",              "GASTROINTESTINAL DISORDERS",
  "ABDOMINAL PAIN",                    "GASTROINTESTINAL DISORDERS",
  "CONSTIPATION",                      "GASTROINTESTINAL DISORDERS",
  "DIARRHOEA",                         "GASTROINTESTINAL DISORDERS",
  "DYSPEPSIA",                         "GASTROINTESTINAL DISORDERS",
  "FLATULENCE",                        "GASTROINTESTINAL DISORDERS",
  "NAUSEA",                            "GASTROINTESTINAL DISORDERS",
  "VOMITING",                          "GASTROINTESTINAL DISORDERS",
  
  # GENERAL DISORDERS AND ADMINISTRATION SITE CONDITIONS
  "APPLICATION SITE DERMATITIS",       "GENERAL DISORDERS AND ADMINISTRATION SITE CONDITIONS",
  "APPLICATION SITE ERYTHEMA",         "GENERAL DISORDERS AND ADMINISTRATION SITE CONDITIONS",
  "APPLICATION SITE IRRITATION",       "GENERAL DISORDERS AND ADMINISTRATION SITE CONDITIONS",
  "APPLICATION SITE PRURITUS",         "GENERAL DISORDERS AND ADMINISTRATION SITE CONDITIONS",
  "FATIGUE",                           "GENERAL DISORDERS AND ADMINISTRATION SITE CONDITIONS",
  "OEDEMA PERIPHERAL",                 "GENERAL DISORDERS AND ADMINISTRATION SITE CONDITIONS",
  "PYREXIA",                           "GENERAL DISORDERS AND ADMINISTRATION SITE CONDITIONS",
  
  # INFECTIONS AND INFESTATIONS
  "BRONCHITIS",                        "INFECTIONS AND INFESTATIONS",
  "GASTROENTERITIS",                   "INFECTIONS AND INFESTATIONS",
  "NASOPHARYNGITIS",                   "INFECTIONS AND INFESTATIONS",
  "UPPER RESPIRATORY TRACT INFECTION", "INFECTIONS AND INFESTATIONS",
  "URINARY TRACT INFECTION",           "INFECTIONS AND INFESTATIONS",
  
  # METABOLISM AND NUTRITION DISORDERS
  "DECREASED APPETITE",                "METABOLISM AND NUTRITION DISORDERS",
  "HYPERCHOLESTEROLAEMIA",             "METABOLISM AND NUTRITION DISORDERS",
  "HYPERGLYCAEMIA",                    "METABOLISM AND NUTRITION DISORDERS",
  "HYPOGLYCAEMIA",                     "METABOLISM AND NUTRITION DISORDERS",
  
  # MUSCULOSKELETAL AND CONNECTIVE TISSUE DISORDERS
  "ARTHRALGIA",                        "MUSCULOSKELETAL AND CONNECTIVE TISSUE DISORDERS",
  "BACK PAIN",                         "MUSCULOSKELETAL AND CONNECTIVE TISSUE DISORDERS",
  "MUSCLE SPASMS",                     "MUSCULOSKELETAL AND CONNECTIVE TISSUE DISORDERS",
  "MYALGIA",                           "MUSCULOSKELETAL AND CONNECTIVE TISSUE DISORDERS",
  "PAIN IN EXTREMITY",                 "MUSCULOSKELETAL AND CONNECTIVE TISSUE DISORDERS",
  
  # NERVOUS SYSTEM DISORDERS
  "DIZZINESS",                         "NERVOUS SYSTEM DISORDERS",
  "HEADACHE",                          "NERVOUS SYSTEM DISORDERS",
  "LETHARGY",                          "NERVOUS SYSTEM DISORDERS",
  "MIGRAINE",                          "NERVOUS SYSTEM DISORDERS",
  "PARAESTHESIA",                      "NERVOUS SYSTEM DISORDERS",
  "SYNCOPE",                           "NERVOUS SYSTEM DISORDERS",
  "TREMOR",                            "NERVOUS SYSTEM DISORDERS",
  
  # PSYCHIATRIC DISORDERS
  "ANXIETY",                           "PSYCHIATRIC DISORDERS",
  "DEPRESSION",                        "PSYCHIATRIC DISORDERS",
  "INSOMNIA",                          "PSYCHIATRIC DISORDERS",
  
  # RESPIRATORY, THORACIC AND MEDIASTINAL DISORDERS
  "COUGH",                             "RESPIRATORY, THORACIC AND MEDIASTINAL DISORDERS",
  "DYSPNOEA",                          "RESPIRATORY, THORACIC AND MEDIASTINAL DISORDERS",
  "NASAL CONGESTION",                  "RESPIRATORY, THORACIC AND MEDIASTINAL DISORDERS",
  "OROPHARYNGEAL PAIN",                "RESPIRATORY, THORACIC AND MEDIASTINAL DISORDERS",
  
  # SKIN AND SUBCUTANEOUS TISSUE DISORDERS
  "DERMATITIS CONTACT",                "SKIN AND SUBCUTANEOUS TISSUE DISORDERS",
  "ERYTHEMA",                          "SKIN AND SUBCUTANEOUS TISSUE DISORDERS",
  "HYPERHIDROSIS",                     "SKIN AND SUBCUTANEOUS TISSUE DISORDERS",
  "PRURITUS",                          "SKIN AND SUBCUTANEOUS TISSUE DISORDERS",
  "RASH",                              "SKIN AND SUBCUTANEOUS TISSUE DISORDERS",
  "URTICARIA",                         "SKIN AND SUBCUTANEOUS TISSUE DISORDERS",
  
  # VASCULAR DISORDERS
  "HYPERTENSION",                      "VASCULAR DISORDERS",
  "HYPOTENSION",                       "VASCULAR DISORDERS",
  "HOT FLUSH",                         "VASCULAR DISORDERS"
)

cat("Lookup table contains", nrow(meddra_lookup), "term mappings\n")
Lookup table contains 59 term mappings

5 Part 2: Applying Codes with Joins

Now, we will simulate a scenario where we have the terms but missing the Body System (SOC). We will use our lookup table to fill it in.

5.1 Step 3: Simulate Uncoded Data

Let’s create a dataset that has AEDECOD but is missing AEBODSYS.

# Create uncoded dataset by dropping AEBODSYS
ae_uncoded <- ae %>%
  select(USUBJID, AESEQ, AETERM, AEDECOD, AESEV, AEREL, AESER)

cat("Uncoded dataset:", nrow(ae_uncoded), "records\n")
Uncoded dataset: 1191 records
head(ae_uncoded)
# A tibble: 6 Γ— 7
  USUBJID     AESEQ AETERM                             AEDECOD AESEV AEREL AESER
  <chr>       <dbl> <chr>                              <chr>   <chr> <chr> <chr>
1 01-701-1015     1 APPLICATION SITE ERYTHEMA          APPLIC… MILD  PROB… N    
2 01-701-1015     2 APPLICATION SITE PRURITUS          APPLIC… MILD  PROB… N    
3 01-701-1015     3 DIARRHOEA                          DIARRH… MILD  REMO… N    
4 01-701-1023     3 ATRIOVENTRICULAR BLOCK SECOND DEG… ATRIOV… MILD  POSS… N    
5 01-701-1023     1 ERYTHEMA                           ERYTHE… MILD  POSS… N    
6 01-701-1023     2 ERYTHEMA                           ERYTHE… MODE… PROB… N    

5.2 Step 4: Apply Coding via Left Join

We use left_join() to attach the SOC information from our lookup table to the clinical data based on the matching Preferred Term (AEDECOD).

# Join the lookup table to the data
ae_coded <- ae_uncoded %>%
  left_join(meddra_lookup, by = "AEDECOD")

# Check the results
ae_coded %>%
  select(USUBJID, AEDECOD, SOC_CODED) %>%
  head(10)
# A tibble: 10 Γ— 3
   USUBJID     AEDECOD                              SOC_CODED                   
   <chr>       <chr>                                <chr>                       
 1 01-701-1015 APPLICATION SITE ERYTHEMA            GENERAL DISORDERS AND ADMIN…
 2 01-701-1015 APPLICATION SITE PRURITUS            GENERAL DISORDERS AND ADMIN…
 3 01-701-1015 DIARRHOEA                            GASTROINTESTINAL DISORDERS  
 4 01-701-1023 ATRIOVENTRICULAR BLOCK SECOND DEGREE <NA>                        
 5 01-701-1023 ERYTHEMA                             SKIN AND SUBCUTANEOUS TISSU…
 6 01-701-1023 ERYTHEMA                             SKIN AND SUBCUTANEOUS TISSU…
 7 01-701-1023 ERYTHEMA                             SKIN AND SUBCUTANEOUS TISSU…
 8 01-701-1028 APPLICATION SITE ERYTHEMA            GENERAL DISORDERS AND ADMIN…
 9 01-701-1028 APPLICATION SITE PRURITUS            GENERAL DISORDERS AND ADMIN…
10 01-701-1034 APPLICATION SITE PRURITUS            GENERAL DISORDERS AND ADMIN…
ImportantWhy Left Join?

We use left_join because we want to keep all rows from our clinical data (ae_uncoded), even if a matching code isn’t found in the dictionary. If we used inner_join, we would lose data for any term not in our lookup table!


6 Part 3: Validating the Coding

After applying codes, it is critical to verify that:

  1. Every term got a code (no NA values in the new column).
  2. The coding is accurate.

6.1 Step 5: Generate Validation Report

Let’s create a comprehensive validation report.

# Count coded vs uncoded
validation_summary <- ae_coded %>%
  summarise(
    Total_Records = n(),
    Coded = sum(!is.na(SOC_CODED)),
    Uncoded = sum(is.na(SOC_CODED)),
    Pct_Coded = round(100 * Coded / Total_Records, 1)
  )

print(validation_summary)
# A tibble: 1 Γ— 4
  Total_Records Coded Uncoded Pct_Coded
          <int> <int>   <int>     <dbl>
1          1191   753     438      63.2
# Identify which terms are still uncoded
uncoded_terms <- ae_coded %>%
  filter(is.na(SOC_CODED)) %>%
  distinct(AEDECOD) %>%
  arrange(AEDECOD)

if (nrow(uncoded_terms) > 0) {
  cat("⚠️ WARNING: The following", nrow(uncoded_terms), "terms need to be added to the lookup table:\n\n")
  print(uncoded_terms)
} else {
  cat("βœ… SUCCESS: All terms were successfully coded!\n")
}
⚠️ WARNING: The following 191 terms need to be added to the lookup table:

# A tibble: 191 Γ— 1
   AEDECOD                        
   <chr>                          
 1 ACROCHORDON EXCISION           
 2 ACTINIC KERATOSIS              
 3 AGITATION                      
 4 ALCOHOL USE                    
 5 ALLERGIC GRANULOMATOUS ANGIITIS
 6 ALOPECIA                       
 7 AMNESIA                        
 8 APPLICATION SITE BLEEDING      
 9 APPLICATION SITE DESQUAMATION  
10 APPLICATION SITE DISCHARGE     
# β„Ή 181 more rows

6.2 Step 6: Compare Against Original

Let’s verify our coding matches the original data.

# Join original SOC for comparison
coding_check <- ae %>%
  select(USUBJID, AESEQ, AEDECOD, AEBODSYS_ORIG = AEBODSYS) %>%
  inner_join(
    ae_coded %>% select(USUBJID, AESEQ, SOC_CODED),
    by = c("USUBJID", "AESEQ")
  ) %>%
  mutate(
    Match = AEBODSYS_ORIG == SOC_CODED
  )

# Summary
cat("Coding accuracy check:\n")
Coding accuracy check:
cat("Matching:", sum(coding_check$Match, na.rm = TRUE), "\n")
Matching: 753 
cat("Mismatched:", sum(!coding_check$Match, na.rm = TRUE), "\n")
Mismatched: 0 

7 Part 4: Other CDISC Controlled Terminology

Beyond MedDRA, we must ensure other variables match CDISC standards.

7.1 Severity (AESEV)

Valid CDISC CT values: β€œMILD”, β€œMODERATE”, β€œSEVERE”

# Check for non-standard values
ae %>%
  count(AESEV, name = "Count") %>%
  mutate(
    Is_Standard = AESEV %in% c("MILD", "MODERATE", "SEVERE"),
    Status = ifelse(Is_Standard, "βœ“ Valid", "βœ— Invalid")
  )
# A tibble: 3 Γ— 4
  AESEV    Count Is_Standard Status 
  <chr>    <int> <lgl>       <chr>  
1 MILD       770 TRUE        βœ“ Valid
2 MODERATE   378 TRUE        βœ“ Valid
3 SEVERE      43 TRUE        βœ“ Valid

7.2 Seriousness (AESER)

Valid CDISC CT values: β€œY”, β€œN”

ae %>%
  count(AESER, name = "Count") %>%
  mutate(
    Is_Standard = AESER %in% c("Y", "N"),
    Status = ifelse(Is_Standard, "βœ“ Valid", "βœ— Invalid")
  )
# A tibble: 2 Γ— 4
  AESER Count Is_Standard Status 
  <chr> <int> <lgl>       <chr>  
1 N      1188 TRUE        βœ“ Valid
2 Y         3 TRUE        βœ“ Valid

7.3 Causality (AEREL)

Causality can vary by study, but common values include:

ae %>%
  count(AEREL, name = "Count") %>%
  arrange(desc(Count))
# A tibble: 5 Γ— 2
  AEREL    Count
  <chr>    <int>
1 PROBABLE   361
2 POSSIBLE   343
3 NONE       322
4 REMOTE     161
5 <NA>         4

8 🎯 Practice Exercise

8.1 Your Turn: Complete the Missing Mappings

If the validation above showed uncoded terms, add them to the lookup table below and re-run the validation.

# TODO: Add any missing terms from the validation report
additional_terms <- tribble(
  ~AEDECOD,         ~SOC_CODED,
  # Add your mappings here:
  # "TERM_NAME",    "SYSTEM ORGAN CLASS",
)

# Combine with original lookup
complete_lookup <- bind_rows(meddra_lookup, additional_terms)

# Re-apply coding
ae_recoded <- ae_uncoded %>%
  left_join(complete_lookup, by = "AEDECOD")

# Check for remaining uncoded
ae_recoded %>%
  filter(is.na(SOC_CODED)) %>%
  distinct(AEDECOD)

9 Deliverable Summary

Today you completed the following:

Task Status
Created MedDRA lookup table with 50+ terms βœ“ Done
Applied coding using left_join() βœ“ Done
Generated validation report (coded vs uncoded) βœ“ Done
Verified coding accuracy against source βœ“ Done
Checked CDISC CT for AESEV, AESER, AEREL βœ“ Done

10 Key Takeaways

  1. Consistency is Mandatory: Regulatory submissions require strict adherence to Controlled Terminology (CT).
  2. MedDRA Hierarchy: PT (Preferred Term) β†’ SOC (System Organ Class) is the key mapping for AE data.
  3. Joins for Coding: left_join() is the primary tool for applying codes from a dictionary to a dataset.
  4. Validation is Critical: Always check for NA values after a join to identify terms that didn’t match.
  5. Document Everything: Keep track of which version of MedDRA you’re using.

11 Resources

  • CDISC Controlled Terminology Browser - Browse official values
  • MedDRA Hierarchy Guide - Understanding the coding levels
  • Metacore Documentation - R package for managing metadata and CT

12 What’s Next?

In Day 4, we will focus on Clinical Date Derivations. You will learn:

  • How to parse ISO 8601 dates with lubridate
  • Calculating Study Days (--DY) with the β€œNo Day 0” rule
  • Handling partial dates (e.g., β€œ2023-05”)
  • Creating reusable derivation functions

 

30 Days of Pharmaverse  Β·  Disclaimer  Β·  Indraneel Chakraborty  Β·  Β© 2026