Custom Rules • llmshieldr

Policies are lists of shieldr_rule objects plus thresholds. You can start with a built-in policy and append domain-specific rules.

library(llmshieldr)

For the source model behind the built-in policies, see vignette("policy-design", package = "llmshieldr").

Rule Fields

Every rule has the same shape:

id: unique rule identifier. The recommended convention is llmXX.category.name, such as llm02.ticket_id.
pattern: regex pattern, or NULL
fn: R predicate function, or NULL
owasp: OWASP LLM category such as llm02
severity: low, medium, high, or critical
action: allow, redact, or block
description: human-readable explanation

Exactly one of pattern or fn must be supplied. Regex rules produce match spans that can be redacted. Function rules are useful when the condition is easier to express in R.

Function rules may return:

TRUE or FALSE
one finding list
a list of finding lists
a data frame of findings

Finding lists can include rule_id, owasp, severity, action, description, match, start, end, and source. Include start and end when you want custom function findings to participate in redaction.

Numbers and Thresholds

Severity maps to risk score contributions:

Severity	Contribution
`low`	0.1
`medium`	0.3
`high`	0.6
`critical`	1.0

Findings are deduplicated, overlapping spans from the same evidence are scored once, distinct findings are summed, and the total is capped at 1.0. Synthetic context findings are capped separately. A policy’s thresholds then decide the final action. Defaults are redact_at = 0.4 and block_at = 0.75.

guardrails <- policy()
guardrails$thresholds
#> $redact_at
#> [1] 0.4
#> 
#> $block_at
#> [1] 0.75

Regex Rules

Regex rules are the simplest way to redact or block recognizable text.

guardrails <- add_rule(
  guardrails,
  id = "llm02.ticket_id",
  pattern = "\\bTICKET-[0-9]{6}\\b",
  owasp = "llm02",
  severity = "medium",
  action = "redact",
  description = "Internal support ticket identifier."
)

scan_prompt("Summarize TICKET-123456 for the support team.", guardrails)
#> llmshieldr report
#> action: redact
#> risk_score: 0.300
#> findings: 1

Function Rules

Function rules let you express checks that are easier to write in R than in a single regular expression.

contains_student_address <- function(text) {
  grepl("\\bstudent\\b", text, ignore.case = TRUE) &&
    grepl("\\bhome address\\b", text, ignore.case = TRUE)
}

education <- policy("education_safe")
education <- add_rule(
  education,
  id = "llm02.student.address",
  fn = contains_student_address,
  owasp = "llm02",
  severity = "high",
  action = "redact",
  description = "Student home address reference."
)

scan_prompt("The student home address appears in the form.", education)
#> llmshieldr report
#> action: block
#> risk_score: 1.000
#> findings: 2

Function rules can also return span-aware findings:

ticket_span_rule <- function(text) {
  hit <- regexpr("\\bTICKET-[0-9]{6}\\b", text, perl = TRUE)
  if (identical(as.integer(hit[[1]]), -1L)) {
    return(FALSE)
  }
  start <- as.integer(hit[[1]])
  end <- start + as.integer(attr(hit, "match.length")) - 1L
  list(
    rule_id = "llm02.ticket_id.fn",
    owasp = "llm02",
    severity = "medium",
    action = "redact",
    description = "Internal support ticket identifier.",
    match = substr(text, start, end),
    start = start,
    end = end
  )
}

Industry Examples

Healthcare and life sciences often add identifiers beyond generic PII.

pharma <- policy("pharma_gxp")
pharma <- add_rule(
  pharma,
  id = "llm02.site_id",
  pattern = "\\bSITE-[0-9]{3}\\b",
  owasp = "llm02",
  severity = "medium",
  action = "redact",
  description = "Clinical trial site identifier."
)

Finance workflows often tighten language around recommendations and promises.

finance <- policy("finance_strict")
finance <- add_rule(
  finance,
  id = "llm09.promissory_return",
  pattern = "(?i)guaranteed\\s+(alpha|profit|return)",
  owasp = "llm09",
  severity = "critical",
  action = "block",
  description = "Promissory investment performance claim."
)

Rule Inventory

Use list_rules() to inspect a policy before deployment.

list_rules(guardrails)
#>                                id owasp severity action has_pattern has_fn
#> 1           llm01.injection.basic llm01 critical  block        TRUE  FALSE
#> 2        llm01.injection.indirect llm01 critical  block        TRUE  FALSE
#> 3                llm01.nlp.intent llm01     high  block       FALSE   TRUE
#> 4                 llm02.pii.email llm02   medium redact        TRUE  FALSE
#> 5                 llm02.pii.phone llm02   medium redact        TRUE  FALSE
#> 6                   llm02.pii.ssn llm02     high redact        TRUE  FALSE
#> 7             llm02.phi.condition llm02     high redact        TRUE  FALSE
#> 8            llm02.secret.api_key llm02     high redact        TRUE  FALSE
#> 9             llm02.secret.bearer llm02     high redact        TRUE  FALSE
#> 10               llm02.secret.aws llm02     high redact        TRUE  FALSE
#> 11          llm02.secret.password llm02     high redact        TRUE  FALSE
#> 12 llm02.secret.connection_string llm02     high redact        TRUE  FALSE
#> 13 llm07.system_prompt.extraction llm07 critical  block        TRUE  FALSE
#> 14          llm06.agency.language llm06 critical  block        TRUE  FALSE
#> 15                llm02.ticket_id llm02   medium redact        TRUE  FALSE

The resulting table includes has_pattern and has_fn, which make it easy to audit whether a policy is mostly regex-based, function-based, or mixed.

Custom rule ids that do not follow the llmXX. naming convention still work, but shieldr_rule() warns because OWASP risk summaries are clearest when rule ids carry the category prefix.

Rule Test Checklist

For every new rule, keep at least:

one positive case that should trigger the rule,
one nearby negative case that should not trigger,
one redaction assertion when the rule should redact,
one policy-level assertion when the rule should block,
one domain-specific benign case if the rule targets clinical, finance, education, developer, or other specialized text.

The packaged evaluation corpus at inst/extdata/security_eval_cases.csv is a small starting point for these cases. Add application-specific corpora outside the package when examples contain real or sensitive data.