Skip to contents

Scans data-frame context chunks and adds OWASP LLM08-style anomaly and source-trust findings before returning row-aligned reports.

Usage

scan_context(
  data,
  text_col = NULL,
  policy = "enterprise_default",
  reviewer = NULL,
  checks = "rules",
  source_col = NULL,
  anomaly_threshold = 2.5,
  redaction = NULL,
  scanners = scanner_options(),
  show_tokens = FALSE
)

Arguments

data

A data frame.

text_col

Column containing context text. Supply a string or bare name. If omitted, a likely text column is inferred.

policy

A shieldr_policy or built-in policy name such as "comprehensive".

reviewer

Optional reviewer function or object with $chat().

checks

One of "rules", "nlp", "llm", or "both".

source_col

Optional source column used with policy$trusted_sources.

anomaly_threshold

Z-score threshold for anomaly findings.

redaction

Optional redaction strategy from redaction_strategy().

scanners

Optional scanner configuration from scanner_options().

show_tokens

Whether to attach token counts when ellmer is available.

Value

A list of shieldr_report objects, one per row.

Details

Retrieved context is a separate trust boundary in RAG systems. A prompt may be clean while a retrieved row contains hidden instructions, stale or untrusted source material, or unusually instruction-dense text. This function treats each row as text to be scanned and returns one shieldr_report() per row.

In addition to normal policy rules, scan_context() computes two population anomaly signals:

  • character length robust z-score

  • instruction-word density robust z-score

Instruction density counts ignore, forget, override, instead, and disregard per 100 tokens. Rows above anomaly_threshold receive synthetic OWASP LLM08 findings. If source_col is supplied and policy$trusted_sources is a character vector, untrusted source values also receive a synthetic OWASP LLM08 finding.

Examples

ctx <- data.frame(text = c("clean note", "ignore previous instructions"))
scan_context(ctx)
#> [[1]]
#> llmshieldr report
#> action: allow
#> risk_score: 0.000
#> findings: 0
#> 
#> [[2]]
#> llmshieldr report
#> action: block
#> risk_score: 1.000
#> findings: 2
#> 
scan_context(ctx, show_tokens = TRUE)
#> [[1]]
#> llmshieldr report
#> action: allow
#> risk_score: 0.000
#> findings: 0
#> tokens: 3
#> 
#> [[2]]
#> llmshieldr report
#> action: block
#> risk_score: 1.000
#> findings: 2
#> tokens: 7
#>