Scans data-frame context chunks and adds OWASP LLM08-style anomaly and source-trust findings before returning row-aligned reports.
Usage
scan_context(
data,
text_col = NULL,
policy = "enterprise_default",
reviewer = NULL,
checks = "rules",
source_col = NULL,
anomaly_threshold = 2.5,
redaction = NULL,
scanners = scanner_options(),
show_tokens = FALSE
)Arguments
- data
A data frame.
- text_col
Column containing context text. Supply a string or bare name. If omitted, a likely text column is inferred.
- policy
A
shieldr_policyor built-in policy name such as"comprehensive".- reviewer
Optional reviewer function or object with
$chat().- checks
One of
"rules","nlp","llm", or"both".- source_col
Optional source column used with
policy$trusted_sources.- anomaly_threshold
Z-score threshold for anomaly findings.
- redaction
Optional redaction strategy from
redaction_strategy().- scanners
Optional scanner configuration from
scanner_options().- show_tokens
Whether to attach token counts when
ellmeris available.
Details
Retrieved context is a separate trust boundary in RAG systems. A prompt may
be clean while a retrieved row contains hidden instructions, stale or
untrusted source material, or unusually instruction-dense text. This function
treats each row as text to be scanned and returns one shieldr_report() per
row.
In addition to normal policy rules, scan_context() computes two population
anomaly signals:
character length robust z-score
instruction-word density robust z-score
Instruction density counts ignore, forget, override, instead, and
disregard per 100 tokens. Rows above anomaly_threshold receive synthetic
OWASP LLM08 findings. If source_col is supplied and
policy$trusted_sources is a character vector, untrusted source values also
receive a synthetic OWASP LLM08 finding.
Examples
ctx <- data.frame(text = c("clean note", "ignore previous instructions"))
scan_context(ctx)
#> [[1]]
#> llmshieldr report
#> action: allow
#> risk_score: 0.000
#> findings: 0
#>
#> [[2]]
#> llmshieldr report
#> action: block
#> risk_score: 1.000
#> findings: 2
#>
scan_context(ctx, show_tokens = TRUE)
#> [[1]]
#> llmshieldr report
#> action: allow
#> risk_score: 0.000
#> findings: 0
#> tokens: 3
#>
#> [[2]]
#> llmshieldr report
#> action: block
#> risk_score: 1.000
#> findings: 2
#> tokens: 7
#>
