scanner_options() enables optional checks that sit beside deterministic
policy rules. These scanners are intentionally lightweight and local. They
are useful for catching common wrappers around risky text, such as invisible
Unicode format characters, encoded payloads, disallowed URLs, simple token
budget violations, language allowlists, and topic bans.
Usage
scanner_options(
invisible_text = TRUE,
encoded_payloads = TRUE,
urls = FALSE,
malicious_urls = TRUE,
max_tokens = NULL,
allowed_languages = NULL,
language_fn = NULL,
blocked_topics = NULL,
blocked_url_hosts = NULL,
allowed_url_hosts = NULL
)Arguments
- invisible_text
Whether to flag Unicode format characters such as zero-width spaces. Normalization removes these characters before rule matching, but a finding records that evasive formatting was present.
- encoded_payloads
Whether to inspect URL-encoded and base64-like payloads by decoding candidates and scanning the decoded text.
- urls
Whether to create low-severity inventory findings for URLs.
- malicious_urls
Whether to flag URLs whose hosts are explicitly blocked or fall outside
allowed_url_hosts.- max_tokens
Optional maximum estimated tokens for a single scanned text. Exceeding the limit creates an OWASP LLM10 block finding.
- allowed_languages
Optional language allowlist. Uses
language_fnwhen supplied, otherwise a minimal ASCII/non-Latin heuristic.- language_fn
Optional function that receives text and returns a single language label.
- blocked_topics
Optional character vector of regular expressions, or a named character vector. Matches create topic-ban findings.
- blocked_url_hosts
Optional character vector of blocked URL hosts.
- allowed_url_hosts
Optional character vector of allowed URL hosts. When supplied, URL hosts outside the allowlist are flagged.
Details
Scanner findings use the same finding schema as rule findings and therefore
contribute to risk_score, action, audit logs, and explanations.
The encoded-payload scanner tries URL decoding and base64 decoding on
candidate substrings, then runs the active policy rules over decoded text.
It does not execute decoded content. The language scanner is deliberately
basic unless language_fn is supplied; a custom function should accept a
single string and return a language label such as "en", "es", or
"non_latin".
Examples
scanners <- scanner_options(
max_tokens = 500,
blocked_topics = c("internal layoffs", "unreleased earnings")
)
scan_prompt("Summarize this public note.", scanners = scanners)
#> llmshieldr report
#> action: allow
#> risk_score: 0.000
#> findings: 0
