Skip to contents

scanner_options() enables optional checks that sit beside deterministic policy rules. These scanners are intentionally lightweight and local. They are useful for catching common wrappers around risky text, such as invisible Unicode format characters, encoded payloads, disallowed URLs, simple token budget violations, language allowlists, and topic bans.

Usage

scanner_options(
  invisible_text = TRUE,
  encoded_payloads = TRUE,
  urls = FALSE,
  malicious_urls = TRUE,
  max_tokens = NULL,
  allowed_languages = NULL,
  language_fn = NULL,
  blocked_topics = NULL,
  blocked_url_hosts = NULL,
  allowed_url_hosts = NULL
)

Arguments

invisible_text

Whether to flag Unicode format characters such as zero-width spaces. Normalization removes these characters before rule matching, but a finding records that evasive formatting was present.

encoded_payloads

Whether to inspect URL-encoded and base64-like payloads by decoding candidates and scanning the decoded text.

urls

Whether to create low-severity inventory findings for URLs.

malicious_urls

Whether to flag URLs whose hosts are explicitly blocked or fall outside allowed_url_hosts.

max_tokens

Optional maximum estimated tokens for a single scanned text. Exceeding the limit creates an OWASP LLM10 block finding.

allowed_languages

Optional language allowlist. Uses language_fn when supplied, otherwise a minimal ASCII/non-Latin heuristic.

language_fn

Optional function that receives text and returns a single language label.

blocked_topics

Optional character vector of regular expressions, or a named character vector. Matches create topic-ban findings.

blocked_url_hosts

Optional character vector of blocked URL hosts.

allowed_url_hosts

Optional character vector of allowed URL hosts. When supplied, URL hosts outside the allowlist are flagged.

Value

A shieldr_scanner_options object.

Details

Scanner findings use the same finding schema as rule findings and therefore contribute to risk_score, action, audit logs, and explanations.

The encoded-payload scanner tries URL decoding and base64 decoding on candidate substrings, then runs the active policy rules over decoded text. It does not execute decoded content. The language scanner is deliberately basic unless language_fn is supplied; a custom function should accept a single string and return a language label such as "en", "es", or "non_latin".

Examples

scanners <- scanner_options(
  max_tokens = 500,
  blocked_topics = c("internal layoffs", "unreleased earnings")
)

scan_prompt("Summarize this public note.", scanners = scanners)
#> llmshieldr report
#> action: allow
#> risk_score: 0.000
#> findings: 0