Skip to contents

Rate guards are explicit stateful environments used to cap token and request budgets for LLM workflows. Resource exhaustion is covered by OWASP LLM10; see https://genai.owasp.org/llm-top-10/.

Usage

rate_guard(
  max_tokens = NULL,
  max_requests = NULL,
  window_seconds = 3600L,
  strict = FALSE,
  concurrent = FALSE
)

Arguments

max_tokens

Maximum tokens per window, NULL, or an existing shieldr_rate_guard when checking a guard with rate_guard(guard).

max_requests

Maximum requests per window, or NULL.

window_seconds

Window length in seconds.

strict

Whether secure_chat() should reserve estimated prompt tokens before calling the model.

concurrent

Whether to protect $usage() and $update() with a file-based lock from the suggested filelock package.

Value

When creating a guard, a shieldr_rate_guard environment. When checking a guard, TRUE if usage is within limits.

Details

Calling rate_guard() with limits creates a new shieldr_rate_guard environment. The environment stores counters for the current window and exposes two methods:

  • $usage(): returns current counters and configured limits.

  • $reserve(tokens, requests): atomically checks projected usage and then increments counters when the reservation stays within limits.

  • $update(tokens, requests): backward-compatible alias for $reserve().

  • $rollback(tokens, requests): subtracts a previous reservation after a guarded operation fails before completion.

Calling rate_guard(guard) checks an existing environment and returns TRUE if all counters are within limits. Reservation methods fail before projected usage exceeds the configured token or request limit. Limits set to NULL are disabled for that dimension.

Windows reset automatically when window_seconds has elapsed. This object is intentionally stateful; it is the one place where llmshieldr expects mutable state, because rate limiting is inherently session-based.

Concurrency

The rate guard is not safe for concurrent use by default. Parallel or async R code (future, parallel, callr) that shares a single guard environment will produce inaccurate counts. Use concurrent = TRUE and install the filelock package to make each $usage(), $reserve(), $update(), and $rollback() call acquire a file-based lock within a single machine. Cross-machine coordination is not supported.

Pre-call Reservation

With strict = TRUE, secure_chat() reserves an estimated prompt token cost and one request before the model call, then records only the positive difference between the actual token estimate and the reserved amount after the call. If the chat call or output scan fails, the pre-call reservation is rolled back. This makes shared guards more useful under bursty load, but estimated tokens may differ from actual usage. Strict mode is recommended when multiple callers share one guard.

Examples

guard <- rate_guard(max_tokens = 100)
guard$reserve(tokens = 10)
rate_guard(guard)
#> [1] TRUE