Rate guards are explicit stateful environments used to cap token and request budgets for LLM workflows. Resource exhaustion is covered by OWASP LLM10; see https://genai.owasp.org/llm-top-10/.
Usage
rate_guard(
max_tokens = NULL,
max_requests = NULL,
window_seconds = 3600L,
strict = FALSE,
concurrent = FALSE
)Arguments
- max_tokens
Maximum tokens per window,
NULL, or an existingshieldr_rate_guardwhen checking a guard withrate_guard(guard).- max_requests
Maximum requests per window, or
NULL.- window_seconds
Window length in seconds.
- strict
Whether
secure_chat()should reserve estimated prompt tokens before calling the model.- concurrent
Whether to protect
$usage()and$update()with a file-based lock from the suggestedfilelockpackage.
Value
When creating a guard, a shieldr_rate_guard environment. When
checking a guard, TRUE if usage is within limits.
Details
Calling rate_guard() with limits creates a new shieldr_rate_guard
environment. The environment stores counters for the current window and
exposes two methods:
$usage(): returns current counters and configured limits.$reserve(tokens, requests): atomically checks projected usage and then increments counters when the reservation stays within limits.$update(tokens, requests): backward-compatible alias for$reserve().$rollback(tokens, requests): subtracts a previous reservation after a guarded operation fails before completion.
Calling rate_guard(guard) checks an existing environment and returns
TRUE if all counters are within limits. Reservation methods fail before
projected usage exceeds the configured token or request limit. Limits set to
NULL are disabled for that dimension.
Windows reset automatically when window_seconds has elapsed. This object
is intentionally stateful; it is the one place where llmshieldr expects
mutable state, because rate limiting is inherently session-based.
Concurrency
The rate guard is not safe for concurrent use by default. Parallel or async R
code (future, parallel, callr) that shares a single guard environment
will produce inaccurate counts. Use concurrent = TRUE and install the
filelock package to make each $usage(), $reserve(), $update(), and
$rollback() call acquire a file-based lock within a single machine.
Cross-machine coordination is not supported.
Pre-call Reservation
With strict = TRUE, secure_chat() reserves an estimated prompt token cost
and one request before the model call, then records only the positive
difference between the actual token estimate and the reserved amount after
the call. If the chat call or output scan fails, the pre-call reservation is
rolled back. This makes shared guards more useful under bursty load, but
estimated tokens may differ from actual usage. Strict mode is recommended
when multiple callers share one guard.
