Dynamic Safeguards: Context-aware responses when safeguards trigger

Dynamic Safeguards: Context-aware responses when safeguards trigger

Safeguards are the line between an AI agent you can trust in front of customers and one you can't. They catch harmful content, prompt injection attempts, banned terms, and false context before any of it shapes the rest of the conversation. But once a safeguard fires, what users see has historically been the same fixed message. A confused shopper, a jailbreak attempt, and a competitor name on the banned list all get the same line.

That works as a safety net but not as a brand experience. A user whose innocent question happened to brush a banned word reads the same canned apology as someone trying to attack the agent. The detection is doing the right thing. The response is doing the bare minimum.

Dynamic Safeguards replaces the fixed fallback with a safeguard actionbook that generates the response in real time, branching by trigger type and using the full conversation context.

How it works

  • Actionbook mode for safeguard responses: Connect any actionbook to your safeguard follow-up action. When a safeguard triggers, the actionbook runs instead of returning the default message.
  • Per-trigger branching: The actionbook receives the flagged type (harmful_content, adversarial_attack, context_injection, banned_words) and the flagged reason as input. Write distinct instructions for each, so an adversarial probe and a banned word brush get different responses.
  • Full conversation and user context: The actionbook has access to the entire conversation history and any user context objects you've configured, including membership tier, region, and language. VIPs can get a more empathetic response. Users in regulated regions can see the right disclaimer.
  • Backup message as safety net: A backup safeguard message is required even in Actionbook mode. If the actionbook fails or times out, the backup takes over, so users always get a response.

Dynamic Safeguards is part of Trust OS, turning the moment a safeguard fires from a dead end into a response that still sounds like your brand.