Product release
Flagged message monitoring

Flagged message monitoring

Jun 30, 2025

Your AI is out there handling real conversations, but not everyone plays nice.

You’ve set the guardrails: no harmful content, blocked phrases, protections against prompt injection. But when users test those boundaries, do you know what’s actually happening? Are your safeguards working as intended?

With Flagged Message Monitoring, you get a dedicated dashboard that tracks every time a safeguard is triggered. Whether it’s a banned keyword, a hostile prompt, or an attempted exploit — you’ll see exactly what users tried, what flagged it, and how your AI responded.

What this means for your team

Close the loop between build and monitor. The rules you set during setup — from banned words to safety policies — are now fully traceable in real conversations.
Understand every violation. Each flag includes a transparent explanation of what triggered it, so reviewers can respond quickly and accurately.
Spot new threats. Strengthen safeguards.Identify patterns in how users attempt to circumvent your guardrails and strengthen your defenses accordingly.
Refine your policies. Use flagged conversations as direct feedback to adjust thresholds, add new terms, or update detection logic.
Document compliance. Generate clean audit trails to show how your AI handles violations and upholds safety standards.

What’s included

Comprehensive tracking monitors violations across all your safeguard categories—guardrails, adversarial attacks, banned phrases, and harmful content detection.
Clear explanations show exactly what triggered each flag, so your team understands the violation without guesswork.
Trend analysis reveals patterns in violation attempts over daily, weekly, and monthly views—helping you spot emerging attack vectors.
Filterable conversation list lets you drill down by violation type, user ID, or time period to investigate specific incidents.
Direct conversation links provide full context around each flagged interaction for thorough review.

This feature is part of Sendbird’s broader commitment to building accountable, responsible AI agents for customer service. From what you define to what your AI says — we’re closing the loop with visibility, control, and safeguards that scale.