> ## Documentation Index
> Fetch the complete documentation index at: https://algolia.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Guardrails

> Filter user input and agent output to block policy-violating content in Agent Studio.

<Callout icon="flask-conical" color="#14b8a6">
  This is a **beta feature** according to [Algolia's Terms of Service ("Beta Services")](https://www.algolia.com/policies/terms/).
</Callout>

Guardrails classify user messages and agent responses against categories you define,
blocking content that violates your policies.
When content is blocked, a fallback response is returned instead.

## How guardrails work

When a user sends a message to an agent with guardrails enabled:

1. **Input check**: the user's message is classified against your input-scoped categories.
   If it matches a category, the agent returns the fallback response without calling the LLM.
2. **Agent response**: if the input passes, the agent generates a response normally.
3. **Output check**: the agent's response is classified against your output-scoped categories.
   If it matches, the response is replaced with the fallback.

Classification uses a separate LLM call with a dedicated model and provider.
This keeps guardrail logic independent from the agent's main model.

<Note>
  Guardrails use a **fail-open** design.
  If the classification LLM is unavailable (timeout, API error, rate limit),
  content is allowed through rather than blocked.
  This prevents guardrail outages from disrupting your agent.
</Note>

## Set up guardrails

<Steps>
  <Step title="Configure a provider">
    Guardrails need an LLM provider for classification.
    You can use the same provider as your agent or a different one.

    1. Go to your agent in the [Agent Studio dashboard](https://dashboard.algolia.com/generativeAi/agent-studio).
    2. Open the **Safety controls** tab.
    3. Enable **Guardrails**.
    4. Select a provider and model.

    <Tip>
      Use a fast, low-latency model for classification.
      Larger models may improve accuracy but add latency to every request.
    </Tip>
  </Step>

  <Step title="Define your agent's scope">
    Describe your agent's domain to give the classifier context.
    For example: "Customer support agent for an electronics store."

    This helps the classifier distinguish legitimate queries from off-topic content.
  </Step>

  <Step title="Add violation categories">
    Each category defines a type of content to block:

    * **Name**: identifier for the category (for example, `competitor_mentions`)
    * **Scope**: `input` (user messages), `output` (agent responses), or `both`
    * **Description**: what content this category catches
    * **Fallback response**: message returned when this category triggers

    Add categories that match your use case.
    For example, an e-commerce agent might block competitor mentions, off-topic questions, and inappropriate content.

    <Warning>
      Consider consolidating categories if you have more than eight.
      Too many categories can reduce classification accuracy.
    </Warning>
  </Step>

  <Step title="Test in the playground">
    Send messages that should be blocked and messages that should pass through.
    The playground shows guardrail violations and provider errors so you can verify your configuration before deploying.
  </Step>
</Steps>

## Category scope

Each category is scoped to control when it's checked:

| Scope    | Checks user input | Checks agent output |
| -------- | ----------------- | ------------------- |
| `input`  | Yes               | No                  |
| `output` | No                | Yes                 |
| `both`   | Yes               | Yes                 |

Use `input` scope for categories that filter what users can ask (off-topic questions, prompt injection attempts).
Use `output` scope for categories that filter what the agent can say (confidential information, competitor mentions).
Use `both` when the same policy applies to both directions.

## Fallback responses

When content is blocked, the user sees the category's fallback response.
If no fallback is configured, a default message is used:

* Input violations: "I cannot process this request."
* Output violations: "I cannot provide this response."

Configure specific fallback responses to guide users toward acceptable queries.
For example: "I can only help with questions about our products and services."

## Streaming behavior

For streaming responses, guardrails work differently for input and output:

**Input guardrails** run concurrently with the LLM stream.
If a violation is detected mid-stream,
a violation event is emitted and the client discards any already-streamed content.

**Output guardrails** classify the full response after streaming completes.
If a violation is detected, a violation event is emitted as the final chunk.
The client replaces the streamed content with the fallback response.

In both cases, the Vercel AI SDK handles the replacement automatically.
If you're building a custom integration,
handle the `guardrailViolation` event in your streaming parser.

## API configuration

Configure guardrails in the agent's `config.guardrail` object:

```json JSON icon=braces theme={"system"}
{
  "config": {
    "guardrail": {
      "enabled": true,
      "providerId": "PROVIDER_UUID",
      "model": "gpt-4.1-mini",
      "scope": "Customer support agent for an electronics store.",
      "categories": [
        {
          "name": "off_topic",
          "scope": "input",
          "description": "Questions unrelated to electronics or the store.",
          "fallbackResponse": "I can only help with electronics questions."
        },
        {
          "name": "inappropriate",
          "scope": "both",
          "description": "Offensive, hateful, or sexually explicit content."
        }
      ]
    }
  }
}
```

### Configuration reference

| Field        | Type    | Default | Description                                               |
| ------------ | ------- | ------- | --------------------------------------------------------- |
| `enabled`    | boolean | `false` | Turn guardrails on or off                                 |
| `required`   | boolean | `false` | Return 503 if the guardrail provider can't be initialized |
| `providerId` | string  | —       | UUID of the provider authentication for classification    |
| `model`      | string  | —       | Model name for the classification LLM                     |
| `scope`      | string  | —       | Description of the agent's domain (max 1,024 characters)  |
| `categories` | array   | `[]`    | List of violation categories                              |

### Category reference

| Field              | Type   | Default  | Description                                               |
| ------------------ | ------ | -------- | --------------------------------------------------------- |
| `name`             | string | required | Category identifier (1-64 characters)                     |
| `scope`            | string | `both`   | `input`, `output`, or `both`                              |
| `description`      | string | —        | What content this category catches (max 1,024 characters) |
| `fallbackResponse` | string | —        | Message returned when this category triggers              |

## Handling guardrail events

If you're using the Vercel AI SDK,
guardrail violations are handled automatically.
The SDK replaces blocked content with the fallback response.

For custom integrations, handle these streaming events:

| Event (AI SDK v5)          | Event (AI SDK v4)               | Description                                                                        |
| -------------------------- | ------------------------------- | ---------------------------------------------------------------------------------- |
| `data-guardrail-violation` | `guardrailViolation` data chunk | Content was blocked. Contains `category`, `guardrailType`, and `fallbackResponse`. |

When you receive a violation event:

1. Discard any content already streamed for this message.
2. Display the `fallbackResponse` from the event data.

## Troubleshooting

### Guardrail not blocking content

* Verify the category's `scope` matches the direction you're testing (input vs output).
* Check the category `description` is specific enough for the classifier to detect violations.
* Test with clear violations first, then refine descriptions for edge cases.

### "Review your guardrail model configuration" error

This error appears in the playground when the guardrail provider can't classify content.
Common causes:

* Invalid or expired API key on the provider
* Wrong model name
* Provider rate limit exceeded

Check your provider settings and try again.
This error doesn't appear in production. In production, guardrails fail open silently.

### Slow response times

Guardrail classification adds latency to each request.
To minimize this:

* Use a fast, low-latency model for classification
* Keep the number of categories under eight
* Keep category descriptions concise

## See also

* [Agent configuration](/doc/guides/algolia-ai/agent-studio/how-to/agent-configuration)
* [LLM providers](/doc/guides/algolia-ai/agent-studio/how-to/llm-providers)
* [Caching](/doc/guides/algolia-ai/agent-studio/how-to/caching)
