Policies for AIDR allow you to remotely configure your instance of AIDR. You can set which detectors are on, whether they block or just alert, and manage more nuanced configuration settings.
You can also create custom rulesets to support unique use cases, defined as Projects. For example, if you have use cases that process PII for benign purposes, you can create a ruleset that turns off the PII detector for specific projects, while leaving it on as the default. You can use Projects to define your use cases and assign them custom rulesets.
If you have AIDR currently deployed, your configurations will be set by environment variables and can be overridden by request headers. Establishing a ruleset in Rules allows you to manage these configurations remotely, through the console.
In order to implement Rules and Projects, you will need to take some technical steps to update environment variables, update the API key for your AIDR implementation, and update the header fields you’re sending to AIDR.
The following are required to use a Ruleset.
- AIDR deployed in Hybrid mode. See AIDR Deployments.
- API key and secret. For the API key, the Ruleset permission must be set to Read. See API Permission Related to Products.
In the Console, go to Admin > Rules.
Click Edit to edit the Default Ruleset.
Enable or disable the detection categories for the ruleset.
Most enabled detection categories allow you to select an action.
- Allow - Allows the input or output and sends an alert on the detection to the Console.
- Block - Blocks the input or output and sends an alert on the detection to the Console.
- Redact - Redacts the input or output and sends an alert on the detection to the Console. This replaces the redacted text with [REDACTED] or the information type (example: [PHONE NUMBER]).
Click Save.
Prompt Injection - Detects an attempt to manipulate an LLM into executing instructions against its intended purpose.
Input
- Allow - Allows the input and sends an alert to the Console.
- Block - Blocks the input and sends an alert to the Console.
Detect by Performing
- Quick Scan - Scans the full input. Doesn’t do a second scan with non-alphanumeric characters stripped.
- Full Scan - Scans the full input and performs a second scan with non-alphanumeric characters stripped.
Denial of Service - Detects an attempt to overload model traffic to degrade performance or consume tokens.
Input
- Allow - Allows the input and sends an alert to the Console.
- Block - Blocks the input and sends an alert to the Console.
Set Threshold
- DoS Threshold - Determines the number of tokens a requestor may pass before they are considered adversarial. The default is 4096 tokens.
Personal Identifiable Information - Detects sensitive information that may identify a person, such as a name, phone number, or credit card number.
Input
Allow - Allows the input and sends an alert to the Console.
Block - Blocks the input and sends an alert to the Console.
Redact - Redacts the input and sends an alert to the Console.
Replace redacted text with - Determines if redacted text is replaced with the term [REDACTED] or a label of the information type.
- Example [INFO TYPE] - Call me [PHONE NUMBER]
- Example [REDACTED] - Call me [REDACTED]
Output
Allow - Allows the output and sends an alert to the Console.
Block - Blocks the output and sends an alert to the Console. -Redact - Redacts the output and sends an alert to the Console.
Replace redacted text with - Determines if redacted text is replaced with the term [REDACTED] or a label of the information type.
- Example [INFO TYPE] - Call me [PHONE NUMBER]
- Example [REDACTED] - Call me [REDACTED]
Overrides (click Show Overrides to expand the section)
- Entity Types to Detect - Select which built-in PII entity types to detect. A green checkbox means the type is selected and will be detected. An empty checkbox means the type is unselected and will be ignored.
- Trusted Strings - Define specific values that should bypass detection, even if they match a selected entity type (above).
- Custom Entity Types - Define new entity types using regular expressions (regex).
Code Detection - Detects code, which in a prompt might reflect malicious injection, and in a response might indicate unauthorized AI behavior.
Input
- Allow - Allows the input and sends an alert to the Console.
- Block - Blocks the input and sends an alert to the Console.
Output
- Allow - Allows the output and sends an alert to the Console.
- Block - Blocks the output and sends an alert to the Console.
Guardrail Detection - Detects a violation of pre-defined safety parameters, triggering a refusal by the AI.
Output
- Allow - Allows the output and sends an alert to the Console.
- Block - Blocks the output and sends an alert to the Console.