Skip to content

AI Runtime Security Policies

Policies for AI Runtime Security allow you to remotely configure your instance of Runtime Security. You can set which detectors are on, whether they block or just alert, and manage more nuanced configuration settings.

You can also create custom policies to support unique use cases, defined as Projects. For example, if you have use cases that process PII for benign purposes, you can create a policy that turns off the PII detector for specific projects, while leaving it on as the default. You can use Projects to define your use cases and assign them custom policies.

If you have Runtime Security currently deployed, your configurations will be set by environment variables and can be overridden by request headers. Establishing a policy in Rules allows you to manage these configurations remotely, through the HiddenLayer Console.

In order to implement Rules and Projects, you will need to take some technical steps to update environment variables, update the API key for your Runtime Security implementation, and update the header fields you’re sending to Runtime Security.

Prerequisites

The following are required to use a Policy.

After creating a policy, you can create a Project and add the policy to the project.

Create a Policy

  1. In the Console, go to Runtime Security > Policy.

  2. Click + Add Policy. This is in the upper-right, next to the search field.

  3. Add a policy name and an optional description, then click Next.

  4. Enable or disable the detection categories for the policy.

  5. Most enabled detection categories allow you to select an action.

    • Allow - Allows the input or output and sends an alert on the detection to the Console.
    • Block - Blocks the input or output and sends an alert on the detection to the Console.
    • Redact - Redacts the input or output and sends an alert on the detection to the Console. This replaces the redacted text with [REDACTED] or the information type (example: [PHONE NUMBER]).
  6. Click Publish.

Edit a Policy

  1. In the Console, go to Runtime Security > Policy.
  2. Select the policy you want to edit from the list on the right-hand side.
  3. Click Edit, then select Edit.
  4. Make any changes to the policy.
  5. Click Save.

Delete a Policy

  1. In the Console, go to Runtime Security > Policy.
  2. Select the policy you want to edit from the list on the right-hand side.
  3. Click Edit, then select Delete. A message displays, asking you to confirm deleting the policy.
  4. Click Delete.

Detection Category Descriptions

Prompt Injection - Detects an attempt to manipulate an LLM into executing instructions against its intended purpose.

  • Input

    • Allow - Allows the input and sends an alert to the Console.
    • Block - Blocks the input and sends an alert to the Console.
  • Detect by Performing

    • Quick Scan - Scans the full input. Doesn’t do a second scan with non-alphanumeric characters stripped.
    • Full Scan - Scans the full input and performs a second scan with non-alphanumeric characters stripped.
  • Overrides (click Show Overrides to expand the section)

    • Trusted Strings - Strings that will be allowed even if they match injection signatures. All entries are case-insensitive and will be ignored by the Prompt Injection detector.
    • Forbidden Strings - Strings that should always be blocked, even if not detected by the scanner. All entries are case-insensitive.

Denial of Service - Detects an attempt to overload model traffic to degrade performance or consume tokens.

  • Input

    • Allow - Allows the input and sends an alert to the Console.
    • Block - Blocks the input and sends an alert to the Console.
  • Set Threshold

    • DoS Threshold - Determines the number of tokens a requestor may pass before they are considered adversarial. The default is 4096 tokens.

Personal Identifiable Information - Detects sensitive information that may identify a person, such as a name, phone number, or credit card number.

  • Input

    • Allow - Allows the input and sends an alert to the Console.

    • Block - Blocks the input and sends an alert to the Console.

    • Redact - Redacts the input and sends an alert to the Console.

      • Replace redacted text with - Determines if redacted text is replaced with the term [REDACTED] or a label of the information type.

        • Example [INFO TYPE] - Call me [PHONE NUMBER]
        • Example [REDACTED] - Call me [REDACTED]
  • Output

    • Allow - Allows the output and sends an alert to the Console.

    • Block - Blocks the output and sends an alert to the Console. -Redact - Redacts the output and sends an alert to the Console.

      • Replace redacted text with - Determines if redacted text is replaced with the term [REDACTED] or a label of the information type.

        • Example [INFO TYPE] - Call me [PHONE NUMBER]
        • Example [REDACTED] - Call me [REDACTED]
  • Overrides (click Show Overrides to expand the section)

    • Entity Types to Detect - Select which built-in PII entity types to detect. A green checkbox means the type is selected and will be detected. An empty checkbox means the type is unselected and will be ignored.
    • Trusted Strings - Define specific values that should bypass detection, even if they match a selected entity type (above).
    • Custom Entity Types - Define new entity types using regular expressions (regex).

Code Detection - Detects code, which in a prompt might reflect malicious injection, and in a response might indicate unauthorized AI behavior.

  • Input

    • Allow - Allows the input and sends an alert to the Console.
    • Block - Blocks the input and sends an alert to the Console.
  • Output

    • Allow - Allows the output and sends an alert to the Console.
    • Block - Blocks the output and sends an alert to the Console.

Guardrail Detection - Detects a violation of pre-defined safety parameters, triggering a refusal by the AI.

  • Output

    • Allow - Allows the output and sends an alert to the Console.
    • Block - Blocks the output and sends an alert to the Console.