Skip to content

LLM Sandbox

The LLM Sandbox is a demo environment to test the functionality of HiddenLayer's AI Runtime Security. This is available in the HiddenLayer Console. You can enable policy settings, then send prompts and see the results. There are some preconfigured test examples to help you get started. The LLM Sandbox can generate a policy configuration based on the settings you have enabled. With this generated policy configuration, you can copy it into your container settings or application code.

For more information about how Runtime Security functions, see the Runtime Security Overview.

Early Access

The LLM Sandbox is Early Access. Ask your HiddenLayer representative for more information.

Prompt Injection

The following steps will use the prompt injection example to help explain the LLM Sandbox features. You can try the other examples to become familiar with the policy configuration settings.

  1. In the Console, go to Runtime Security > LLM Sandbox.

  2. Click OWASP Scenarios. A list of available OWASP examples displays.

    Select Example Scenario
  3. Select LLM01: Prompt Injection. A pre-defined prompt is entered.

  4. Click the analyze button (paper airplane icon).

    Analyze Prompt Injection
  5. The prompt is sent to Runtime Security to be analyzed.

    • The policy settings for Prompt and Output are set to Alert Only. Therefore:

      • Runtime Security will send the prompt to the ML model
      • The ML model will process the prompt and return the output to Runtime Security
      • Runtime Security will send the output to the AI Security Platform
    • Latency data is provided. The Runtime Security Detection Latency is the amount of time for Runtime Security to process the request. The Upstream LLM Latency is the amount of time for the model to generate the response.

      • If Runtime Security blocks the input, the latency could be zero.
    Example Prompts

    For demonstration purposes, the example prompts are unsafe and will trigger a detection.

  6. Click Detections to display more details about the detection.

    View Detections
  7. Click View Model, then click the Runtime Security tab to display a list of incidents for the Sandbox.

    AIDR Tab
  8. Click the green arrow to display inference information about the incident.

    • The Incident Details contains overview information and short descriptions of the incident.
    • The Inferences tab lists all of the events and their related detection category. Clicking the green arrow for an event displays an Interaction Details window.
    • The MITRE ATLAS tab shows the tactics and techniques related to the incident.
    View Inference Information
  9. Go back to the LLM Sandbox, then click Advanced Policy Settings. The policy settings expand to display all of the policy settings.

    Advanced Policy Settings
  10. For the LLM Denial of Service policy setting, select Block Denial of Service.

    Block Denial of Service
  11. Select OWASP Scenarios, select LLM10: Unbounded Consumption, then click the analyze button (paper airplane icon). With the policy set to Alert and Block for prompt injection, the unsafe prompt is not sent to the ML model.

    OWASP LLM10 Results
  12. Click View Model to see the LLM Sandbox model artifacts.

    View Model

Custom Entity

Using a Custom Entity, you can add data strings, like words or numbers, to prevent data leakage of personally identifiable information (PII). After creating a custom entity, Runtime Security can alert or alert and block this data.

The LLM Sandbox allows you to create one custom entity. With your own Runtime Security instance, you can add multiple custom entities to your policy configuration.

A custom entity pattern uses regular expressions (regex patterns). For more information about regular expressions, see this Python document. To test regex, check out regex101.

Initial Prompt Analysis

The first time a prompt is submitted that includes a custom entity pattern, the initial analysis may take longer than normal. Subsequent submissions including the same custom entity pattern will see expected response times.

Custom Entity

Individual Data Strings

You can add individual data strings, like words or numbers, to the custom entity pattern and the LLM Sandbox will alert or block prompts with any of these data strings.

Use the following example to try out the Custom Entity feature in the LLM Sandbox.

  1. In the LLM Sandbox, click Advanced Policy Settings.

  2. Enter the example Custom Entity Name.

  3. Enter the Custom Entity Pattern.

    Valid Python Regular Expression

    The Custom Entity Pattern needs to be a valid Python Regular Expression.

  4. Type a prompt that includes one of the strings from the Custom Entity Pattern.

  5. Click Analyze to run the prompt.

  6. Click View Detections to see what Runtime Security detected. In the image below, you will see the Custom Entity Name under Details. This lets you know what triggered the detection. Also in the example image below, there is a Prompt Injection. This is because the prompt used included a word that triggered Prompt Injection.

  7. By default, Redact PII is set to Don’t Redact. If you want to see the LLM Sandbox redact the prompt, set Redact PII to Redact. Redact PII is under Advanced Policy Settings.

  8. If you want to block any prompts that include the custom entity pattern, set Data Leakage to Alert and Block. Data Leakage is under Basic Policy Settings.

Example Custom Entity Name

swear_words

Example Custom Entity Pattern

(damn|hell|crap)

Example Prompt Submission

What is crap?
Custom Entity Pattern

The custom entity pattern must be precise. Adding spaces, quotation marks, or other characters can cause unexpected results.

Custom Entity Pattern

Policy Settings

The policy settings control what the LLM Sandbox does when you submit a prompt. This allows you to only get Alerts or to Alert and Block specific types of prompts.

Alerts and Blocks

This applies to most policy settings.

SettingDescription
Alert OnlyProvides detection alerts only. Allows the prompt and the output.
Alert and BlockProvides detection alerts and blocks the prompt and the output, based on the policy configuration.

Basic Policy Settings

Prompt

The prompt that is sent to the ML Model.

SettingDescriptionEnv Key
Block Code ModalityIf the input code detection category is true, the message will be blocked.HL_LLM_BLOCK_INPUT_CODE_DETECTION
Block Data LeakageIf input PII category is true, message will be blocked.HL_LLM_BLOCK_INPUT_PII
Block Prompt InjectionIf prompt injection category is true, the message will be blocked.HL_LLM_BLOCK_PROMPT_INJECTION

Output

The output from the ML Model based on the prompt.

SettingDescriptionEnv Key
Block Code ModalityIf the output code detection category is true, the message will be blocked.HL_LLM_BLOCK_OUTPUT_CODE_DETECTION
Block Data LeakageIf output PII category is true, message will be blocked.HL_LLM_BLOCK_OUTPUT_PII
Block Guardrail ActivationIf the guardrail detection category is false, the message will be blocked.HL_LLM_BLOCK_GUARDRAIL_DETECTION

Advanced Policy Settings

Prompt Injection

SettingDescription Env Key

Full Scan

Type of prompt injection scan to perform, Full or Quick.

  • Deselecting the Full Scan checkbox sets the prompt injection scan to Quick.
  • Quick: Scans the entirety of the user prompt.
  • Full: Scans the entirety of the user prompt and scans additional variations of the user prompt.

HL_LLM_PROMPT_INJECTION_SCAN_TYPE

LLM Denial of Service

SettingDescriptionEnv Key
Block Denial of ServiceIf the LLM denial of service category is true, the message will be blocked.HL_LLM_BLOCK_INPUT_DOS_DETECTION
LLM Denial of Service Alert ThresholdThe threshold for input denial of service detection.HL_LLM_INPUT_DOS_DETECTION_THRESHOLD

Data Leakage

Redact output before sending to the caller.

SettingDescriptionEnv Key

All Entities

Environment variables

  • HL_LLM_REDACT_OUTPUT_PII="false"
  • HL_LLM_REDACT_TYPE="entity"
  • HL_LLM_ENTITY_TYPE="all"

Redact Output PII

Environment variables

  • HL_LLM_REDACT_OUTPUT_PII="true"
  • HL_LLM_REDACT_TYPE="entity"
  • HL_LLM_ENTITY_TYPE="all"

Strict Redaction

Environment variables

  • HL_LLM_REDACT_OUTPUT_PII="false"
  • HL_LLM_REDACT_TYPE="strict"
  • HL_LLM_ENTITY_TYPE="all"

Custom Entities

For examples, see Custom Entity.

SettingDescriptionEnv Key

Custom Entity Name

The name of custom PII recognizer.

  • If Name is supplied, expression must also be provided under same name.

There are two environment keys for this entity.

  • HL_LLM_PROXY_PII_CUSTOM_{{name}}: The name of the custom entity.
  • HL_LLM_PROXY_PII_CUSTOM_{{name}}_ENTITY: The entity to replace custom PII with, if found.
Custom Entity PatternThe regex expression used to find custom PII.HL_LLM_PROXY_PII_CUSTOM_((name))_EXPRESSION

Detection Category Severity

SettingDescriptionEnv Key

Prompt Injection Severity

Sets severity for Prompt Injection conviction category.

  • Accepted values: low, medium, high.

HL_LLM_PROXY_CONVICTION_SEVERITY_PROMPT_INJECTION

Data Leakage Severity

Sets severity for Data Leakage conviction category.

  • Accepted values: low, medium, high.

HL_LLM_PROXY_CONVICTION_SEVERITY_DATA_LEAKAGE

LLM Denial of Service Severity

Sets severity for Denial-of-Service conviction category.

  • Accepted values: low, medium, high.

HL_LLM_PROXY_CONVICTION_SEVERITY_DENIAL_OF_SERVICE

Modality Restriction Severity

Sets severity for Modality Restriction conviction category.

  • Accepted values: low, medium, high.

HL_LLM_PROXY_CONVICTION_SEVERITY_MODALITY_RESTRICTION

Guardrail Activation Severity

Sets severity for Guardrail conviction category.

  • Accepted values: low, medium, high.

HL_LLM_PROXY_CONVICTION_SEVERITY_GUARDRAIL

Language Detection

SettingDescriptionEnv Key

Block Unsupported Languages

This is a Client policy configuration. This requires two headers.

  • Must set block to true.
  • Must include allowed languages. Languages not included will be blocked.

Required headers.

  • "X-LLM-Block-Input-Language-Detection": "true"
  • "X-LLM-Input-Allowed-Languages": "AR,BN,ZH,EN,FR,DE,HI,ID,IT,JA,KO,MR,PT,PA,RU,ES,TA,TE,TR,UR,VI"
Supported Languages
TypeSupported Languages
High coverage in prompt injection modelEnglish, French, German, Italian, Japanese, Korean, Spanish
Low coverage in prompt injection modelArabic, Bengali, Chinese, Hindi, Indonesian, Marathi, Punjabi, Portuguese, Russian, Tamil, Telugu, Turkish, Urdu, Vietnamese

Policy Configuration

Generates a .env file example that you can use.

Policy Configuration Type Description
ServerThis is the environment file used to set the default policy configuration that will be used by the Runtime Security in the absence of any Client Side request headers to override the default policy. Once the file is downloaded and saved on your Runtime Security Pods / Containers, they will need to be restarted to apply the configuration.
ClientThese are Client side HTTP Header values that can be passed to the Runtime Security with the requests from the AI Application front end (which also contains the input prompt) to set policy options per application (or even per request). This includes the ability to override the default policies set by the Server side .env file.

Generate Policy Configuration

  1. On the LLM Sandbox page, make any changes you want to the Advanced Policy Settings for Prompt Policies and Output Policies.

  2. Under Policy Configuration, make sure Server is selected.

    Select Server for Policy Configuration
  3. Click Generate Policy Configuration. You can download or copy the server code. The downloaded file has a .env file extension.

    Generate Policy Configuration for Server
  4. Close the Policy Configuration Server Code window.

  5. Under Policy Configuration, select Client.

  6. Click Generate Policy Configuration. The client code is available in Python and JavaScript. You can download or copy the client code.

    Generate Policy Configuration for Client
  7. Close the Policy Configuration Client Code window.