LLM Sandbox

The LLM Sandbox is a demo environment to test the functionality of HiddenLayer's AI Runtime Security. This is available in the HiddenLayer Console. You can enable policy settings, then send prompts and see the results. There are some preconfigured test examples to help you get started. The LLM Sandbox can generate a policy configuration based on the settings you have enabled. With this generated policy configuration, you can copy it into your container settings or application code.

For more information about how Runtime Security functions, see the Runtime Security Overview.

Early Access

The LLM Sandbox is Early Access. Ask your HiddenLayer representative for more information.

Prompt Injection

The following steps will use the prompt injection example to help explain the LLM Sandbox features. You can try the other examples to become familiar with the policy configuration settings.

In the Console, go to Runtime Security > LLM Sandbox.
Click OWASP Scenarios. A list of available OWASP examples displays.
Select LLM01: Prompt Injection. A pre-defined prompt is entered.
Click the analyze button (paper airplane icon).
The prompt is sent to Runtime Security to be analyzed.
- The policy settings for Prompt and Output are set to Alert Only. Therefore:
  - Runtime Security will send the prompt to the ML model
  - The ML model will process the prompt and return the output to Runtime Security
  - Runtime Security will send the output to the AI Security Platform
- Latency data is provided. The Runtime Security Detection Latency is the amount of time for Runtime Security to process the request. The Upstream LLM Latency is the amount of time for the model to generate the response.
  - If Runtime Security blocks the input, the latency could be zero.
Example Prompts
For demonstration purposes, the example prompts are unsafe and will trigger a detection.
Click Detections to display more details about the detection.
Click View Model, then click the Runtime Security tab to display a list of incidents for the Sandbox.
Click the green arrow to display inference information about the incident.
- The Incident Details contains overview information and short descriptions of the incident.
- The Inferences tab lists all of the events and their related detection category. Clicking the green arrow for an event displays an Interaction Details window.
- The MITRE ATLAS tab shows the tactics and techniques related to the incident.
Go back to the LLM Sandbox, then click Advanced Policy Settings. The policy settings expand to display all of the policy settings.
For the LLM Denial of Service policy setting, select Block Denial of Service.
Select OWASP Scenarios, select LLM10: Unbounded Consumption, then click the analyze button (paper airplane icon). With the policy set to Alert and Block for prompt injection, the unsafe prompt is not sent to the ML model.
Click View Model to see the LLM Sandbox model artifacts.

Custom Entity

Using a Custom Entity, you can add data strings, like words or numbers, to prevent data leakage of personally identifiable information (PII). After creating a custom entity, Runtime Security can alert or alert and block this data.

The LLM Sandbox allows you to create one custom entity. With your own Runtime Security instance, you can add multiple custom entities to your policy configuration.

A custom entity pattern uses regular expressions (regex patterns). For more information about regular expressions, see this Python document. To test regex, check out regex101.

Initial Prompt Analysis

The first time a prompt is submitted that includes a custom entity pattern, the initial analysis may take longer than normal. Subsequent submissions including the same custom entity pattern will see expected response times.

Individual Data Strings

You can add individual data strings, like words or numbers, to the custom entity pattern and the LLM Sandbox will alert or block prompts with any of these data strings.

Use the following example to try out the Custom Entity feature in the LLM Sandbox.

In the LLM Sandbox, click Advanced Policy Settings.
Enter the example Custom Entity Name.
Enter the Custom Entity Pattern.
Valid Python Regular Expression
The Custom Entity Pattern needs to be a valid Python Regular Expression.
Type a prompt that includes one of the strings from the Custom Entity Pattern.
Click Analyze to run the prompt.
Click View Detections to see what Runtime Security detected. In the image below, you will see the Custom Entity Name under Details. This lets you know what triggered the detection. Also in the example image below, there is a Prompt Injection. This is because the prompt used included a word that triggered Prompt Injection.
By default, Redact PII is set to Don’t Redact. If you want to see the LLM Sandbox redact the prompt, set Redact PII to Redact. Redact PII is under Advanced Policy Settings.
If you want to block any prompts that include the custom entity pattern, set Data Leakage to Alert and Block. Data Leakage is under Basic Policy Settings.

Example Custom Entity Name

swear_words

Example Custom Entity Pattern

(damn|hell|crap)

Example Prompt Submission

What is crap?

Custom Entity Pattern

The custom entity pattern must be precise. Adding spaces, quotation marks, or other characters can cause unexpected results.

Policy Settings

The policy settings control what the LLM Sandbox does when you submit a prompt. This allows you to only get Alerts or to Alert and Block specific types of prompts.

Alerts and Blocks

This applies to most policy settings.

Setting	Description
Alert Only	Provides detection alerts only. Allows the prompt and the output.
Alert and Block	Provides detection alerts and blocks the prompt and the output, based on the policy configuration.

Basic Policy Settings

Prompt

The prompt that is sent to the ML Model.

Setting	Description	Env Key
Block Code Modality	If the input code detection category is true, the message will be blocked.	`HL_LLM_BLOCK_INPUT_CODE_DETECTION`
Block Data Leakage	If input PII category is true, message will be blocked.	`HL_LLM_BLOCK_INPUT_PII`
Block Prompt Injection	If prompt injection category is true, the message will be blocked.	`HL_LLM_BLOCK_PROMPT_INJECTION`

Output

The output from the ML Model based on the prompt.

Setting	Description	Env Key
Block Code Modality	If the output code detection category is true, the message will be blocked.	`HL_LLM_BLOCK_OUTPUT_CODE_DETECTION`
Block Data Leakage	If output PII category is true, message will be blocked.	`HL_LLM_BLOCK_OUTPUT_PII`
Block Guardrail Activation	If the guardrail detection category is false, the message will be blocked.	`HL_LLM_BLOCK_GUARDRAIL_DETECTION`

Advanced Policy Settings

Prompt Injection

Setting Description Env Key

Setting	Description	Env Key
Full Scan	Type of prompt injection scan to perform, `Full` or `Quick`. Deselecting the `Full Scan` checkbox sets the prompt injection scan to `Quick`. Quick: Scans the entirety of the user prompt. Full: Scans the entirety of the user prompt and scans additional variations of the user prompt.	`HL_LLM_PROMPT_INJECTION_SCAN_TYPE`

Full Scan

Type of prompt injection scan to perform, Full or Quick.

Deselecting the Full Scan checkbox sets the prompt injection scan to Quick.
Quick: Scans the entirety of the user prompt.
Full: Scans the entirety of the user prompt and scans additional variations of the user prompt.

HL_LLM_PROMPT_INJECTION_SCAN_TYPE

LLM Denial of Service

Setting	Description	Env Key
Block Denial of Service	If the LLM denial of service category is true, the message will be blocked.	`HL_LLM_BLOCK_INPUT_DOS_DETECTION`
LLM Denial of Service Alert Threshold	The threshold for input denial of service detection.	`HL_LLM_INPUT_DOS_DETECTION_THRESHOLD`

Data Leakage

Redact output before sending to the caller.

Setting Description Env Key

Setting	Description	Env Key
All Entities		Environment variables `HL_LLM_REDACT_OUTPUT_PII="false"` `HL_LLM_REDACT_TYPE="entity"` `HL_LLM_ENTITY_TYPE="all"`
Redact Output PII		Environment variables `HL_LLM_REDACT_OUTPUT_PII="true"` `HL_LLM_REDACT_TYPE="entity"` `HL_LLM_ENTITY_TYPE="all"`
Strict Redaction		Environment variables `HL_LLM_REDACT_OUTPUT_PII="false"` `HL_LLM_REDACT_TYPE="strict"` `HL_LLM_ENTITY_TYPE="all"`

All Entities

Environment variables

HL_LLM_REDACT_OUTPUT_PII="false"
HL_LLM_REDACT_TYPE="entity"
HL_LLM_ENTITY_TYPE="all"

Redact Output PII

Environment variables

HL_LLM_REDACT_OUTPUT_PII="true"
HL_LLM_REDACT_TYPE="entity"
HL_LLM_ENTITY_TYPE="all"

Strict Redaction

Environment variables

HL_LLM_REDACT_OUTPUT_PII="false"
HL_LLM_REDACT_TYPE="strict"
HL_LLM_ENTITY_TYPE="all"

Custom Entities

For examples, see Custom Entity.

Setting Description Env Key

Setting	Description	Env Key
Custom Entity Name	The name of custom PII recognizer. If Name is supplied, expression must also be provided under same name.	There are two environment keys for this entity. `HL_LLM_PROXY_PII_CUSTOM_{{name}}`: The name of the custom entity. `HL_LLM_PROXY_PII_CUSTOM_{{name}}_ENTITY`: The entity to replace custom PII with, if found.
Custom Entity Pattern	The regex expression used to find custom PII.	`HL_LLM_PROXY_PII_CUSTOM_((name))_EXPRESSION`

Custom Entity Name

The name of custom PII recognizer.

If Name is supplied, expression must also be provided under same name.

There are two environment keys for this entity.

HL_LLM_PROXY_PII_CUSTOM_{{name}}: The name of the custom entity.
HL_LLM_PROXY_PII_CUSTOM_{{name}}_ENTITY: The entity to replace custom PII with, if found.

Custom Entity Pattern The regex expression used to find custom PII. HL_LLM_PROXY_PII_CUSTOM_((name))_EXPRESSION

Detection Category Severity

Setting	Description	Env Key
Prompt Injection Severity	Sets severity for Prompt Injection conviction category. Accepted values: `low`, `medium`, `high`.	`HL_LLM_PROXY_CONVICTION_SEVERITY_PROMPT_INJECTION`
Data Leakage Severity	Sets severity for Data Leakage conviction category. Accepted values: `low`, `medium`, `high`.	`HL_LLM_PROXY_CONVICTION_SEVERITY_DATA_LEAKAGE`
LLM Denial of Service Severity	Sets severity for Denial-of-Service conviction category. Accepted values: `low`, `medium`, `high`.	`HL_LLM_PROXY_CONVICTION_SEVERITY_DENIAL_OF_SERVICE`
Modality Restriction Severity	Sets severity for Modality Restriction conviction category. Accepted values: `low`, `medium`, `high`.	`HL_LLM_PROXY_CONVICTION_SEVERITY_MODALITY_RESTRICTION`
Guardrail Activation Severity	Sets severity for Guardrail conviction category. Accepted values: `low`, `medium`, `high`.	`HL_LLM_PROXY_CONVICTION_SEVERITY_GUARDRAIL`

Language Detection

Setting Description Env Key

Setting	Description	Env Key
Block Unsupported Languages	This is a Client policy configuration. This requires two headers. Must set block to `true`. Must include allowed languages. Languages not included will be blocked.	Required headers. `"X-LLM-Block-Input-Language-Detection": "true"` `"X-LLM-Input-Allowed-Languages": "AR,BN,ZH,EN,FR,DE,HI,ID,IT,JA,KO,MR,PT,PA,RU,ES,TA,TE,TR,UR,VI"`

Block Unsupported Languages

This is a Client policy configuration. This requires two headers.

Must set block to true.
Must include allowed languages. Languages not included will be blocked.

Required headers.

"X-LLM-Block-Input-Language-Detection": "true"
"X-LLM-Input-Allowed-Languages": "AR,BN,ZH,EN,FR,DE,HI,ID,IT,JA,KO,MR,PT,PA,RU,ES,TA,TE,TR,UR,VI"

Supported Languages

Type	Supported Languages
High coverage in prompt injection model	English, French, German, Italian, Japanese, Korean, Spanish
Low coverage in prompt injection model	Arabic, Bengali, Chinese, Hindi, Indonesian, Marathi, Punjabi, Portuguese, Russian, Tamil, Telugu, Turkish, Urdu, Vietnamese

Policy Configuration

Generates a .env file example that you can use.

Policy Configuration Type	Description
Server	This is the environment file used to set the default policy configuration that will be used by the Runtime Security in the absence of any Client Side request headers to override the default policy. Once the file is downloaded and saved on your Runtime Security Pods / Containers, they will need to be restarted to apply the configuration.
Client	These are Client side HTTP Header values that can be passed to the Runtime Security with the requests from the AI Application front end (which also contains the input prompt) to set policy options per application (or even per request). This includes the ability to override the default policies set by the Server side .env file.

Generate Policy Configuration

On the LLM Sandbox page, make any changes you want to the Advanced Policy Settings for Prompt Policies and Output Policies.
Under Policy Configuration, make sure Server is selected.
Click Generate Policy Configuration. You can download or copy the server code. The downloaded file has a .env file extension.
Close the Policy Configuration Server Code window.
Under Policy Configuration, select Client.
Click Generate Policy Configuration. The client code is available in Python and JavaScript. You can download or copy the client code.
Close the Policy Configuration Client Code window.

LLM Sandbox

Prompt Injection

Custom Entity

Individual Data Strings

Policy Settings

Alerts and Blocks

Basic Policy Settings

Prompt

Output

Advanced Policy Settings

Prompt Injection

LLM Denial of Service

Data Leakage

Custom Entities

Detection Category Severity

Language Detection

Supported Languages

Policy Configuration

Generate Policy Configuration

Was this helpful?