# LLM Sandbox

The LLM Sandbox is demo environment to test the functionality of the AI Detection & Response Proxy (AIDR Proxy). This is available in the Console. You can enable policy settings, then send prompts and see the results. There are some preconfigured test examples to help you get started. The LLM Sandbox can generate a policy configuration based on the settings you have enabled. With this generated policy configuration, you can copy it into your container settings or application code.

For more information about how AIDR functions, see the [AIDR Overview](/docs/products/aidr-g/overview).

Early Access
The LLM Sandbox is Early Access. Ask your HiddenLayer representative for more information.

## Prompt Injection

The following steps will use the prompt injection example to help explain the LLM Sandbox features. You can try the other examples to become familiar with the policy configuration settings.

1. In the HiddenLayer AISec Platform, click the LLM Sandbox icon. The LLM Sandbox page displays.
2. Click **OWASP Scenarios**. A list of available OWASP examples displays.

3. Select **LLM01: Prompt Injection**. A pre-defined prompt is entered.
4. Click the analyze button (paper airplane icon).

5. The prompt is sent to the AIDR Proxy to be analyzed.
  - The policy settings for Prompt and Output are set to Alert Only. Therefore:
    - AIDR will send the prompt to the ML model
    - The ML model will process the prompt and return the output to AIDR
    - AIDR will send the output to the AISec Platform
  - Latency data is provided. The AIDR Detection Latency is the amount of time for AIDR to process the request. The Upstream LLM Latency is the amount of time for the model to generate the response.
    - If AIDR blocks the input, the latency could be zero.
For demonstration purposes, the example prompts are unsafe and will trigger a detection.
6. Click **Detections** to display more details about the detection.

7. Click **View Model**, then click the AIDR tab to display a list of incidents for the Sandbox.

8. Click the green arrow to display inference information about the incident.
  - The Incident Details contains overview information and short descriptions of the incident.
  - The Inferences tab lists all of the events and their related detection category. Clicking the green arrow for an event displays an Interaction Details window.
  - The MITRE ATLAS tab shows the tactics and techniques related to the incident.

9. Go back to the LLM Sandbox, then click Advanced Policy Settings. The policy settings expand to display all of the policy settings.

10. For the LLM Denial of Service policy setting, select **Block Denial of Service**.

11. Select **OWASP Scenarios**, select **LLM10: Unbounded Consumption**, then click the analyze button (paper airplane icon). With the policy set to Alert and Block for prompt injection, the unsafe prompt is not sent to the ML model.

12. Click **View Model** to see the LLM Sandbox model artifacts.


## Custom Entity

Using a Custom Entity, you can add data strings, like words or numbers, to prevent data leakage of personally identifiable information (PII). After creating a custom entity, AIDR can alert or alert and block this data.

The LLM Sandbox allows you to create one custom entity. With your own AIDR instance, you can add multiple custom entities to your policy configuration.

A custom entity pattern uses regular expressions (regex patterns). For more information about regular expressions, see this Python document. To test regex, check out regex101.

Initial Prompt Analysis
The first time a prompt is submitted that includes a custom entity pattern, the initial analysis may take longer than normal. Subsequent submissions including the same custom entity pattern will see expected response times.

Custom Entity
### Individual Data Strings

You can add individual data strings, like words or numbers, to the custom entity pattern and the LLM Sandbox will alert or block prompts with any of these data strings.

Use the following example to try out the Custom Entity feature in the LLM Sandbox.

1. In the LLM Sandbox, click **Advanced Policy Settings**.
2. Enter the example Custom Entity Name.
3. Enter the Custom Entity Pattern. Note: The Custom Entity Pattern needs to be a valid Python Regular Expression.
4. Type a prompt that includes one of the strings from the Custom Entity Pattern.
5. Click **Analyze** to run the prompt.
6. Click **View Detections** to see what AIDR detected. In the image below, you will see the Custom Entity Name under Details. This lets you know what triggered the detection. Also in the example image below, there is a Prompt Injection. This is because the prompt used included a word that triggered Prompt Injection.
7. By default, Redact PII is set to Don’t Redact. If you want to see the LLM Sandbox redact the prompt, set Redact PII to Redact. Redact PII is under Advanced Policy Settings.
8. If you want to block any prompts that include the custom entity pattern, set Data Leakage to Alert and Block. Data Leakage is under Basic Policy Settings.


**Example Custom Entity Name**


```
swear_words
```

**Example Custom Entity Pattern**


```
(damn|hell|crap)
```

**Example Prompt Submission**


```
What is crap?
```

Custom Entity Pattern
The custom entity pattern must be precise. Adding spaces, quotation marks, or other characters can cause unexpected results.

Custom Entity Pattern
## Policy Settings

The policy settings control what the LLM Sandbox does when you submit a prompt. This allows you to only get Alerts or to Alert and Block specific types of prompts.

**Alert and Block** (applies to most policy settings)

- Alert Only - Provides detection alerts only. Allows the prompt and the output.
- Alert and Block - Provides detection alerts and blocks the prompt and the output, based on the policy configuration.


**Basic Policy Settings**

- Prompt - The prompt that is sent to the ML Model.
  - Block Code Modality
  - Block Data Leakage
  - Block Prompt Injection
- Output - The output from the ML Model based on the prompt.
  - Block Code Modality
  - Block Data Leakage
  - Block Guardrail Activation


**Advanced Policy Settings**

- Prompt Injection
  - Full Scan
- LLM Denial of Service
  - Block Denial of Service
  - LLM Denial of Service Alert Threshold
- Data Leakage
  - All Entities
  - Redact Output PII
  - Strict Redaction
- Custom Entities
  - See [Custom Entity](#custom-entity)
- Detection Category Severity
  - Prompt Injection Severity
  - Data Leakage Severity
  - LLM Denial of Service Severity
  - Modality Restriction Severity
  - Guardrail Activation Severity
- Language Detection
  - Block Unsupported Languages
  - Supported Languages
    - High coverage in prompt injection model
      - German
      - English
      - Spanish
      - French
      - Italian
      - Japanese
      - Korean
    - Low coverage in prompt injection model
      - Arabic
      - Bengali
      - Hindi
      - Indonesian
      - Marathi
      - Punjabi
      - Portuguese
      - Russian
      - Tamil
      - Telugu
      - Turkish
      - Urdu
      - Vietnamese
      - Chinese


**Policy Configuration**

- Server - This is the environment file used to set the default policy configuration that will be used by the AIDR-G in the absence of any Client Side request headers to override the default policy. Once the file is downloaded and saved on your AIDR-G Pods / Containers, they will need to be restarted to apply the configuration.
- Client - These are Client side HTTP Header values that can be passed to the AIDR-G with the requests from the AI Application front end (which also contains the input prompt) to set policy options per application (or even per request). This includes the ability to override the default policies set by the Server side .env file.


### Generate Policy Configuration

1. On the LLM Sandbox page, make any changes you want to the Advanced Policy Settings for Prompt Policies and Output Policies.
2. Under Policy Configuration, make sure **Server** is selected.

3. Click **Generate Policy Configuration**. You can download or copy the server code. The downloaded file has a `.env` file extension.

4. Close the Policy Configuration Server Code window.
5. Under Policy Configuration, select **Client**.
6. Click **Generate Policy Configuration**. The client code is available in Python and JavaScript. You can download or copy the client code.

7. Close the Policy Configuration Client Code window.