Policy Configuration
Copy for LLM
Copy page as Markdown for LLMs
View as Markdown
Open this page as Markdown
Open in ChatGPT
Get insights from ChatGPT
Open in Claude
Get insights from Claude
Connect to Cursor
Install MCP server on Cursor
Connect to VS Code
Install MCP server on VS Code

There are different policy groups, each with specific environment variables and header keys, which can be tailored to the specifications and requirements of your organization. Additionally, you can set the conviction severity levels to determine the appropriate threat level for your organization to trigger a conviction or a block. By default, the policy is to alert only for all detections (all blocks are set to False by default).

Policy configurations can in most (not all) cases additionally be sent at runtime via an additional request header. As stated above, please note that the headers will override deployment-level policy settings, enabling unique policies for different use cases within a single LLM proxy deployment.

Global

By default, the policy will be alert only for all detections.

Most configuration settings are true or false, with false being the default setting. For configurations with different settings, the settings are identified in the Description.

Environment Key	Header Key	Description	Default	Required
HL_LLM_BLOCK_MESSAGE	n/a	The message that displays when a message is blocked.	Message was blocked.	False
HL_LLM_BLOCK_UNSAFE	X-LLM-Block-Unsafe	If overall verdict is true, the message will be blocked.	False	False
HL_LLM_BLOCK_UNSAFE_INPUT	X-LLM-Block-Unsafe-Input	If unsafe input verdict is true, the message will be blocked.	False	False
HL_LLM_BLOCK_UNSAFE_OUTPUT	X-LLM-Block-Unsafe-Output	If unsafe output verdict is true, the message will be blocked.	False	False
HL_LLM_CHAT_COMPLETION_CONTEXT_WINDOW	X-LLM-Chat-Completion-Context-Window	Size of chat completion window to perform analysis on FULL or LAST. `LAST` - Only analyze the last message in a chat completion request. `FULL` - Analyze all messages in a chat completion request.	LAST	False
HL_LLM_INCLUDE_BLOCK_MESSAGE_REASONS	X-LLM-Include-Block-Message-Reasons	When enabled, the block message reasons will be included in the response.	True	False
HL_LLM_PROXY_ENABLE_PASSTHROUGH_STREAMING	X-LLM-Proxy-Enable-Passthrough-Streaming	When enabled, the proxy will immediately start streaming the response back to the requester. Currently available for OpenAI.	False	False
HL_LLM_PROXY_ENABLE_HEADER_POLICY	n/a	Enable security rules to be set per request via HTTP headers. Note: Recommended to set to False for production environments.	True	False
HL_LLM_PROXY_ENABLE_UNSECURED_ROUTE_PASSTHROUGH	X-LLM-Proxy-Enable-Unsecured-Route-Passthrough	When using AIDR as a reverse proxy, all transparent upstream requests passthrough for unsecured routes.	True	False
HL_LLM_PROXY_MAX_REQUEST_SIZE_BYTES	X-LLM-Proxy-Max-Request-Size-Bytes	The maximum size for a request or a response, in bytes.	1000000	False
n/a	x-requester-id	The ID for the requester. This value takes precedence over `hl-user-id`, the requesting IP address (defined as the IP address communicating directly with the AIDR endpoint), and `HL_LLM_PROXY_MLDR_DEFAULT_REQUESTER`.	Requesting IP	False
n/a	hl-user-id	The ID for the HiddenLayer user. This value takes precedence over IP address and `HL_LLM_PROXY_MLDR_DEFAULT_REQUESTER`.	None	False
HL_LLM_PROXY_MLDR_DEFAULT_REQUESTER	n/a	The ID used if no other identification for the requester is found. The default is unknown.	Unknown	False

Prompt Injection

By default, the policy will be alert only for all detections.

Most configuration settings are true or false, with false being the default setting. For configurations with different settings, the settings are identified in the description.

Environment Key	Header Key	Description	Default	Required
HL_LLM_SKIP_PROMPT_INJECTION_DETECTION	X-LLM-Skip-Prompt-Injection-Detection	Flag to skip prompt injection detection.	False	False
HL_LLM_BLOCK_PROMPT_INJECTION	X-LLM-Block-Prompt-Injection	If prompt injection category is true, the message will be blocked.	False	False
HL_LLM_PROMPT_INJECTION_SCAN_TYPE	X-LLM-Prompt-Injection-Scan-Type	Type of prompt injection scan to perform `FULL` or `QUICK`.	FULL	False
HL_LLM_PROXY_PROMPT_INJECTION_ALLOW_{{id}}	X-LLM-Prompt-Injection-Allow-{{id}}	Optional Identifier for custom Prompt Injections Allowed Expression. Note: `{{id}}` must contain only alpha-numeric characters without spaces. Note: The Allow list takes priority over the Block list.	n/a	False
HL_LLM_PROXY_PROMPT_INJECTION_ALLOW_{{id}}_SUBSTRING	X-LLM-Prompt-Injection-Allow-{{id}}-Substring	The substring to indicate a prompt is benign if detected within a prompt flagged as a Prompt Injection, mapped to its identifier. This is a string match. Note: `{{id}}` must contain only alpha-numeric characters without spaces. Caution: Take care when creating the substring to allow. A substring that is a commonly used word or phrase could allow more than expected.	n/a	False
HL_LLM_PROXY_PROMPT_INJECTION_BLOCK_{{id}}	X-LLM-Prompt-Injection-Block-{{id}}	Optional Identifier for custom Prompt Injections Blocklist Expression. Note: `{{id}}` must contain only alpha-numeric characters without spaces. Note: The Allow list takes priority over the Block list.	n/a	False
HL_LLM_PROXY_PROMPT_INJECTION_BLOCK_{{id}}_SUBSTRING	X-LLM-Prompt-Injection-Block-{{id}}-Substring	The substring to indicate a prompt is malicious if detected as an input, mapped to its identifier. This is a string match. Note: `{{id}}` must contain only alpha-numeric characters without spaces. Caution: Take care when creating the substring to allow. A substring that is a commonly used word or phrase could block more than expected.	n/a	False

Prompt Injection Scan Types

QUICK - Only run the classifier on a single pass with 512 tokens.
FULL - Run classifier with multiple passes. This will strip certain characters and run the classifier on each line. Additional latency is added when using a FULL scan and increases with the size of input.

Examples

The following are examples for using keys that include variables.

Prompt Injection Allow

The following is an example config/vaules.yaml where the prompts "digital key" and "digitaler Schlüssel" are allowed.

namespace:
  name: aidr-genai

image:
  repository: quay.io/hiddenlayer/distro-enterprise-aidr-genai

resources:
  requests:
    cpu: 8

replicas:
  min: 1
  max: 1

config:
  HL_LICENSE: <license>
  
  OMP_NUM_THREADS: 8
  
  HL_LLM_PROXY_CLIENT_ID: <client_id>
  HL_LLM_PROXY_CLIENT_SECRET: <client_secret>
  HL_LLM_PROXY_PROMPT_INJECTION_ALLOW_DigitalKey: "digital key"
  HL_LLM_PROXY_PROMPT_INJECTION_ALLOW_DigitalKey_SUBSTRING: "digital key"
  HL_LLM_PROXY_PROMPT_INJECTION_ALLOW_DigitalKeyGerman: "digitaler Schlüssel"
  HL_LLM_PROXY_PROMPT_INJECTION_ALLOW_DigitalKeyGerman_SUBSTRING: "digitaler Schlüssel"

Denial of Service

By default, the policy will be alert only for all detections.

Most configuration settings are true or false, with false being the default setting. For configurations with different settings, the settings are identified in the Description.

Environment Key	Header Key	Description	Default	Required
HL_LLM_SKIP_INPUT_DOS_DETECTION	X-LLM-Skip-Input-DOS-Detection	Flag to skip the LLM denial of service detection.	False	False
HL_LLM_BLOCK_INPUT_DOS_DETECTION	X-LLM-Block-Input-DOS-Detection	If the LLM denial of service category is true, the message will be blocked.	False	False
HL_LLM_INPUT_DOS_DETECTION_THRESHOLD	X-LLM-Input-DOS-Detection-Threshold	Threshold for input denial of service detection.	4096	False

Personal Identifiable Information (PII)

By default, the policy will be alert only for all detections.

Most configuration settings are true or false, with false being the default setting. For configurations with different settings, the settings are identified in the Description.

Environment Key	Header Key	Description	Default	Required
HL_LLM_REDACT_INPUT_PII	X-LLM-Redact-Input-PII	Flag to redact input before sending to the LLM.	False	False
HL_LLM_SKIP_INPUT_PII_DETECTION	X-LLM-Skip-Input-PII-Detection	Flag to skip input PII detection.	False	False
HL_LLM_BLOCK_INPUT_PII	X-LLM-Block-Input-PII	If input PII category is true, message will be blocked.	False	False
HL_LLM_SKIP_OUTPUT_PII_DETECTION	X-LLM-Skip-Output-PII-Detection	Flag to skip output PII detection.	False	False
HL_LLM_BLOCK_OUTPUT_PII	X-LLM-Block-Output-PII	If output PII category is true, message will be blocked.	False	False
HL_LLM_REDACT_OUTPUT_PII	X-LLM-Redact-Output-PII	Flag to redact output before sending to the caller.	False	False
HL_LLM_REDACT_TYPE	X-LLM-Redact-Type	Type of redaction to perform ENTITY (ex [PHONE_NUMBER]) / STRICT (ex [REDACTED]) `ENTITY` - Redaction will be done with the entity type identified. `The company phone number is <PHONE_NUMBER>.` `STRICT` - Redaction will be made with the word REDACTED. `The company phone number is [REDACTED].`	ENTITY	False
HL_LLM_ENTITY_TYPE	X-LLM-Entity-Type	Entity Groups ALL / STRICT. See LLM Entity Types for a list of available types.	STRICT	False
HL_LLM_PROXY_PII_ALLOW_{{id}}	X-LLM-PII-Allow-{{id}}	Optional Identifier for custom PII Allowlist Expression. Note: `{{id}}` must contain only alpha-numeric characters without spaces. Note: The Allow list takes priority over the Block list.	n/a	False
HL_LLM_PROXY_PII_ALLOW_{{id}}_EXPRESSION	X-LLM-Proxy-PII-Allow-{{id}}-Expression	The expression to allow if detected as PII, mapped to its identifier. This is a string match. Note: `{{id}}` must contain only alpha-numeric characters without spaces. Note: The Allow list takes priority over the Block list.	n/a	False
HL_LLM_PROXY_PII_CUSTOM_{{name}}	X-LLM-PII-Custom-{{name}}	Name of custom PII recognizer. If Name is supplied, expression must also be provided under same name.	None	False
HL_LLM_PROXY_PII_CUSTOM_{{name}}_ENTITY	X-LLM-PII-Custom-{{name}}-Entity	The entity to replace custom PII with, if found.	{{REDACTED}}	False
HL_LLM_PROXY_PII_CUSTOM_((name))_EXPRESSION	X-LLM-PII-Custom-((name))-Expression	The regex expression used to find custom PII.	None	False
HL_LLM_OVERRIDE_INPUT_PII_ENTITIES	X-LLM-Override-Input-PII-Entities	Override list of input PII entities that should be looked for in the text.	None	False
HL_LLM_OVERRIDE_OUTPUT_PII_ENTITIES	X-LLM-Override-Output-PII-Entities	Override list of output PII entities that should be looked for in the text.	None	False

LLM Entity Types

ALL

PERSON
LOCATION
ORGANIZATION
EMAIL_ADDRESS
CREDIT_CARD
PHONE_NUMBER
IP_ADDRESS
IBAN_CODE
US_BANK_NUMBER
UK_NINO
URL
US_DRIVER_LICENSE
US_PASSPORT
US_SSN
US_ITIN
ABA_ROUTING_NUMBER
ETH_ADDRESS
BTC_ADDRESS
MAC_ADDRESS
VIN_NUMBER
GPS_COORDINATES
GENERIC_API_KEY

ENABLED BY DEFAULT

CREDIT_CARD
PHONE_NUMBER
IP_ADDRESS
IBAN_CODE
US_BANK_NUMBER
US_PASSPORT
US_SSN
US_ITIN

STRICT

PHONE_NUMBER
CREDIT_CARD
IP_ADDRESS
IBAN_CODE
US_BANK_NUMBER
US_PASSPORT
US_SSN
US_ITIN

Code Detection

By default, the policy will be alert only for all detections.

Most configuration settings are true or false, with false being the default setting. For configurations with different settings, the settings are identified in the Description.

Environment Key	Header Key	Description	Default	Required
HL_LLM_SKIP_INPUT_CODE_DETECTION	X-LLM-Skip-Input-Code-Detection	Flag to skip code detection.	False	False
HL_LLM_BLOCK_INPUT_CODE_DETECTION	X-LLM-Block-Input-Code-Detection	If the input code detection category is true, the message will be blocked.	False	False
HL_LLM_SKIP_OUTPUT_CODE_DETECTION	X-LLM-Skip-Output-Code-Detection	Flag to skip code detection.	False	False
HL_LLM_BLOCK_OUTPUT_CODE_DETECTION	X-LLM-Block-Output-Code-Detection	If the output code detection category is true, the message will be blocked.	False	False
HL_LLM_TIMEOUT_INPUT_CODE_SECONDS	X-LLM-Timeout-Input-Code-Seconds	The number of seconds the code detector should run for an individual request before timing out. Accepted value: integer (example: 10).	False	False
HL_LLM_TIMEOUT_INPUT_CODE_IS_DETECTION	X-LLM-Timeout-Input-Code-Is-Detection	When the code detector times out, should the time out be considered a positive detection for code. Accepted values: true or false.	False	False
HL_LLM_TIMEOUT_OUTPUT_CODE_SECONDS	X-LLM-Timeout-Output-Code-Seconds	The number of seconds the code detector should run for an individual request before timing out. Accepted value: integer (example: 10).	False	False
HL_LLM_TIMEOUT_OUTPUT_CODE_IS_DETECTION	X-LLM-Timeout-Output-Code-Is-Detection	When the code detector times out, should the time out be considered a positive detection for code. Accepted values: true or false.	False	False

Guardrail

By default, the policy will be alert only for all detections.

Most configuration settings are true or false, with false being the default setting. For configurations with different settings, the settings are identified in the Description.

Environment Key	Header Key	Description	Default	Required
HL_LLM_SKIP_GUARDRAIL_DETECTION	X-LLM-Skip-Guardrail-Detection	Flag to skip guardrail detection.	False	False
HL_LLM_SKIP_GUARDRAIL_CLASSIFICATION_DETECTION		Flag to skip guardrail classification	False	False
HL_LLM_BLOCK_GUARDRAIL_DETECTION	X-LLM-Block-Guardrail-Detection	If the guardrail detection category is false, the message will be blocked.	False	False

Language Detection

Attackers attempting to do prompt injection may use multiple languages. HiddenLayer's language detector provides more visibility into your AI usage and helps control potentially malicious behavior. It runs on input prompts only.

The language detector has two components:

When enabled, it predicts whether a prompt is one of the top 20 most spoken languages, or returns unknown.
It allows you to select a set of supported languages, which lets only those languages through.

Supported languages

HiddenLayer's prompt injection model is trained and evaluated for seven languages: English, French, German, Italian, Japanese, Korean, and Spanish.

Using language detection can block all unsupported languages, providing an extra layer of security against prompt injection.

Environment Key Header Key Description Default Required

HL_LLM_SKIP_INPUT_LANGUAGE_DETECTION X-LLM-Skip-Input-Language-Detection Skips input language detection and allows the prompt to be analyzed. True False

HL_LLM_BLOCK_INPUT_LANGUAGE_DETECTION

X-LLM-Block-Input-Language-Detection

Environment Key	Header Key	Description	Default	Required
HL_LLM_SKIP_INPUT_LANGUAGE_DETECTION	X-LLM-Skip-Input-Language-Detection	Skips input language detection and allows the prompt to be analyzed.	True	False
HL_LLM_BLOCK_INPUT_LANGUAGE_DETECTION	X-LLM-Block-Input-Language-Detection	Blocks inputs that are not on the allowed language list. See `HL_LLM_INPUT_ALLOWED_LANGUAGES` for allowed languages.	False	False
HL_LLM_INPUT_ALLOWED_LANGUAGES	X-LLM-Input-Allowed-Languages	Allows languages included in your allowed list to be analyzed by the prompt injection model. When multiple languages are included in the input, AIDR will choose a language it identifies as principal and classify it as such. The default language values are: `en, es, fr, it, de, ja, ko` for English, Spanish, French, Italian, German, Japanese and Korean. So not setting this variable means the default languages are used. Expand to see languages AR - Arabic BN - Bengali DE - German EN - English ES - Spanish FR - French HI - Hindi ID - Indonesian IT - Italian JA - Japanese KO - Korean MR - Marathi PA - Punjabi PT - Portuguese RU - Russian TA - Tamil TE - Telugu TR - Turkish UR - Urdu VI - Vietnamese ZH - Chinese	n/a	False

Blocks inputs that are not on the allowed language list.

See HL_LLM_INPUT_ALLOWED_LANGUAGES for allowed languages.

False

HL_LLM_INPUT_ALLOWED_LANGUAGES

X-LLM-Input-Allowed-Languages

Allows languages included in your allowed list to be analyzed by the prompt injection model.

When multiple languages are included in the input, AIDR will choose a language it identifies as principal and classify it as such.

The default language values are: en, es, fr, it, de, ja, ko for English, Spanish, French, Italian, German, Japanese and Korean. So not setting this variable means the default languages are used.

Expand to see languages

AR - Arabic

BN - Bengali

DE - German

EN - English

ES - Spanish

FR - French

HI - Hindi

ID - Indonesian

IT - Italian

JA - Japanese

KO - Korean

MR - Marathi

PA - Punjabi

PT - Portuguese

RU - Russian

TA - Tamil

TE - Telugu

TR - Turkish

UR - Urdu

VI - Vietnamese

ZH - Chinese

n/a

False

Examples

Allowed languages

Example environment keys

HL_LLM_SKIP_INPUT_LANGUAGE_DETECTION="false"
HL_LLM_BLOCK_INPUT_LANGUAGE_DETECTION="false"
HL_LLM_INPUT_ALLOWED_LANGUAGES="en, es, fr, it, de ,ja ,ko, ar, bn, hi, id, mr, pa, pt, ru, ta, te, tr, ur, vi, zh"

Example header key

X-LLM-Skip-Input-Language-Detection="false"
X-LLM-Block-Input-Language-Detection="false"
X-LLM-Input-Allowed-Languages="en, es, fr, it, de, ja, ko, ar, bn, hi, id, mr, pa, pt, ru, ta, te, tr, ur, vi, zh"

Block input language detection

Example environment key

HL_LLM_BLOCK_INPUT_LANGUAGE_DETECTION="true"

Example header key

X-LLM-Block-Input-Language-Detection="true"

Skip input language detection

Example environment key

HL_LLM_SKIP_INPUT_LANGUAGE_DETECTION="true"

Example header key

X-LLM-Skip-Input-Language-Detection="true"

URL Detection

By default, the policy will be alert only for all detections.

Most configuration settings are true or false, with false being the default setting. For configurations with different settings, the settings are identified in the Description.

Environment Key	Header Key	Description	Default	Required
HL_LLM_SKIP_INPUT_URL_DETECTION	X-LLM-Skip-Input-URL-Detection	Flag to skip input URL detection.	False	False
HL_LLM_SKIP_OUTPUT_URL_DETECTION	X-LLM-Skip-Output-URL-Detection	Flag to skip output URL detection.	False	False

Conviction Severity Level

With these variables, you can set the threat level that is required for the model to convict.

Environment Key	Header Key	Description	Default	Required
HL_LLM_PROXY_CONVICTION_SEVERITY_GUARDRAIL	X-LLM-Conviction-Severity-Guardrail	Sets severity for Guardrail conviction category. Accepted values: "Low", "Medium", "High"	"Low"	False
HL_LLM_PROXY_CONVICTION_SEVERITY_DATA_LEAKAGE	X-LLM-Conviction-Severity-Data-Leakage	Sets severity for Data Leakage conviction category. Accepted values: "Low", "Medium", "High"	"Medium"	False
HL_LLM_PROXY_CONVICTION_SEVERITY_PROMPT_INJECTION	X-LLM-Conviction-Severity-Prompt-Injection	Sets severity for Prompt Injection conviction category. Accepted values: "Low", "Medium", "High"	"High"	False
HL_LLM_PROXY_CONVICTION_SEVERITY_DENIAL_OF_SERVICE	X-LLM-Conviction-Severity-Denial-Of-Service	Sets severity for Denial-of-Service conviction category. Accepted values: "Low", "Medium", "High"	"High"	False
HL_LLM_PROXY_CONVICTION_SEVERITY_MODALITY_RESTRICTION	X-LLM-Conviction-Severity-Modality-Restriction	Sets severity for Modality Restriction conviction category. Accepted values: "Low", "Medium", "High"	"Medium"	False

Policy ConfigurationCopyCopy for LLMCopy page as Markdown for LLMsView as MarkdownOpen this page as MarkdownOpen in ChatGPTGet insights from ChatGPTOpen in ClaudeGet insights from ClaudeConnect to CursorInstall MCP server on CursorConnect to VS CodeInstall MCP server on VS Code

Global

Prompt Injection

Prompt Injection Scan Types

Examples

Denial of Service

Personal Identifiable Information (PII)

LLM Entity Types

Code Detection

Guardrail

Language Detection

Examples

Allowed languages

Block input language detection

Skip input language detection

URL Detection

Conviction Severity Level

Was this helpful?

Policy Configuration
Copy for LLM
Copy page as Markdown for LLMs
View as Markdown
Open this page as Markdown
Open in ChatGPT
Get insights from ChatGPT
Open in Claude
Get insights from Claude
Connect to Cursor
Install MCP server on Cursor
Connect to VS Code
Install MCP server on VS Code