# Resource Requirements

## License Keys

### Hybrid Mode

The following licenses and keys are required for deploying AI Runtime Security in Hybrid mode. If your organization doesn't have a license, contact HiddenLayer for more information. See [Hybrid and Disconnected Modes](/docs/products/runtime/hybrid_disconnected) for information about Runtime Security Hybrid mode.

- **Runtime Security License Key**: HiddenLayer Support will provide you with a license key. This key is required to start the LLM proxy container, and it will not run without a valid key. The license can be set as an environment variable and the installer will not run without the license being set as a value.
- **Credentials to download Runtime Security container**: Credentials for the HiddenLayer container repository are required to download the appropriate images. These can also be obtained from HiddenLayer Support or from your HiddenLayer technical contact.
- **API Client ID and Client Secret**: HiddenLayer API Client ID and Client Secret to interact with the AISec Platform Console. Get these from the Console or your Console Admin.
  - **API Permissions**
    - Inferences: write
    - Model Inventory: write
  - **Links to Console**
    - Link to the US Console
    - Link to the EU Console


### Disconnected Mode

The following licenses and keys are required for deploying Runtime Security in Disconnected mode. If your organization doesn't have a license, contact HiddenLayer for more information. See [Hybrid and Disconnected Modes](/docs/products/runtime/hybrid_disconnected) for information about Runtime Security Disconnected mode.

- **Runtime Security License Key**: HiddenLayer Support will provide you with a license key. This key is required to start the LLM proxy container, and it will not run without a valid key. The license can be set as an environment variable and the installer will not run without the license being set as a value.
- **Credentials to download Runtime Security container**: Credentials for the HiddenLayer container repository are required to download the appropriate images. These can also be obtained from HiddenLayer Support or from your HiddenLayer technical contact.


API Keys for Disconnected Mode Not Required
Disconnected mode does not require an API Client ID and Secret.

## Tools

The following tools are required for deploying Runtime Security for GenAI.

- [**Docker Desktop**](https://docs.docker.com/engine/install/): Docker Desktop is used to deploy the container to your Kubernetes cluster.
- [**kubectl**](https://kubernetes.io/docs/tasks/tools/): The official Kubernetes CLI tool, used to issue commands to your Kubernetes cluster.
- **Kubernetes cluster**
  - For cloud deployments, you need `kubectl` access to the cluster.
  - For local deployment, use Minikube (or something similar) that can run on your local system.


## Resource Guidelines

The following are resource recommendations for the virtual machine running the LLM Proxy in a production environment. Azure AKS is used as an example. These resource recommendations can be applied to any virtual environment. For information about an Azure AKS Dv3 virtual machine, see the Azure documentation.

| Resource | Resource Type | Count |
|  --- | --- | --- |
| Azure AKS | Standard_D32_v3 | 2 |


### CPU and Memory

| Resource Type | CPU (vCPU) | Memory (GB) | GPU |
|  --- | --- | --- | --- |
| Standard_D32_v3 | 32 | 128 | 0 |


Testing
For testing Runtime Security for GenAI, the image can run with 8 CPU cores and 16GB memory (most modern laptops). This is not recommended for production environments due to latency.

### Scaling Recommendations

Runtime Security is horizontally scalable. The latency and throughput for each instance depends on many factors in the deployed environment, including underlying hardware, network conditions, and resource contention.

#### CPU-Only Deployments

A single Runtime Security instance sees diminishing performance returns beyond 10-12 CPUs. Allocating 8 CPUs per instance provides the best balance of performance and resource efficiency.

- **CPU**: 8 vCPUs per instance
- **Memory**: 6 GB per instance
- Set `OMP_NUM_THREADS` to match the CPU allocation (8 in this case)
- Run as many instances as can fit on your available hardware at those resource amounts


#### GPU Deployments

When a GPU is available, Runtime Security offloads its primary inference workloads (prompt injection detection and refusal detection) to the GPU, which significantly reduces CPU requirements compared to CPU-only deployments. CPU is still important though, as some processing continues to run there. PII detection with context enabled is the most notable example and can become a bottleneck on large payloads if the instance is CPU-starved.

- **GPU**: 1 per instance
- **CPU**: 4-6 vCPUs per instance (4 minimum; 6 if PII detection with context is enabled and you expect large payloads)
- **Memory**: 6 GB per instance
- Set `OMP_NUM_THREADS` to match the CPU allocation


### Health Checks

Runtime Security exposes a `/readiness` endpoint on port `8000` that returns a `200` status code only after the service has fully started and is ready to process requests. This endpoint can be used to integrate with the health check mechanism of your container runtime or orchestration platform.

### Kubernetes

#### OMP_NUM_THREADS

Setting OMP_NUM_THREADS
In Kubernetes, a Runtime Security instance may detect all CPUs on the underlying node rather than just the CPUs allocated to its pod. This can cause the service to spawn more threads than it has resources for, leading to contention and degraded performance. Always set the `OMP_NUM_THREADS` environment variable to match the number of CPUs allocated to each pod.

#### Resource Configuration

The following snippets show per-container resource configuration for a single replica.

**CPU-Only**

```yaml
resources:
  requests:
    cpu: 8
    memory: 6144Mi
  limits:
    memory: 6144Mi
env:
  - name: OMP_NUM_THREADS
    value: "8"
```

**GPU**

```yaml
resources:
  requests:
    cpu: 4
    memory: 6144Mi
  limits:
    memory: 6144Mi
    nvidia.com/gpu: 1
env:
  - name: OMP_NUM_THREADS
    value: "4"
```

#### Replica Count

To determine how many replicas to run, divide the available CPU on your node by the per-replica CPU allocation. For example, a Standard_D32_v3 node (32 vCPUs) can fit 4 replicas at 8 CPUs each for a CPU-only deployment.

#### Health Probes

We recommend configuring a **readiness probe** for Runtime Security deployments. The readiness probe ensures that pods are only added to the load balancer once they are fully initialized and ready to serve requests.

```yaml
readinessProbe:
  httpGet:
    path: /readiness
    port: 8000
  initialDelaySeconds: 30
  periodSeconds: 10
  failureThreshold: 3
```

Because the `/readiness` endpoint only reports healthy once the service is fully initialized, it also covers the startup case. A separate startup probe is not required.

Liveness Probes
We do not recommend configuring a liveness probe for Runtime Security. A liveness probe failure will restart the pod, which could unnecessarily terminate a pod that is simply processing a long-running request. The readiness probe is sufficient to remove unresponsive pods from the load balancer without disrupting in-flight requests.