Skip to content

Red Team Evaluation

Red Team Evaluation Evaluation simulates real-world adversarial attacks against your AI system by generating prompts based on the APE objectives and techniques.

Prerequisites

Before starting a Red Team Evaluation, you should have:

  • A system prompt to test
  • The model(s) that power the application

Best Practices

  • Compare Versions: Run evaluations on both original and enhanced prompts to measure improvement.
  • Review Failed Attacks: Understanding why attacks failed is as important as knowing which succeeded.
  • Use Appropriate Models: Match the target model to what you're actually using in production.
Red Team Table

Run New Evaluation

  1. In the HiddenLayer Console, select Attack Simulation > Red Teaming.

  2. Click Run New Evaluation. The Create Red Team Evaluation slideout displays.

  3. Enter a name for the evaluation.

  4. Enter the target system prompt.

  5. Select a target model.

    • Select a model that is similar to what you have in your environment to simulate the attacks against your system prompt.

      Beta Models

      Models marked with a beta designation may be subject to lower usage quotas, limited availability, or ongoing development changes. As a result, these models may exhibit unexpected results, reduced performance, or intermittent failures during testing. Users should account for these limitations when selecting beta models.

    Red Team Create Evaluation
  6. Optionally, click Objectives to change the severity level for a default objective and to add a custom objective.

    • To change the severity for a default objective, click the severity dropdown menu, then select the desired severity.

      • When a default objective severity is changed, the word Override appears next to the severity level.
    • To add a custom objective, click Add objective, then select the custom objective.

      • Use the search field to find a custom objective. Use the full or partial name for the objective.
    Red Team Create Evaluation - Add Objectives
  7. Optionally, click Advanced Options to expand the section.

    • Select a project to apply runtime policies to interaction tagging.

      • If no project is selected, the default project is used.
    • Select an execution strategy.

      • Single: Runs each technique once per objective.

      • Random: Runs all techniques plus N additional random techniques.

        • Select the number of additional random techniques.
        Red Team Evaluation Random
      • Static prompt set: Uses a predefined set of static prompts for evaluation.

        • Select the prompt set from the drop-down list.
        Red Team Evaluation Static Prompt Set
    • Set the maximum number of conversation turns allowed per technique when attempting to achieve an objective. The minimum is one and the maximum is five.

      • The attack simulator will do multi-turn, trying to attack the target for N turns, and then go to the next session.
      • Note: If you selected static_prompt_set, then Attacker Max Turns to Complete Objective is not available.
    • Set the number of independent sessions to run for each technique. The minimum is one and the maximum is five.

      • This is the number of times you want to run the same technique or static prompt.
  8. Click Start Evaluation.

  9. When the evaluation completes, click the green arrow to view the results. See Red Team Evaluation Summary for more information.

Terminate Running Evaluation

You can terminate a running evaluation from the Active Evaluations tab.

  1. In the Console, go to Attack Simulation > Red Teaming.

  2. Select the Active Evaluations tab.

  3. For the running evaluation that you want to terminate, click the x in the Status column. A confirmation message displays.

    Red Team Evaluation Static Prompt Set
  4. Click Terminate. If the evaluation was successfully terminated, Terminated displays for the status.

    Red Team Evaluation Static Prompt Set

Active Evaluations

Active evaluations show running jobs and recently completed evaluations (last 90 days). This provides an operational view of what is running and what just finished. For a historical view that includes results older than 90 days, see Evaluation Results

Red Team Table

Filter Results

  1. Click Filter. The Filters slide-out displays.
  2. Select the statuses you want to view.
  3. Click Show Results.
Red Team Table

Active Evaluation Table Descriptions

ColumnDescription
NameThe name of the evaluation.
Start TimeThe date and time the evaluation started.
ElapsedThe time it took for the evaluation to end. The time is in hhmmss.
StatusThe status of the evaluation. Statuses: Terminated, Completed, Running, Failed, Canceled, Continued As New, Timed Out.
Status (green arrow)Click to go to the Red Team Evaluation page. This page contains Metrics, Interactions, and Config data. See Red Team Evaluation Summary for more information.

Evaluation Results

Evaluation results show all completed evaluation results (no date constraints). This provides a historical view of completed evaluations. To see evaluations that are currently running, see Active Evaluations.

Red Team Table

Evaluation Results Table Descriptions

ColumnDescription
NameThe name of the evaluation.
Target ModelThe model targeted for the evaluation. Example: openai/gpt-5.1.
Start TimeThe date and time the evaluation started.
CompletedThe date and time the evaluation completed.
Actions

The actions available for the evaluation.

  • View Summary: View the Evaluation Summary slide-out.
  • Download CSV: Download the evaluation summary as a CSV file.

View (green arrow)Click to go to the Red Team Evaluation page. This page contains Metrics, Interactions, and Config data. See Red Team Evaluation Summary for more information.