Red Team Evaluation
Copy for LLM
Copy page as Markdown for LLMs
View as Markdown
Open this page as Markdown
Open in ChatGPT
Get insights from ChatGPT
Open in Claude
Get insights from Claude
Connect to Cursor
Install MCP server on Cursor
Connect to VS Code
Install MCP server on VS Code

Red Team Evaluation simulates real-world adversarial attacks against your AI system by generating prompts based on the APE objectives and techniques.

Prerequisites

Before starting a Red Team evaluation, you should have:

A system prompt to test
The model(s) that power the application

Best Practices

Compare Versions: Run evaluations on both original and enhanced prompts to measure improvement.
Review Failed Attacks: Understanding why attacks failed is as important as knowing which succeeded.
Use Appropriate Models: Match the target model to what you're actually using in production.

Run New Evaluation

In the Console, select Security Testing > Red Teaming.
Click Run New Evaluation. The Create Red Team Evaluation slideout displays.
Enter a name for the evaluation.
Enter the target system prompt.
Select a target model.
- Select a model that is similar to what you have in your environment to simulate the attacks against your system prompt.
- Disclaimer: Models marked with a beta designation may be subject to lower usage quotas, limited availability, or ongoing development changes. As a result, these models may exhibit unexpected results, reduced performance, or intermittent failures during testing. Users should account for these limitations when selecting beta models.
Optionally, click Advanced Options to expand the section.
- Select a project to apply runtime rulesets to interaction tagging.
  - If no project is selected, the default project is used.
- Select an execution strategy.
  - Single: Runs each technique once per objective.
  - Random: Runs all techniques plus N additional random techniques.
    - Select the number of additional random techniques.
  - Static prompt set: Uses a predefined set of static prompts for evaluation.
    - Select the prompt set from the drop-down list.
- Set the maximum number of conversation turns allowed per technique when attempting to achieve an objective. The minimum is one and the maximum is five.
  - The attack simulator will do multi-turn, trying to attack the target for N turns, and then go to the next session.
  - Note: If you selected static_prompt_set, then Attacker Max Turns to Complete Objective is not available.
- Set the number of independent sessions to run for each technique. The minimum is one and the maximum is five.
  - This is the number of times you want to run the same technique or static prompt.
Click Start Evaluation.
When the evaluation completes, click the green arrow to view the results. See Red Team Evaluation Summary for more information.

Red Team Table Descriptions

Column	Description
Name	The name of the evaluation.
Start Time	The date and time the evaluation started.
End Time	The date and time the evaluation ended.
Elapsed	The time it took for the evaluation to end. The time is in hhmmss.
Actions	The actions available for the evaluation. View Summary: View the Evaluation Summary slide-out. Download CSV: Download the evaluation summary as a CSV file.
Status	The status of the evaluation. Statuses: Completed, Failed.
View Status (green arrow)	Click to go to the Evaluation Summary page. This page contains Metrics, Interactions, and Config data. See Evaluation Summary for more information.

Browse Prompt Sets

Clicking Browse Prompt Sets takes you to the Red Team Prompts Sets tab. See Red Team Prompt Sets for more information.

Filter Results

Click Filter. The Filters slide-out displays.
Select the statuses you want to view.
Click Show Results.

Red Team EvaluationCopyCopy for LLMCopy page as Markdown for LLMsView as MarkdownOpen this page as MarkdownOpen in ChatGPTGet insights from ChatGPTOpen in ClaudeGet insights from ClaudeConnect to CursorInstall MCP server on CursorConnect to VS CodeInstall MCP server on VS Code