Prompt Analyzer is a flexible API that allows users to ensure that AI inputs and outputs are safe. The API takes a prompt input and / or model output, and returns a “true” / “false” verdict on whether or not the content is malicious.
A user making a post request to the API with a prompt will receive an overall verdict, as well as details about which categories have been triggered by the prompt. For example, it will return True / False for detecting prompt injection, personally identifiable information (PII), and code.
The verdict is based on Prompt Analyzer’s policies and the detected categories. For example, if the policies were configured to allow the model to return code, but not PII, the API would return “verdict”: “false” if the only detection category that was “true” was code.