Imminent Harm Classification
Demo
Enter a conversation between a user and an AI assistant.
Turns
Paste
Benign chat
Gradual disclosure
Dark roleplay
Threat to others
Self-harm disclosure
Difficult but safe
U
+ Add turn
Classify