--- language: - en pipeline_tag: text-classification license: mit metrics: - accuracy - f1 --- This model is part of the research presented in "Mitigating Toxicity in Dialogue Agents through Adversarial Reinforcement Learning," a conference paper addressing dialog agent toxicity by mitigating it at three levels: explicit, implicit, and contextual. It is a model capable of predicting toxicity given a history and a response to it. It is designed for dialog agents. To use it correctly, please follow the schematics below: [HST]Hi, how are you?[END]I am doing fine[ANS]I hope you die. The token [HST] initiates the history of the conversation, and each turn pair is separated by [END]. The token [ANS] indicates the start of the response to the last utterance. I will update this card, but right now, I am developing a bigger project with these, so I do not have the time to indicate all the results. The datasets used to train the model were the Dialogue Safety dataset and Bot Adversarial dataset.