XLM-RoBERTa (base) fine-tuned on HC3 for ChatGPT text detection

XLM-RoBERTa (base) fine-tuned on Hello-SimpleAI HC3 corpus for ChatGPT text detection.

All credit to Hello-SimpleAI for their huge work!

F1 score on test dataset: 0.9736

The model

XLM-RoBERTa model pre-trained on 2.5TB of filtered CommonCrawl data containing 100 languages. It was introduced in the paper Unsupervised Cross-lingual Representation Learning at Scale by Conneau et al. and first released in this repository.

The dataset

Human ChatGPT Comparison Corpus (HC3)

The first human-ChatGPT comparison corpus, named HC3 dataset by Hello-SimpleAI

This dataset is introduced in the paper:

Metrics

metric value
F1 0.9736

Usage

from transformers import pipeline

ckpt = "mrm8488/xlm-roberta-base-finetuned-HC3-mix"

detector = pipeline('text-classification', model=ckpt)

text = "Here your text..."

result = detector(text)

print(result)

Citation

@misc {manuel_romero_2023,
    author       = { {Manuel Romero} },
    title        = { xlm-roberta-base-finetuned-HC3-mix (Revision b18de48) },
    year         = 2023,
    url          = { https://huggingface.co/mrm8488/xlm-roberta-base-finetuned-HC3-mix },
    doi          = { 10.57967/hf/0306 },
    publisher    = { Hugging Face }
}
Downloads last month
52
Safetensors
Model size
278M params
Tensor type
I64
·
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train mrm8488/xlm-roberta-base-finetuned-HC3-mix