File size: 827 Bytes
891b27f
 
64fc78a
891b27f
 
 
 
 
 
 
 
 
36238b9
 
45a1b1d
 
 
 
 
 
 
 
 
 
891b27f
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
---
license: other
license_name: agplv3
license_link: https://www.gnu.org/licenses/agpl-3.0.en.html
---

We have trained distil bert on this dataset [https://huggingface.co/datasets/nothingiisreal/Human_Stories]

It's kinda okay for sampling, but needs improvements and exposure to more synthetic data and types of mistakes LLMs do.

Overall I'm extremely impressed with how well this 68 million parameter model works, and extremely disappointed with how every single AI is getting picked up after only training BERT on GPT3.5 rows of the data.

Class label 0 means human, 1 means AI.

We tested these models all of which worked:

GPT3.5, 4, 4o

Claude Sonnet, Opus

Wizard LM 2

Gemini 1.5 Pro

It's really blatant how every single AI company is using the same watermark whether knowingly or unknowingly (through LLM "incest")