Text Classification
PyTorch
English
bert
File size: 2,130 Bytes
3eaae64
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
80cfc1a
3eaae64
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
87af2dd
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
---
license: apache-2.0
datasets:
- G11/climate_adaptation_abstracts
- pierre-pessarossi/wikipedia-climate-data
- rlacombe/ClimateX
language:
- en
base_model:
- google-bert/bert-base-uncased
pipeline_tag: text-classification
---

## Social Media Style Classifier for Climate Change Text


This model is a fine-tuned bert-base-uncased on a binary classification task to determine whether an English text about Climate Change is written in a social media style.  

Social media texts were gathered from [ClimaConvo](https://github.com/shucoll/ClimaConvo) and [DEBAGREEMENT](https://datasets-benchmarks-proceedings.neurips.cc/paper_files/paper/2021/hash/6f3ef77ac0e3619e98159e9b6febf557-Abstract-round2.html).

Non-social media texts were gathered from diverse sources including article abstracts (G11/climate_adaptation_abstracts), Wikipedia articles (pierre-pessarossi/wikipedia-climate-data), and IPCC reports (rlacombe/ClimateX).

The dataset contained about 60K instances, with a 50/50 distribution between the two classes. It was shuffled with a random seed of 42 and split into 80/20 for training/testing.
The V100-16GB GPU was used for training three epochs with a batch size of 8. Other hyperparameters were default values from the HuggingFace Trainer.

The model was trained in order to evaluate a text style transfer task, converting formal-language texts to tweets. 

### How to use

```python
from transformers import AutoModelForSequenceClassification, AutoTokenizer, TextClassificationPipeline

model_name = "rabuahmad/cc-tweets-classifier"

model = AutoModelForSequenceClassification.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name, max_len=512)

classifier = TextClassificationPipeline(model=model, tokenizer=tokenizer, truncation=True, max_length=512)

text = "Yesterday was a great day!"

result = classifier(text)

```
Label 1 indicates that the text is predicted to be a tweet. 

### Evaluation 

Evaluation results on the test set: 

| Metric   |Score      |
|----------|-----------|
| Accuracy | 0.99747 |
| Precision|  1.0   |
| Recall   |  0.99493  |
| F1       |  0.99746  |