File size: 3,430 Bytes
ab91a68
 
 
 
 
 
 
 
 
 
f6f5888
260af5b
 
 
 
 
 
 
 
 
 
ab91a68
 
 
 
 
3781184
81780f2
d21e324
81780f2
aea8acb
674afbf
f1626f7
 
 
 
 
 
 
 
 
 
d21e324
 
 
0ad901e
 
aea8acb
0ad901e
f1626f7
d21e324
674afbf
d21e324
 
 
 
 
 
 
503ed3e
674afbf
 
ac5cd2b
ab91a68
 
d21e324
6258dd4
9e9160f
 
 
ab91a68
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9e9160f
 
ab91a68
 
 
 
 
 
 
260af5b
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
---
license: apache-2.0
tags:
- generated_from_trainer
metrics:
- accuracy
- f1
model-index:
- name: distilbert-base-uncased-finetuned-greenpatent
  results: []
widget:
- text: A method for recycling waste
- text: A method of reducing pollution
- text: An apparatus to improve environmental aspects
- text: A method to improve waste management
- text: A device to use renewable energy sources
datasets:
- cwinkler/green_patents
language:
- en
pipeline_tag: text-classification
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# Classification of patent title - "green" or "no green"

This model classifies patents into "green patents" or "no green patents" by their titles.

### Examples of "green patents" titles:

- "A method for recycling waste" - score: 0.714 
- "A method of reducing pollution" - score: 0.786
- "An apparatus to improve environmental aspects" - score: 0.570
- "A method to improve waste management" - score: 0.813
- "A device to use renewable energy sources" - score: 0.98
- "A technology for efficient electrical power generation"- score: 0.975
- "A method for the production of fuel of non-fossil origin" - score: 0.975
- "Biofuels from waste" - score: 0.88
- "A combustion technology with mitigation potential" - score: 0.947
- "A device to capture greenhouse gases" - score: 0.871
- "A method to reduce the greenhouse effect" - score: 0.887
- "A device to improve the climate" - score: 0.650
- "A device to stop climate change" - score: 0.55


### Examples of "no green patents" titles:

- "A device to destroy the nature" - score: 0.19
- "A method to produce smoke" - score: 0.386

### Examples of the model's limitation

- "A method to avoid trash" - score: 0.165
- "A method to reduce trash" - score: 0.333
- "A method to burn the Amazonas" - score: 0.501
- "A method to burn wood" - score: 0.408
- "Green plastics" - score: 0.126
- "Greta Thunberg" - score: 0.313 (How dare you, model?); BUT: "A method of using Greta Thunberg to stop climate change" - score: 0.715

Examples were inspired by https://www.epo.org/news-events/in-focus/classification/classification.html

# distilbert-base-uncased-finetuned-greenpatent

This model is a fine-tuned version of [distilbert-base-uncased](https://huggingface.co/distilbert-base-uncased) on the [green patent dataset](https://huggingface.co/datasets/cwinkler/green_patents). The green patent dataset was split into 70 % training data and 30 % test data (using ".train_test_split(test_size=0.3)").
The model achieves the following results on the evaluation set:
- Loss: 0.3148
- Accuracy: 0.8776
- F1: 0.8770

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 2e-05
- train_batch_size: 64
- eval_batch_size: 64
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- num_epochs: 2

### Training results

| Training Loss | Epoch | Step | Validation Loss | Accuracy | F1     |
|:-------------:|:-----:|:----:|:---------------:|:--------:|:------:|
| 0.4342        | 1.0   | 101  | 0.3256          | 0.8721   | 0.8712 |
| 0.3229        | 2.0   | 202  | 0.3148          | 0.8776   | 0.8770 |


### Framework versions

- Transformers 4.25.1
- Pytorch 1.13.1+cpu
- Datasets 2.8.0
- Tokenizers 0.13.2