File size: 3,430 Bytes
ab91a68 f6f5888 260af5b ab91a68 3781184 81780f2 d21e324 81780f2 aea8acb 674afbf f1626f7 d21e324 0ad901e aea8acb 0ad901e f1626f7 d21e324 674afbf d21e324 503ed3e 674afbf ac5cd2b ab91a68 d21e324 6258dd4 9e9160f ab91a68 9e9160f ab91a68 260af5b |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 |
---
license: apache-2.0
tags:
- generated_from_trainer
metrics:
- accuracy
- f1
model-index:
- name: distilbert-base-uncased-finetuned-greenpatent
results: []
widget:
- text: A method for recycling waste
- text: A method of reducing pollution
- text: An apparatus to improve environmental aspects
- text: A method to improve waste management
- text: A device to use renewable energy sources
datasets:
- cwinkler/green_patents
language:
- en
pipeline_tag: text-classification
---
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->
# Classification of patent title - "green" or "no green"
This model classifies patents into "green patents" or "no green patents" by their titles.
### Examples of "green patents" titles:
- "A method for recycling waste" - score: 0.714
- "A method of reducing pollution" - score: 0.786
- "An apparatus to improve environmental aspects" - score: 0.570
- "A method to improve waste management" - score: 0.813
- "A device to use renewable energy sources" - score: 0.98
- "A technology for efficient electrical power generation"- score: 0.975
- "A method for the production of fuel of non-fossil origin" - score: 0.975
- "Biofuels from waste" - score: 0.88
- "A combustion technology with mitigation potential" - score: 0.947
- "A device to capture greenhouse gases" - score: 0.871
- "A method to reduce the greenhouse effect" - score: 0.887
- "A device to improve the climate" - score: 0.650
- "A device to stop climate change" - score: 0.55
### Examples of "no green patents" titles:
- "A device to destroy the nature" - score: 0.19
- "A method to produce smoke" - score: 0.386
### Examples of the model's limitation
- "A method to avoid trash" - score: 0.165
- "A method to reduce trash" - score: 0.333
- "A method to burn the Amazonas" - score: 0.501
- "A method to burn wood" - score: 0.408
- "Green plastics" - score: 0.126
- "Greta Thunberg" - score: 0.313 (How dare you, model?); BUT: "A method of using Greta Thunberg to stop climate change" - score: 0.715
Examples were inspired by https://www.epo.org/news-events/in-focus/classification/classification.html
# distilbert-base-uncased-finetuned-greenpatent
This model is a fine-tuned version of [distilbert-base-uncased](https://huggingface.co/distilbert-base-uncased) on the [green patent dataset](https://huggingface.co/datasets/cwinkler/green_patents). The green patent dataset was split into 70 % training data and 30 % test data (using ".train_test_split(test_size=0.3)").
The model achieves the following results on the evaluation set:
- Loss: 0.3148
- Accuracy: 0.8776
- F1: 0.8770
## Training procedure
### Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 2e-05
- train_batch_size: 64
- eval_batch_size: 64
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- num_epochs: 2
### Training results
| Training Loss | Epoch | Step | Validation Loss | Accuracy | F1 |
|:-------------:|:-----:|:----:|:---------------:|:--------:|:------:|
| 0.4342 | 1.0 | 101 | 0.3256 | 0.8721 | 0.8712 |
| 0.3229 | 2.0 | 202 | 0.3148 | 0.8776 | 0.8770 |
### Framework versions
- Transformers 4.25.1
- Pytorch 1.13.1+cpu
- Datasets 2.8.0
- Tokenizers 0.13.2 |