File size: 1,937 Bytes
df14000
 
 
 
 
 
 
 
8e2429b
fbc2006
8e2429b
dd7ec51
61f1c49
8e2429b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
01ae623
 
 
8e2429b
01ae623
 
8e2429b
 
 
 
 
 
 
 
 
 
 
c31989f
8e2429b
 
 
 
 
 
 
01ae623
8e2429b
794fe8f
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
---
datasets:
- e9t/nsmc
language:
- ko
metrics:
- accuracy
pipeline_tag: text-classification
---
## Model Description

- **Finetuned from model klue/bert :** (https://huggingface.co/klue/bert-base)
- i got **test_accuracy: 0.9041**

## Uses

- use to sentimental analysis task

## How to Get Started with the Model

```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification

tokenizer = AutoTokenizer.from_pretrained("seongyeon1/klue-base-finetuned-nsmc")
model = AutoModelForSequenceClassification.from_pretrained("seongyeon1/klue-base-finetuned-nsmc")
```

```python
from transformers import pipeline

pipe = pipeline("text-classification", model="seongyeon1/klue-base-finetuned-nsmc")
pipe("진짜 별로더라") # [{'label': 'LABEL_0', 'score': 0.999700665473938}]
pipe("굿굿")        # [{'label': 'LABEL_1', 'score': 0.9875587224960327}]

```

## Training Details

### Training Data

- nsmc datasets (https://huggingface.co/datasets/e9t/nsmc)
```python
from datasets import load_dataset

dataset = load_dataset('nsmc')
```

#### Preprocessing

- bert's default is 512, but it costs a lot of time.
  - maxlen = 55
![image/png](https://cdn-uploads.huggingface.co/production/uploads/634330a304d4ff28aeb8de56/t7axSlo4JI4bPLynUB3OP.png)

```python
def tokenize_function_with_max(examples, maxlen=maxlen):
    encodings = tokenizer(examples['document'],max_length=maxlen, truncation=True, padding='max_length')
    return encodings
```

#### Training Hyperparameters

- learning rate=2e-5, weight decay=0.01, batch size=32, epochs=2

#### Metrics

- **accuracy**
- label ratio is about almost balanced

![image/png](https://cdn-uploads.huggingface.co/production/uploads/634330a304d4ff28aeb8de56/_S5TTyec8I25Kx-yaqeJo.png)

#### Result

{'eval_loss': 0.2575262784957886,
 'eval_accuracy': 0.9041,
 'eval_runtime': 163.2129,
 'eval_samples_per_second': 306.348,
 'eval_steps_per_second': 9.576,
 'epoch': 2.0}