seongyeon1 commited on
Commit
8e2429b
1 Parent(s): df14000

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +92 -1
README.md CHANGED
@@ -6,4 +6,95 @@ language:
6
  metrics:
7
  - accuracy
8
  pipeline_tag: text-classification
9
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6
  metrics:
7
  - accuracy
8
  pipeline_tag: text-classification
9
+ ---
10
+ # Model Card for Model ID
11
+
12
+
13
+ ## Model Details
14
+
15
+ {'eval_loss': 0.2575262784957886,
16
+ 'eval_accuracy': 0.9041,
17
+ 'eval_runtime': 163.2129,
18
+ 'eval_samples_per_second': 306.348,
19
+ 'eval_steps_per_second': 9.576,
20
+ 'epoch': 2.0}
21
+
22
+
23
+ ### Model Description
24
+
25
+ - **Finetuned from model klue/bert :** [More Information Needed]
26
+ -
27
+
28
+ ## Uses
29
+
30
+ - use to sentimental analysis task
31
+
32
+ ## How to Get Started with the Model
33
+
34
+ ```python
35
+ from transformers import AutoTokenizer, AutoModelForSequenceClassification
36
+
37
+ tokenizer = AutoTokenizer.from_pretrained("seongyeon1/klue-base-finetuned-nsmc")
38
+ model = AutoModelForSequenceClassification.from_pretrained("seongyeon1/klue-base-finetuned-nsmc")
39
+ ```
40
+
41
+ ```python
42
+ from transformers import pipeline
43
+
44
+ pipe = pipeline("text-classification", model="seongyeon1/klue-base-finetuned-nsmc")
45
+ pipe("진짜 별로더라") # [{'label': 'LABEL_0', 'score': 0.999700665473938}]
46
+ pipe("굿굿") # [{'label': 'LABEL_1', 'score': 0.9875587224960327}]
47
+
48
+ ```
49
+
50
+ ## Training Details
51
+
52
+ ```
53
+ ### Training Data
54
+
55
+ - nsmc datasets
56
+
57
+
58
+ #### Preprocessing
59
+
60
+ - bert's default is 512, but it costs a lot of time.
61
+ - maxlen = 55
62
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/634330a304d4ff28aeb8de56/t7axSlo4JI4bPLynUB3OP.png)
63
+
64
+ ```python
65
+ def tokenize_function_with_max(examples, maxlen=maxlen):
66
+ encodings = tokenizer(examples['document'],max_length=maxlen, truncation=True, padding='max_length')
67
+ return encodings
68
+
69
+
70
+ #### Training Hyperparameters
71
+
72
+ - learning rate=2e-5, weight decay=0.01, batch size=32, epochs=2
73
+
74
+ #### Speeds, Sizes, Times [optional]
75
+
76
+ - about 40 minutes
77
+
78
+
79
+ [More Information Needed]
80
+
81
+ ## Evaluation
82
+
83
+ ### Testing Data, Factors & Metrics
84
+
85
+ #### Testing Data
86
+
87
+ - nsmc test datasets
88
+
89
+ [More Information Needed]
90
+
91
+ #### Factors
92
+
93
+ <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
94
+
95
+ [More Information Needed]
96
+
97
+ #### Metrics
98
+
99
+ - accuracy
100
+ - label ratio is about almost balanced