AlexWortega commited on
Commit
32f110d
·
1 Parent(s): 82443e6

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +81 -0
README.md ADDED
@@ -0,0 +1,81 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ datasets:
3
+ - IlyaGusev/habr
4
+ - Den4ikAI/russian_instructions
5
+ - wiki_qa
6
+ inference:
7
+ parameters:
8
+ max_new_tokens: 32
9
+ temperature: 1
10
+ top_k: 50
11
+ top_p: 0.7
12
+ do_sample: true
13
+ license: apache-2.0
14
+ language:
15
+ - ru
16
+ pipeline_tag: text-generation
17
+ widget:
18
+ - text: Чем отличается лось от ежа?
19
+ example_title: Question Answering
20
+ - text: Как выпросить повышение?
21
+ example_title: Logical reasoning
22
+ - text: Какая температура закипания азота?
23
+ example_title: Scientific knowledge
24
+ library_name: transformers
25
+ tags:
26
+ - finance
27
+ - code
28
+ ---
29
+
30
+ <h1 style="font-size: 42px">Instructions ruGPT Small v0.1a<h1/>
31
+
32
+
33
+
34
+ # Model Summary
35
+
36
+ > Я дообучил small rugpt на датасете инструкций, хабра, QA и кода
37
+
38
+
39
+ # Quick Start
40
+
41
+ ```python
42
+ from transformers import pipeline
43
+ pipe = pipeline(model='AlexWortega/instruct_rugptSmall')
44
+ pipe('''Как собрать питон код?''')
45
+ ```
46
+ or
47
+ ```python
48
+ from transformers import AutoTokenizer, AutoModelForCausalLM
49
+ tokenizer = AutoTokenizer.from_pretrained("AlexWortega/instruct_rugptSmall")
50
+ model = AutoModelForCausalLM.from_pretrained("AlexWortega/instruct_rugptSmall")
51
+ ```
52
+
53
+ # License
54
+
55
+ The weights of Instructions ruGPT Small v0.1a are licensed under version 2.0 of the Apache License.
56
+
57
+
58
+
59
+ ## Hyperparameters
60
+
61
+ I used Novograd with a learning rate of 2e-5 and global batch size of 6 (3 for each data parallel worker).
62
+ I use both data parallelism and pipeline parallelism to conduct training.
63
+ During training, we truncate the input sequence to 1024 tokens, and for input sequence that contains less than 1024 tokens, we concatenate multiple sequences into one long sequence to improve the data efficiency.
64
+
65
+
66
+
67
+ # References
68
+
69
+ #Metrics
70
+
71
+ SOON
72
+
73
+ ## BibTeX entry and citation info
74
+
75
+ ```bibtex
76
+ @article{
77
+ title={GPT2xl is underrated task solver},
78
+ author={Nickolich Aleksandr, Karina Romanova, Arseniy Shahmatov, Maksim Gersimenko},
79
+ year={2023}
80
+ }
81
+ ```