TristanBehrens commited on
Commit
7c44227
1 Parent(s): 10f1279

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +66 -0
README.md ADDED
@@ -0,0 +1,66 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ tags:
5
+ - NLP
6
+ license: mit
7
+ datasets:
8
+ - TristanBehrens/bach_garland_2024-100K
9
+ base_model: None
10
+ ---
11
+
12
+ # bach_garland_phariaplus - An xLSTM Model
13
+
14
+ ![Trained with Helibrunna](banner.jpg)
15
+
16
+ Trained with [Helibrunna](https://github.com/AI-Guru/helibrunna) by [Dr. Tristan Behrens](https://de.linkedin.com/in/dr-tristan-behrens-734967a2).
17
+
18
+ ## Configuration
19
+
20
+ ```
21
+ training:
22
+ model_name: bach_garland_phariaplus
23
+ batch_size: 22
24
+ lr: 0.001
25
+ lr_warmup_steps: 1818
26
+ lr_decay_until_steps: 18181
27
+ lr_decay_factor: 0.001
28
+ weight_decay: 0.1
29
+ amp_precision: bfloat16
30
+ weight_precision: float32
31
+ enable_mixed_precision: true
32
+ num_epochs: 8
33
+ output_dir: output/bach_garland_phariaplus
34
+ save_every_step: 500
35
+ log_every_step: 10
36
+ wandb_project: bach_garland
37
+ torch_compile: false
38
+ model:
39
+ type: pharia
40
+ attention_bias: true
41
+ attention_dropout: 0.0
42
+ eos_token_id: 0
43
+ bos_token_id: 127179
44
+ pad_token_id: 1
45
+ hidden_act: gelu
46
+ hidden_size: 132
47
+ initializer_range: 0.02
48
+ intermediate_size: 264
49
+ max_position_embeddings: 2048
50
+ mlp_bias: true
51
+ num_attention_heads: 6
52
+ num_hidden_layers: 6
53
+ num_key_value_heads: 6
54
+ rope_scaling: null
55
+ rope_theta: 1000000
56
+ tie_word_embeddings: false
57
+ use_cache: true
58
+ context_length: 2048
59
+ vocab_size: 178
60
+ dataset:
61
+ hugging_face_id: TristanBehrens/bach_garland_2024-100K
62
+ tokenizer:
63
+ type: whitespace
64
+ fill_token: '[EOS]'
65
+
66
+ ```