bedio commited on
Commit
55a0899
·
verified ·
1 Parent(s): 24855a7

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +80 -136
README.md CHANGED
@@ -1,6 +1,5 @@
1
  ---
2
  library_name: transformers
3
- tags: []
4
  model-index:
5
  - name: Explore_Llama-3.2-1B-Inst_v1
6
  results:
@@ -17,7 +16,8 @@ model-index:
17
  value: 49.99
18
  name: strict accuracy
19
  source:
20
- url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=DeepAutoAI/Explore_Llama-3.2-1B-Inst_v1
 
21
  name: Open LLM Leaderboard
22
  - task:
23
  type: text-generation
@@ -32,7 +32,8 @@ model-index:
32
  value: 4.26
33
  name: normalized accuracy
34
  source:
35
- url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=DeepAutoAI/Explore_Llama-3.2-1B-Inst_v1
 
36
  name: Open LLM Leaderboard
37
  - task:
38
  type: text-generation
@@ -47,7 +48,8 @@ model-index:
47
  value: 1.28
48
  name: exact match
49
  source:
50
- url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=DeepAutoAI/Explore_Llama-3.2-1B-Inst_v1
 
51
  name: Open LLM Leaderboard
52
  - task:
53
  type: text-generation
@@ -59,10 +61,11 @@ model-index:
59
  num_few_shot: 0
60
  metrics:
61
  - type: acc_norm
62
- value: 0.0
63
  name: acc_norm
64
  source:
65
- url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=DeepAutoAI/Explore_Llama-3.2-1B-Inst_v1
 
66
  name: Open LLM Leaderboard
67
  - task:
68
  type: text-generation
@@ -77,7 +80,8 @@ model-index:
77
  value: 5.2
78
  name: acc_norm
79
  source:
80
- url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=DeepAutoAI/Explore_Llama-3.2-1B-Inst_v1
 
81
  name: Open LLM Leaderboard
82
  - task:
83
  type: text-generation
@@ -94,8 +98,16 @@ model-index:
94
  value: 2.99
95
  name: accuracy
96
  source:
97
- url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=DeepAutoAI/Explore_Llama-3.2-1B-Inst_v1
 
98
  name: Open LLM Leaderboard
 
 
 
 
 
 
 
99
  ---
100
 
101
  # Model Card for Model ID
@@ -104,204 +116,136 @@ model-index:
104
 
105
 
106
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
107
  ## Model Details
108
 
109
- ### Model Description
110
 
111
  <!-- Provide a longer summary of what this model is. -->
112
 
113
- This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
 
114
 
115
- - **Developed by:** [More Information Needed]
116
- - **Funded by [optional]:** [More Information Needed]
117
- - **Shared by [optional]:** [More Information Needed]
118
- - **Model type:** [More Information Needed]
119
- - **Language(s) (NLP):** [More Information Needed]
120
- - **License:** [More Information Needed]
121
- - **Finetuned from model [optional]:** [More Information Needed]
 
122
 
123
  ### Model Sources [optional]
124
 
125
  <!-- Provide the basic links for the model. -->
126
 
127
- - **Repository:** [More Information Needed]
128
- - **Paper [optional]:** [More Information Needed]
129
- - **Demo [optional]:** [More Information Needed]
130
 
131
  ## Uses
132
 
133
  <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
134
 
135
- ### Direct Use
136
 
137
  <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
138
 
139
- [More Information Needed]
140
 
141
- ### Downstream Use [optional]
142
 
143
  <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
144
-
145
- [More Information Needed]
146
 
147
  ### Out-of-Scope Use
148
 
149
  <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
150
 
151
- [More Information Needed]
152
 
153
  ## Bias, Risks, and Limitations
154
 
155
  <!-- This section is meant to convey both technical and sociotechnical limitations. -->
156
 
157
- [More Information Needed]
158
-
159
- ### Recommendations
160
-
161
- <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
162
-
163
- Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
164
 
165
  ## How to Get Started with the Model
166
-
167
- Use the code below to get started with the model.
168
-
169
- [More Information Needed]
170
 
171
  ## Training Details
 
 
 
 
 
172
 
173
  ### Training Data
 
174
 
175
  <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
176
 
177
- [More Information Needed]
178
 
179
  ### Training Procedure
180
 
181
  <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
182
 
183
- #### Preprocessing [optional]
184
-
185
- [More Information Needed]
186
 
187
 
188
- #### Training Hyperparameters
189
-
190
- - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
191
-
192
- #### Speeds, Sizes, Times [optional]
193
-
194
  <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
195
 
196
- [More Information Needed]
197
-
198
- ## Evaluation
199
-
200
- <!-- This section describes the evaluation protocols and provides the results. -->
201
-
202
- ### Testing Data, Factors & Metrics
203
-
204
- #### Testing Data
205
-
206
- <!-- This should link to a Dataset Card if possible. -->
207
-
208
- [More Information Needed]
209
-
210
- #### Factors
211
-
212
- <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
213
-
214
- [More Information Needed]
215
-
216
- #### Metrics
217
-
218
- <!-- These are the evaluation metrics being used, ideally with a description of why. -->
219
-
220
- [More Information Needed]
221
-
222
- ### Results
223
-
224
- [More Information Needed]
225
-
226
- #### Summary
227
-
228
-
229
-
230
- ## Model Examination [optional]
231
-
232
- <!-- Relevant interpretability work for the model goes here -->
233
-
234
- [More Information Needed]
235
 
236
  ## Environmental Impact
237
 
238
  <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
239
 
 
 
240
  Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
241
 
242
- - **Hardware Type:** [More Information Needed]
243
- - **Hours used:** [More Information Needed]
244
- - **Cloud Provider:** [More Information Needed]
245
- - **Compute Region:** [More Information Needed]
246
- - **Carbon Emitted:** [More Information Needed]
247
 
248
  ## Technical Specifications [optional]
249
 
250
  ### Model Architecture and Objective
251
 
252
- [More Information Needed]
 
 
 
 
 
253
 
254
  ### Compute Infrastructure
255
 
256
- [More Information Needed]
257
 
258
  #### Hardware
259
 
260
- [More Information Needed]
261
 
262
  #### Software
263
 
264
- [More Information Needed]
265
-
266
- ## Citation [optional]
267
-
268
- <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
269
-
270
- **BibTeX:**
271
-
272
- [More Information Needed]
273
-
274
- **APA:**
275
-
276
- [More Information Needed]
277
-
278
- ## Glossary [optional]
279
-
280
- <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
281
-
282
- [More Information Needed]
283
-
284
- ## More Information [optional]
285
-
286
- [More Information Needed]
287
-
288
- ## Model Card Authors [optional]
289
-
290
- [More Information Needed]
291
 
292
  ## Model Card Contact
293
-
294
- [More Information Needed]
295
- # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
296
- Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_DeepAutoAI__Explore_Llama-3.2-1B-Inst_v1)
297
-
298
- | Metric |Value|
299
- |-------------------|----:|
300
- |Avg. |10.62|
301
- |IFEval (0-Shot) |49.99|
302
- |BBH (3-Shot) | 4.26|
303
- |MATH Lvl 5 (4-Shot)| 1.28|
304
- |GPQA (0-shot) | 0.00|
305
- |MuSR (0-shot) | 5.20|
306
- |MMLU-PRO (5-shot) | 2.99|
307
-
 
1
  ---
2
  library_name: transformers
 
3
  model-index:
4
  - name: Explore_Llama-3.2-1B-Inst_v1
5
  results:
 
16
  value: 49.99
17
  name: strict accuracy
18
  source:
19
+ url: >-
20
+ https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=DeepAutoAI/Explore_Llama-3.2-1B-Inst_v1
21
  name: Open LLM Leaderboard
22
  - task:
23
  type: text-generation
 
32
  value: 4.26
33
  name: normalized accuracy
34
  source:
35
+ url: >-
36
+ https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=DeepAutoAI/Explore_Llama-3.2-1B-Inst_v1
37
  name: Open LLM Leaderboard
38
  - task:
39
  type: text-generation
 
48
  value: 1.28
49
  name: exact match
50
  source:
51
+ url: >-
52
+ https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=DeepAutoAI/Explore_Llama-3.2-1B-Inst_v1
53
  name: Open LLM Leaderboard
54
  - task:
55
  type: text-generation
 
61
  num_few_shot: 0
62
  metrics:
63
  - type: acc_norm
64
+ value: 0
65
  name: acc_norm
66
  source:
67
+ url: >-
68
+ https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=DeepAutoAI/Explore_Llama-3.2-1B-Inst_v1
69
  name: Open LLM Leaderboard
70
  - task:
71
  type: text-generation
 
80
  value: 5.2
81
  name: acc_norm
82
  source:
83
+ url: >-
84
+ https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=DeepAutoAI/Explore_Llama-3.2-1B-Inst_v1
85
  name: Open LLM Leaderboard
86
  - task:
87
  type: text-generation
 
98
  value: 2.99
99
  name: accuracy
100
  source:
101
+ url: >-
102
+ https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=DeepAutoAI/Explore_Llama-3.2-1B-Inst_v1
103
  name: Open LLM Leaderboard
104
+ license: apache-2.0
105
+ language:
106
+ - en
107
+ metrics:
108
+ - accuracy
109
+ base_model:
110
+ - meta-llama/Llama-3.2-1B
111
  ---
112
 
113
  # Model Card for Model ID
 
116
 
117
 
118
 
119
+ ## Overview
120
+
121
+
122
+ **DeepAutoAI/Explore_Llama-3.2-1B-Inst** is developed by **deepAuto.ai** by learning the distribution of llama-3.2-1B-instruct.
123
+ Our approach leverages the base model’s pretrained weights and optimizes them for the **Winogrande** and **ARC-Challenge** datasets by
124
+ training a latent diffusion model on the pretrained weights. specifically , this model is based on learning the distrinution of transformer layers from 16 to 31.
125
+
126
+ Through this process, we learn the distribution of the base model's weight space, enabling us to explore optimal configurations.
127
+ We then sample multiple sets of weights, using the **model-soup averaging technique** to identify the best-performing weights for both datasets.
128
+ These weights are merged using linear interpolation to create the final model weights for **DeepAutoAI/Explore_Llama-3.1-1B-Inst**.
129
+
130
+ This approach has led to improved performance on previously unseen leaderboard tasks, all without any additional task-specific training.
131
+
132
+ The work is currently in progress
133
+
134
+
135
  ## Model Details
136
 
 
137
 
138
  <!-- Provide a longer summary of what this model is. -->
139
 
140
+ We trained a diffusion model to learn the distribution of subset of llama to enable generation weights that improve the performance.
141
+ We generate task specific weights on winogrande and arc_challenge then transfer the best model for leaderboard benchmarking.
142
 
143
+ - **Developed by:** DeepAuto.ai
144
+ - **Funded by [optional]:** DeepAuto.ai
145
+ - **Shared by [optional]:** DeepAuto.ai
146
+ - **Model type:** llama-3.2-1B
147
+ - **Language(s) (NLP):** English
148
+ - **License:** Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in
149
+ - compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0
150
+ - **Finetuned from model [optional]:** No fine-tuning
151
 
152
  ### Model Sources [optional]
153
 
154
  <!-- Provide the basic links for the model. -->
155
 
156
+ - **Repository:** Under construction
157
+ - **Paper [optional]:** To be announce
158
+
159
 
160
  ## Uses
161
 
162
  <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
163
 
 
164
 
165
  <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
166
 
167
+ The direct use case of our work is o improve existing model performance as well as generating task specific weights with no training.
168
 
 
169
 
170
  <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
171
+ Performance improvement of existing large models with limited compute
 
172
 
173
  ### Out-of-Scope Use
174
 
175
  <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
176
 
177
+ No fine-tuning or architecture generalization
178
 
179
  ## Bias, Risks, and Limitations
180
 
181
  <!-- This section is meant to convey both technical and sociotechnical limitations. -->
182
 
183
+ Using a generative model to produce weights can potentially lead to unintended or undesirable outputs. However, the generated content
184
+ will still fall within the range of what the base model is inherently capable of producing.
 
 
 
 
 
185
 
186
  ## How to Get Started with the Model
187
+ The work is under progress
 
 
 
188
 
189
  ## Training Details
190
+ We employed a latent diffusion process on pretrained model weights, unlocking the ability to generate diverse, previously unseen neural networks.
191
+ Remarkably, even within the constraints of one-shot learning, our approach consistently produces a wide range of weight variations, each offering
192
+ distinct performance characteristics. These generated weights not only open opportunities for weight averaging and model merging but also have the
193
+ potential to significantly enhance model performance. Moreover, they enable the creation of task-specific weights, tailored to optimize performance
194
+ for specialized applications
195
 
196
  ### Training Data
197
+ The training data used to produced the current model is the base pretrained weights
198
 
199
  <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
200
 
 
201
 
202
  ### Training Procedure
203
 
204
  <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
205
 
206
+ - We selected a set of layers and combined their pretrained weights, then trained a Variational Autoencoder (VAE) to encode these weights into the layer dimension.
207
+ - We conditionally trained a diffusion model on this set of weights, allowing individual sampling of layer-specific weights.
208
+ - All selected layers were encoded into a 1024-dimensional space. This model exclusively contained the sampled weights for layer normalization."
209
 
210
 
 
 
 
 
 
 
211
  <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
212
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
213
 
214
  ## Environmental Impact
215
 
216
  <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
217
 
218
+
219
+
220
  Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
221
 
222
+ - **Hardware Type:** Nvidia-A100-40Gb
223
+ - **Hours used:** VAE is trained for 4 hour and diffusion process 4 hours
224
+ - **Compute Region:** South Korea
225
+ - **Carbon Emitted:** 0.96kg
 
226
 
227
  ## Technical Specifications [optional]
228
 
229
  ### Model Architecture and Objective
230
 
231
+ We used Latent diffusion for weights generation, and llama3-2-1B as target architectures.
232
+
233
+ The primary objective of this weight generation process was to demonstrate that by learning only the distribution
234
+ of few layers weights (normlaization layers in this case) in an 1-billion-parameter model, it is possible to significantly enhance the
235
+ model's capabilities. Notably, this is achieved using a fraction of the computational resources and without the
236
+ need for fine-tuning, showcasing the efficiency and potential of this approach.
237
 
238
  ### Compute Infrastructure
239
 
240
+ Nvidia-A100 cluster
241
 
242
  #### Hardware
243
 
244
+ A single Nvidia-A100
245
 
246
  #### Software
247
 
248
+ Model is tested using lm-harness tool version 0.4.3
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
249
 
250
  ## Model Card Contact
251
+ deepauto.ai