bedio commited on
Commit
9752180
·
verified ·
1 Parent(s): e362d00

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +17 -17
README.md CHANGED
@@ -5,18 +5,27 @@ tags: []
5
 
6
  # Model Card for Model ID
7
 
8
- <!-- Provide a quick summary of what the model is/does. -->
9
- DeepAutoAI/Explore_Llama-3.1-8B-Inst is a customized variant of Llama-2.1-8B-Instruct. This customization is achieved by learning
10
- the distribution of all normalization layer weights followed by the distribution of the last transformer block, 30, and 24th FFN layers of
11
- the original Llama model.
12
- A layer-conditional diffusion based weights generation model that enables sampling for performance enhancement by leveraging
13
- the learned distributions to optimize the merging process is used to generate newly diverse weights
14
 
15
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
16
 
17
  ## Model Details
18
 
19
- ### Model Description
20
 
21
  <!-- Provide a longer summary of what this model is. -->
22
 
@@ -65,17 +74,8 @@ No fine-tuning or architecture generalization
65
  Using a generative model to produce weights can potentially lead to unintended or undesirable outputs. However, the generated content
66
  will still fall within the range of what the base model is inherently capable of producing.
67
 
68
- ### Recommendations
69
-
70
- <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
71
-
72
- Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
73
-
74
  ## How to Get Started with the Model
75
-
76
- Use the code below to get started with the model.
77
-
78
- [More Information Needed]
79
 
80
  ## Training Details
81
  We employed a latent diffusion process on pretrained model weights, unlocking the ability to generate diverse, previously unseen neural networks.
 
5
 
6
  # Model Card for Model ID
7
 
 
 
 
 
 
 
8
 
9
 
10
+ ## Overview
11
+
12
+
13
+ **DeepAutoAI/Explore_Llama-3.1-8B-Inst** is developed by **deepAuto.ai** by learning the distribution of llama-3.1-8B-instruct.
14
+ Our approach leverages the base model’s pretrained weights and optimizes them for the **Winogrande** and **ARC-Challenge** datasets by
15
+ training a latent diffusion model on the pretrained weights. specifically , this model is based on learning the distrinution of
16
+ the last transformer block, 30, and 24th FFN layers of the original Llama model.
17
+
18
+ Through this process, we learn the distribution of the base model's weight space, enabling us to explore optimal configurations.
19
+ We then sample multiple sets of weights, using the **model-soup averaging technique** to identify the best-performing weights for both datasets.
20
+ These weights are merged using linear interpolation to create the final model weights for **DeepAutoAI/Explore_Llama-3.1-8B-Inst**.
21
+
22
+ This approach has led to improved performance on previously unseen leaderboard tasks, all without any additional task-specific training.
23
+
24
+ The work is currently in progress
25
+
26
 
27
  ## Model Details
28
 
 
29
 
30
  <!-- Provide a longer summary of what this model is. -->
31
 
 
74
  Using a generative model to produce weights can potentially lead to unintended or undesirable outputs. However, the generated content
75
  will still fall within the range of what the base model is inherently capable of producing.
76
 
 
 
 
 
 
 
77
  ## How to Get Started with the Model
78
+ The work is under progress
 
 
 
79
 
80
  ## Training Details
81
  We employed a latent diffusion process on pretrained model weights, unlocking the ability to generate diverse, previously unseen neural networks.