Update README.md
Browse files
README.md
CHANGED
@@ -5,18 +5,27 @@ tags: []
|
|
5 |
|
6 |
# Model Card for Model ID
|
7 |
|
8 |
-
<!-- Provide a quick summary of what the model is/does. -->
|
9 |
-
DeepAutoAI/Explore_Llama-3.1-8B-Inst is a customized variant of Llama-2.1-8B-Instruct. This customization is achieved by learning
|
10 |
-
the distribution of all normalization layer weights followed by the distribution of the last transformer block, 30, and 24th FFN layers of
|
11 |
-
the original Llama model.
|
12 |
-
A layer-conditional diffusion based weights generation model that enables sampling for performance enhancement by leveraging
|
13 |
-
the learned distributions to optimize the merging process is used to generate newly diverse weights
|
14 |
|
15 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
16 |
|
17 |
## Model Details
|
18 |
|
19 |
-
### Model Description
|
20 |
|
21 |
<!-- Provide a longer summary of what this model is. -->
|
22 |
|
@@ -65,17 +74,8 @@ No fine-tuning or architecture generalization
|
|
65 |
Using a generative model to produce weights can potentially lead to unintended or undesirable outputs. However, the generated content
|
66 |
will still fall within the range of what the base model is inherently capable of producing.
|
67 |
|
68 |
-
### Recommendations
|
69 |
-
|
70 |
-
<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
|
71 |
-
|
72 |
-
Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
|
73 |
-
|
74 |
## How to Get Started with the Model
|
75 |
-
|
76 |
-
Use the code below to get started with the model.
|
77 |
-
|
78 |
-
[More Information Needed]
|
79 |
|
80 |
## Training Details
|
81 |
We employed a latent diffusion process on pretrained model weights, unlocking the ability to generate diverse, previously unseen neural networks.
|
|
|
5 |
|
6 |
# Model Card for Model ID
|
7 |
|
|
|
|
|
|
|
|
|
|
|
|
|
8 |
|
9 |
|
10 |
+
## Overview
|
11 |
+
|
12 |
+
|
13 |
+
**DeepAutoAI/Explore_Llama-3.1-8B-Inst** is developed by **deepAuto.ai** by learning the distribution of llama-3.1-8B-instruct.
|
14 |
+
Our approach leverages the base model’s pretrained weights and optimizes them for the **Winogrande** and **ARC-Challenge** datasets by
|
15 |
+
training a latent diffusion model on the pretrained weights. specifically , this model is based on learning the distrinution of
|
16 |
+
the last transformer block, 30, and 24th FFN layers of the original Llama model.
|
17 |
+
|
18 |
+
Through this process, we learn the distribution of the base model's weight space, enabling us to explore optimal configurations.
|
19 |
+
We then sample multiple sets of weights, using the **model-soup averaging technique** to identify the best-performing weights for both datasets.
|
20 |
+
These weights are merged using linear interpolation to create the final model weights for **DeepAutoAI/Explore_Llama-3.1-8B-Inst**.
|
21 |
+
|
22 |
+
This approach has led to improved performance on previously unseen leaderboard tasks, all without any additional task-specific training.
|
23 |
+
|
24 |
+
The work is currently in progress
|
25 |
+
|
26 |
|
27 |
## Model Details
|
28 |
|
|
|
29 |
|
30 |
<!-- Provide a longer summary of what this model is. -->
|
31 |
|
|
|
74 |
Using a generative model to produce weights can potentially lead to unintended or undesirable outputs. However, the generated content
|
75 |
will still fall within the range of what the base model is inherently capable of producing.
|
76 |
|
|
|
|
|
|
|
|
|
|
|
|
|
77 |
## How to Get Started with the Model
|
78 |
+
The work is under progress
|
|
|
|
|
|
|
79 |
|
80 |
## Training Details
|
81 |
We employed a latent diffusion process on pretrained model weights, unlocking the ability to generate diverse, previously unseen neural networks.
|