bedio commited on
Commit
7de2179
·
verified ·
1 Parent(s): 45bb133

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -4
README.md CHANGED
@@ -120,11 +120,11 @@ base_model:
120
 
121
  **DeepAutoAI/Explore_Llama-3.2-1B-Inst** is developed by **deepAuto.ai** by learning the distribution of llama-3.2-1B-instruct.
122
  Our approach leverages the base model’s pretrained weights and optimizes them for the **Winogrande** and **ARC-Challenge** datasets by
123
- training a latent diffusion model on the pretrained weights. specifically , this model is based on learning the distrinution of transformer layers from 16 to 31.
 
124
 
125
- Through this process, we learn the distribution of the base model's weight space, enabling us to explore optimal configurations.
126
- We then sample multiple sets of weights, using the **model-soup averaging technique** to identify the best-performing weights for both datasets.
127
- These weights are merged using linear interpolation to create the final model weights for **DeepAutoAI/Explore_Llama-3.1-1B-Inst**.
128
 
129
  This approach has led to improved performance on previously unseen leaderboard tasks, all without any additional task-specific training.
130
 
 
120
 
121
  **DeepAutoAI/Explore_Llama-3.2-1B-Inst** is developed by **deepAuto.ai** by learning the distribution of llama-3.2-1B-instruct.
122
  Our approach leverages the base model’s pretrained weights and optimizes them for the **Winogrande** and **ARC-Challenge** datasets by
123
+ training a latent diffusion model on the pretrained weights. specifically , this model is based on learning the distrinution of the top 2 layer of layer in feed forward
124
+ or attention layers based on spectrum based optimum layer selection.
125
 
126
+
127
+ We directly transfer the weights of the best model on both winogrande and arc-challenge for **DeepAutoAI/Explore_Llama-3.1-1B-Inst**.
 
128
 
129
  This approach has led to improved performance on previously unseen leaderboard tasks, all without any additional task-specific training.
130