DeepAuto-AI
/

Explore_Llama-3.2-1B-Inst_v1.1

Text Generation

text-generation-inference

Model card Files Files and versions Community

bedio commited on Oct 17, 2024

Commit

7de2179

·

verified ·

1 Parent(s): 45bb133

Update README.md

Files changed (1) hide show

README.md +4 -4

README.md CHANGED Viewed

@@ -120,11 +120,11 @@ base_model:
 **DeepAutoAI/Explore_Llama-3.2-1B-Inst** is developed by **deepAuto.ai** by learning the distribution of llama-3.2-1B-instruct.
 Our approach leverages the base model’s pretrained weights and optimizes them for the **Winogrande** and **ARC-Challenge** datasets by
-training a latent diffusion model on the pretrained weights. specifically , this model is based on learning the distrinution of transformer layers from 16 to 31.
-Through this process, we learn the distribution of the base model's weight space, enabling us to explore optimal configurations.
-We then sample multiple sets of weights, using the **model-soup averaging technique** to identify the best-performing weights for both datasets.
-These weights are merged using linear interpolation to create the final model weights for **DeepAutoAI/Explore_Llama-3.1-1B-Inst**.
 This approach has led to improved performance on previously unseen leaderboard tasks, all without any additional task-specific training.

 **DeepAutoAI/Explore_Llama-3.2-1B-Inst** is developed by **deepAuto.ai** by learning the distribution of llama-3.2-1B-instruct.
 Our approach leverages the base model’s pretrained weights and optimizes them for the **Winogrande** and **ARC-Challenge** datasets by
+training a latent diffusion model on the pretrained weights. specifically , this model is based on learning the distrinution of the top 2 layer of layer in feed forward
+or attention layers based on spectrum based optimum layer selection.
+We directly transfer the weights of the best model on both winogrande and arc-challenge for **DeepAutoAI/Explore_Llama-3.1-1B-Inst**.
 This approach has led to improved performance on previously unseen leaderboard tasks, all without any additional task-specific training.