DeepAutoAI
/

Explore_Llama-3.1-8B-Inst

@@ -5,18 +5,27 @@ tags: []
 # Model Card for Model ID
-<!-- Provide a quick summary of what the model is/does. -->
-DeepAutoAI/Explore_Llama-3.1-8B-Inst is a customized variant of Llama-2.1-8B-Instruct. This customization is achieved by learning
-the distribution of all normalization layer weights followed by the distribution of the last transformer block, 30, and 24th FFN layers of
- the original Llama model.
-A layer-conditional diffusion based weights generation model that enables sampling for performance enhancement by leveraging
-the learned distributions to optimize the merging process is used to generate newly diverse weights
 ## Model Details
-### Model Description
 <!-- Provide a longer summary of what this model is. -->
@@ -65,17 +74,8 @@ No fine-tuning or architecture generalization
 Using a generative model to produce weights can potentially lead to unintended or undesirable outputs. However, the generated content
 will still fall within the range of what the base model is inherently capable of producing.
-### Recommendations
-<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
-Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
 ## How to Get Started with the Model
-Use the code below to get started with the model.
-[More Information Needed]
 ## Training Details
 We employed a latent diffusion process on pretrained model weights, unlocking the ability to generate diverse, previously unseen neural networks.

 # Model Card for Model ID
+## Overview
+**DeepAutoAI/Explore_Llama-3.1-8B-Inst** is developed by **deepAuto.ai** by learning the distribution of llama-3.1-8B-instruct.
+Our approach leverages the base model’s pretrained weights and optimizes them for the **Winogrande** and **ARC-Challenge** datasets by
+training a latent diffusion model on the pretrained weights. specifically , this model is based on learning the distrinution of
+the last transformer block, 30, and 24th FFN layers of the original Llama model.
+Through this process, we learn the distribution of the base model's weight space, enabling us to explore optimal configurations.
+We then sample multiple sets of weights, using the **model-soup averaging technique** to identify the best-performing weights for both datasets.
+These weights are merged using linear interpolation to create the final model weights for **DeepAutoAI/Explore_Llama-3.1-8B-Inst**.
+This approach has led to improved performance on previously unseen leaderboard tasks, all without any additional task-specific training.
+The work is currently in progress
 ## Model Details
 <!-- Provide a longer summary of what this model is. -->
 Using a generative model to produce weights can potentially lead to unintended or undesirable outputs. However, the generated content
 will still fall within the range of what the base model is inherently capable of producing.
 ## How to Get Started with the Model
+The work is under progress
 ## Training Details
 We employed a latent diffusion process on pretrained model weights, unlocking the ability to generate diverse, previously unseen neural networks.