--- language: - en license: mit library_name: transformers tags: - mergekit - merge - unsloth base_model: - LeroyDyer/Mixtral_AI_CyberBrain_2.0 - ezelikman/quietstar-8-ahead --- hopefully this merge took correctly ! .... Enabling for thights to be displayed ; obviously untrained and will still need fine tuning ! as well as it has not been correctly coded for true management via transformers pretrained args. i will try to add the other arch: leaving it available to perhaps load with different remote auto mapping! , I will leve both automapping here and test both models to see which configuration loads correctly for training ! then wich loads correctly for usage ; as this also has been a minor issue ; the internall heads have default settings ; with remote code installed then its should be configuarble. # merge This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit). ## Merge Details ### Merge Method This model was merged using the SLERP merge method. ### Models Merged The following models were included in the merge: * [LeroyDyer/Mixtral_AI_CyberBrain_2.0](https://huggingface.co/LeroyDyer/Mixtral_AI_CyberBrain_2.0) * [ezelikman/quietstar-8-ahead](https://huggingface.co/ezelikman/quietstar-8-ahead) ### Configuration The following YAML configuration was used to produce this model: ```yaml slices: - sources: - model: LeroyDyer/Mixtral_AI_CyberBrain_2.0 layer_range: [0, 32] - model: ezelikman/quietstar-8-ahead layer_range: [0, 32] # or, the equivalent models: syntax: # models: # - model: mistralai/Mistral-7B-Instruct-v0.2 # LaRGER MODEL MUST BE BASE or # BASE MODEL MUST BE THE TOKENIZER YOU WISH TO ADOPT # so for models with customized processes they must be the base model # If the base model has remote code then this must be collected and added # to the repo after and the config file adusted to allow for automapping to your new repo # - model: yanismiraoui/Yarn-Mistral-7b-128k-sharded merge_method: slerp base_model: ezelikman/quietstar-8-ahead parameters: t: - filter: self_attn value: [0.3, 0.6, 0.3786, 0.6, 0.6] - filter: mlp value: [0.7, 0.4, 0.6, 0.4, 0.7] - value: 0.5 # fallback for rest of tensors dtype: float16 ```