YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

Quantization made by Richard Erkhov.

Github

Discord

Request more models

Qwen2-1.5Moe - bnb 8bits

Original model description:

base_model: - Qwen/Qwen2-1.5B - Replete-AI/Replete-Coder-Qwen2-1.5b license: apache-2.0 tags: - moe - frankenmoe - merge - mergekit - lazymergekit - Qwen/Qwen2-1.5B - Replete-AI/Replete-Coder-Qwen2-1.5b

QwenMoEAriel

QwenMoEAriel is a Mixture of Experts (MoE) made with the following models using LazyMergekit:

馃З Configuration

base_model : Qwen/Qwen2-1.5B architecture: qwen experts:

  • source_model: Qwen/Qwen2-1.5B positive_prompts:
    • "chat"
    • "assistant"
    • "tell me"
    • "explain"
    • "I want"
  • source_model: Replete-AI/Replete-Coder-Qwen2-1.5b positive_prompts:
    • "code"
    • "python"
    • "javascript"
    • "programming"
    • "algorithm" shared_experts:
  • source_model: Qwen/Qwen2-1.5B positive_prompts: # required by Qwen MoE for "hidden" gate mode, otherwise not allowed
    • "chat"

    (optional, but recommended:)

    residual_scale: 0.1 # downweight output from shared expert to prevent overcooking the model

馃捇 Usage

!pip install -qU transformers bitsandbytes accelerate einops
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(device)
model = AutoModelForCausalLM.from_pretrained(
    "femiari/Qwen2-1.5Moe",
    torch_dtype=torch.float16,
    ignore_mismatched_sizes=True
).to(device)
tokenizer = AutoTokenizer.from_pretrained("femiari/Qwen2-1.5Moe")

prompt = "Give me a short introduction to large language model."
messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(device)

generated_ids = model.generate(
    model_inputs.input_ids,
    max_new_tokens=512
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]

print(response)
Downloads last month
5
Safetensors
Model size
3.86B params
Tensor type
F32
FP16
I8
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.