Edit model card

Model Card for Model ID

This is a quantized adapter trained on the Ultrachat 200k dataset for the TinyLlama-1.1B Intermediate Step 1431k 3T model.

adapter_name = 'iqbalamo93/TinyLlama-1.1B-intermediate-1431k-3T-adapters-ultrachat'

Model Details

Base model was quantized using BitsAndBytes

from bitsandbytes import BitsAndBytesConfig

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,                  # Use 4-bit precision model loading
    bnb_4bit_quant_type="nf4",          # Quantization type
    bnb_4bit_compute_dtype="float16",   # Compute data type
    bnb_4bit_use_double_quant=True      # Apply nested quantization
)

Model Description

This is quantized adapters trained on the Ultrachat 200k dataset for the TinyLlama-1.1B Intermediate Step 1431k 3T model.

How to use

Method 1: Direct loading via AutoPeftModel

from peft import PeftModel, AutoPeftModelForCausalLM
from transformers import pipeline, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("TinyLlama/TinyLlama-1.1B-Chat-v1.0")
adapter_name = 'iqbalamo93/TinyLlama-1.1B-intermediate-1431k-3T-adapters-ultrachat'
model = AutoPeftModelForCausalLM.from_pretrained(
    adapter_name,
    device_map="auto"
)
model = model.merge_and_unload()

prompt = """<|user|>
Tell me something about Large Language Models.</s>
<|assistant|>
"""

pipe = pipeline(task="text-generation", model=model, tokenizer=tokenizer)
print(pipe(prompt)[0]["generated_text"])

Method 2: direct loading AutoModel

model = AutoModelForCausalLM.from_pretrained(adapter_name,
                                             device_map="auto"
                                             )

prompt = """<|user|>
Tell me something about Large Language Models.</s>
<|assistant|>
"""

pipe = pipeline(task="text-generation", model=model, tokenizer=tokenizer)
print(pipe(prompt)[0]["generated_text"])

Method 3: Using peftModel


bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,  # Use 4-bit precision model loading
    bnb_4bit_quant_type="nf4",  # Quantization type
    bnb_4bit_compute_dtype="float16",  # Compute dtype
    bnb_4bit_use_double_quant=True,  # Apply nested quantization
)

model_name = "TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T"
adapter_name = 'iqbalamo93/TinyLlama-1.1B-intermediate-1431k-3T-adapters-ultrachat'
model = AutoModelForCausalLM.from_pretrained(
          model_name, quantization_config=bnb_config,)

model = PeftModel.from_pretrained(
            model,adapter_name
)

tokenizer = AutoTokenizer.from_pretrained("TinyLlama/TinyLlama-1.1B-Chat-v1.0")

prompt = """<|user|>
Tell me something about Large Language Models.</s>
<|assistant|>
"""

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
with torch.no_grad():
    outputs = model.generate(
        input_ids=inputs['input_ids'],
        temperature=0.7,            # Controls randomness: lower = more deterministic
        top_p=0.9,                  # Nucleus sampling
        top_k=50,                   # Top-K sampling
        num_return_sequences=1,)
for i, output in enumerate(outputs):
    generated_text = tokenizer.decode(output, skip_special_tokens=True)
    print(f"--- Generated Sequence {i + 1} ---")
    print(generated_text)
Downloads last month
28
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for iqbalamo93/TinyLlama-1.1B-intermediate-1431k-3T-adapters-ultrachat

Dataset used to train iqbalamo93/TinyLlama-1.1B-intermediate-1431k-3T-adapters-ultrachat

Collection including iqbalamo93/TinyLlama-1.1B-intermediate-1431k-3T-adapters-ultrachat