MobiLlama-0.5B-Chat

We present MobiLlama-0.5B-Chat, an instruction following model finetuned on MBZUAI/MobiLlama-05B.

Model Summary

"Bigger the better" has been the predominant trend in recent Large Language Models (LLMs) development. However, LLMs do not suit well for scenarios that require on-device processing, energy efficiency, low memory footprint, and response efficiency. These requisites are crucial for privacy, security, and sustainable deployment. This paper explores the ‘less is more’ paradigm by addressing the challenge of designing accurate yet efficient Small Language Models (SLMs) for resource-constrained devices. Our primary contribution is the introduction of an accurate and fully transparent open-source 0.5 billion (0.5B) parameter SLM, named MobiLlama, catering to the specific needs of resource-constrained computing with an emphasis on enhanced performance with reduced resource demands. MobiLlama is a SLM design that initiates from a larger model and applies a careful parameter sharing scheme to reduce both the pre-training and the deployment cost. Our work strives to not only bridge the gap in open-source SLMs but also ensures full transparency, where complete training data pipeline, training code, model weights, and over 300 checkpoints along with evaluation codes are available on our Github.

Arxiv Paper Link

Model Description

Model type: Small Language Model (SLM) built using the architecture design of LLaMA-7B
Language(s) (NLP): English
License: Apache 2.0
Resources for more information:

Loading MobiLlama-0.5B-Chat

from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("MBZUAI/MobiLlama-05B-Chat", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("MBZUAI/MobiLlama-05B-Chat", trust_remote_code=True)
model.to('cuda')

#template adapated from fastchat
template= "A chat between a curious human and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the human's questions.\n### Human: Got any creative ideas for a 10 year old’s birthday?\n### Assistant: Of course! Here are some creative ideas for a 10-year-old's birthday party:\n1. Treasure Hunt: Organize a treasure hunt in your backyard or nearby park. Create clues and riddles for the kids to solve, leading them to hidden treasures and surprises.\n2. Science Party: Plan a science-themed party where kids can engage in fun and interactive experiments. You can set up different stations with activities like making slime, erupting volcanoes, or creating simple chemical reactions.\n3. Outdoor Movie Night: Set up a backyard movie night with a projector and a large screen or white sheet. Create a cozy seating area with blankets and pillows, and serve popcorn and snacks while the kids enjoy a favorite movie under the stars.\n4. DIY Crafts Party: Arrange a craft party where kids can unleash their creativity. Provide a variety of craft supplies like beads, paints, and fabrics, and let them create their own unique masterpieces to take home as party favors.\n5. Sports Olympics: Host a mini Olympics event with various sports and games. Set up different stations for activities like sack races, relay races, basketball shooting, and obstacle courses. Give out medals or certificates to the participants.\n6. Cooking Party: Have a cooking-themed party where the kids can prepare their own mini pizzas, cupcakes, or cookies. Provide toppings, frosting, and decorating supplies, and let them get hands-on in the kitchen.\n7. Superhero Training Camp: Create a superhero-themed party where the kids can engage in fun training activities. Set up an obstacle course, have them design their own superhero capes or masks, and organize superhero-themed games and challenges.\n8. Outdoor Adventure: Plan an outdoor adventure party at a local park or nature reserve. Arrange activities like hiking, nature scavenger hunts, or a picnic with games. Encourage exploration and appreciation for the outdoors.\nRemember to tailor the activities to the birthday child's interests and preferences. Have a great celebration!\n### Human: {prompt}\n### Assistant:"

prompt = "Generate a C code snippet that implements a function to calculate the Fibonacci sequence using recursion."

input_str = template.format(prompt=prompt)
input_ids = tokenizer(input_str, return_tensors="pt").to('cuda').input_ids
outputs = model.generate(input_ids, max_length=1000, pad_token_id=tokenizer.eos_token_id)
print(tokenizer.batch_decode(outputs[:, input_ids.shape[1]:-1])[0].strip())

Alternatively, you may use FastChat:

python3 -m fastchat.serve.cli --model-path MBZUAI/MobiLlama-05B-Chat

MobiLlama-0.5B-Chat Finetuning Details

DataMix

Subset	Number of rows	License
WizardLM/WizardLM_evol_instruct_V2_196k	143k
icybee/share_gpt_90k_v1	90k	cc0-1.0
Total	233k

Hyperparameters

Hyperparameter	Value
Total Parameters	0.52B
Hidden Size	2048
Intermediate Size (MLPs)	5632
Number of Attention Heads	32
Number of Hidden Lyaers	22
RMSNorm ɛ	1e^-5
Max Seq Length	2048
Vocab Size	32000

Training Hyperparameter	Value
learning_rate	2e-5
num_train_epochs	3
per_device_train_batch_size	2
gradient_accumulation_steps	16
warmup_ratio	0.04
model_max_length	2048

Evaluation

Evaluation Benchmark	MobiLlama-05B-Chat	MobiLlama-1.2B-Chat
HellaSwag	0.5042	0.6244
MMLU	0.2677	0.2635
Arc Challenge	0.2935	0.3558
TruthfulQA	0.3997	0.3848
CrowsPairs	0.5694	0.679
PIQA	0.7078	0.7557
Race	0.3320	0.3598
SIQA	0.4165	0.4396
Winogrande	0.5659	0.5966

Citation

BibTeX:

@misc{thawakar2024mobillama,
      title={MobiLlama: Towards Accurate and Lightweight Fully Transparent GPT}, 
      author={Omkar Thawakar and Ashmal Vayani and Salman Khan and Hisham Cholakkal and Rao Muhammad Anwer and Michael Felsberg and Timothy Baldwin and Eric P. Xing and Fahad Shahbaz Khan},
      year={2024},
      eprint={2402.16840},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

MBZUAI
/

MobiLlama-05B-Chat