MobileLLM-125M-HF / README.md
vonjack's picture
Adding Evaluation Results (#1)
95290dd verified
metadata
license: cc-by-nc-4.0
library_name: transformers
model-index:
  - name: MobileLLM-125M-HF
    results:
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: IFEval (0-Shot)
          type: HuggingFaceH4/ifeval
          args:
            num_few_shot: 0
        metrics:
          - type: inst_level_strict_acc and prompt_level_strict_acc
            value: 21.07
            name: strict accuracy
        source:
          url: >-
            https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=vonjack/MobileLLM-125M-HF
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: BBH (3-Shot)
          type: BBH
          args:
            num_few_shot: 3
        metrics:
          - type: acc_norm
            value: 3.15
            name: normalized accuracy
        source:
          url: >-
            https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=vonjack/MobileLLM-125M-HF
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: MATH Lvl 5 (4-Shot)
          type: hendrycks/competition_math
          args:
            num_few_shot: 4
        metrics:
          - type: exact_match
            value: 0.3
            name: exact match
        source:
          url: >-
            https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=vonjack/MobileLLM-125M-HF
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: GPQA (0-shot)
          type: Idavidrein/gpqa
          args:
            num_few_shot: 0
        metrics:
          - type: acc_norm
            value: 1.34
            name: acc_norm
        source:
          url: >-
            https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=vonjack/MobileLLM-125M-HF
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: MuSR (0-shot)
          type: TAUR-Lab/MuSR
          args:
            num_few_shot: 0
        metrics:
          - type: acc_norm
            value: 5.11
            name: acc_norm
        source:
          url: >-
            https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=vonjack/MobileLLM-125M-HF
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: MMLU-PRO (5-shot)
          type: TIGER-Lab/MMLU-Pro
          config: main
          split: test
          args:
            num_few_shot: 5
        metrics:
          - type: acc
            value: 1.82
            name: accuracy
        source:
          url: >-
            https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=vonjack/MobileLLM-125M-HF
          name: Open LLM Leaderboard

Model Details

MobileLLM is introduced: "MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases", published in ICML 2024.

Model Developer: Meta

Model Architecture: MobileLLM is an auto-regressive language model leveraging an optimized transformer architecture, specifically engineered for on-device applications with constrained resources. MobileLLM integrated several key techniques including: (1) SwiGLU activation function, (2) deep and thin architectures, (3) embedding sharing, (4) grouped-query attention. MobileLLM-125M/350M attains a remarkable 2.7%/4.3% accuracy boost over preceding 125M/350M SoTA models on zero-shot commonsense reasoning tasks. In our updated version, we further demonstrate that our design philosophy scales effectively to larger models, with SoTA results for MobileLLM-600M/1B/1.5B.

image/jpeg

# Layers # Attnetion Heads # KV Heads Token Dimension Params
MobileLLM-125M 30 9 3 576 124.6M
MobileLLM-350M 32 15 5 960 345.3M
MobileLLM-600M 40 18 6 1152 603.1M
MobileLLM-1B 54 20 5 1280 1.01B
MobileLLM-1.5B 54 25 5 1600 1.51B
Training Data Input modalities Output modalities Context Length GQA Shared Embeddings Token count
MobileLLM-125M Publicly available online data. Text Text 2k Yes Yes 1T tokens
MobileLLM-350M Publicly available online data. Text Text 2k Yes Yes 1T tokens
MobileLLM-600M Publicly available online data. Text Text 2k Yes Yes 1T tokens
MobileLLM-1B Publicly available online data. Text Text 2k Yes Yes 1T tokens
MobileLLM-1.5B Publicly available online data. Text Text 2k Yes Yes 1T tokens

How to use

We are providing 2 ways to run the model:

HuggingFace

MobileLLM codebase

HuggingFace

To load the pretrained model for further finetuning or evaluation:

from transformers import AutoModelForCausalLM, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("facebook/MobileLLM-125M", use_fast=False)
model = AutoModelForCausalLM.from_pretrained("facebook/MobileLLM-125M", trust_remote_code=True)

Note that the default tokenizer does not contain special tokens. For example you can use:

tokenizer.add_special_tokens(
    {
        "eos_token": "</s>",
        "bos_token": "<s>",
        "unk_token": "<unk>",
    }
)

MobileLLM codebase

We provide the pretraining code in https://github.com/facebookresearch/MobileLLM

> git clone https://github.com/facebookresearch/MobileLLM
> pip install -r requirement.txt

# data pre-process and specify the data path in pretrain.sh
# run pretraining
> bash pretrain.sh 

We also provide evaluation script for calculating ppl of wikitext-2 test split:

> bash eval.sh

You can find more details in the GitHub repo.

Training cost

It takes the following number of days to train MobileLLM on 1T tokens using 32 NVIDIA A100 80G GPUs.

125M 350M 600M 1B 1.5B
~3 days ~6 days ~8 days ~12 days ~18 days

Evaluation

We evaluate the pretrained MobileLLM models on Zero-shot Common Sense Reasoning tasks

MobileLLM-125M

model boolq piqa siqa hellaswag winogrande arc_easy arc_challenge obqa avg.
OPT-125M 41.3 25.2 57.5 62.0 41.9 31.1 31.2 50.8 42.6
GPT-neo-125M 40.7 24.8 61.3 62.5 41.9 29.7 31.6 50.7 42.9
Pythia-160M 40.0 25.3 59.5 62.0 41.5 29.9 31.2 50.9 42.5
MobileLLM-125M 43.9 27.1 60.2 65.3 42.4 38.9 39.5 53.1 46.3
MobileLLM-LS-125M 45.8 28.7 60.4 65.7 42.9 39.5 41.1 52.1 47.0

MobileLLM-350M

model boolq piqa siqa hellaswag winogrande arc_easy arc_challenge obqa avg.
OPT-350M 41.9 25.7 54.0 64.8 42.6 36.2 33.3 52.4 43.9
Pythia-410M 47.1 30.3 55.3 67.2 43.1 40.1 36.2 53.4 46.6
MobileLLM-350M 53.8 33.5 62.4 68.6 44.7 49.6 40.0 57.6 51.3
MobileLLM-LS-350M 54.4 32.5 62.8 69.8 44.1 50.6 45.8 57.2 52.1

MobileLLM-600M

model boolq piqa siqa hellaswag winogrande arc_easy arc_challenge obqa avg.
Qwen1.5-500M 54.7 32.1 46.9 68.9 46.0 48.8 37.7 55.0 48.8
BLOOM-560M 43.7 27.5 53.7 65.1 42.5 36.5 32.6 52.2 44.2
MobiLlama-800M 52.0 31.7 54.6 73.0 43.3 52.3 42.5 56.3 50.7
MobileLLM-600M 58.1 35.8 61.0 72.3 44.9 55.9 47.9 58.6 54.3

MobileLLM-1B

model boolq piqa siqa hellaswag winogrande arc_easy arc_challenge obqa avg.
Pythia-1B 49.9 30.4 58.7 69.2 43.3 47.4 38.6 52.2 48.7
MobiLlama-1B 59.7 38.4 59.2 74.5 44.9 62.0 43.7 59.0 55.2
Falcon-1B 59.5 38.4 63.9 74.6 44.6 62.9 45.6 60.9 56.3
BLOOM-1.1B 47.6 27.3 58.6 67.0 42.4 42.2 36.6 53.8 46.9
TinyLlama-1.1B 59.2 37.1 58.1 72.9 43.9 59.1 44.7 58.8 54.2
MobileLLM-1B 63.0 39.0 66.7 74.4 45.0 61.4 46.8 62.3 57.3

MobileLLM-1.5B

model boolq piqa siqa hellaswag winogrande arc_easy arc_challenge obqa avg.
GPT-neo-1.3B 51.3 33.0 61.8 70.9 43.7 48.6 41.2 54.5 50.6
OPT-1.3B 54.4 31.7 58.4 71.5 44.7 53.7 44.6 59.1 52.3
BLOOM-1.7B 50.9 31.2 61.7 70.0 43.2 47.2 36.2 56.1 49.6
Qwen1.5-1.8B 61.1 36.5 68.3 74.1 47.2 60.4 42.9 61.2 56.5
GPT-neo-2.7B 55.8 34.3 62.4 72.9 43.6 55.6 40.0 57.9 52.8
OPT-2.7B 56.6 34.6 61.8 74.5 45.6 60.2 48.2 59.6 55.1
Pythia-2.8B 59.4 38.9 66.1 73.8 44.5 59.6 45.0 59.4 55.8
BLOOM-3B 55.1 33.6 62.1 70.5 43.2 53.9 41.6 58.2 52.3
MobileLLM-1.5B 67.5 40.9 65.7 74.8 46.4 64.5 50.5 64.7 59.4

Citation

If you find our code useful for your research, please consider citing:

@article{liu2024mobilellm,
    title={MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases},
    author={Liu, Zechun and Zhao, Changsheng and Iandola, Forrest and Lai, Chen and Tian, Yuandong and Fedorov, Igor and Xiong, Yunyang and Chang, Ernie and Shi, Yangyang and Krishnamoorthi, Raghuraman and others},
    journal={arXiv preprint arXiv:2402.14905},
    year={2024}
}

License

MobileLLM is CC-BY-NC 4.0 licensed as of now.

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric Value
Avg. 5.46
IFEval (0-Shot) 21.07
BBH (3-Shot) 3.15
MATH Lvl 5 (4-Shot) 0.30
GPQA (0-shot) 1.34
MuSR (0-shot) 5.11
MMLU-PRO (5-shot) 1.82