TinyLlama-1.1B ---My personal Test update

Tasks Version Filter n-shot Metric Value Stderr
arc_challenge Yaml none 0 acc 0.2619 ± 0.0128
none 0 acc_norm 0.2892 ± 0.0133
arc_easy Yaml none 0 acc 0.4777 ± 0.0102
none 0 acc_norm 0.4461 ± 0.0102
boolq Yaml none 0 acc 0.6297 ± 0.0084
hellaswag Yaml none 0 acc 0.3934 ± 0.0049
none 0 acc_norm 0.4930 ± 0.0050
openbookqa Yaml none 0 acc 0.2120 ± 0.0183
none 0 acc_norm 0.3260 ± 0.0210
piqa Yaml none 0 acc 0.6915 ± 0.0108
none 0 acc_norm 0.6877 ± 0.0108
winogrande Yaml none 0 acc 0.5714 ± 0.0139

Llamafactory EVAL

!CUDA_VISIBLE_DEVICES=0 python src/evaluate.py
--model_name_or_path Deathsquad10/TinyLlama-Remix
--template vanilla
--task mmlu
--split test
--lang en
--n_shot 5
--use_unsloth
--batch_size 1

       Average: 26.29
       STEM: 27.10
       Social Sciences: 25.48
       Humanities: 25.62
       Other: 27.26

!CUDA_VISIBLE_DEVICES=0 python src/evaluate.py
--model_name_or_path Deathsquad10/TinyLlama-Remix
--template vanilla
--task cmmlu
--split test
--lang en
--n_shot 5
--use_unsloth
--batch_size 2

      Average: 24.98
      STEM: 25.52
      Social Sciences: 24.70
      Humanities: 24.59
      Other: 25.19

https://github.com/jzhang38/TinyLlama

The TinyLlama project aims to pretrain a 1.1B Llama model on 3 trillion tokens. With some proper optimization, we can achieve this within a span of "just" 90 days using 16 A100-40G GPUs 🚀🚀. The training has started on 2023-09-01.

We adopted exactly the same architecture and tokenizer as Llama 2. This means TinyLlama can be plugged and played in many open-source projects built upon Llama. Besides, TinyLlama is compact with only 1.1B parameters. This compactness allows it to cater to a multitude of applications demanding a restricted computation and memory footprint.

This Model

This is the chat model finetuned on top of TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T. We follow HF's Zephyr's training recipe. The model was " initially fine-tuned on a variant of the UltraChat dataset, which contains a diverse range of synthetic dialogues generated by ChatGPT. We then further aligned the model with 🤗 TRL's DPOTrainer on the openbmb/UltraFeedback dataset, which contain 64k prompts and model completions that are ranked by GPT-4."

How to use

You will need the transformers>=4.34 Do check the TinyLlama github page for more information.

# Install transformers from source - only needed for versions <= v4.34
# pip install git+https://github.com/huggingface/transformers.git
# pip install accelerate
import torch
from transformers import pipeline
pipe = pipeline("text-generation", model="TinyLlama/TinyLlama-1.1B-Chat-v1.0", torch_dtype=torch.bfloat16, device_map="auto")
# We use the tokenizer's chat template to format each message - see https://huggingface.co/docs/transformers/main/en/chat_templating
messages = [
    {
        "role": "system",
        "content": "You are a friendly chatbot who always responds in the style of a pirate",
    },
    {"role": "user", "content": "How many helicopters can a human eat in one sitting?"},
]
prompt = pipe.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
outputs = pipe(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
print(outputs[0]["generated_text"])
# <|system|>
# You are a friendly chatbot who always responds in the style of a pirate.</s>
# <|user|>
# How many helicopters can a human eat in one sitting?</s>
# <|assistant|>
# ...
Downloads last month
888
Safetensors
Model size
1.1B params
Tensor type
BF16
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Datasets used to train Deathsquad10/TinyLlama-Remix