Qwen2.5-1.5B-ultrachat200k

Model Details

Training Details

Training Hyperparameters

attn_implementation: flash_attention_2
bf16: True
learning_rate: 5e-5
lr_scheduler_type: cosine
per_device_train_batch_size: 2
gradient_accumulation_steps: 16
torch_dtype: bfloat16
num_train_epochs: 1
max_seq_length: 2048
warmup_ratio: 0.1

Results

init_train_loss: 1.421
final_train_loss: 1.192
eval_loss: 1.2003

Training script

import multiprocessing

from datasets import load_dataset
from tqdm.rich import tqdm
from transformers import AutoTokenizer, AutoModelForCausalLM
from trl import (
    ModelConfig,
    SFTTrainer,
    get_peft_config,
    get_quantization_config,
    get_kbit_device_map,
    SFTConfig,
    ScriptArguments
)
from trl.commands.cli_utils import TrlParser

tqdm.pandas()

if __name__ == "__main__":
    parser = TrlParser((ScriptArguments, SFTConfig, ModelConfig))
    args, training_args, model_config = parser.parse_args_and_config()

    quantization_config = get_quantization_config(model_config)
    model_kwargs = dict(
        revision=model_config.model_revision,
        trust_remote_code=model_config.trust_remote_code,
        attn_implementation=model_config.attn_implementation,
        torch_dtype=model_config.torch_dtype,
        use_cache=False if training_args.gradient_checkpointing else True,
        device_map=get_kbit_device_map() if quantization_config is not None else None,
        quantization_config=quantization_config,
    )

    model = AutoModelForCausalLM.from_pretrained(model_config.model_name_or_path,
                                                 **model_kwargs)
    tokenizer = AutoTokenizer.from_pretrained(
        model_config.model_name_or_path, trust_remote_code=model_config.trust_remote_code, use_fast=True
    )
    tokenizer.pad_token = tokenizer.eos_token

    train_dataset = load_dataset(args.dataset_name,
                                 split=args.dataset_train_split,
                                 num_proc=multiprocessing.cpu_count())

    trainer = SFTTrainer(
        model=model,
        args=training_args,
        train_dataset=train_dataset,
        processing_class=tokenizer,
        peft_config=get_peft_config(model_config),
    )

    trainer.train()

    trainer.save_model(training_args.output_dir)

Test Script

from vllm import LLM
from datasets import load_dataset
from vllm.sampling_params import SamplingParams
from transformers import AutoTokenizer

MODEL_PATH = "autodl-tmp/saves/Qwen2.5-1.5B-ultrachat200k"

model = LLM(MODEL_PATH,
            tensor_parallel_size=1,
            dtype='bfloat16')
tokenizer = AutoTokenizer.from_pretrained(MODEL_PATH)

input = tokenizer.apply_chat_template([{"role": "user", "content": "Which province is Shenyang in?"}],
                                    tokenize=False,
                                    add_generation_prompt=True)
sampling_params = SamplingParams(max_tokens=1024,
                                 temperature=0.7,
                                 logprobs=1,
                                 stop_token_ids=[tokenizer.eos_token_id])

vllm_generations = model.generate(input,
                                  sampling_params)

print(vllm_generations[0].outputs[0].text)
# print result: Shenyang is in Liaoning province, China.
Downloads last month
31
Safetensors
Model size
1.54B params
Tensor type
BF16
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for AIR-hl/Qwen2.5-1.5B-ultrachat200k

Base model

Qwen/Qwen2.5-1.5B
Finetuned
(36)
this model
Finetunes
1 model
Quantizations
1 model

Dataset used to train AIR-hl/Qwen2.5-1.5B-ultrachat200k