Llama-3-Motif-102B / README.md
leejunhyeok's picture
Update README.md
58f9a37 verified
metadata
language:
  - ko
  - en
library_name: transformers

image/png

Introduction

We introduce Llama-3-Motif, a new language model family of Moreh, specialized in Korean and English.
Llama-3-Motif-102B-Instruct is a chat model tuned from this model.

Training Platform

  • Llama-3-Motif-102B is trained on MoAI platform, refer to link for more information.

Quick Usage

base model is not served directly. Instead, you can chat directly with Llama-3-Motif-102B-Instruct through our Model hub.

Details

More details will be provided in the upcoming technical report.

Release Date

2024.12.02

Benchmark Results

Provider Model kmmlu_direct score
Moreh Llama-3-Motif-102B 64.74 +
Meta Llama3-70B-instruct 54.5*
Meta Llama3.1-70B-instruct 52.1*
Meta Llama3.1-405B-instruct 65.8*
Alibaba Qwen2-72B-instruct 64.1*
OpenAI GPT-4-0125-preview 59.95*
OpenAI GPT-4o-2024-05-13 64.11**
Google gemini pro 50.18*
LG exaone 3.0 44.5* +
Naver HyperCLOVA X 53.4* +
Upstage SOLAR-10.7B 41.65* +

* : Community report
** : Measured by Moreh
+ : Claimed to have better capability in Korean

How to use

We do not recommend using base model directly!

Use with vLLM

  • Refer to this link to install vllm
from transformers import AutoTokenizer
from vllm import LLM, SamplingParams

# Change tensor_parallel_size to GPU numbers you can afford
model = LLM("moreh/Llama-3-Motif-102B", tensor_parallel_size=4)
tokenizer = AutoTokenizer.from_pretrained("moreh/Llama-3-Motif-102B")
messages = [
    {"role": "system", "content": "You are a helpful assistant"},
    {"role": "user", "content": "์œ ์น˜์›์ƒ์—๊ฒŒ ๋น…๋ฑ… ์ด๋ก ์˜ ๊ฐœ๋…์„ ์„ค๋ช…ํ•ด๋ณด์„ธ์š”"},
]

messages_batch = [tokenizer.apply_chat_template(conversation=messages, add_generation_prompt=True, tokenize=False)]

# vllm does not support generation_config of hf. So we have to set it like below
sampling_params = SamplingParams(max_tokens=512, temperature=0, repetition_penalty=1.0, stop_token_ids=[tokenizer.eos_token_id])
responses = model.generate(messages_batch, sampling_params=sampling_params)

print(responses[0].outputs[0].text)

Use with transformers

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_id = "moreh/Llama-3-Motif-102B"

# all generation configs are set in generation_configs.json
model = AutoModelForCausalLM.from_pretrained(model_id).cuda()
tokenizer = AutoTokenizer.from_pretrained(model_id)
messages = [
    {"role": "system", "content": "You are a helpful assistant"},
    {"role": "user", "content": "์œ ์น˜์›์ƒ์—๊ฒŒ ๋น…๋ฑ… ์ด๋ก ์˜ ๊ฐœ๋…์„ ์„ค๋ช…ํ•ด๋ณด์„ธ์š”"},
]

messages_batch = tokenizer.apply_chat_template(conversation=messages, add_generation_prompt=True, tokenize=False)
input_ids = tokenizer(messages_batch, padding=True, return_tensors='pt')['input_ids'].cuda()

outputs = model.generate(input_ids)