metadata
language:
- ko
- en
library_name: transformers
Introduction
We introduce Llama-3-Motif, a new language model family of Moreh, specialized in Korean and English.
Llama-3-Motif-102B-Instruct is a chat model tuned from this model.
Training Platform
- Llama-3-Motif-102B is trained on MoAI platform, refer to link for more information.
Quick Usage
base model is not served directly. Instead, you can chat directly with Llama-3-Motif-102B-Instruct through our Model hub.
Details
More details will be provided in the upcoming technical report.
Release Date
2024.12.02
Benchmark Results
Provider | Model | kmmlu_direct score | |
---|---|---|---|
Moreh | Llama-3-Motif-102B | 64.74 | + |
Meta | Llama3-70B-instruct | 54.5* | |
Meta | Llama3.1-70B-instruct | 52.1* | |
Meta | Llama3.1-405B-instruct | 65.8* | |
Alibaba | Qwen2-72B-instruct | 64.1* | |
OpenAI | GPT-4-0125-preview | 59.95* | |
OpenAI | GPT-4o-2024-05-13 | 64.11** | |
gemini pro | 50.18* | ||
LG | exaone 3.0 | 44.5* | + |
Naver | HyperCLOVA X | 53.4* | + |
Upstage | SOLAR-10.7B | 41.65* | + |
* : Community report
** : Measured by Moreh
+ : Claimed to have better capability in Korean
How to use
We do not recommend using base model directly!
Use with vLLM
- Refer to this link to install vllm
from transformers import AutoTokenizer
from vllm import LLM, SamplingParams
# Change tensor_parallel_size to GPU numbers you can afford
model = LLM("moreh/Llama-3-Motif-102B", tensor_parallel_size=4)
tokenizer = AutoTokenizer.from_pretrained("moreh/Llama-3-Motif-102B")
messages = [
{"role": "system", "content": "You are a helpful assistant"},
{"role": "user", "content": "์ ์น์์์๊ฒ ๋น
๋ฑ
์ด๋ก ์ ๊ฐ๋
์ ์ค๋ช
ํด๋ณด์ธ์"},
]
messages_batch = [tokenizer.apply_chat_template(conversation=messages, add_generation_prompt=True, tokenize=False)]
# vllm does not support generation_config of hf. So we have to set it like below
sampling_params = SamplingParams(max_tokens=512, temperature=0, repetition_penalty=1.0, stop_token_ids=[tokenizer.eos_token_id])
responses = model.generate(messages_batch, sampling_params=sampling_params)
print(responses[0].outputs[0].text)
Use with transformers
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model_id = "moreh/Llama-3-Motif-102B"
# all generation configs are set in generation_configs.json
model = AutoModelForCausalLM.from_pretrained(model_id).cuda()
tokenizer = AutoTokenizer.from_pretrained(model_id)
messages = [
{"role": "system", "content": "You are a helpful assistant"},
{"role": "user", "content": "์ ์น์์์๊ฒ ๋น
๋ฑ
์ด๋ก ์ ๊ฐ๋
์ ์ค๋ช
ํด๋ณด์ธ์"},
]
messages_batch = tokenizer.apply_chat_template(conversation=messages, add_generation_prompt=True, tokenize=False)
input_ids = tokenizer(messages_batch, padding=True, return_tensors='pt')['input_ids'].cuda()
outputs = model.generate(input_ids)