|
--- |
|
base_model: llm-jp/llm-jp-3-13b |
|
tags: |
|
- text-generation-inference |
|
- transformers |
|
- unsloth |
|
- llama |
|
- trl |
|
language: |
|
- ja |
|
license: cc-by-nc-sa-4.0 |
|
--- |
|
# はじめに |
|
これは,東京大学松尾・岩澤研究室のLLM講座2024のコンペティションで提出するためのモデルです. |
|
|
|
[llm-jp/llm-jp-3-13b](https://huggingface.co/llm-jp/llm-jp-3-13b)に,QLoRAによるSFTを施して,LoRAアダプタのみをこちらにアップしています. |
|
|
|
chat templateは,[weblab-GENIAC/Tanuki-8B-dpo-v1.0](https://huggingface.co/weblab-GENIAC/Tanuki-8B-dpo-v1.0)のものと同一のものを使用しています. |
|
|
|
# 推論方法 |
|
提供された環境で,以下のように推論します.L4 GPU×1のインスタンスで,vLLMを用いて推論します. |
|
|
|
Jupyter Notebookで,一かたまりごとに一つのセルになっています.順番に実行してください. |
|
```python |
|
!pip uninstall numpy -y |
|
!pip install numpy==1.26.4 |
|
|
|
%%time |
|
%pip install vllm==0.6.4.post1 --force-reinstall |
|
|
|
!pip install ipywidgets |
|
|
|
import time |
|
import torch |
|
import transformers |
|
from transformers import ( |
|
AutoTokenizer, |
|
AutoModelForCausalLM, |
|
BitsAndBytesConfig |
|
) |
|
import vllm |
|
from vllm.lora.request import LoRARequest |
|
from jinja2 import Template |
|
print(vllm.__version__) |
|
|
|
MAX_LENGTH = 1024 |
|
MODEL_NAME = "llm-jp/llm-jp-3-13b" |
|
print(MODEL_NAME) |
|
|
|
import os |
|
os.environ["HF_TOKEN"] = "あなたのHugging Faceトークン" |
|
|
|
from vllm.lora.request import LoRARequest |
|
llm = vllm.LLM( |
|
MODEL_NAME, |
|
tensor_parallel_size=1, |
|
gpu_memory_utilization=0.95, |
|
trust_remote_code=True, |
|
enforce_eager=True, |
|
max_model_len=MAX_LENGTH, |
|
enable_lora=True, |
|
quantization="bitsandbytes", |
|
load_format="bitsandbytes" |
|
) |
|
tokenizer = llm.get_tokenizer() |
|
|
|
from transformers import AutoTokenizer |
|
sft_tokenizer = AutoTokenizer.from_pretrained( |
|
"weblab-GENIAC/Tanuki-8B-dpo-v1.0" |
|
) |
|
tokenizer.chat_template = sft_tokenizer.chat_template |
|
|
|
from huggingface_hub import snapshot_download |
|
lora_path = snapshot_download(repo_id="OsakanaTeishoku/1204lora") |
|
|
|
from datasets import load_dataset |
|
data_files = {"test": "elyza-tasks-100-TV_0.jsonl"} |
|
tasks = load_dataset("json", data_files=data_files, split="test") |
|
|
|
messages_list = [ |
|
[{"role": "user", "content": tasks["input"][i]}] for i in range(len(tasks)) |
|
] |
|
prompts = [line[0]["content"] for line in messages_list] |
|
prompt_token_ids = [tokenizer.apply_chat_template(messages, add_generation_prompt=True) for messages in messages_list] |
|
sampling_params = vllm.SamplingParams( |
|
temperature=0.7, |
|
max_tokens=1024, |
|
repetition_penalty=1.05, |
|
top_p=0.9, |
|
) |
|
outputs = llm.generate( |
|
prompt_token_ids=prompt_token_ids, |
|
sampling_params=sampling_params, |
|
lora_request=LoRARequest("lora", 1, lora_path), # LoRA adapter |
|
) |
|
for prompt, response in zip(prompts, outputs): |
|
print("prompt:", prompt) |
|
print("output:", response.outputs[0].text.strip()) |
|
print("-"*80) |
|
import json |
|
data = [{ |
|
"task_id": i, |
|
#"input": prompts[i], |
|
"output": outputs[i].outputs[0].text.strip() |
|
} for i in range(len(tasks))] |
|
file_path_with_unicode = 'output.jsonl' |
|
with open(file_path_with_unicode, 'w', encoding='utf-8') as file: |
|
for entry in data: |
|
json.dump(entry, file, ensure_ascii=False) |
|
file.write('\n') |
|
print(f"Saved json {file_path_with_unicode} !") |
|
``` |
|
# Change log |
|
- 2024/12/26: 推論コードの余分なコメント部分の削除,リンクの追加 |
|
|
|
# Uploaded model |
|
|
|
- **Developed by:** OsakanaTeishoku |
|
- **License:** cc-by-nc-sa-4.0 |
|
- **Finetuned from model :** llm-jp/llm-jp-3-13b |
|
|
|
This llama model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library. |
|
|
|
[<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth) |