--- base_model: llm-jp/llm-jp-3-13b tags: - text-generation-inference - transformers - unsloth - llama - trl language: - ja license: cc-by-nc-sa-4.0 --- # はじめに これは,東京大学松尾・岩澤研究室のLLM講座2024のコンペティションで提出するためのモデルです. [llm-jp/llm-jp-3-13b](https://huggingface.co/llm-jp/llm-jp-3-13b)に,QLoRAによるSFTを施して,LoRAアダプタのみをこちらにアップしています. chat templateは,[weblab-GENIAC/Tanuki-8B-dpo-v1.0](https://huggingface.co/weblab-GENIAC/Tanuki-8B-dpo-v1.0)のものと同一のものを使用しています. # 推論方法 提供された環境で,以下のように推論します.L4 GPU×1のインスタンスで,vLLMを用いて推論します. Jupyter Notebookで,一かたまりごとに一つのセルになっています.順番に実行してください. ```python !pip uninstall numpy -y !pip install numpy==1.26.4 %%time %pip install vllm==0.6.4.post1 --force-reinstall !pip install ipywidgets import time import torch import transformers from transformers import ( AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig ) import vllm from vllm.lora.request import LoRARequest from jinja2 import Template print(vllm.__version__) MAX_LENGTH = 1024 MODEL_NAME = "llm-jp/llm-jp-3-13b" print(MODEL_NAME) import os os.environ["HF_TOKEN"] = "あなたのHugging Faceトークン" from vllm.lora.request import LoRARequest llm = vllm.LLM( MODEL_NAME, tensor_parallel_size=1, gpu_memory_utilization=0.95, trust_remote_code=True, enforce_eager=True, max_model_len=MAX_LENGTH, enable_lora=True, quantization="bitsandbytes", load_format="bitsandbytes" ) tokenizer = llm.get_tokenizer() from transformers import AutoTokenizer sft_tokenizer = AutoTokenizer.from_pretrained( "weblab-GENIAC/Tanuki-8B-dpo-v1.0" ) tokenizer.chat_template = sft_tokenizer.chat_template from huggingface_hub import snapshot_download lora_path = snapshot_download(repo_id="OsakanaTeishoku/1204lora") from datasets import load_dataset data_files = {"test": "elyza-tasks-100-TV_0.jsonl"} tasks = load_dataset("json", data_files=data_files, split="test") messages_list = [ [{"role": "user", "content": tasks["input"][i]}] for i in range(len(tasks)) ] prompts = [line[0]["content"] for line in messages_list] prompt_token_ids = [tokenizer.apply_chat_template(messages, add_generation_prompt=True) for messages in messages_list] sampling_params = vllm.SamplingParams( temperature=0.7, max_tokens=1024, repetition_penalty=1.05, top_p=0.9, ) outputs = llm.generate( prompt_token_ids=prompt_token_ids, sampling_params=sampling_params, lora_request=LoRARequest("lora", 1, lora_path), # LoRA adapter ) for prompt, response in zip(prompts, outputs): print("prompt:", prompt) print("output:", response.outputs[0].text.strip()) print("-"*80) import json data = [{ "task_id": i, #"input": prompts[i], "output": outputs[i].outputs[0].text.strip() } for i in range(len(tasks))] file_path_with_unicode = 'output.jsonl' with open(file_path_with_unicode, 'w', encoding='utf-8') as file: for entry in data: json.dump(entry, file, ensure_ascii=False) file.write('\n') print(f"Saved json {file_path_with_unicode} !") ``` # Change log - 2024/12/26: 推論コードの余分なコメント部分の削除,リンクの追加 # Uploaded model - **Developed by:** OsakanaTeishoku - **License:** cc-by-nc-sa-4.0 - **Finetuned from model :** llm-jp/llm-jp-3-13b This llama model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library. [](https://github.com/unslothai/unsloth)