|
--- |
|
library_name: transformers |
|
tags: |
|
- LoRA |
|
- unsloth |
|
license: apache-2.0 |
|
language: |
|
- ja |
|
base_model: |
|
- IshiiTakahiro/llm-jp-3-13b-q-it-id2098_4bit |
|
--- |
|
|
|
# Model Card for Model ID |
|
|
|
<!-- Provide a quick summary of what the model is/does. --> |
|
This model is a **LoRA (Low-Rank Adaptation)** fine-tuned version of `IshiiTakahiro/llm-jp-3-13b-q-it-id2098_4bit`, designed for efficient parameter updates and task-specific customization. LoRA enables lightweight fine-tuning by adapting only a subset of model parameters, significantly reducing the computational and storage requirements. |
|
|
|
This llama model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library. |
|
|
|
## Model Details |
|
|
|
### Model Architecture and Visual Abstract |
|
|
|
![Model Architecture 2](https://huggingface.co/IshiiTakahiro/llm-jp-3-13b-it_lora/resolve/main/napkin-selection2.png) |
|
|
|
### Model Overview |
|
|
|
This is a LoRA (Low-Rank Adaptation) fine-tuned version of IshiiTakahiro/llm-jp-3-13b-q-it-id2098_4bit, designed for efficient parameter updates and task-specific customization. LoRA enables lightweight fine-tuning by adapting only a subset of model parameters, significantly reducing the computational and storage requirements. |
|
|
|
**Note**: The base model, "IshiiTakahiro/llm-jp-3-13b-q-it-id2098_16bit," is a further pre-trained version of "llm-jp/llm-jp-3-13b." Please be aware of this distinction. |
|
|
|
This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated. |
|
|
|
- **Base Model**: IshiiTakahiro/llm-jp-3-13b-q-it-id2098_4bit |
|
- **Adaptation Type**: LoRA |
|
- **Language**: Japanese |
|
- **License**: Apache 2.0 |
|
|
|
This model specializes in tasks such as sentiment analysis, dialogue generation, and text summarization. |
|
|
|
--- |
|
|
|
## Intended Use |
|
|
|
### Primary Use Cases |
|
This LoRA model is ideal for the following tasks: |
|
1. **Text Generation:** Efficiently generate Japanese text for specific domains or use cases. |
|
2. **Text Classification:** Perform classification tasks with reduced resource consumption. |
|
3. **Domain-Specific Fine-Tuning:** Quickly adapt to niche tasks without retraining the entire model. |
|
|
|
### Out-of-Scope Use Cases |
|
- This LoRA model inherits limitations from the base model and should not be used for: |
|
- Generating harmful or biased content. |
|
- High-stakes decision-making in legal, medical, or critical scenarios. |
|
|
|
--- |
|
|
|
## How to Use |
|
|
|
### Installation |
|
Before running the code, install the required libraries: |
|
|
|
``` |
|
!pip uninstall unsloth -y && pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git" |
|
!pip install --no-deps xformers trl peft accelerate bitsandbytes jsonlines |
|
``` |
|
|
|
## Example Code |
|
```python |
|
|
|
from unsloth import FastLanguageModel |
|
import torch |
|
from tqdm import tqdm |
|
import random |
|
import numpy as np |
|
from multiprocessing import Pool, cpu_count |
|
import re |
|
import datetime |
|
import csv |
|
import jsonlines |
|
from google.colab import userdata |
|
HF_TOKEN=userdata.get('HF_TOKEN') |
|
|
|
peft_model, tokenizer = FastLanguageModel.from_pretrained( |
|
model_name="IshiiTakahiro/llm-jp-3-13b-it_lora", |
|
dtype=torch.bfloat16, |
|
load_in_4bit=False, |
|
trust_remote_code=True, |
|
token=HF_TOKEN, |
|
) |
|
def evaluate_task_score(task: str, answer: str) -> int: |
|
return 0 |
|
|
|
def batchify(data, batch_size): |
|
"""データをバッチに分割する関数""" |
|
for i in range(0, len(data), batch_size): |
|
yield data[i:i + batch_size] |
|
|
|
def score_prediction(task_input, predictions, index): |
|
scores = [ |
|
evaluate_task_score(task_input, prediction) |
|
for prediction in predictions |
|
] |
|
# ファイルに出力 |
|
output_file = f"{ID}.prediction_scores.csv" |
|
with open(output_file, mode="a", newline="", encoding="utf-8") as file: |
|
writer = csv.writer(file) |
|
for prediction, score in zip(predictions, scores): |
|
writer.writerow([index, task_input, prediction, score]) |
|
return index, scores |
|
|
|
datasets = [] |
|
# タスクとなるデータの読み込み。 |
|
# 事前にデータをアップロードしてください。 |
|
with jsonlines.open("./elyza-tasks-100-TV_0.jsonl", "r") as reader: |
|
datasets = list(reader) |
|
|
|
FastLanguageModel.for_inference(peft_model) |
|
peft_model = peft_model.to(dtype=torch.bfloat16) |
|
|
|
MAX_NEW_TOKENS:int = 2048 |
|
NUM_RETURN_SEQUENCES:int = 1 |
|
BATCH_SIZE:int = 1 |
|
ID:int = 1000 |
|
tasks_with_predictions = [] |
|
torch.cuda.empty_cache() |
|
|
|
for batch in tqdm(batchify(datasets, BATCH_SIZE), desc="Running inference on GPU"): |
|
batch_inputs = [] |
|
batch_task_ids = [] |
|
for dt in batch: |
|
input_text = dt["input"] |
|
annotation = f"### 注釈\n 追加情報: {random.randint(1, 100)}。この追加情報は無視してください。" |
|
prompt = f"### 指示\n{input_text}\n{annotation}\n### 回答\n" |
|
batch_inputs.append(prompt) |
|
batch_task_ids.append(dt["task_id"]) |
|
|
|
inputs = tokenizer( |
|
batch_inputs, |
|
return_tensors="pt", |
|
padding=True, |
|
truncation=True |
|
).to(peft_model.device) |
|
|
|
with torch.no_grad(): |
|
outputs = peft_model.generate( |
|
**inputs, |
|
max_new_tokens=MAX_NEW_TOKENS, |
|
use_cache=True, |
|
do_sample=False, |
|
repetition_penalty=1.07, |
|
early_stopping=True, |
|
num_return_sequences=NUM_RETURN_SEQUENCES, |
|
pad_token_id=tokenizer.pad_token_id, |
|
bos_token_id=tokenizer.bos_token_id, |
|
eos_token_id=tokenizer.eos_token_id |
|
) |
|
batch_predictions = [ |
|
tokenizer.decode(output, skip_special_tokens=True).split('\n### 回答')[-1].strip() |
|
for output in outputs |
|
] |
|
for task_id, input_text, prediction in zip(batch_task_ids, batch_inputs, batch_predictions): |
|
tasks_with_predictions.append((task_id, input_text, [prediction])) |
|
|
|
# CPUでのスコアリング(並列化) |
|
def cpu_scoring(task): |
|
task_id, task_input, predictions = task |
|
return score_prediction(task_input, predictions, task_id) |
|
|
|
with Pool(cpu_count()) as pool: |
|
scoring_results = list( |
|
tqdm(pool.imap(cpu_scoring, tasks_with_predictions), total=len(tasks_with_predictions), desc="Scoring on CPU") |
|
) |
|
|
|
# 最終結果の収集 |
|
results = [] |
|
for (task_id, task_input, predictions), (_, scores) in zip(tasks_with_predictions, scoring_results): |
|
best_index = np.argmax(scores) |
|
print("task_id:", task_id, ", best_index:", best_index) |
|
best_prediction = predictions[best_index] |
|
results.append({ |
|
"task_id": task_id, |
|
"input": task_input, |
|
"output": best_prediction |
|
}) |
|
|
|
# 保存 |
|
output_filename = f"id{ID}.jsonl" |
|
with jsonlines.open(output_filename, mode='w') as writer: |
|
writer.write_all(results) |
|
|
|
print(f"Results saved to {output_filename}") |
|
``` |
|
|
|
|
|
## Bias, Risks, and Limitations |
|
|
|
This LoRA model inherits biases and risks from its base model: |
|
|
|
- Cultural and Linguistic Bias: Outputs may reflect biases present in the Japanese-language training data. |
|
- Domain-Specific Limitations: Performance may degrade outside of the fine-tuned domain or task. |
|
|
|
### Recommendations |
|
|
|
- Validate outputs critically, especially when applied to sensitive domains. |
|
- Fine-tune further or evaluate carefully when adapting this model for a new domain. |
|
|
|
### Training Procedure |
|
|
|
This model was fine-tuned using LoRA, which updates only a small number of low-rank matrices: |
|
|
|
Base Model: IshiiTakahiro/llm-jp-3-13b-q-it-id2098_4bit |
|
LoRA Ranks: 8 |
|
Precision: bf16 |
|
Hardware: NVIDIA L4 GPUs |
|
|
|
LoRA significantly reduces computational overhead compared to full model fine-tuning, while maintaining performance on the target task. |
|
|
|
|
|
|
|
## Citation |
|
|
|
If you use this model, please cite it as follows: |
|
|
|
**BibTeX:** |
|
|
|
@misc{ishii2024lora, |
|
title={LoRA Adaptation of Large Japanese Language Model}, |
|
author={Takahiro Ishii}, |
|
year={2024}, |
|
note={Available at Hugging Face Hub: https://huggingface.co/IshiiTakahiro/llm-jp-3-13b-q-it-id2098_4bit} |
|
} |
|
|