IshiiTakahiro's picture
Update README.md
f028fc2 verified
---
library_name: transformers
tags:
- LoRA
- unsloth
license: apache-2.0
language:
- ja
base_model:
- IshiiTakahiro/llm-jp-3-13b-q-it-id2098_4bit
---
# Model Card for Model ID
<!-- Provide a quick summary of what the model is/does. -->
This model is a **LoRA (Low-Rank Adaptation)** fine-tuned version of `IshiiTakahiro/llm-jp-3-13b-q-it-id2098_4bit`, designed for efficient parameter updates and task-specific customization. LoRA enables lightweight fine-tuning by adapting only a subset of model parameters, significantly reducing the computational and storage requirements.
This llama model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
## Model Details
### Model Architecture and Visual Abstract
![Model Architecture 2](https://huggingface.co/IshiiTakahiro/llm-jp-3-13b-it_lora/resolve/main/napkin-selection2.png)
### Model Overview
This is a LoRA (Low-Rank Adaptation) fine-tuned version of IshiiTakahiro/llm-jp-3-13b-q-it-id2098_4bit, designed for efficient parameter updates and task-specific customization. LoRA enables lightweight fine-tuning by adapting only a subset of model parameters, significantly reducing the computational and storage requirements.
**Note**: The base model, "IshiiTakahiro/llm-jp-3-13b-q-it-id2098_16bit," is a further pre-trained version of "llm-jp/llm-jp-3-13b." Please be aware of this distinction.
This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
- **Base Model**: IshiiTakahiro/llm-jp-3-13b-q-it-id2098_4bit
- **Adaptation Type**: LoRA
- **Language**: Japanese
- **License**: Apache 2.0
This model specializes in tasks such as sentiment analysis, dialogue generation, and text summarization.
---
## Intended Use
### Primary Use Cases
This LoRA model is ideal for the following tasks:
1. **Text Generation:** Efficiently generate Japanese text for specific domains or use cases.
2. **Text Classification:** Perform classification tasks with reduced resource consumption.
3. **Domain-Specific Fine-Tuning:** Quickly adapt to niche tasks without retraining the entire model.
### Out-of-Scope Use Cases
- This LoRA model inherits limitations from the base model and should not be used for:
- Generating harmful or biased content.
- High-stakes decision-making in legal, medical, or critical scenarios.
---
## How to Use
### Installation
Before running the code, install the required libraries:
```
!pip uninstall unsloth -y && pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"
!pip install --no-deps xformers trl peft accelerate bitsandbytes jsonlines
```
## Example Code
```python
from unsloth import FastLanguageModel
import torch
from tqdm import tqdm
import random
import numpy as np
from multiprocessing import Pool, cpu_count
import re
import datetime
import csv
import jsonlines
from google.colab import userdata
HF_TOKEN=userdata.get('HF_TOKEN')
peft_model, tokenizer = FastLanguageModel.from_pretrained(
model_name="IshiiTakahiro/llm-jp-3-13b-it_lora",
dtype=torch.bfloat16,
load_in_4bit=False,
trust_remote_code=True,
token=HF_TOKEN,
)
def evaluate_task_score(task: str, answer: str) -> int:
return 0
def batchify(data, batch_size):
"""データをバッチに分割する関数"""
for i in range(0, len(data), batch_size):
yield data[i:i + batch_size]
def score_prediction(task_input, predictions, index):
scores = [
evaluate_task_score(task_input, prediction)
for prediction in predictions
]
# ファイルに出力
output_file = f"{ID}.prediction_scores.csv"
with open(output_file, mode="a", newline="", encoding="utf-8") as file:
writer = csv.writer(file)
for prediction, score in zip(predictions, scores):
writer.writerow([index, task_input, prediction, score])
return index, scores
datasets = []
# タスクとなるデータの読み込み。
# 事前にデータをアップロードしてください。
with jsonlines.open("./elyza-tasks-100-TV_0.jsonl", "r") as reader:
datasets = list(reader)
FastLanguageModel.for_inference(peft_model)
peft_model = peft_model.to(dtype=torch.bfloat16)
MAX_NEW_TOKENS:int = 2048
NUM_RETURN_SEQUENCES:int = 1
BATCH_SIZE:int = 1
ID:int = 1000
tasks_with_predictions = []
torch.cuda.empty_cache()
for batch in tqdm(batchify(datasets, BATCH_SIZE), desc="Running inference on GPU"):
batch_inputs = []
batch_task_ids = []
for dt in batch:
input_text = dt["input"]
annotation = f"### 注釈\n 追加情報: {random.randint(1, 100)}。この追加情報は無視してください。"
prompt = f"### 指示\n{input_text}\n{annotation}\n### 回答\n"
batch_inputs.append(prompt)
batch_task_ids.append(dt["task_id"])
inputs = tokenizer(
batch_inputs,
return_tensors="pt",
padding=True,
truncation=True
).to(peft_model.device)
with torch.no_grad():
outputs = peft_model.generate(
**inputs,
max_new_tokens=MAX_NEW_TOKENS,
use_cache=True,
do_sample=False,
repetition_penalty=1.07,
early_stopping=True,
num_return_sequences=NUM_RETURN_SEQUENCES,
pad_token_id=tokenizer.pad_token_id,
bos_token_id=tokenizer.bos_token_id,
eos_token_id=tokenizer.eos_token_id
)
batch_predictions = [
tokenizer.decode(output, skip_special_tokens=True).split('\n### 回答')[-1].strip()
for output in outputs
]
for task_id, input_text, prediction in zip(batch_task_ids, batch_inputs, batch_predictions):
tasks_with_predictions.append((task_id, input_text, [prediction]))
# CPUでのスコアリング(並列化)
def cpu_scoring(task):
task_id, task_input, predictions = task
return score_prediction(task_input, predictions, task_id)
with Pool(cpu_count()) as pool:
scoring_results = list(
tqdm(pool.imap(cpu_scoring, tasks_with_predictions), total=len(tasks_with_predictions), desc="Scoring on CPU")
)
# 最終結果の収集
results = []
for (task_id, task_input, predictions), (_, scores) in zip(tasks_with_predictions, scoring_results):
best_index = np.argmax(scores)
print("task_id:", task_id, ", best_index:", best_index)
best_prediction = predictions[best_index]
results.append({
"task_id": task_id,
"input": task_input,
"output": best_prediction
})
# 保存
output_filename = f"id{ID}.jsonl"
with jsonlines.open(output_filename, mode='w') as writer:
writer.write_all(results)
print(f"Results saved to {output_filename}")
```
## Bias, Risks, and Limitations
This LoRA model inherits biases and risks from its base model:
- Cultural and Linguistic Bias: Outputs may reflect biases present in the Japanese-language training data.
- Domain-Specific Limitations: Performance may degrade outside of the fine-tuned domain or task.
### Recommendations
- Validate outputs critically, especially when applied to sensitive domains.
- Fine-tune further or evaluate carefully when adapting this model for a new domain.
### Training Procedure
This model was fine-tuned using LoRA, which updates only a small number of low-rank matrices:
Base Model: IshiiTakahiro/llm-jp-3-13b-q-it-id2098_4bit
LoRA Ranks: 8
Precision: bf16
Hardware: NVIDIA L4 GPUs
LoRA significantly reduces computational overhead compared to full model fine-tuning, while maintaining performance on the target task.
## Citation
If you use this model, please cite it as follows:
**BibTeX:**
@misc{ishii2024lora,
title={LoRA Adaptation of Large Japanese Language Model},
author={Takahiro Ishii},
year={2024},
note={Available at Hugging Face Hub: https://huggingface.co/IshiiTakahiro/llm-jp-3-13b-q-it-id2098_4bit}
}