Qwen2.5 Korean Code Review LLM
Overview
This model is a fine-tuned version of unsloth/qwen2.5-coder-14b-instruct-bnb-4bit
. It is optimized for Korean-language code reviews and programming education.
The model was trained using ewhk9887/korean_code_reviews_from_github, a dataset consisting of Korean code reviews collected from GitHub. The fine-tuning process was done using Unsloth and Hugging Face's transformers
and trl
libraries, enabling a 2x faster training process.
๋ชจ๋ธ ๊ฐ์
์ด ๋ชจ๋ธ์ unsloth/qwen2.5-coder-14b-instruct-bnb-4bit
๋ฅผ ํ์ธํ๋ํ ๋ฒ์ ์ผ๋ก, ํ๊ตญ์ด ์ฝ๋ ๋ฆฌ๋ทฐ ๋ฐ ์ฝ๋ฉ ํ์ต์ ์ํ ์ต์ ํ๋ฅผ ๊ฑฐ์ณค์ต๋๋ค.
GitHub์์ ์์ง๋ ์ฝ๋ ๋ฆฌ๋ทฐ ๋ฐ์ดํฐ์
์ ์ฌ์ฉํ์ฌ ํ์ตํ์ผ๋ฉฐ, Unsloth ๋ฐ Hugging Face์ transformers
, trl
๋ผ์ด๋ธ๋ฌ๋ฆฌ๋ฅผ ํ์ฉํ์ฌ 2๋ฐฐ ๋น ๋ฅธ ํ์ต์ ๊ฐ๋ฅํ๊ฒ ํ์ต๋๋ค.
Features / ํน์ง
Korean Code Review Support: Designed specifically for analyzing and reviewing code in Korean.
Efficient Fine-Tuning: Utilized
bnb-4bit
quantization and Unsloth for optimized performance.Bilingual Support: Can process both Korean and English inputs.
Transformer-based Model: Leverages Qwen2.5's strong coding capabilities.
ํ๊ตญ์ด ์ฝ๋ ๋ฆฌ๋ทฐ ์ต์ ํ: ์ฝ๋ ๋ฆฌ๋ทฐ๋ฅผ ํ๊ตญ์ด๋ก ๋ถ์ํ๊ณ ์์ฑํ๋ ๋ฐ ์ต์ ํ๋์์ต๋๋ค.
ํจ์จ์ ์ธ ํ์ธํ๋:
bnb-4bit
์์ํ ๋ฐ Unsloth ๊ธฐ์ ์ ํ์ฉํ์ฌ ๋น ๋ฅธ ํ์ต์ด ๊ฐ๋ฅํ์ต๋๋ค.ํ์ ์ง์: ํ๊ตญ์ด์ ์์ด ์ ๋ ฅ์ ๋ชจ๋ ์ฒ๋ฆฌํ ์ ์์ต๋๋ค.
๊ฐ๋ ฅํ ํธ๋์คํฌ๋จธ ๊ธฐ๋ฐ: Qwen2.5 ๋ชจ๋ธ์ ํ์ฉํ ์ฝ๋ ๋ถ์ ์ฑ๋ฅ.
Usage / ์ฌ์ฉ ๋ฐฉ๋ฒ
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model_name = "ewhk9887/qwen2.5-korean-code-review"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.float16, device_map="auto")
inputs = tokenizer("์ฝ๋๋ฅผ ๋ฆฌ๋ทฐํด ์ฃผ์ธ์: def add(a, b): return a + b", return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=100)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Developer / ๊ฐ๋ฐ์
- Name: ์์์ (Eunsoo Max Eun)
- License: Apache-2.0
Acknowledgments / ์ฐธ๊ณ ์๋ฃ
- Downloads last month
- 4