Pixtral-12b-korean-preview

Finetunned with korean, english data for improving korean performance.

Model Card for Model ID

Merged model using mergekit

This model hasn't been fully tested, so your feedback will be invaluable in improving it.

Merge Format

models:
  - model: spow12/Pixtral-12b-korean-base(private)
    layer_range: [0, 40]
  - model: mistral-community/pixtral-12b
    layer_range: [0, 40]
merge_method: slerp
base_model: mistral-community/pixtral-12b
parameters:
  t:
    - filter: self_attn
      value: [0, 0.5, 0.3, 0.7, 1]
    - filter: mlp
      value: [1, 0.5, 0.7, 0.3, 0]
    - value: 0.5 # fallback for rest of tensors
dtype: bfloat16

Model Details

Model Description

  • Developed by: spow12(yw_nam)
  • Shared by : spow12(yw_nam)
  • Model type: LLaVA
  • Language(s) (NLP): Korean, English
  • Finetuned from model : mistral-community/pixtral-12b

Usage

Single image inference

image

from transformers import AutoProcessor, AutoModelForVision2Seq
from PIL import Image

model_id =  'spow12/Pixtral-12b-korean-preview'
model = AutoModelForVision2Seq.from_pretrained(
    model_id, 
    device_map='auto', 
    torch_dtype = torch.bfloat16, 
).eval()
model.tie_weights()
processor = AutoProcessor.from_pretrained(model_id)

system = "You are helpful assistant create by Yw nam"


chat = [
    {
        'content': system,
        'role': 'system'
    },
    {
        "role": "user", "content": [
        {"type": "image"},  
        {"type": "text", "content": "이 이미지에 λ‚˜μ™€μžˆλŠ” 풍경을 μ„€λͺ…ν•΄μ€˜"}, 
        ]
    }
]
url = "https://encrypted-tbn2.gstatic.com/images?q=tbn:ANd9GcSXVmCeFm5GRrciuGCM502uv9xXVSrS9zDJZ1umCfoMero2MLxT"
image = Image.open(requests.get(url, stream=True).raw)

images = [[image]]
prompt = processor.apply_chat_template(chat, tokenize=False)

inputs = processor(text=prompt, images=images, return_tensors="pt").to(model.device)
generate_ids = model.generate(**inputs, max_new_tokens=500,do_sample=True,min_p=0.1, temperature=0.9)
output = processor.batch_decode(generate_ids, skip_special_tokens=True,clean_up_tokenization_spaces=False)
print(output[0])

#Output
"""이 μ΄λ―Έμ§€λŠ” λ°”μœ„ ν•΄μ•ˆμ— μœ„μΉ˜ν•œ μž‘μ€ 섬에 μœ„μΉ˜ν•œ κ³ μš”ν•œ ν•΄μ•ˆ 경치λ₯Ό λ³΄μ—¬μ€λ‹ˆλ‹€. 이 섬은 ν‘Έλ₯Έ 물둜 λ‘˜λŸ¬μ‹Έμ—¬ 있으며, κ·Έ μœ„μ—λŠ” 뢉은 지뢕이 μžˆλŠ” ν•˜μ–€ λ“±λŒ€κ°€ μ„œ μžˆμŠ΅λ‹ˆλ‹€. λ“±λŒ€λŠ” μ„¬μ˜ 쀑앙에 μœ„μΉ˜ν•΄ 있으며, λ°”μœ„ 절벽과 μ—°κ²°λœ λŒλ‹€λ¦¬κ°€ 이어져 μžˆμ–΄ μ ‘κ·Όν•  수 μžˆμŠ΅λ‹ˆλ‹€. λ“±λŒ€ μ£Όλ³€μ˜ λ°”μœ„ μ ˆλ²½μ€ νŒŒλ„κ°€ λΆ€λ”ͺ히며 μž₯면에 역동적인 μš”μ†Œλ₯Ό λ”ν•©λ‹ˆλ‹€. λ“±λŒ€ λ„ˆλ¨Έλ‘œλŠ” ν•˜λŠ˜μ΄ 맑고 ν‘Έλ₯΄λ©°, 전체적인 μž₯면은 평화둭고 κ³ μš”ν•œ λΆ„μœ„κΈ°λ₯Ό μžμ•„λƒ…λ‹ˆλ‹€."""

Multi image inference

url_apple = "https://cloud.shopback.com/c_fit,h_750,w_750/store-service-tw/assets/20185/0476e480-b6c3-11ea-b541-2ba549204a69.png"
image_1 = Image.open(requests.get(url_apple, stream=True).raw)
url_microsoft = "https://pbs.twimg.com/profile_images/1268196215587397634/sgD5ZWuO_400x400.png"
image_2 = Image.open(requests.get(url_microsoft, stream=True).raw)
chat = [
    {
        'content': system,
        'role': 'system'
    },
    {
        "role": "user", "content": [
        {"type": "image"},  
        {"type": "image"},  
        {"type": "text", "content": "두 기업에 λŒ€ν•΄μ„œ μ•„λŠ”κ±Έ μ„€λͺ…ν•΄μ€˜."}, 
        ]
    }
]

images = [[image_1, image_2] ]
prompt = processor.apply_chat_template(chat, tokenize=False)
inputs = processor(text=prompt, images=images, return_tensors="pt").to(model.device)
generate_ids = model.generate(**inputs, max_new_tokens=1024, do_sample=True, temperature=0.7, min_p=0.1)
output = processor.batch_decode(generate_ids, skip_special_tokens=True,clean_up_tokenization_spaces=False)
print(output[0])


#Output
"""두 기업은 각각 Appleκ³Ό Microsoftμž…λ‹ˆλ‹€.

1. μ• ν”Œ:
μ• ν”Œμ€ 1976년에 μŠ€ν‹°λΈŒ 작슀, μŠ€ν‹°λΈŒ μ›Œμ¦ˆλ‹ˆμ•…, λ‘œλ„λ“œ μ›¨μΈμ—κ²Œ μ„€λ¦½λœ 미ꡭ의 닀ꡭ적 기술 κΈ°μ—…μž…λ‹ˆλ‹€. μ• ν”Œμ˜ μ£Όμš” μ œν’ˆμœΌλ‘œλŠ” iPhone, iPad, Mac, Apple Watchκ°€ μžˆμŠ΅λ‹ˆλ‹€. 이 νšŒμ‚¬λŠ” ν˜μ‹ μ μΈ λ””μžμΈ, μ‚¬μš©μž μΉœν™”μ μΈ μΈν„°νŽ˜μ΄μŠ€, κ³ ν’ˆμ§ˆμ˜ ν•˜λ“œμ›¨μ–΄λ‘œ 유λͺ…ν•©λ‹ˆλ‹€. μ• ν”Œμ€ λ˜ν•œ Apple Music, iCloud, App Store와 같은 λ‹€μ–‘ν•œ μ†Œν”„νŠΈμ›¨μ–΄ μ„œλΉ„μŠ€μ™€ ν”Œλž«νΌμ„ μ œκ³΅ν•©λ‹ˆλ‹€. μ• ν”Œμ€ ν˜μ‹ μ μΈ μ œν’ˆκ³Ό κ°•λ ₯ν•œ λΈŒλžœλ“œλ‘œ 잘 μ•Œλ €μ Έ 있으며, 2010λ…„λŒ€ 이후 μ„Έκ³„μ—μ„œ κ°€μž₯ κ°€μΉ˜ μžˆλŠ” κΈ°μ—… 쀑 ν•˜λ‚˜λ‘œ μžλ¦¬λ§€κΉ€ν–ˆμŠ΅λ‹ˆλ‹€.

2. λ§ˆμ΄ν¬λ‘œμ†Œν”„νŠΈ:
λ§ˆμ΄ν¬λ‘œμ†Œν”„νŠΈλŠ” 1975년에 빌 κ²Œμ΄μΈ μ™€ 폴 μ•Œλ Œμ— μ˜ν•΄ μ„€λ¦½λœ 미ꡭ의 닀ꡭ적 기술 κΈ°μ—…μž…λ‹ˆλ‹€. 이 νšŒμ‚¬λŠ” 운영 체제, μ†Œν”„νŠΈμ›¨μ–΄, 개인용 컴퓨터, μ „μžμ œν’ˆ κ°œλ°œμ— 쀑점을 λ‘‘λ‹ˆλ‹€. λ§ˆμ΄ν¬λ‘œμ†Œν”„νŠΈμ˜ μ£Όμš” μ œν’ˆμœΌλ‘œλŠ” Windows 운영 체제, Microsoft Office μ œν’ˆκ΅°, Xbox κ²Œμž„ μ½˜μ†”μ΄ μžˆμŠ΅λ‹ˆλ‹€. 이 νšŒμ‚¬λŠ” μ†Œν”„νŠΈμ›¨μ–΄ 개발, ν΄λΌμš°λ“œ μ»΄ν“¨νŒ…, 인곡지λŠ₯ 연ꡬ와 같은 λΆ„μ•Όμ—μ„œλ„ μ€‘μš”ν•œ 역할을 ν•˜κ³  μžˆμŠ΅λ‹ˆλ‹€. λ§ˆμ΄ν¬λ‘œμ†Œν”„νŠΈλŠ” ν˜μ‹ μ μΈ 기술과 κ°•λ ₯ν•œ λΉ„μ¦ˆλ‹ˆμŠ€ μ†”λ£¨μ…˜μœΌλ‘œ 잘 μ•Œλ €μ Έ 있으며, μ„Έκ³„μ—μ„œ κ°€μž₯ κ°€μΉ˜ μžˆλŠ” κΈ°μ—… 쀑 ν•˜λ‚˜λ‘œ μžλ¦¬λ§€κΉ€ν–ˆμŠ΅λ‹ˆλ‹€"""

Limitation

Overall, the performance seems reasonable.

However, it declines when processing images with non enlgish image.

This is likely because the model was trained primarily on English text and landscapes.

Adding Korean data in the future is expected to enhance performance.

Citation

@misc {spow12/Pixtral-12b-korean-preview,
    author       = { YoungWoo Nam },
    title        = { spow12/Pixtral-12b-korean-preview },
    year         = 2024,
    url          = { https://huggingface.co/spow12/Pixtral-12b-korean-preview },
    publisher    = { Hugging Face }
}
Downloads last month
23
Safetensors
Model size
12.7B params
Tensor type
BF16
Β·
Inference Examples
Inference API (serverless) does not yet support transformers models for this pipeline type.

Model tree for spow12/Pixtral-12b-korean-preview

Finetuned
(5)
this model

Collection including spow12/Pixtral-12b-korean-preview