Developed by :

  • K2S3

Model Number:

  • K2S3-v0.1

Base Model Weight :

  • mistralai/Mistral-7B-v0.1

Model Description :

  • The K2S3 v0.1 model utilizes mistral weights, having undergone depth up scaling to double its size, and has been enhanced with the addition of Korean vocabulary and merges to the tokenizer.
  • K2S3 v0.1 ๋ชจ๋ธ์€ mistral weight๋ฅผ ํ™œ์šฉํ•˜์˜€์œผ๋ฉฐ, depth up scaling์„ ํ†ตํ•ด ๋ชจ๋ธ์˜ ํฌ๊ธฐ๋ฅผ 2๋ฐฐ๋กœ ํ™•์žฅํ•˜์˜€์Šต๋‹ˆ๋‹ค. ๋˜ํ•œ, ํ† ํฌ๋‚˜์ด์ €์—๋Š” ํ•œ๊ธ€ vocab๊ณผ merges๋ฅผ ์ถ”๊ฐ€ํ•˜์—ฌ ํ•œ๊ตญ์–ด ์ฒ˜๋ฆฌ ๋Šฅ๋ ฅ์„ ๊ฐ•ํ™”ํ•˜์˜€์Šต๋‹ˆ๋‹ค.

Training Data

  • The training data for this model includes alpaca-gpt4-data, and samples from The OpenOrca Dataset.
  • ์ด ๋ชจ๋ธ์˜ ํ›ˆ๋ จ ๋ฐ์ดํ„ฐ์—๋Š” alpaca-gpt4-data, ๊ทธ๋ฆฌ๊ณ  OpenOrca Dataset์—์„œ ์ œ๊ณตํ•œ ์ƒ˜ํ”Œ๋“ค์ด ํฌํ•จ๋ฉ๋‹ˆ๋‹ค.

Training Method

  • This model was trained on an enhanced version of the base model that underwent depth up scaling by K2S3, using a full parameter tuning method with SFT (Supervised Fine-Tuning).
  • ์ด ๋ชจ๋ธ์€ K2S3์—์„œ depth up scaling์„ ํ†ตํ•ด ํ™•์žฅํ•œ ๋ฒ„์ „์˜ ๊ธฐ๋ฐ˜ ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•˜์—ฌ SFT(Supervised Fine-Tuning)๋ฅผ ์‚ฌ์šฉํ•œ ์ „์ฒด ํŒŒ๋ผ๋ฏธํ„ฐ ์กฐ์ • ๋ฐฉ๋ฒ•์œผ๋กœ ํ›ˆ๋ จ๋˜์—ˆ์Šต๋‹ˆ๋‹ค.

Hardware

  • Hardware: Utilized two A100 (80G*2EA) GPUs for training.
  • Training Factors: This model was fine-tuned with SFT, using the HuggingFace SFTtrainer and applied fsdp.
  • ์ด ๋ชจ๋ธ์€ SFT๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ HuggingFace SFTtrainer์™€ fsdp๋ฅผ ์ ์šฉํ•˜์—ฌ ๋ฏธ์„ธ์กฐ์ •๋˜์—ˆ์Šต๋‹ˆ๋‹ค.
Downloads last month
9
Safetensors
Model size
14.4B params
Tensor type
FP16
ยท
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.

Model tree for Changgil/K2S3-v0.1

Quantizations
3 models