--- base_model: - meta-llama/Meta-Llama-3-8B language: - kk license: apache-2.0 tags: - text-generation-inference - transformers - llama - trl --- # Uploaded model - **Developed by:** Til-Qazyna - **License:** apache-2.0 - **Finetuned from model :** Meta-Llama-3-8B This model underwent Continuous Pretraining (CPT) on an extensive Kazakh text corpus to optimize LLAMA3 for the Kazakh language. It was subsequently fine-tuned with Kazakh-language instructional data. The model demonstrates strong performance in processing Kazakh text, answering text-based questions, correcting punctuation and grammar, and summarizing text. However, there is still room for improvement in handling open-ended questions. ## Requirements To install the necessary dependencies, use the following commands: ```bash !pip install --no-deps "xformers<0.0.27" "trl<0.9.0" !pip install peft accelerate bitsandbytes triton ``` # Loading in 8bit with transformers ```python from transformers import AutoTokenizer, AutoModelForCausalLM model_name = "TilQazyna/llama-kaz-instruct-8B-1" hf_token = "" # enable load_in_4bit=True for faster results but slighlty lower accuracy model = AutoModelForCausalLM.from_pretrained(model_name, load_in_8bit=True, use_auth_token=hf_token) tokenizer = AutoTokenizer.from_pretrained(model_name, use_auth_token=hf_token) ``` # Running simple inference ```python from transformers import TextStreamer inputs = tokenizer("Тапсырма: Келесі мәтіндегі пунктуацияларды және грамматикалық қателерді дұрыста. \n\nМәтін: Жаналыктар леби осиндай \n\nЖауабы:", return_tensors="pt") text_streamer = TextStreamer(tokenizer) _ = model.generate(**inputs, streamer = text_streamer, max_new_tokens = 128) ```