--- license: llama3.1 language: - el - en pipeline_tag: text-generation library_name: transformers tags: - text-generation-inference base_model: - ilsp/Llama-Krikri-8B-Instruct --- 🚨 **PLEASE USE THE OFFICIAL QUANTIZED VERSIONS** 🚨 🚨 *There is no guarantee that you are using the latest improved versions from 3rd party quantizations as we have updated the model's weights* 🚨 # Llama-Krikri-8B-Instruct: An Instruction-tuned Large Language Model for the Greek language Following the release of [Meltemi-7B](https://huggingface.co/ilsp/Meltemi-7B-v1) on the 26th March 2024, we are happy to welcome Krikri to the family of ILSP open Greek LLMs. Krikri is built on top of [Llama-3.1-8B](https://huggingface.co/meta-llama/Llama-3.1-8B), extending its capabilities for Greek through continual pretraining on a large corpus of high-quality and locally relevant Greek texts. We present **Llama-Krikri-8B-Instruct**, along with the base model, [Llama-Krikri-8B-Base](https://huggingface.co/ilsp/Llama-Krikri-8B-Base). ![image/png](https://cdn-uploads.huggingface.co/production/uploads/639215a81bae0dde85842ab8/VMTgYzygHsarC9QRv2rGV.png) # Model Information ## Base Model - Vocabulary extension of the Llama-3.1 tokenizer with Greek tokens - 128k context length (approximately 80,000 Greek words) - We extend the pretraining of Llama-3.1-8B with added proficiency for the Greek language, by utilizing a large training corpus. * This corpus includes 56.7 billion monolingual Greek tokens, constructed from publicly available resources. * Additionaly, to mitigate catastrophic forgetting and ensure that the model has bilingual capabilities, we use additional sub-corpora with monolingual English texts (21 billion tokens) and Greek-English parallel data (5.5 billion tokens). * The training corpus also contains 7.8 billion math and code tokens. * This corpus has been processed, filtered, and deduplicated to ensure data quality and is outlined below: | Sub-corpus | # Tokens | Percentage | |-----------|------------------|------------| | Greek | 56.7 B | 62.3 % | | English | 21.0 B | 23.1 % | | Parallel | 5.5 B | 6.0 % | | Math/Code | 7.8 B | 8.6 % | | **Total** | 91 B | **100%** | Chosen subsets of the 91 billion corpus were upsampled resulting in a size of **110 billion tokens**. ## Instruct Model Llama-Krikri-8B-Instruct is the result of post-training Llama-Kriki-8B-Base and features: - Enhanced chat capabilities and instruction-following in both Greek and English. - Document translation from Greek to English, French, German, Italian, Portuguese, Spanish and vice versa. - Great performance on generation, comprehension, and editing tasks, such as summarization, creative content creation, text modification, entity recognition, sentiment analysis, etc. - Domain-specifc expertise for specialized legal, financial, medical, and scientific applications. - Retrieval-Augmented Generation (RAG) utilizing multiple documents with 128k context length. - Improved coding and agentic capabilities with correct formatting and tool use. - Conversion or structured extraction (e.g., XML, JSON) in data-to-text & text-to-data settings. - Analytical thinking and Chain-of-Thought (CoT) reasoning for problem-solving. 🚨 **More information on the post-training corpus and methdology coming soon.** 🚨 # How to use ## With Transformers ```python from transformers import AutoModelForCausalLM, AutoTokenizer device = "cuda" model = AutoModelForCausalLM.from_pretrained("ilsp/Llama-Krikri-8B-Instruct") tokenizer = AutoTokenizer.from_pretrained("ilsp/Llama-Krikri-8B-Instruct") model.to(device) system_prompt = "Είσαι το Κρικρί, ένα εξαιρετικά ανεπτυγμένο μοντέλο Τεχνητής Νοημοσύνης για τα ελληνικα και εκπαιδεύτηκες από το ΙΕΛ του Ε.Κ. \"Αθηνά\"." user_prompt = "Σε τι διαφέρει ένα κρικρί από ένα λάμα;" messages = [ {"role": "system", "content": system_prompt}, {"role": "user", "content": user_prompt}, ] prompt = tokenizer.apply_chat_template(messages, add_generation_prompt=True, tokenize=False) input_prompt = tokenizer(prompt, return_tensors='pt').to(device) outputs = model.generate(input_prompt['input_ids'], max_new_tokens=256, do_sample=True) print(tokenizer.batch_decode(outputs)[0]) ``` ## With OpenAI compatible server via vLLM ```bash vllm serve ilsp/Llama-Krikri-8B-Instruct \ --enforce-eager \ --dtype 'bfloat16' \ --api-key token-abc123 ``` Then, the model can be used through Python using: ```python from openai import OpenAI api_key = "token-abc123" base_url = "http://localhost:8000/v1" client = OpenAI( api_key=api_key, base_url=base_url, ) system_prompt = "Είσαι ένα ανεπτυγμένο μεταφραστικό σύστημα που απαντάει με λίστες Python. Δεν γράφεις τίποτα άλλο στις απαντήσεις σου πέρα από τις μεταφρασμένες λίστες." user_prompt = "Δώσε μου την παρακάτω λίστα με μεταφρασμένο κάθε string της στα ελληνικά: ['Ethics of duty', 'Postmodern ethics', 'Consequentialist ethics', 'Utilitarian ethics', 'Deontological ethics', 'Virtue ethics', 'Relativist ethics']" messages = [ {"role": "system", "content": system_prompt}, {"role": "user", "content": user_prompt}, ] response = client.chat.completions.create(model="ilsp/Llama-Krikri-8B-Instruct", messages=messages, temperature=0.0, top_p=0.95, max_tokens=8192, stream=False) print(response.choices[0].message.content) # ['Ηθική καθήκοντος', 'Μεταμοντέρνα ηθική', 'Συνεπειοκρατική ηθική', 'Ωφελιμιστική ηθική', 'Δεοντολογική ηθική', 'Ηθική αρετών', 'Σχετικιστική ηθική'] ``` # Evaluation 🚨 **Instruction following and chat capability evaluation benchmarks coming soon.** 🚨 # Acknowledgements The ILSP team utilized Amazon's cloud computing services, which were made available via GRNET under the [OCRE Cloud framework](https://www.ocre-project.eu/), providing Amazon Web Services for the Greek Academic and Research Community.