--- library_name: transformers language: - wo - en license: apache-2.0 pipeline_tag: text2text-generation --- # Oolel: A High-Performing Open LLM for Wolof Despite numerous open-source innovations in large language models, African languages have remained underrepresented. **Soynade Research** is transforming this landscape with Oolel, the first open-source language model for Wolof. Built on the **Qwen 2.5** architecture, Oolel combines state-of-the-art AI technology with deep Wolof linguistic expertise. With careful high-quality curated data, we trained and optimized Oolel for the following tasks: - **RAG** supporting Wolof queries with English, French, or Wolof context. - **Bidirectional translation between English and Wolof** - **Natural text generation in Wolof** - **Math in Wolof** - **And many other standard NLP tasks**: - Summarization - Text edition - etc ## 3. Usage **!!! It's important to add your system prompt !!!** Here provides a code snippet with apply_chat_template to show you how to load the tokenizer and model and how to generate contents. ```python from transformers import AutoModelForCausalLM, AutoTokenizer import torch device = "cuda" model = AutoModelForCausalLM.from_pretrained( "soynade-research/Oolel-v0.1", torch_dtype = torch.bfloat16, device_map="auto") tokenizer = AutoTokenizer.from_pretrained("soynade-research/Oolel-v0.1") def generate_response(messages, max_new_tokens=1024, temperature=0.1): text = tokenizer.apply_chat_template( messages, tokenize=False, add_generation_prompt=True ) model_inputs = tokenizer([text], return_tensors="pt").to(device) generated_ids = model.generate(model_inputs.input_ids, max_new_tokens=max_new_tokens, temperature=temperature) generated_ids = [ output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)] response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0] return response ``` **Some tasks examples:** 1. **Translation Tasks** ```python system_prompt = "You're a Wolof AI assistant. Please always provide detailed and useful answers to the user queries." messages = [ {"role": "system", "content": system_prompt}, {"role": "user", "content": "Translate to Wolof: Bassirou Diomaye Faye is the new Senegalese president. He is 44 years old"} ] print(generate_response(messages)) ``` 2. **Code generation** ```python system_prompt = "You're a Wolof AI assistant. Please always provide detailed and useful answers to the user queries" messages = [ {"role": "system", "content": system_prompt}, {"role": "user", "content": "Bindal ab klaas Python buy wone ni ñuy jëfandikoo dataframe yi ci Pandas"} ] print(generate_response(messages)) ``` 3. **Problem Solving** ```python system_prompt = "You're a Wolof AI assistant. Please always provide detailed and useful answers to the user queries." messages = [ {"role": "system", "content": system_prompt}, {"role": "user", "content": "Ndax nga mën ma won ni ñuy resolver problème bii: Fatou dafa jënd 3 kilo ceeb, 2 kilo diw ak 5 kilo sukër. Ceeb gi wenn kilo 500 CFA la, diw gi 1200 CFA kilo bi, sukër gi 750 CFA kilo bi. Ñaata la wara fay?"} ] from pprint import pprint pprint(generate_response(messages)) ``` 4. **Text Generation** (e.g. story generation) ```python system_prompt = "You are a skilled Wolof storyteller (Gewël) with deep knowledge of African folktales and traditions. Write engaging stories in Wolof that reflect African cultural values and wisdom." messages = [ {"role": "system", "content": system_prompt}, {"role": "user", "content": "Bindal ab léeb ci gaynde gi lekk muus mi"} ] print(generate_response(messages, temperature=0.9)) ``` 5. **Multi-turn conversations** Oolel is not optimized for multi-turn conversations, but you can try it! ```bash messages = [ {"role": "user", "content": "Wax ma clan mooy CEDEAO ? Ci lan la liggeey?"}, {"role": "assistant", "content": "CEDEAO mooy 'organisation' gu boole reew yi nekk ci pennc Afrika bi. Mu ngi sukkandiku ci wàll économie, politig, ak déggoo diggante reew yi"}, {"role": "user", "content": "ñaata reew ñoo ci bokk?"} ] print(generate_response(messages)) ``` ## Authors - [**Yaya SY**](https://x.com/seygalare): NLP Researcher (Efficient Continued Pretraining) - [**Dioula DOUCOURE**](https://x.com/DioulaD): Data & NLP Engineer