--- license: apache-2.0 language: - en - sl - hr - sr - bs library_name: transformers --- # OPT_GaMS 1B This is the 1B OPT model additionally pretrained on Slovene data. The model was created as a part of project Povejmo: https://www.cjvt.si/povejmo/. This is the base version of the model and is not instruction-tuned. ## Data The model was additionally pretrained on the following Slovene, English, and Croatian-Bosnian-Serbian (CBS) corpora: | Corpus | Language | # Tokens | Percentage | | :----- | :------- | :------: | :--------: | | Metafida | Slovene | 6.59 B | 13.89 % | | KAS | Slovene | 3.61 B | 7.62 % | | Trendi | Slovene | 1.4 B | 2.96 % | | mC4 | Slovene | 5.5 B | 11.6 % | | MaCoCu | Slovene | 4.68 B | 9.86 % | | CC100 | Slovene | 0.54 B | 1.14 % | | Rižnica | Croatian | 0.21 B | 0.44 % | | Hr News | Croatian | 4.16 B | 8.77 % | | MaCoCu HBS | CBS | 15.65 B | 32.98 % | | Wikipedia | English | 4.7 B | 9.9 % | | CC-News | English | 0.4 B | 0.83 % | The total size of additional training data is **47.44 B** tokens. ## Model usage The inference can be done using the following snippet of code: ```python from transformers import AutoTokenizer, pipeline model_id = "cjvt/OPT_GaMS-1B" tokenizer = AutoTokenizer.from_pretrained(model_id) pline = pipeline( "text-generation", model=model_id, tokenizer=tokenizer, device_map="auto" ) prompts = [ "The examples of antonyms are:\nhigh => low\nwide => narrow\nbig =>", "Pristanek je bil prvi nadzorovani spust ameriškega vesoljskega plovila na površje Lune po Apollu 17 leta 1972, ko je na Luni pristala zadnja Nasina misija s posadko.\nDoslej so na Luni pristala vesoljska plovila le iz štirih drugih držav –", "U četvrtak je bila prva polufinalna večer Dore, a komentari na društvenim mrežama ne prestaju. U nedjeljno finale prošli su:" ] sequences = pline( prompts, max_length=1000, do_sample=False, num_return_sequences=1, eos_token_id=tokenizer.eos_token_id ) for seq in sequences: print("--------------------------") print(f"Result: {seq[0]['generated_text']}") print("--------------------------\n") ```