Heem2 commited on
Commit
e706c25
1 Parent(s): 6f9eba9

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +85 -6
README.md CHANGED
@@ -10,14 +10,93 @@ tags:
10
  - trl
11
  - sft
12
  base_model: unsloth/mistral-7b-bnb-4bit
 
13
  ---
14
 
15
- # Uploaded model
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
16
 
17
- - **Developed by:** Heem2
18
- - **License:** apache-2.0
19
- - **Finetuned from model :** unsloth/mistral-7b-bnb-4bit
20
 
21
- This mistral model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
22
 
23
- [<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)
 
10
  - trl
11
  - sft
12
  base_model: unsloth/mistral-7b-bnb-4bit
13
+
14
  ---
15
 
16
+ # Nepali GPT
17
+ Nepali GPT is a large Nepali language fine-tuned model based on Mixtral_7B.The fine-tuning process uses Unsloth, expediting the training process for optimal efficiency.
18
+
19
+
20
+
21
+ ## Model Description
22
+ * Model type: A 7B fine-tuned model
23
+ * Primary Language(s): Nepali
24
+ * License: Mistral
25
+
26
+
27
+ ### Installation
28
+ ```
29
+ #Install Unsloth
30
+ %%capture
31
+ import torch
32
+ major_version, minor_version = torch.cuda.get_device_capability()
33
+ # Must install separately since Colab has torch 2.2.1, which breaks packages
34
+ !pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"
35
+ if major_version >= 8:
36
+ # Use this for new GPUs like Ampere, Hopper GPUs (RTX 30xx, RTX 40xx, A100, H100, L40)
37
+ !pip install --no-deps packaging ninja einops flash-attn xformers trl peft accelerate bitsandbytes
38
+ else:
39
+ # Use this for older GPUs (V100, Tesla T4, RTX 20xx)
40
+ !pip install --no-deps xformers trl peft accelerate bitsandbytes
41
+ pass
42
+ ```
43
+
44
+ ### Model loading
45
+ ```
46
+ from unsloth import FastLanguageModel
47
+ import torch
48
+ max_seq_length = 2048
49
+ dtype = None # None for auto detection. Float16 for Tesla T4, V100, Bfloat16 for Ampere+
50
+ load_in_4bit = True # Use 4bit quantization to reduce memory usage. Can be False.
51
+
52
+ model, tokenizer = FastLanguageModel.from_pretrained(
53
+ model_name = "Heem2/NEPALIGPT-1.0",
54
+ max_seq_length = max_seq_length,
55
+ dtype = dtype,
56
+ load_in_4bit = load_in_4bit,
57
+ )
58
+
59
+ prompt = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.
60
+
61
+ ### Instruction:
62
+ {}
63
+
64
+ ### Input:
65
+ {}
66
+
67
+ ### Response:
68
+ {}"""
69
+
70
+ ```
71
+ ### Inference
72
+ ```
73
+ FastLanguageModel.for_inference(model)
74
+ inputs = tokenizer(
75
+ [
76
+ prompt.format(
77
+ "नेपालको बारेमा व्याख्या गर्नुहोस्।?", # instruction
78
+ "संस्कृति, भाषा, भूगोल, राजनीति, जलवायु", # input
79
+ "", # output - leave this blank for generation!
80
+ )
81
+ ], return_tensors = "pt").to("cuda")
82
+
83
+ outputs = model.generate(**inputs, max_new_tokens = 1000, use_cache = True)
84
+ tokenizer.batch_decode(outputs)
85
+
86
+
87
+ ```
88
+
89
+ ### Citation Information
90
+
91
+ If you find this model useful, please consider giving 👏 and citing:
92
+
93
+ ```
94
+ @heem2
95
+ }
96
+ ```
97
+
98
+ ### Contributions
99
 
100
+ - This is developed by Hem Bahadur Gurung.Feel free to DM if you have any questions.
 
 
101
 
 
102