IAmSkyDra commited on
Commit
69948fe
·
verified ·
1 Parent(s): 261d685

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +101 -3
README.md CHANGED
@@ -1,3 +1,101 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ datasets:
4
+ - IAmSkyDra/HCMUT_FAQ
5
+ language:
6
+ - vi
7
+ tags:
8
+ - education
9
+ widget:
10
+ - text: Chào bạn
11
+ output:
12
+ text: >-
13
+ Chào bạn! Tôi là GemSUra-edu, một trợ lý AI được phát triển bởi Long
14
+ Nguyen.
15
+ ---
16
+ ## Introduction
17
+
18
+ GemSUra-edu is a large language model fine-tuned on a dataset of FAQs from HCMUT, based on the pre-trained model [GemSUra 2B](https://huggingface.co/ura-hcmut/GemSUra-2B) developed by the URA research group at Ho Chi Minh City University of Technology (HCMUT).
19
+
20
+ ## Inference (with Unsloth for higher speed)
21
+
22
+ ```python
23
+ from unsloth import FastLanguageModel
24
+ import torch
25
+
26
+ # Load model and tokenizer
27
+ model, tokenizer = FastLanguageModel.from_pretrained(
28
+ model_name="IAmSkyDra/GemSUra-edu",
29
+ max_seq_length=4096,
30
+ dtype=None,
31
+ load_in_4bit=True
32
+ )
33
+
34
+ FastLanguageModel.for_inference(model)
35
+
36
+ query_template = "<start_of_turn>user\n{query}<end_of_turn>\n<start_of_turn>model\n"
37
+
38
+ while True:
39
+ query = input("Query: ")
40
+ if query.lower() == "exit":
41
+ break
42
+
43
+ query = query_template.format(query=query)
44
+ inputs = tokenizer(query, return_tensors="pt")
45
+
46
+ outputs = model.generate(**inputs, max_new_tokens=4096, use_cache=True)
47
+ generated_text = tokenizer.batch_decode(outputs, skip_special_tokens=True)
48
+ answer = generated_text[0].split("model\n")[1].strip()
49
+ print(answer)
50
+ ```
51
+
52
+ ## Inference (with Transformers)
53
+
54
+ ```python
55
+ import transformers
56
+ from transformers import AutoModelForCausalLM, AutoTokenizer
57
+
58
+ pipeline_kwargs = {
59
+ "temperature": 0.1,
60
+ "max_new_tokens": 4096,
61
+ "do_sample": True
62
+ }
63
+
64
+ if __name__ == "__main__":
65
+ # Load model
66
+ model = AutoModelForCausalLM.from_pretrained(
67
+ "IAmSkyDra/GemSUra-edu",
68
+ device_map="auto"
69
+ )
70
+ model.eval()
71
+
72
+ # Load tokenizer
73
+ tokenizer = AutoTokenizer.from_pretrained(
74
+ "IAmSkyDra/GemSUra-edu",
75
+ trust_remote_code=True
76
+ )
77
+
78
+ pipeline = transformers.pipeline(
79
+ model=model,
80
+ tokenizer=tokenizer,
81
+ return_full_text=False,
82
+ task='text-generation',
83
+ **pipeline_kwargs
84
+ )
85
+
86
+ query_template = "<start_of_turn>user\n{query}<end_of_turn>\n<start_of_turn>model\n"
87
+
88
+ while True:
89
+ query = input("Query: ")
90
+ if query.lower() == "exit":
91
+ break
92
+
93
+ query = query_template.format(query=query)
94
+ answer = pipeline(query)[0]["generated_text"]
95
+ answer = answer.split("model\n")[1].strip()
96
+ print(answer)
97
+ ```
98
+
99
+ ## Notation
100
+
101
+ If you want to quantize the model for deployment on local devices, it should be quantized to at least 8 bits.