|
--- |
|
license: apache-2.0 |
|
datasets: |
|
- IAmSkyDra/HCMUT_FAQ |
|
language: |
|
- vi |
|
tags: |
|
- education |
|
- text-generation-inference |
|
- gemma |
|
- llama-factory |
|
- unsloth |
|
widget: |
|
- text: Chào bạn |
|
output: |
|
text: >- |
|
Chào bạn! Tôi là GemSUra-edu, một trợ lý AI được phát triển bởi Long |
|
Nguyen. |
|
example_title: Query 1 |
|
|
|
- text: Hiệu trưởng hiện tại của trường Đại học Bách Khoa |
|
output: |
|
text: >- |
|
Hiệu trưởng hiện tại của trường Đại học Bách Khoa là PGS. TS. Mai Thanh Phong. |
|
example_title: Query 2 |
|
|
|
- text: OISP là viết tắt của |
|
output: |
|
text: >- |
|
Văn phòng Đào tạo Quốc tế (Office for International Study Programs) |
|
example_title: Query 3 |
|
--- |
|
## Introduction |
|
|
|
GemSUra-edu is a large language model fine-tuned on a dataset of FAQs from HCMUT, based on the pre-trained model [GemSUra 2B](https://huggingface.co/ura-hcmut/GemSUra-2B) developed by the URA research group at Ho Chi Minh City University of Technology (HCMUT). |
|
|
|
## Inference (with Unsloth for higher speed) |
|
|
|
```python |
|
from unsloth import FastLanguageModel |
|
import torch |
|
|
|
# Load model and tokenizer |
|
model, tokenizer = FastLanguageModel.from_pretrained( |
|
model_name="IAmSkyDra/GemSUra-edu", |
|
max_seq_length=4096, |
|
dtype=None, |
|
load_in_4bit=True |
|
) |
|
|
|
FastLanguageModel.for_inference(model) |
|
|
|
query_template = "<start_of_turn>user\n{query}<end_of_turn>\n<start_of_turn>model\n" |
|
|
|
while True: |
|
query = input("Query: ") |
|
if query.lower() == "exit": |
|
break |
|
|
|
query = query_template.format(query=query) |
|
inputs = tokenizer(query, return_tensors="pt") |
|
|
|
outputs = model.generate(**inputs, max_new_tokens=4096, use_cache=True) |
|
generated_text = tokenizer.batch_decode(outputs, skip_special_tokens=True) |
|
answer = generated_text[0].split("model\n")[1].strip() |
|
print(answer) |
|
``` |
|
|
|
## Inference (with Transformers) |
|
|
|
```python |
|
import transformers |
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
|
|
pipeline_kwargs = { |
|
"temperature": 0.1, |
|
"max_new_tokens": 4096, |
|
"do_sample": True |
|
} |
|
|
|
if __name__ == "__main__": |
|
# Load model |
|
model = AutoModelForCausalLM.from_pretrained( |
|
"IAmSkyDra/GemSUra-edu", |
|
device_map="auto" |
|
) |
|
model.eval() |
|
|
|
# Load tokenizer |
|
tokenizer = AutoTokenizer.from_pretrained( |
|
"IAmSkyDra/GemSUra-edu", |
|
trust_remote_code=True |
|
) |
|
|
|
pipeline = transformers.pipeline( |
|
model=model, |
|
tokenizer=tokenizer, |
|
return_full_text=False, |
|
task='text-generation', |
|
**pipeline_kwargs |
|
) |
|
|
|
query_template = "<start_of_turn>user\n{query}<end_of_turn>\n<start_of_turn>model\n" |
|
|
|
while True: |
|
query = input("Query: ") |
|
if query.lower() == "exit": |
|
break |
|
|
|
query = query_template.format(query=query) |
|
answer = pipeline(query)[0]["generated_text"] |
|
answer = answer.split("model\n")[1].strip() |
|
print(answer) |
|
``` |
|
|
|
## Notation |
|
|
|
If you want to quantize the model for deployment on local devices, it should be quantized to at least 8 bits. |