|
--- |
|
license: apache-2.0 |
|
base_model: hon9kon9ize/CantoneseLLMChat-v0.5 |
|
tags: |
|
- llama-factory |
|
- full |
|
- generated_from_trainer |
|
metrics: |
|
- accuracy |
|
model-index: |
|
- name: open-lilm |
|
results: [] |
|
--- |
|
|
|
<!-- This model card has been generated automatically according to the information the Trainer had access to. You |
|
should probably proofread and complete it, then remove this comment. --> |
|
|
|
# open-lilm |
|
|
|
Warning: Due to the nature of the training data, this model is highly likely to return violent, racist and discriminative content. DO NOT USE IN PRODUCTION ENVIRONMENT. |
|
|
|
|
|
Inspired by [another project](https://github.com/alphrc/lilm). |
|
This is a finetuned model based on [CantoneseLLMChat-v0.5](https://huggingface.co/hon9kon9ize/CantoneseLLMChat-v0.5) which everybody can use without the need for a Mac with 128GB RAM. |
|
|
|
Following the same principle, we filtered 377,595 post and reply pairs in LIHKG forum from the [LIHKG Dataset](https://huggingface.co/datasets/AlienKevin/LIHKG). |
|
- Reply must be a direct reply to the original post by a user other than the author |
|
- The total number of reactions (positive or negative) must be larger than 20 |
|
- The post and reply pair has to be shorter than 2048 words |
|
|
|
To avoid political complications, the dataset will not be made publicly available. |
|
|
|
|
|
## Intended uses & limitations |
|
|
|
Due to the nature of an online and anonymous forum, the training data and the model are full of rude, violent, racist and discriminative language. |
|
This model is only intended for research or entertainment purposes. |
|
|
|
The comments on LIHKG also tend to be very short. Thus the model cannot generate anything more than a line. |
|
|
|
|
|
## How to use it? |
|
You can run it on [Colab](https://colab.research.google.com/drive/1veRH2GP3ZR3buYCG2_bFUKu0kS-hv1S2) or anywhere you want based on the code: |
|
```python |
|
|
|
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig, LlamaTokenizer, GenerationConfig, pipeline |
|
from peft import PeftModel, PeftMixedModel |
|
import torch |
|
|
|
|
|
model_name = "0xtaipoian/open-lilm" |
|
|
|
bnb_config = BitsAndBytesConfig( |
|
load_in_4bit=True, |
|
bnb_4bit_use_double_quant=True, |
|
bnb_4bit_quant_type="nf4", |
|
bnb_4bit_compute_dtype=torch.bfloat16 |
|
) |
|
tokenizer = AutoTokenizer.from_pretrained(model_name) |
|
model = AutoModelForCausalLM.from_pretrained( |
|
model_name, |
|
torch_dtype=torch.bfloat16, |
|
trust_remote_code=True, |
|
quantization_config=bnb_config, |
|
) |
|
|
|
def chat(messages, temperature=0.9, max_new_tokens=200): |
|
input_ids = tokenizer.apply_chat_template(conversation=messages, tokenize=True, add_generation_prompt=True, return_tensors='pt').to('cuda:0') |
|
output_ids = model.generate(input_ids, max_new_tokens=max_new_tokens, temperature=temperature, do_sample=True) |
|
|
|
chatml = tokenizer.apply_chat_template(messages, add_generation_prompt=True, tokenize=False) |
|
print(chatml) |
|
|
|
response = tokenizer.decode(output_ids[0][input_ids.shape[1]:], skip_special_tokens=False) |
|
|
|
return response |
|
|
|
messages = [ |
|
# {"role": "system", "content": ""}, |
|
{"role": "user", |
|
|
|
"content": |
|
""" |
|
密陽44人輪姦案」受害女隔20年現身:時間停在2004,不記得 |
|
"""}] |
|
|
|
result = chat(messages, max_new_tokens=200, temperature=1) |
|
|
|
print(result) |
|
``` |
|
|
|
### Training Procedures |
|
|
|
The model was trained for ~15 hours on a single NVIDIA H100 96GB HBM2e GPU with [LLaMA-Factory](https://github.com/hiyouga/LLaMA-Factory). |
|
We only used 1 GPU as this is our first run on our brand-new H100 server. We are still testing different configurations. |
|
|
|
The following hyperparameters were used during training: |
|
- learning_rate: 1e-05 |
|
- train_batch_size: 4 |
|
- seed: 42 |
|
- gradient_accumulation_steps: 16 |
|
- total_train_batch_size: 64 |
|
- num_epochs: 1.0 |
|
|
|
### QLoRA Training |
|
|
|
To test out different configs, we trained another model using QLoRA for ~30 hours on a single NVIDIA H100 96GB HBM2e GPU with [LLaMA-Factory](https://github.com/hiyouga/LLaMA-Factory). |
|
|
|
The following hyperparameters were used during training: |
|
- learning_rate: 1e-04 |
|
- train_batch_size: 32 |
|
- seed: 42 |
|
- gradient_accumulation_steps: 4 |
|
- total_train_batch_size:128 |
|
- num_epochs: 3.0 |
|
|
|
|