amgadhasan commited on
Commit
9371222
·
verified ·
1 Parent(s): d7c05a6

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +52 -12
README.md CHANGED
@@ -1,23 +1,63 @@
1
  ---
2
- base_model: unsloth/meta-llama-3.1-8b-instruct-bnb-4bit
3
  language:
4
  - en
5
- license: apache-2.0
 
 
 
 
 
 
 
 
6
  tags:
 
7
  - text-generation-inference
8
- - transformers
9
- - unsloth
10
  - llama
11
- - trl
12
- - sft
13
  ---
 
 
 
 
14
 
15
- # Uploaded model
16
 
17
- - **Developed by:** amgadhasan
18
- - **License:** apache-2.0
19
- - **Finetuned from model :** unsloth/meta-llama-3.1-8b-instruct-bnb-4bit
20
 
21
- This llama model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
 
22
 
23
- [<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
 
2
  language:
3
  - en
4
+ - de
5
+ - fr
6
+ - it
7
+ - pt
8
+ - hi
9
+ - es
10
+ - th
11
+ base_model: meta-llama/Meta-Llama-3.1-8B
12
+ pipeline_tag: text-generation
13
  tags:
14
+ - llama-3
15
  - text-generation-inference
 
 
16
  - llama
 
 
17
  ---
18
+ # Sera Llama v0.1
19
+ This a finetune of Llama3.1-8B on custom tool call to be used as an agent for a personal assistant, network admin.
20
+ It generates structured outputs with a tool call based on the user's input, without the need to add lengthy system message.
21
+ ## How to use
22
 
23
+ ### Use with transformers
24
 
 
 
 
25
 
26
+ ```python
27
+ # pip install -U accelerate bitsandbytes
28
 
29
+ import transformers
30
+
31
+ tokenizer = transformers.AutoTokenizer.from_pretrained("Sera-Network/sera-llama-3.1-8b-0.1")
32
+
33
+ # 4-bit quantization to run on smaller gpus
34
+ model = transformers.AutoModelForCausalLM.from_pretrained("Sera-Network/sera-llama-3.1-8b-0.1", device_map="auto", quantization_config=transformers.BitsAndBytesConfig(load_in_4bit=True))
35
+
36
+ # Warm up the model
37
+ input_ids = tokenizer.apply_chat_template([{"role": "user", "content": "What's the capital of France?"}], add_generation_prompt=True, return_tensors='pt').to('cuda')
38
+ output = model.generate(input_ids, do_sample=False, max_new_tokens=128)
39
+ preds = output[:, input_ids.shape[1]:]
40
+ text = tokenizer.decode(preds[0], skip_special_tokens=True)
41
+
42
+ # Create a helper function to generate text based on user input
43
+ def generate(user_input: str):
44
+ input_ids = tokenizer.apply_chat_template([{"role": "user", "content": user_input}], add_generation_prompt=True, return_tensors='pt').to('cuda')
45
+ output = model.generate(input_ids, do_sample=False, max_new_tokens=128)
46
+ preds = output[:, input_ids.shape[1]:]
47
+ text = tokenizer.decode(preds[0], skip_special_tokens=True)
48
+ return text
49
+ ```
50
+
51
+ ## Example output
52
+ ```python
53
+ generate("What's the capital of Swizerland and Germany?")
54
+ # The capital of Switzerland is Bern. The capital of Germany is Berlin.
55
+
56
+ generate("Set up a host for the domain symbiont.me")
57
+ # [{"name": "add_host", "parameters": {"hostname": "symbiont.me"}}]
58
+
59
+ generate("Send an email to my friend Andrej wishing him a happy birthday.")
60
+ # [{"name": "send_email", "parameters": {"subject": "Happy Birthday", "body": "Dear Andrej, happy birthday! Best regards, [Your Name]"}}]
61
+
62
+ generate("Schedule a call with my manager tomorrow 7 am to discuss my promotion.")
63
+ # [{"name": "schedule_call", "parameters": {"date": "2024-07-27", "time": "07:00:00", "topic": "promotion"}}]