aashish1904 commited on
Commit
187c717
1 Parent(s): 971507f

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +72 -0
README.md ADDED
@@ -0,0 +1,72 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ ---
3
+
4
+ language: en
5
+ license: apache-2.0
6
+ tags:
7
+ - causal-lm
8
+ - transformers
9
+ - llama
10
+ - reflex-ai
11
+
12
+ ---
13
+
14
+ [![QuantFactory Banner](https://lh7-rt.googleusercontent.com/docsz/AD_4nXeiuCm7c8lEwEJuRey9kiVZsRn2W-b4pWlu3-X534V3YmVuVc2ZL-NXg2RkzSOOS2JXGHutDuyyNAUtdJI65jGTo8jT9Y99tMi4H4MqL44Uc5QKG77B0d6-JfIkZHFaUA71-RtjyYZWVIhqsNZcx8-OMaA?key=xt3VSDoCbmTY7o-cwwOFwQ)](https://hf.co/QuantFactory)
15
+
16
+
17
+ # QuantFactory/AMD-Llama-350M-Upgraded-GGUF
18
+ This is quantized version of [reflex-ai/AMD-Llama-350M-Upgraded](https://huggingface.co/reflex-ai/AMD-Llama-350M-Upgraded) created using llama.cpp
19
+
20
+ # Original Model Card
21
+
22
+
23
+ # AMD Llama 350M Upgraded
24
+
25
+ ## Model Description
26
+
27
+ The **AMD Llama 350M Upgraded** is a transformer-based causal language model built on the Llama architecture, designed to generate human-like text. This model has been upgraded from the original AMD Llama 135M model to provide enhanced performance with an increased parameter count of 332 million. It is suitable for various natural language processing tasks, including text generation, completion, and conversational applications.
28
+
29
+ ## Model Details
30
+
31
+ - **Model Type**: Causal Language Model
32
+ - **Architecture**: Llama
33
+ - **Number of Parameters**: 332 million
34
+ - **Input Size**: Variable-length input sequences
35
+ - **Output Size**: Variable-length output sequences
36
+
37
+ ## Usage
38
+
39
+ To use the AMD Llama 350M Upgraded model, you can utilize the `transformers` library. Here’s a sample code snippet to get started:
40
+
41
+ ```python
42
+ import torch
43
+ from transformers import LlamaForCausalLM, LlamaTokenizer
44
+
45
+ # Load the tokenizer and model
46
+ model_name = "reflex-ai/AMD-Llama-350M-Upgraded"
47
+ tokenizer = LlamaTokenizer.from_pretrained(model_name)
48
+ model = LlamaForCausalLM.from_pretrained(model_name)
49
+
50
+ # Set the model to evaluation mode
51
+ model.eval()
52
+
53
+ # Function to generate text
54
+ def generate_text(prompt, max_length=50):
55
+ inputs = tokenizer.encode(prompt, return_tensors='pt', padding=True, truncation=True)
56
+ attention_mask = (inputs != tokenizer.pad_token_id).long()
57
+
58
+ if torch.cuda.is_available():
59
+ inputs = inputs.to('cuda')
60
+ attention_mask = attention_mask.to('cuda')
61
+
62
+ with torch.no_grad():
63
+ outputs = model.generate(inputs, attention_mask=attention_mask, max_length=max_length, num_return_sequences=1)
64
+
65
+ generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
66
+ return generated_text
67
+
68
+ # Example usage
69
+ prompt = "Once upon a time in a land far away,"
70
+ generated_output = generate_text(prompt, max_length=100)
71
+ print(generated_output)
72
+