aashish1904 commited on
Commit
5858f0f
·
verified ·
1 Parent(s): 6641596

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +137 -0
README.md ADDED
@@ -0,0 +1,137 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ ---
3
+
4
+ license: apache-2.0
5
+ datasets:
6
+ - nvidia/ChatQA-Training-Data
7
+ language:
8
+ - en
9
+ base_model:
10
+ - meta-llama/Llama-3.2-1B-Instruct
11
+ pipeline_tag: text-generation
12
+ library_name: transformers
13
+
14
+ ---
15
+
16
+ [![QuantFactory Banner](https://lh7-rt.googleusercontent.com/docsz/AD_4nXeiuCm7c8lEwEJuRey9kiVZsRn2W-b4pWlu3-X534V3YmVuVc2ZL-NXg2RkzSOOS2JXGHutDuyyNAUtdJI65jGTo8jT9Y99tMi4H4MqL44Uc5QKG77B0d6-JfIkZHFaUA71-RtjyYZWVIhqsNZcx8-OMaA?key=xt3VSDoCbmTY7o-cwwOFwQ)](https://hf.co/QuantFactory)
17
+
18
+
19
+ # QuantFactory/OneLLM-Doey-ChatQA-V1-Llama-3.2-1B-GGUF
20
+ This is quantized version of [DoeyLLM/OneLLM-Doey-ChatQA-V1-Llama-3.2-1B](https://huggingface.co/DoeyLLM/OneLLM-Doey-ChatQA-V1-Llama-3.2-1B) created using llama.cpp
21
+
22
+ # Original Model Card
23
+
24
+ ## **Model Summary**
25
+ This model is a fine-tuned version of **LLaMA 3.2-1B**, optimized using **LoRA (Low-Rank Adaptation)** on the [NVIDIA ChatQA-Training-Data](https://huggingface.co/datasets/nvidia/ChatQA-Training-Data). It is tailored for conversational AI, question answering, and other instruction-following tasks, with support for sequences up to 1024 tokens.
26
+
27
+ ---
28
+
29
+ ## **Key Features**
30
+ - **Base Model**: LLaMA 3.2-1B
31
+ - **Fine-Tuning Framework**: LoRA
32
+ - **Dataset**: NVIDIA ChatQA-Training-Data
33
+ - **Max Sequence Length**: 1024 tokens
34
+ - **Use Case**: Instruction-based tasks, question answering, conversational AI.
35
+
36
+ ## **Model Usage**
37
+ This fine-tuned model is suitable for:
38
+ - **Conversational AI**: Chatbots and dialogue agents with improved contextual understanding.
39
+ - **Question Answering**: Generating concise and accurate answers to user queries.
40
+ - **Instruction Following**: Responding to structured prompts.
41
+ - **Long-Context Tasks**: Processing sequences up to 1024 tokens for long-text reasoning.
42
+
43
+ # **How to Use DoeyLLM / OneLLM-Doey-V1-Llama-3.2-1B-Instruct**
44
+
45
+ This guide explains how to use the **DoeyLLM** model on both app (iOS) and PC platforms.
46
+
47
+ ---
48
+
49
+ ## **App: Use with OneLLM**
50
+
51
+ OneLLM brings versatile large language models (LLMs) to your device—Llama, Gemma, Qwen, Mistral, and more. Enjoy private, offline GPT and AI tools tailored to your needs.
52
+
53
+ With OneLLM, experience the capabilities of leading-edge language models directly on your device, all without an internet connection. Get fast, reliable, and intelligent responses, while keeping your data secure with local processing.
54
+
55
+ ### **Quick Start for mobile**
56
+
57
+
58
+ ![OneLLM](./OneLLM.png)
59
+
60
+ Follow these steps to integrate the **DoeyLLM** model using the OneLLM app:
61
+
62
+ 1. **Download OneLLM**
63
+ Get the app from the [App Store](https://apps.apple.com/us/app/onellm-private-ai-gpt-llm/id6737907910) and install it on your iOS device.
64
+
65
+ Or get the app from the [Play Store](https://play.google.com/store/apps/details?id=com.esotech.onellm) and install it on your Android device.
66
+
67
+ 3. **Load the DoeyLLM Model**
68
+ Use the OneLLM interface to load the DoeyLLM model directly into the app:
69
+ - Navigate to the **Model Library**.
70
+ - Search for `DoeyLLM`.
71
+ - Select the model and tap **Download** to store it locally on your device.
72
+ 4. **Start Conversing**
73
+ Once the model is loaded, you can begin interacting with it through the app's chat interface. For example:
74
+ - Tap the **Chat** tab.
75
+ - Type your question or prompt, such as:
76
+ > "Explain the significance of AI in education."
77
+ - Receive real-time, intelligent responses generated locally.
78
+
79
+ ### **Key Features of OneLLM**
80
+ - **Versatile Models**: Supports various LLMs, including Llama, Gemma, and Qwen.
81
+ - **Private & Secure**: All processing occurs locally on your device, ensuring data privacy.
82
+ - **Offline Capability**: Use the app without requiring an internet connection.
83
+ - **Fast Performance**: Optimized for mobile devices, delivering low-latency responses.
84
+
85
+ For more details or support, visit the [OneLLM App Store page](https://apps.apple.com/us/app/onellm-private-ai-gpt-llm/id6737907910) and [Play Store](https://play.google.com/store/apps/details?id=com.esotech.onellm).
86
+
87
+ ## **PC: Use with Transformers**
88
+
89
+ The DoeyLLM model can also be used on PC platforms through the `transformers` library, enabling robust and scalable inference for various NLP tasks.
90
+
91
+ ### **Quick Start for PC**
92
+ Follow these steps to use the model with Transformers:
93
+
94
+ 1. **Install Transformers**
95
+ Ensure you have `transformers >= 4.43.0` installed. Update or install it via pip:
96
+
97
+ ```bash
98
+ pip install --upgrade transformers
99
+
100
+ 2. **Load the Model**
101
+ Use the transformers library to load the model and tokenizer:
102
+
103
+ Starting with `transformers >= 4.43.0` onward, you can run conversational inference using the Transformers `pipeline` abstraction or by leveraging the Auto classes with the `generate()` function.
104
+
105
+ Make sure to update your transformers installation via `pip install --upgrade transformers`.
106
+
107
+ ```python
108
+ import torch
109
+ from transformers import pipeline
110
+
111
+ model_id = "OneLLM-Doey-V1-Llama-3.2-1B"
112
+ pipe = pipeline(
113
+ "text-generation",
114
+ model=model_id,
115
+ torch_dtype=torch.bfloat16,
116
+ device_map="auto",
117
+ )
118
+ messages = [
119
+ {"role": "system", "content": "You are a pirate chatbot who always responds in pirate speak!"},
120
+ {"role": "user", "content": "Who are you?"},
121
+ ]
122
+ outputs = pipe(
123
+ messages,
124
+ max_new_tokens=256,
125
+ )
126
+ print(outputs[0]["generated_text"][-1])
127
+ ```
128
+
129
+
130
+
131
+ ## Responsibility & Safety
132
+
133
+ As part of our responsible release strategy, we adopted a three-pronged approach to managing trust and safety risks:
134
+
135
+ Enable developers to deploy helpful, safe, and flexible experiences for their target audience and the use cases supported by the model.
136
+ Protect developers from adversarial users attempting to exploit the model’s capabilities to potentially cause harm.
137
+ Provide safeguards for the community to help prevent the misuse of the model.