QuantFactory
/

OneLLM-Doey-ChatQA-V1-Llama-3.2-1B-GGUF

+---
+license: apache-2.0
+datasets:
+- nvidia/ChatQA-Training-Data
+language:
+- en
+base_model:
+- meta-llama/Llama-3.2-1B-Instruct
+pipeline_tag: text-generation
+library_name: transformers
+---
+[![QuantFactory Banner](https://lh7-rt.googleusercontent.com/docsz/AD_4nXeiuCm7c8lEwEJuRey9kiVZsRn2W-b4pWlu3-X534V3YmVuVc2ZL-NXg2RkzSOOS2JXGHutDuyyNAUtdJI65jGTo8jT9Y99tMi4H4MqL44Uc5QKG77B0d6-JfIkZHFaUA71-RtjyYZWVIhqsNZcx8-OMaA?key=xt3VSDoCbmTY7o-cwwOFwQ)](https://hf.co/QuantFactory)
+# QuantFactory/OneLLM-Doey-ChatQA-V1-Llama-3.2-1B-GGUF
+This is quantized version of [DoeyLLM/OneLLM-Doey-ChatQA-V1-Llama-3.2-1B](https://huggingface.co/DoeyLLM/OneLLM-Doey-ChatQA-V1-Llama-3.2-1B) created using llama.cpp
+# Original Model Card
+## **Model Summary**
+This model is a fine-tuned version of **LLaMA 3.2-1B**, optimized using **LoRA (Low-Rank Adaptation)** on the [NVIDIA ChatQA-Training-Data](https://huggingface.co/datasets/nvidia/ChatQA-Training-Data). It is tailored for conversational AI, question answering, and other instruction-following tasks, with support for sequences up to 1024 tokens.
+---
+## **Key Features**
+- **Base Model**: LLaMA 3.2-1B
+- **Fine-Tuning Framework**: LoRA
+- **Dataset**: NVIDIA ChatQA-Training-Data
+- **Max Sequence Length**: 1024 tokens
+- **Use Case**: Instruction-based tasks, question answering, conversational AI.
+## **Model Usage**
+This fine-tuned model is suitable for:
+- **Conversational AI**: Chatbots and dialogue agents with improved contextual understanding.
+- **Question Answering**: Generating concise and accurate answers to user queries.
+- **Instruction Following**: Responding to structured prompts.
+- **Long-Context Tasks**: Processing sequences up to 1024 tokens for long-text reasoning.
+# **How to Use DoeyLLM / OneLLM-Doey-V1-Llama-3.2-1B-Instruct**
+This guide explains how to use the **DoeyLLM** model on both app (iOS) and PC platforms.
+---
+## **App: Use with OneLLM**
+OneLLM brings versatile large language models (LLMs) to your device—Llama, Gemma, Qwen, Mistral, and more. Enjoy private, offline GPT and AI tools tailored to your needs.
+With OneLLM, experience the capabilities of leading-edge language models directly on your device, all without an internet connection. Get fast, reliable, and intelligent responses, while keeping your data secure with local processing.
+### **Quick Start for mobile**
+![OneLLM](./OneLLM.png)
+Follow these steps to integrate the **DoeyLLM** model using the OneLLM app:
+1. **Download OneLLM**
+   Get the app from the [App Store](https://apps.apple.com/us/app/onellm-private-ai-gpt-llm/id6737907910) and install it on your iOS device.
+   Or get the app from the [Play Store](https://play.google.com/store/apps/details?id=com.esotech.onellm) and install it on your Android device.
+3. **Load the DoeyLLM Model**
+   Use the OneLLM interface to load the DoeyLLM model directly into the app:
+   - Navigate to the **Model Library**.
+   - Search for `DoeyLLM`.
+   - Select the model and tap **Download** to store it locally on your device.
+4. **Start Conversing**
+   Once the model is loaded, you can begin interacting with it through the app's chat interface. For example:
+   - Tap the **Chat** tab.
+   - Type your question or prompt, such as:
+     > "Explain the significance of AI in education."
+   - Receive real-time, intelligent responses generated locally.
+### **Key Features of OneLLM**
+- **Versatile Models**: Supports various LLMs, including Llama, Gemma, and Qwen.
+- **Private & Secure**: All processing occurs locally on your device, ensuring data privacy.
+- **Offline Capability**: Use the app without requiring an internet connection.
+- **Fast Performance**: Optimized for mobile devices, delivering low-latency responses.
+For more details or support, visit the [OneLLM App Store page](https://apps.apple.com/us/app/onellm-private-ai-gpt-llm/id6737907910) and [Play Store](https://play.google.com/store/apps/details?id=com.esotech.onellm).
+## **PC: Use with Transformers**
+The DoeyLLM model can also be used on PC platforms through the `transformers` library, enabling robust and scalable inference for various NLP tasks.
+### **Quick Start for PC**
+Follow these steps to use the model with Transformers:
+1. **Install Transformers**
+   Ensure you have `transformers >= 4.43.0` installed. Update or install it via pip:
+   ```bash
+   pip install --upgrade transformers
+2. **Load the Model**
+   Use the transformers library to load the model and tokenizer:
+Starting with `transformers >= 4.43.0` onward, you can run conversational inference using the Transformers `pipeline` abstraction or by leveraging the Auto classes with the `generate()` function.
+Make sure to update your transformers installation via `pip install --upgrade transformers`.
+```python
+import torch
+from transformers import pipeline
+model_id = "OneLLM-Doey-V1-Llama-3.2-1B"
+pipe = pipeline(
+    "text-generation",
+    model=model_id,
+    torch_dtype=torch.bfloat16,
+    device_map="auto",
+)
+messages = [
+    {"role": "system", "content": "You are a pirate chatbot who always responds in pirate speak!"},
+    {"role": "user", "content": "Who are you?"},
+]
+outputs = pipe(
+    messages,
+    max_new_tokens=256,
+)
+print(outputs[0]["generated_text"][-1])
+```
+## Responsibility & Safety
+As part of our responsible release strategy, we adopted a three-pronged approach to managing trust and safety risks:
+Enable developers to deploy helpful, safe, and flexible experiences for their target audience and the use cases supported by the model.
+Protect developers from adversarial users attempting to exploit the model’s capabilities to potentially cause harm.
+Provide safeguards for the community to help prevent the misuse of the model.