alokabhishek
/

gemma-1.1-7b-it-GGUF

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

alokabhishek commited on Apr 8, 2024

Commit

079c4af

·

verified ·

1 Parent(s): c92c340

Updated Readme

Files changed (1) hide show

README.md +92 -13

README.md CHANGED Viewed

@@ -1,21 +1,100 @@
 ---
 library_name: transformers
-widget:
-- messages:
-  - role: user
-    content: How does the brain work?
-inference:
-  parameters:
-    max_new_tokens: 200
-extra_gated_heading: Access Gemma on Hugging Face
-extra_gated_prompt: >-
-  To access Gemma on Hugging Face, you’re required to review and agree to
-  Google’s usage license. To do this, please ensure you’re logged-in to Hugging
-  Face and click below. Requests are processed immediately.
-extra_gated_button_content: Acknowledge license
 license: gemma
 ---
 # Gemma Model Card
 **Model Page**: [Gemma](https://ai.google.dev/gemma/docs)

 ---
 library_name: transformers
 license: gemma
+pipeline_tag: text-generation
+tags:
+- GGUF
+- quantized
+- Q4_K_M
+- Q5_K_M
+- 4bit
+- 5bit
+- Gemma
+- Gemma-7B
+- Gemma-1.1
+- Gemma-1.1-7b
+- Google
 ---
+# Model Card for alokabhishek/gemma-1.1-7b-it-GGUF
+<!-- Provide a quick summary of what the model is/does. -->
+This repo GGUF quantized version of Google's Gemma-1.1-7b-it model using llama.cpp.
+## Model Details
+- Model creator: [Google](https://huggingface.co/google)
+- Original model: [gemma-7b-it-GGUF](https://huggingface.co/google/gemma-1.1-7b-it-GGUF)
+### About GGUF quantization using llama.cpp
+- llama.cpp github repo: [llama.cpp github repo](https://github.com/ggerganov/llama.cpp)
+- llama-cpp-python github repo: [llama-cpp-python github repo](https://github.com/abetlen/llama-cpp-python)
+# How to Get Started with the Model
+Use the code below to get started with the model.
+## How to run from Python code
+#### First install the package
+```shell
+# Base ctransformers with CUDA GPU acceleration
+! pip install ctransformers[cuda]>=0.2.24
+# Or with no GPU acceleration
+# ! pip install llama-cpp-python
+! pip install -U sentence-transformers
+! pip install transformers huggingface_hub torch
+```
+# Import
+```python
+from llama_cpp import Llama
+from transformers import pipeline, AutoModel, AutoTokenizer
+from sentence_transformers import SentenceTransformer
+import os
+```
+# Using llama_cpp as a high-level helper
+```python
+repo_id = "alokabhishek/gemma-1.1-7b-it-GGUF"
+filename = "Q4_K_M.gguf"
+llm = Llama.from_pretrained(
+    repo_id=repo_id,
+    filename=filename,
+    verbose=False,
+)
+prompt = "Tell me a funny joke about Large Language Models meeting a Blackhole in an intergalactic Bar."
+llm_response = llm.create_chat_completion(
+    messages=[{"role": "user", "content": prompt}],
+    temperature=1.5,
+    top_p=0.8,
+    top_k=50,
+    repeat_penalty=1.01,
+)
+llm_respose_formatted = llm_response["choices"][0]["message"]["content"]
+print(llm_respose_formatted)
+```
+# Orignial Gemma Model Card
 # Gemma Model Card
 **Model Page**: [Gemma](https://ai.google.dev/gemma/docs)