LimYeri
/

CodeMind-Gemma-7B-QLoRA-4bit

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

LimYeri commited on Apr 18

Commit

c9c724d

•

1 Parent(s): b48a439

Update README.md

Files changed (1) hide show

README.md +27 -0

README.md CHANGED Viewed

@@ -43,6 +43,33 @@ To use the CodeMind model, you can access it through the Hugging Face model hub
 Please refer to the documentation and examples for detailed instructions on how to integrate and use the CodeMind model effectively.
 Below we share some code snippets on how to get quickly started with running the model. After downloading the transformers library via 'pip install -U transformers', use the following snippet code.
 #### Running the model on a single / multi GPU

 Please refer to the documentation and examples for detailed instructions on how to integrate and use the CodeMind model effectively.
 Below we share some code snippets on how to get quickly started with running the model. After downloading the transformers library via 'pip install -U transformers', use the following snippet code.
+#### Running the model on a CPU
+```python
+from transformers import AutoTokenizer, AutoModelForCausalLM
+model = AutoModelForCausalLM.from_pretrained("LimYeri/CodeMind-Gemma-7B-QLoRA-4bit")
+tokenizer = AutoTokenizer.from_pretrained("LimYeri/CodeMind-Gemma-7B-QLoRA-4bit")
+def get_completion(query: str, model, tokenizer) -> str:
+  prompt_template = """
+  <start_of_turn>user
+  Below is an instruction that describes a task. Write a response that appropriately completes the request.
+  {query}
+  <end_of_turn>\n\n<start_of_turn>model
+  """
+  prompt = prompt_template.format(query=query)
+  encodeds = tokenizer(prompt, return_tensors="pt", add_special_tokens=True)
+  generated_ids = model.generate(**encodeds, max_new_tokens=1000, do_sample=True, pad_token_id=tokenizer.eos_token_id)
+  # decoded = tokenizer.batch_decode(generated_ids)
+  decoded = tokenizer.decode(generated_ids[0], skip_special_tokens=True)
+  return (decoded)
+result = get_completion(query="Tell me how to solve the Leetcode Two Sum problem", model=model, tokenizer=tokenizer)
+print(result)
+```
 #### Running the model on a single / multi GPU