elysiantech
/

gemma-2b-gptq-4bit

Text Generation

text-generation-inference

Inference Endpoints

4-bit precision

Model card Files Files and versions Community

gemma-2b-gptq-4bit / README.md

elysiantech's picture

Update README.md

3459fe9 verified 6 months ago

|

history blame contribute delete

1.42 kB

	---
	language:
	- en
	library_name: transformers
	license: other
	license_name: gemma-terms-of-use
	license_link: https://ai.google.dev/gemma/terms
	tags:
	- text-generation-inference
	- gemma
	- gptq
	- google
	extra_gated_heading: Access Gemma on Hugging Face
	extra_gated_prompt: To access Gemma on Hugging Face, you’re required to review and
	agree to Google’s usage license. To do this, please ensure you’re logged-in to Hugging
	Face and click below. Requests are processed immediately.
	extra_gated_button_content: Acknowledge license
	---

	[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1kFznlPlWYOrcgd7Q1NI2tYMLH_vTRuys?usp=sharing)

	# elysiantech/gemma-2b-gptq-4bit

	gemma-2b-gptq-4bit is a version of the [2B base model](https://huggingface.co/google/gemma-2b) model that was quantized using the GPTQ method developed by [Lin et al. (2023)](https://arxiv.org/abs/2308.07662).

	Please refer to the [Original Gemma Model Card](https://ai.google.dev/gemma/docs) for details about the model preparation and training processes.

	## Dependencies
	- [`auto-gptq`](https://pypi.org/project/auto-gptq/0.7.1/) – [AutoGPTQ](https://github.com/PanQiWei/AutoGPTQ.git) was used to quantize the phi-3 model.
	- [`vllm==0.4.2`](https://pypi.org/project/vllm/0.4.2/) – [vLLM](https://github.com/vllm-project/vllm) was used to host models for benchmarking.