--- base_model: yentinglin/Llama-3-Taiwan-70B-Instruct language: - zh - en license: llama3 model_creator: yentinglin model_name: Llama-3-Taiwan-70B-Instruct model_type: llama pipeline_tag: text-generation quantized_by: minyichen tags: - llama-3 --- # Llama-3-Taiwan-70B-Instruct-fp8 - Model creator: [Yen-Ting Lin](https://huggingface.co/yentinglin) - Original model: [Llama-3-Taiwan-70B-Instruct](https://huggingface.co/yentinglin/Llama-3-Taiwan-70B-Instruct) ## Description This repo contains fp8 model files for [Llama-3-Taiwan-70B-Instruct](https://huggingface.co/yentinglin/Llama-3-Taiwan-70B-Instruct). * [GPTQ models for GPU inference](https://huggingface.co/minyichen/Llama-3-Taiwan-70B-Instruct-GPTQ) * [Yen-Ting Lin's original unquantized model](https://huggingface.co/yentinglin/Llama-3-Taiwan-70B-Instruct) ## Quantization parameter - activation_scheme : static - quant_method : fp8 - ignored_layers : lm_head It tooks about 8.5 hrs to quantize on H100.