Model Card for Model ID

I've fine-tuned a state-of-the-art Generative AI model using Hugging Face for customer support FAQ chat applications. This model is designed to provide accurate and helpful responses to frequently asked questions, making it a valuable tool for improving user experiences in customer support interactions. Its specialized training ensures it can understand and address a wide range of customer queries, making it an excellent choice for automating customer support tasks and enhancing overall efficiency.

Model Details

I have implemented a sharded model TinyPixel/Llama-2–7B-bf16-sharded which involves dividing a large neural network model into multiple smaller pieces, typically more than 14 pieces in our case. This sharding strategy has proven to be highly beneficial when combined with the ‘accelerate’ framework

When a model is sharded, each shard represents a portion of the overall model’s parameters. Accelerate can then efficiently manage these shards by distributing them across various parts of the memory, including GPU memory and CPU memory. This dynamic allocation of shards allows us to work with very large models without requiring an excessive amount of memory

Model Description

  • Developed by: [Tony Esposito]
  • Model type: [LLama2 family]
  • License: [Apache 2.0]
  • Finetuned from model [optional]: [TinyPixel/Llama-2-7B-bf16-sharded]

Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019).

  • Hardware Type: [More Information Needed]
  • Hours used: [More Information Needed]
  • Cloud Provider: [More Information Needed]
  • Compute Region: [More Information Needed]
  • Carbon Emitted: [More Information Needed]

Training procedure

The following bitsandbytes quantization config was used during training:

  • quant_method: bitsandbytes
  • load_in_8bit: False
  • load_in_4bit: True
  • llm_int8_threshold: 6.0
  • llm_int8_skip_modules: None
  • llm_int8_enable_fp32_cpu_offload: False
  • llm_int8_has_fp16_weight: False
  • bnb_4bit_quant_type: nf4
  • bnb_4bit_use_double_quant: False
  • bnb_4bit_compute_dtype: bfloat16

Framework versions

  • PEFT 0.7.0.dev0
Downloads last month
0
Inference Examples
Inference API (serverless) does not yet support peft models for this pipeline type.

Model tree for fbanespo/Llama2-7b-qlora-chat-support-bot-faq

Adapter
(187)
this model