casperhansen
/

llama-3-70b-instruct-awq

Text Generation

text-generation-inference

Inference Endpoints

4-bit precision

Model card Files Files and versions Community

Resources

View closed (3)

Marlin kernel in vLLM - new checkpoint?

#10 opened 5 months ago by

Based on llama-2?

#9 opened 6 months ago by

[AUTOMATED] Model Memory Requirements

#8 opened 7 months ago by

How to setup the generation_config properly?

#7 opened 7 months ago by

The inference API is too slow.

#6 opened 8 months ago by

How did you create AWQ-quantized weights?

#5 opened 8 months ago by

encountered error when loading model

#4 opened 8 months ago by