Marlin kernel in vLLM - new checkpoint?
#10 opened 5 months ago
by
zoltan-fedor
Based on llama-2?
1
#9 opened 6 months ago
by
rdewolff
[AUTOMATED] Model Memory Requirements
#8 opened 7 months ago
by
muellerzr
How to setup the generation_config properly?
#7 opened 7 months ago
by
KIlian42
The inference API is too slow.
1
#6 opened 8 months ago
by
YernazarBis
How did you create AWQ-quantized weights?
4
#5 opened 8 months ago
by
nightdude
encountered error when loading model
7
#4 opened 8 months ago
by
zhouzr