TechxGenus/Mistral-Large-Instruct-2411-GPTQ · 感觉新版的Mistrial-LargeV3的GPTQ量化的int4版本对显存的需求大大提升了

about 1 month ago

如题，原本的用4张2080Ti九可以跑，4000个token的上下文不是问题，但是新版本好像最多只接受500的上下文，否则显存直接原地爆炸。=而新版的awq不会受影响，但是awq的质量比较差，不知道i大家有没有遇到这样的问题？
As the title suggests, the original version could run with 4 2080Ti 9s and 4000 tokens of context was not a problem, but the new version seems to only accept a maximum of 500 contexts, otherwise the graphics memory will explode in place= The new version of AWQ will not be affected, but the quality of AWQ is relatively poor. Have you encountered such a problem?

qwertyjack

about 1 month ago

跟vllm版本有关？

YanchengQian

26 days ago

跟vllm版本有关？

可能是，这一次的2411n
版本更新以后，vllm库和原本的mistrial库需要被更新成最新版本，用支持2407的环境是无法推理的。可能是因为这样的原因。我去看看。