Can a quantized version of deepseek-v3-0324 be implemented on a machine with a cluster of 8 *A100s?

#19
by xueshuai - opened

like 4bit

https://huggingface.co/OPEA/DeepSeek-R1-int4-AutoRound-awq-asym#generate-the-model

thanks , but i want a quantization of deepseek v3-0324

oh, I thought you meant that you wanted to quantize this model on an 8×100 card.

xueshuai changed discussion title from Can a quantized version be implemented on a machine with a cluster of 8 *A100s? to Can a quantized version of deepseek-v3-0324 be implemented on a machine with a cluster of 8 *A100s?

you only need 2 A100s for 671B https://huggingface.co/collections/VPTQ-community/vptq-deepseek-r1-without-finetune-67d0832c203afd208bb8449e

a quantization of deepseek v3-0324 ?

Stay tunned!

you only need 2 A100s for 671B https://huggingface.co/collections/VPTQ-community/vptq-deepseek-r1-without-finetune-67d0832c203afd208bb8449e

a quantization of deepseek v3-0324 ?

Stay tunned!

wait for u !!!

You don't need such expensive equipment to run it, esp non original quantized.
It can be made in "turtle Ai" mode by 10 years old Xeon server motherboard like from Gigabyte with 12 RAM slots ($150 board, DDR4 64Gb $70).
I'm still downloading it, but older V3 & R1 in Q5 uses 502GB RAM/VRAM, Q6 uses 568GB RAM/VRAM (in large models Q6 is minimum for adequate quality).
I would use such GPU better in WAN videomodel in ComfyUI, which is uncensored. Or StepFun video model which exactly require 4 GPUs, something ~80GB VRAM.

You don't need such expensive equipment to run it, esp non original quantized.
It can be made in "turtle Ai" mode by 10 years old Xeon server motherboard like from Gigabyte with 12 RAM slots ($150 board, DDR4 64Gb $70).
I'm still downloading it, but older V3 & R1 in Q5 uses 502GB RAM/VRAM, Q6 uses 568GB RAM/VRAM (in large models Q6 is minimum for adequate quality).
I would use such GPU better in WAN videomodel in ComfyUI, which is uncensored. Or StepFun video model which exactly require 4 GPUs, something ~80GB VRAM.

thank ur advice , i already have machines . just need a quantization version deepseek to run . now awq have been open .url:https://huggingface.co/cognitivecomputations/DeepSeek-V3-0324-AWQ

thank ur advice , i already have machines . just need a quantization version deepseek to run . now awq have been open .url:https://huggingface.co/cognitivecomputations/DeepSeek-V3-0324-AWQ

Yes, Q6 is quite good. At Gigabyte they have a 24 RAM slots DDR5 boards for latest CPUs, which will be quite fast. https://www.gigabyte.com/Enterprise/Server-Motherboard
On 10-year old MU70-SU0 the Deepseek V3.1 runs with 0.69 tokens/sec.

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment