Can a quantized version of deepseek-v3-0324 be implemented on a machine with a cluster of 8 *A100s?
like 4bit
https://huggingface.co/OPEA/DeepSeek-R1-int4-AutoRound-awq-asym#generate-the-model
thanks , but i want a quantization of deepseek v3-0324
you only need 2 A100s for 671B https://huggingface.co/collections/VPTQ-community/vptq-deepseek-r1-without-finetune-67d0832c203afd208bb8449e
oh, I thought you meant that you wanted to quantize this model on an 8×100 card.
you only need 2 A100s for 671B https://huggingface.co/collections/VPTQ-community/vptq-deepseek-r1-without-finetune-67d0832c203afd208bb8449e
a quantization of deepseek v3-0324 ?
you only need 2 A100s for 671B https://huggingface.co/collections/VPTQ-community/vptq-deepseek-r1-without-finetune-67d0832c203afd208bb8449e
a quantization of deepseek v3-0324 ?
Stay tunned!
you only need 2 A100s for 671B https://huggingface.co/collections/VPTQ-community/vptq-deepseek-r1-without-finetune-67d0832c203afd208bb8449e
a quantization of deepseek v3-0324 ?
Stay tunned!
wait for u !!!
You don't need such expensive equipment to run it, esp non original quantized.
It can be made in "turtle Ai" mode by 10 years old Xeon server motherboard like from Gigabyte with 12 RAM slots ($150 board, DDR4 64Gb $70).
I'm still downloading it, but older V3 & R1 in Q5 uses 502GB RAM/VRAM, Q6 uses 568GB RAM/VRAM (in large models Q6 is minimum for adequate quality).
I would use such GPU better in WAN videomodel in ComfyUI, which is uncensored. Or StepFun video model which exactly require 4 GPUs, something ~80GB VRAM.
You don't need such expensive equipment to run it, esp non original quantized.
It can be made in "turtle Ai" mode by 10 years old Xeon server motherboard like from Gigabyte with 12 RAM slots ($150 board, DDR4 64Gb $70).
I'm still downloading it, but older V3 & R1 in Q5 uses 502GB RAM/VRAM, Q6 uses 568GB RAM/VRAM (in large models Q6 is minimum for adequate quality).
I would use such GPU better in WAN videomodel in ComfyUI, which is uncensored. Or StepFun video model which exactly require 4 GPUs, something ~80GB VRAM.
thank ur advice , i already have machines . just need a quantization version deepseek to run . now awq have been open .url:https://huggingface.co/cognitivecomputations/DeepSeek-V3-0324-AWQ
thank ur advice , i already have machines . just need a quantization version deepseek to run . now awq have been open .url:https://huggingface.co/cognitivecomputations/DeepSeek-V3-0324-AWQ
Yes, Q6 is quite good. At Gigabyte they have a 24 RAM slots DDR5 boards for latest CPUs, which will be quite fast. https://www.gigabyte.com/Enterprise/Server-Motherboard
On 10-year old MU70-SU0 the Deepseek V3.1 runs with 0.69 tokens/sec.