Can a quantized version of deepseek-v3-0324 be implemented on a machine with a cluster of 8 *A100s?

#19

by xueshuai - opened 18 days ago

Discussion

xueshuai

18 days ago

like 4bit

cicdatopea

18 days ago

https://huggingface.co/OPEA/DeepSeek-R1-int4-AutoRound-awq-asym#generate-the-model

xueshuai

18 days ago

https://huggingface.co/OPEA/DeepSeek-R1-int4-AutoRound-awq-asym#generate-the-model

thanks , but i want a quantization of deepseek v3-0324

OpenSourceRonin

18 days ago

you only need 2 A100s for 671B https://huggingface.co/collections/VPTQ-community/vptq-deepseek-r1-without-finetune-67d0832c203afd208bb8449e

cicdatopea

18 days ago

oh, I thought you meant that you wanted to quantize this model on an 8×100 card.

xueshuai changed discussion title from Can a quantized version be implemented on a machine with a cluster of 8 *A100s? to Can a quantized version of deepseek-v3-0324 be implemented on a machine with a cluster of 8 *A100s? 18 days ago

xueshuai

18 days ago

you only need 2 A100s for 671B https://huggingface.co/collections/VPTQ-community/vptq-deepseek-r1-without-finetune-67d0832c203afd208bb8449e

a quantization of deepseek v3-0324 ?

OpenSourceRonin

18 days ago

you only need 2 A100s for 671B https://huggingface.co/collections/VPTQ-community/vptq-deepseek-r1-without-finetune-67d0832c203afd208bb8449e

a quantization of deepseek v3-0324 ?

Stay tunned!

xueshuai

18 days ago

you only need 2 A100s for 671B https://huggingface.co/collections/VPTQ-community/vptq-deepseek-r1-without-finetune-67d0832c203afd208bb8449e

a quantization of deepseek v3-0324 ?

Stay tunned!

wait for u !!!

krustik

16 days ago

•

edited 16 days ago

You don't need such expensive equipment to run it, esp non original quantized.
It can be made in "turtle Ai" mode by 10 years old Xeon server motherboard like from Gigabyte with 12 RAM slots ($150 board, DDR4 64Gb $70).
I'm still downloading it, but older V3 & R1 in Q5 uses 502GB RAM/VRAM, Q6 uses 568GB RAM/VRAM (in large models Q6 is minimum for adequate quality).
I would use such GPU better in WAN videomodel in ComfyUI, which is uncensored. Or StepFun video model which exactly require 4 GPUs, something ~80GB VRAM.

xueshuai

15 days ago

You don't need such expensive equipment to run it, esp non original quantized.
It can be made in "turtle Ai" mode by 10 years old Xeon server motherboard like from Gigabyte with 12 RAM slots ($150 board, DDR4 64Gb $70).
I'm still downloading it, but older V3 & R1 in Q5 uses 502GB RAM/VRAM, Q6 uses 568GB RAM/VRAM (in large models Q6 is minimum for adequate quality).
I would use such GPU better in WAN videomodel in ComfyUI, which is uncensored. Or StepFun video model which exactly require 4 GPUs, something ~80GB VRAM.

thank ur advice , i already have machines . just need a quantization version deepseek to run . now awq have been open .url:https://huggingface.co/cognitivecomputations/DeepSeek-V3-0324-AWQ

krustik

15 days ago

•

edited 15 days ago

thank ur advice , i already have machines . just need a quantization version deepseek to run . now awq have been open .url:https://huggingface.co/cognitivecomputations/DeepSeek-V3-0324-AWQ

Yes, Q6 is quite good. At Gigabyte they have a 24 RAM slots DDR5 boards for latest CPUs, which will be quite fast. https://www.gigabyte.com/Enterprise/Server-Motherboard
On 10-year old MU70-SU0 the Deepseek V3.1 runs with 0.69 tokens/sec.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment