Steve Li's picture

69 16

Steve Li

CHNtentes

·

CHNtentes

AI & ML interests

None yet

Recent Activity

new activity 1 day ago

stduhpf/google-gemma-3-12b-it-qat-q4_0-gguf-small:Update?

new activity 5 days ago

meta-llama/Llama-4-Scout-17B-16E-Instruct:Unethical comparisons with Deepseek replacing chinese languages by thai/vietnamese only

new activity 9 days ago

google/gemma-3-27b-it:SigLIP or SigLIP2 encoder?

View all activity

Organizations

None yet

CHNtentes's activity

New activity in stduhpf/google-gemma-3-12b-it-qat-q4_0-gguf-small 1 day ago

Update?

#4 opened 1 day ago by

New activity in meta-llama/Llama-4-Scout-17B-16E-Instruct 5 days ago

Unethical comparisons with Deepseek replacing chinese languages by thai/vietnamese only

#32 opened 6 days ago by

New activity in google/gemma-3-27b-it 9 days ago

SigLIP or SigLIP2 encoder?

#48 opened 10 days ago by

New activity in deepseek-ai/DeepSeek-V3-0324 11 days ago

Downloading weights without duplicates

#52 opened 11 days ago by

New activity in deepseek-ai/DeepSeek-V3-0324 15 days ago

听我说谢谢你因为有你温暖了四季

#37 opened 17 days ago by

New activity in Qwen/Qwen2.5-Omni-7B 16 days ago

Qwen2.5-Omni-7B-AWQ?

#7 opened 16 days ago by

Can this be quantized with bitsAndBytes?

#6 opened 17 days ago by

liked a model 18 days ago

deepseek-ai/DeepSeek-V3-0324

Text Generation • Updated 16 days ago • 201k • • 2.54k

New activity in Qwen/Qwen1.5-MoE-A2.7B-Chat-GPTQ-Int4 19 days ago

Int4为什么比没量化的float32和float16还慢

#3 opened about 1 month ago by

New activity in THUDM/CogView4-6B 21 days ago

Maybe rename it 15B

#5 opened about 1 month ago by

New activity in mistralai/Mistral-Small-3.1-24B-Instruct-2503 21 days ago

Quantized models with vision included?

#27 opened 24 days ago by

New activity in bartowski/nvidia_Llama-3_3-Nemotron-Super-49B-v1-GGUF 24 days ago

Does it support thinking on/off?

#2 opened 24 days ago by

liked a model about 1 month ago

Qwen/QwQ-32B

Text Generation • Updated Mar 11 • 810k • • 2.66k

New activity in unsloth/QwQ-32B about 1 month ago

I could be wrong, but I think the <think> tag needs to be removed from the last bit of the jinja template in the tokenizer_config.json

#1 opened about 1 month ago by

liked a Space about 1 month ago

QwQ 32B Demo

Send text and get detailed responses

New activity in Qwen/QwQ-32B about 1 month ago

When will you fix the model replies missing</think>\n start tags

#19 opened about 1 month ago by

liked a Space about 1 month ago

CogView4

Gradio demo of CogView4-6B

New activity in moonshotai/Moonlight-16B-A3B-Instruct about 1 month ago

When running example got ValueError: Attention mask should be of size (1, 1, 1, 30), but is torch.Size([1, 1, 1, 29])

#9 opened about 1 month ago by

New activity in nvidia/DeepSeek-R1-FP4 about 2 months ago

Benchmark results compared to orig fp8 / int4 quants etc?

#1 opened about 2 months ago by

New activity in Congliu/Chinese-DeepSeek-R1-Distill-data-110k about 2 months ago

有的题目不完整

#6 opened about 2 months ago by