YAML Metadata
Warning:
empty or missing yaml metadata in repo card
(https://huggingface.co/docs/hub/model-cards#model-card-metadata)
Qwen3-Coder-30B-A3B-Instruct-RTPurbo
Model Overview
- Model Optimizations:
- Sliding Window Attention: 85%
- Full Attention: 15%
- Version: 1.0
RTPurbo uses hybrid HeadWise Attention to compress the Qwen3Coder model. Specifically, it divides attention into two parts according to attention type:
- Retrieval Heads: These heads perform Full Attention over the entire sequence (or a large chunk), allowing them to capture rich, long-range dependencies and act as a powerful information retrieval component.
- non Retrieval Heads: These heads use Sink SWA Attention, processing tokens in a sliding-window or fixed-cache manner. They are highly efficient and ideal for handling very long sequences while maintaining local context.
The following code can be used for inference. HeadWise will be triggered in scenarios where SeqLen > 16,384.
from transformers import AutoModelForCausalLM, AutoTokenizer, AutoConfig
model_name = "RTP-LLM/Qwen3-Coder-30B-A3B-Instruct-RTPurbo"
tokenizer = AutoTokenizer.from_pretrained(model_name)
config = AutoConfig.from_pretrained(model_name, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
model_name,
config=config,
trust_remote_code=True,
torch_dtype="auto",
device_map="auto"
)
# prepare the model input
prompt = "Write a quick sort algorithm."
messages = [
{"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True,
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
# conduct text completion
generated_ids = model.generate(
**model_inputs,
max_new_tokens=128
)
output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist()
content = tokenizer.decode(output_ids, skip_special_tokens=True)
print("content:", content)
Evaluation
This model was evaluated in the lm_eval benchmark using Qwen3-Coder-30B-A3B-Instruct as evaluator.
| Longbench | lcc | repo-p | samsum | trec | lsht | 2wikim | hotpot | multi-en | multi-zh | musique | qasper | vcsum | qmsum | PR-en | PR-zh | Avg. (%) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Qwen3-Coder-30B-A3B | ||||||||||||||||
| Full Attn | 34.34 | 27.14 | 45.80 | 81.00 | 47.50 | 42.08 | 57.64 | 52.89 | 65.99 | 38.30 | 39.25 | 13.55 | 23.77 | 99.00 | 99.75 | 51.20 |
| RTPurbo | 35.96 | 35.21 | 46.49 | 81.00 | 49.00 | 47.39 | 55.44 | 52.93 | 65.23 | 35.58 | 39.78 | 13.80 | 23.68 | 99.00 | 99.75 | 52.02 |
- Downloads last month
- 32
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support