As mentioned, we’ve open-sourced our benchmarking code here: https://github.com/keyboardAnt/hf-bench
Nadav Timor
Nadav-Timor
AI & ML interests
None yet
Recent Activity
commented on
an
article
14 days ago
Speeding Up LLM Decoding with Advanced Universal Assisted Generation Techniques
commented on
their
article
14 days ago
Universal Assisted Generation: Faster Decoding with Any Assistant Model
published
an
article
15 days ago
Speeding Up LLM Decoding with Advanced Universal Assisted Generation Techniques
Organizations
Nadav-Timor's activity

commented on
Speeding Up LLM Decoding with Advanced Universal Assisted Generation Techniques
14 days ago

commented on
Universal Assisted Generation: Faster Decoding with Any Assistant Model
14 days ago
Citation
@article
{timor2025acceleratingllminferencelossless,
title={Accelerating LLM Inference with Lossless Speculative Decoding Algorithms for Heterogeneous Vocabularies},
author={Nadav Timor and Jonathan Mamou and Daniel Korat and Moshe Berchansky and Oren Pereg and Gaurav Jain and Roy Schwartz and Moshe Wasserblat and David Harel},
year={2025},
eprint={2502.05202},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2502.05202},
}

published
an
article
15 days ago
Article
Speeding Up LLM Decoding with Advanced Universal Assisted Generation Techniques
By
and 8 others
•
•
17Vocab size in config.json mismatches the actual tokenizer size
5
#4 opened 2 months ago
by
Fizzarolli


upvoted
a
paper
4 months ago

published
an
article
5 months ago
Article
Universal Assisted Generation: Faster Decoding with Any Assistant Model
By
and 7 others
•
•
55
upvoted
an
article
6 months ago
Article
Faster Assisted Generation with Dynamic Speculation
By
and 6 others
•
•
46
published
an
article
6 months ago
Article
Faster Assisted Generation with Dynamic Speculation
By
and 6 others
•
•
46
upvoted
a
paper
10 months ago

upvoted
a
collection
11 months ago

upvoted
a
paper
11 months ago
`llama3p-70b-rc3_vr_mid_3` & `llama3p-7b-rc3_vr_mid_2`?
#2 opened 11 months ago
by
Nadav-Timor

`max_position_embeddings=32768` and `precompute_freqs_cis` with `end=128_000`
1
#6 opened over 1 year ago
by
Nadav-Timor

`max_position_embeddings=32768` with "attention span of 131K tokens"
1
#57 opened over 1 year ago
by
Nadav-Timor
