3 2

Gabriele Oliaro

goliaro

https://www.gabrieleoliaro.com

goliaro

AI & ML interests

computer systems

Recent Activity

liked a model 3 months ago

01-ai/Yi-1.5-34B

authored a paper about 1 year ago

Towards Efficient Generative Large Language Model Serving: A Survey from Algorithms to Systems

authored a paper about 1 year ago

FlexLLM: A System for Co-Serving Large Language Model Inference and Parameter-Efficient Finetuning

View all activity

Organizations

None yet

goliaro's activity

liked a model 3 months ago

01-ai/Yi-1.5-34B

Text Generation • Updated Jun 26, 2024 • 1.84k • 47

authored 3 papers about 1 year ago

Towards Efficient Generative Large Language Model Serving: A Survey from Algorithms to Systems

Paper • 2312.15234 • Published Dec 23, 2023 • 3

FlexLLM: A System for Co-Serving Large Language Model Inference and Parameter-Efficient Finetuning

Paper • 2402.18789 • Published Feb 29, 2024

Quantized Side Tuning: Fast and Memory-Efficient Tuning of Quantized Large Language Models

Paper • 2401.07159 • Published Jan 13, 2024

New activity in JackFram/llama-160m over 1 year ago

align_with_llama2

#12 opened over 1 year ago by

goliaro

New activity in JackFram/llama-160m-base over 1 year ago

Fix bos_token_id to match BOS from tokenizer.model file

#1 opened over 1 year ago by

goliaro

New activity in JackFram/llama-160m over 1 year ago

Fix bos_token_id to match BOS from tokenizer.model file

#11 opened over 1 year ago by

goliaro

liked a model over 1 year ago

meta-llama/Llama-2-7b-hf

Text Generation • Updated Apr 17, 2024 • 918k • 1.97k

authored 3 papers over 1 year ago

SpecInfer: Accelerating Generative LLM Serving with Speculative Inference and Token Tree Verification

Paper • 2305.09781 • Published May 16, 2023 • 4

Zero-CPU Collection with Direct Telemetry Access

Paper • 2110.05438 • Published Oct 11, 2021

Direct Telemetry Access

Paper • 2202.02270 • Published Feb 4, 2022