Perplexity scores for a Herd of 3B Llamas

by flyingkiwiguy - opened Jun 8, 2023

Discussion

flyingkiwiguy

Jun 8, 2023

Perplexities calculated using build = 635 (5c64a09) of llama.cpp and the first 406 lines of wiki.test.raw
Previous perplexity benchmarking for llamas indicated that 406 lines is enough to compare different sizes and quantization levels

SlyEcho

Owner Jun 8, 2023

406 lines would save me a lot of time. But 3B is pretty fast to compute.

flyingkiwiguy

Jun 8, 2023

•

edited Jun 8, 2023

And here's a comparison across model sizes for select quantization levels (note: X-Axis is now on a log scale to better see the trend in doubling context size):

flyingkiwiguy

Jun 8, 2023

•

edited Jun 8, 2023

406 lines would save me a lot of time. But 3B is pretty fast to compute.

I iterated through most of your models for the three context sizes to get a complete picture of how good Open LLama is for varying quantization levels. Still plenty of room for Open LLama to get to FB Llama quality.

EDIT: I posted the same plot and included the data for the 100 or so runs I did to https://github.com/openlm-research/open_llama/discussions/41

HanClinto

Jun 8, 2023

@flyingkiwiguy These graphs are great!

There are also people over in this discussion who may be interested in your graphs:

https://github.com/ggerganov/llama.cpp/issues/1291

flyingkiwiguy changed discussion title from Perplexity scores for a Herd of 7B Llamas to Perplexity scores for a Herd of 3B Llamas Jun 8, 2023

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment