This is a Llama 2 architecture model series trained on the TinyStories dataset, intended for use in the llama2.c project by Andrej Karpathy.

Trained on a single v100 32GB GPU for 3 epochs, we achieve an inference speed of ~72 tokens/sec on the same.

Achieved tok/s: 161.819538 on 12th Gen Intel(R) Core(TM) i9-12900HK

Learn more on how to run inference in pure C using llama2.c

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Tensoic
/

Tiny-Stories

Dataset used to train Tensoic/Tiny-Stories