7 9 43

Mitko Vasilev

mitkox

AI & ML interests

Make sure you own your AI. AI in the cloud is not aligned with you; it's aligned with the company that owns it.

Recent Activity

liked a model 23 days ago

deepseek-ai/DeepSeek-R1-0528

liked a model about 2 months ago

fdtn-ai/Foundation-Sec-8B

updated a model about 2 months ago

mitkox/Foundation-Sec-8B-Q8_0-GGUF

View all activity

Organizations

Posts 11

Post

3175

llama.cpp is 26.8% faster than ollama.
I have upgraded both, and using the same settings, I am running the same DeepSeek R1 Distill 1.5B on the same hardware. It's an Apples to Apples comparison.

Total duration:
llama.cpp 6.85 sec <- 26.8% faster
ollama 8.69 sec

Breakdown by phase:
Model loading
llama.cpp 241 ms <- 2x faster
ollama 553 ms

Prompt processing
llama.cpp 416.04 tokens/s with an eval time 45.67 ms <- 10x faster
ollama 42.17 tokens/s with an eval time of 498 ms

Token generation
llama.cpp 137.79 tokens/s with an eval time 6.62 sec <- 13% faster
ollama 122.07 tokens/s with an eval time 7.64 sec

llama.cpp is LLM inference in C/C++; ollama adds abstraction layers and marketing.

Make sure you own your AI. AI in the cloud is not aligned with you; it's aligned with the company that owns it.

Post

651

Stargate to the west of me
DeepSeek to the east
Here I am
Stuck in the middle with the EU

It will likely be a matter of sparkle to get export control on frontier research and models on both sides, leaving us in a vacuum.

Decentralized training infrastructure and on device inferencing are the future.

View all Posts