Thomas Nguyen

ThomasTheMaker
Β·

AI & ML interests

Building the world's fastest CPU LLM inference layer

Recent Activity

View all activity

Organizations

None yet

ThomasTheMaker's activity

replied to takarajordan's post 6 days ago
view reply

Nice! I'm working on a CPU-only inference layer and been looking for a good and simple alternative to work on.
I would love it if there can be q8 and q4 quantization support!

replied to takarajordan's post 6 days ago
view reply

What is the performance like compared to ggml and llama.cpp?
Also do it support quantization?

Thanks

replied to Pendrokar's post 2 months ago
view reply

Is it possible to have the decoder and encoder separate for this model? I would like to use my RK3588 NPU for decoder. thanks!

reacted to prithivMLmods's post with πŸš€ 3 months ago
view post
Post
6016
Reasoning SmolLM2 πŸš€

🎯Fine-tuning SmolLM2 on a lightweight synthetic reasoning dataset for reasoning-specific tasks. Future updates will focus on lightweight, blazing-fast reasoning models. Until then, check out the blog for fine-tuning details.

πŸ”₯Blog : https://huggingface.co/blog/prithivMLmods/smollm2-ft

πŸ”Ό Models :
+ SmolLM2-CoT-360M : prithivMLmods/SmolLM2-CoT-360M
+ Reasoning-SmolLM2-135M : prithivMLmods/Reasoning-SmolLM2-135M
+ SmolLM2-CoT-360M-GGUF : prithivMLmods/SmolLM2-CoT-360M-GGUF

🀠 Other Details :
+ Demo : prithivMLmods/SmolLM2-CoT-360M
+ Fine-tune nB : prithivMLmods/SmolLM2-CoT-360M