Yatharth  Sharma's picture

Yatharth Sharma

YaTharThShaRma999

AI & ML interests

None yet

Recent Activity

Organizations

None yet

YaTharThShaRma999's activity

reacted to AdinaY's post with ๐Ÿ”ฅ 5 days ago
view post
Post
3086
Shanghai AI Lab - OpenGV team just released InternVL3 ๐Ÿ”ฅ

OpenGVLab/internvl3-67f7f690be79c2fe9d74fe9d

โœจ 1/2/8/9/14/38/28B with MIT license
โœจ Stronger perception & reasoning vs InternVL 2.5
โœจ Native Multimodal Pre-Training for even better language performance
  • 1 reply
ยท
reacted to onekq's post with ๐Ÿš€ 6 days ago
view post
Post
2550
We desperately need GPU for model inference. CPU can't replace GPU.

I will start with the basics. GPU is designed to serve predictable workloads with many parallel units (pixels, tensors, tokens). So a GPU allocates as much transistor budget as possible to build thousands of compute units (Cuda cores in NVidia or execution units in Apple Silicon), each capable of running a thread.

But CPU is designed to handle all kinds of workloads. CPU cores are much larger (hence a lot fewer) with branch prediction and other complex things. In addition, more and more transistors are allocated to build larger cache (~50% now) to house the unpredictable, devouring the compute budget.

Generalists can't beat specialists.
ยท
liked a Space 8 days ago
reacted to sr-rai's post with ๐Ÿคฏ๐Ÿคฏ๐Ÿค— 9 days ago
view post
Post
2593
ExLlamaV3 is out. And it introduces EXL3 - a new SOTA quantization format!

"The conversion process is designed to be simple and efficient and requires only an input model (in HF format) and a target bitrate. By computing Hessians on the fly and thanks to a fused Viterbi kernel, the quantizer can convert a model in a single step, taking a couple of minutes for smaller models, up to a few hours for larger ones (70B+) (on a single RTX 4090 or equivalent GPU.)"

Repo: https://github.com/turboderp-org/exllamav3



  • 1 reply
ยท
reacted to jsulz's post with ๐Ÿ”ฅ 10 days ago
view post
Post
3586
Huge week for xet-team as Llama 4 is the first major model on Hugging Face uploaded with Xet providing the backing! Every byte downloaded comes through our infrastructure.

Using Xet on Hugging Face is the fastest way to download and iterate on open source models and we've proved it with Llama 4 giving a boost of ~25% across all models.

We expect builders on the Hub to see even more improvements, helping power innovation across the community.

With the models on our infrastructure, we can peer in and see how well our dedupe performs across the Llama 4 family. On average, we're seeing ~25% dedupe, providing huge savings to the community who iterate on these state-of-the-art models. The attached image shows a few selected models and how they perform on Xet.

Thanks to the meta-llama team for launching on Xet!