RAHUL YASHWANTKUMAR GUPTA

ryg81
Β·

AI & ML interests

None yet

Recent Activity

View all activity

Organizations

None yet

ryg81's activity

reacted to Duskfallcrew's post with πŸ‘πŸ”₯ about 17 hours ago
view post
Post
1165
Just been starting to port my articles over that mattered most to me from Civitai.
Look, i'm not going to sit here and whine, complain and moan entirely - they know why i've left, they're going to thrive without me.
I'm a mere spec compared to their future, and that's amazing.
But the journey continues, i've posted my Design 101 for Ai - the first one up -- i BELEIVE it's the first one, as it delves back to how Arts and Crafts connect to AI.
I'm still looking for a model hub in future for my insane 800+ models i'd published - considering that that's half of what i've got sitting in my repos on HF.
reacted to retronic's post with πŸ”₯ 1 day ago
view post
Post
4112
Colox, a reasoning AI model. I am currently working on a model smarter than GPT o1 that thinks before it speaks. It is coming tomorrow in the afternoon.
Β·
reacted to ahmed-masry's post with πŸ‘ 4 days ago
view post
Post
5013
Happy to announce AlignVLM πŸ“ – a novel approach to bridging vision and language latent spaces for multimodal understanding in Vision-Language Models (VLMs) πŸŒπŸ“„πŸ–Ό

πŸ”— Read the paper: AlignVLM: Bridging Vision and Language Latent Spaces for Multimodal Understanding (2502.01341)

🧐 What’s the challenge?
Aligning visual features with language embeddings remains a major bottleneck in VLMs. Existing connectors such as Multi-layer perceptron (MLPs) often introduce noise that degrades performance. ❌

🎯 Our Solution: ALIGN Connector
We propose AlignVLM, a method that maps vision features into a weighted average of LLM text embeddings, ensuring they remain in a space that the LLM can effectively interpret. βœ…

πŸ”¬ How does it perform?
We compared ALIGN against common connectors like MLPs, Perceiver Resampler, and Ovis trained under similar configurations. The results? ALIGN outperforms them all πŸ† on diverse document understanding tasks πŸ“„.

πŸ“Š Meet the AlignVLM Model Family!
We trained Llama 3.1 (1B, 3B, 8B) using our connector and benchmarked them against various models. The results:
βœ… AlignVLM surpasses all Base VLMs trained under similar configurations. βœ… Our models also perform competitively against Instruct VLMs such as Qwen2-VL and InternVL-2.5 πŸš€.

πŸ€” What about robustness to noise?
We injected Gaussian noise (ΞΌ=0, Οƒ=3) into the vision encoder’s outputs before feeding them to the connector:
βœ… ALIGN Connector: Minimal drop (↓1.67%) – proving its high robustness!
❌ MLP Connector: Severe degradation (↓25.54%) – struggling with noisy inputs.

Code & model weights coming soon! Stay tuned! πŸ”₯
replied to Jaward's post 5 days ago
view reply

looks good, but will they put for us to use or just showing their development and teasing us. :)

New activity in lmstudio-community/MiniCPM-o-2_6-GGUF 7 days ago

getting error

8
#1 opened 10 days ago by
ryg81
reacted to fuzzy-mittenz's post with πŸ”₯ 9 days ago
view post
Post
2588
Not many seemed to notice but what was probably meant to be a WIN for artist's rights in the US Office of Copyright has solved some fundamental issues for the community.
In our recent article I outline how Companies like Suno, OpenAI, Midjourney etc can no longer claim any right to copy your work that you create with their platforms
We also look at other ways this study and new rules for AI will fundamentally effect creators who use it and companies incentives to give them control over certain aspects might change because of this. it's broken down pretty well here: https://huggingface.co/blog/fuzzy-mittenz/copyright-in-ai
view reply

Can this be similar for image generation models? (I am not a programmer :- or expert in AI))