2 150 73

Raja Biswas

rbiswasfc

AI & ML interests

NLP, Generative AI

Recent Activity

updated a dataset about 5 hours ago

rbiswasfc/r1-7b

upvoted a collection about 13 hours ago

Model Merging

upvoted an article 1 day ago

The N Implementation Details of RLHF with PPO

View all activity

Organizations

rbiswasfc's activity

updated a dataset about 5 hours ago

rbiswasfc/r1-7b

Viewer • Updated about 5 hours ago • 40 • 71

upvoted a collection about 13 hours ago

Model Merging

Collection

Model Merging is a very popular technique nowadays in LLM. Here is a chronological list of papers on the space that will help you get started with it! • 30 items • Updated Jun 12, 2024 • 235

upvoted an article 1 day ago

Article

The N Implementation Details of RLHF with PPO

Oct 24, 2023

• 44

liked a dataset 3 days ago

qihoo360/Light-R1-SFTData

Viewer • Updated 3 days ago • 79.4k • 658 • 21

upvoted a paper 4 days ago

Crowdsource, Crawl, or Generate? Creating SEA-VL, a Multicultural Vision-Language Dataset for Southeast Asia

Paper • 2503.07920 • Published 6 days ago • 91

upvoted an article 4 days ago

Article

Welcome Gemma 3: Google's all new multimodal, multilingual, long context open LLM

5 days ago

• 271

upvoted a collection 4 days ago

Gemma 3 Release

Collection

9 items • Updated 3 days ago • 252

liked a dataset 5 days ago

open-r1/codeforces

Viewer • Updated 5 days ago • 10k • 583 • 19

upvoted 3 articles 5 days ago

Article

Open R1: Update #3

and 9 others •

5 days ago

• 225

Article

HuggingFace, IISc partner to supercharge model building on India's diverse languages

18 days ago

• 14

Article

A Deepdive into Aya Vision: Advancing the Frontier of Multilingual Multimodality

13 days ago

• 66

upvoted 2 papers 6 days ago

Token-Efficient Long Video Understanding for Multimodal LLMs

Paper • 2503.04130 • Published 10 days ago • 79

EuroBERT: Scaling Multilingual Encoders for European Languages

Paper • 2503.05500 • Published 9 days ago • 72

published a dataset 13 days ago

rbiswasfc/r1-7b

Viewer • Updated about 5 hours ago • 40 • 71

upvoted 2 articles 22 days ago

Article

DeepSeek-R1 Dissection: Understanding PPO & GRPO Without Any Prior Reinforcement Learning Knowledge

•

Feb 7

• 74

Article

Illustrating Reinforcement Learning from Human Feedback (RLHF)

Dec 9, 2022

• 199

liked a model 22 days ago

deepseek-ai/DeepSeek-R1-Distill-Qwen-7B

Text Generation • Updated 21 days ago • 1.25M • 548

liked a Space 24 days ago

2.26k

The Ultra-Scale Playbook

🌌

The ultimate guide to training LLM on large GPU Clusters

published a model 24 days ago

rbiswasfc/mistral-rp-v2

Updated 24 days ago • 10

updated a model 24 days ago

rbiswasfc/mistral-rp-v2

Updated 24 days ago • 10