Sabkuch Align Karo

university

Activity Feed

AI & ML interests

None defined yet.

Recent Activity

RishabhBhardwaj authored a paper 3 months ago

Ruby Teaming: Improving Quality Diversity Search with Memory for Automated Red Teaming

RishabhBhardwaj authored a paper 3 months ago

DELLA-Merging: Reducing Interference in Model Merging through Magnitude-Based Sampling

RishabhBhardwaj authored a paper 3 months ago

Measuring and Enhancing Trustworthiness of LLMs in RAG through Grounded Attributions and Learning to Refuse

View all activity

sabkuch-align-karo's activity

RishabhBhardwaj

authored 3 papers 3 months ago

Ruby Teaming: Improving Quality Diversity Search with Memory for Automated Red Teaming

Paper • 2406.11654 • Published Jun 17 • 6

DELLA-Merging: Reducing Interference in Model Merging through Magnitude-Based Sampling

Paper • 2406.11617 • Published Jun 17 • 8

Measuring and Enhancing Trustworthiness of LLMs in RAG through Grounded Attributions and Learning to Refuse

Paper • 2409.11242 • Published Sep 17 • 5

RishabhBhardwaj

authored a paper 4 months ago

Ferret: Faster and Effective Automated Red Teaming with Reward-Based Scoring Technique

Paper • 2408.10701 • Published Aug 20 • 11

RishabhBhardwaj

authored a paper 5 months ago

WalledEval: A Comprehensive Safety Evaluation Toolkit for Large Language Models

Paper • 2408.03837 • Published Aug 7 • 17

RishabhBhardwaj

posted an update 6 months ago

Post

2128

Excited to announce the release of the community version of our guardrails: WalledGuard-C!

Feel free to use it—compared to Meta’s guardrails, it offers superior performance, being 4x faster. Most importantly, it's free for nearly any use!

Link: walledai/walledguard-c

#AISafety

1 reply

RishabhBhardwaj

posted an update 6 months ago

Post

2437

🎉 We are thrilled to share our work on model merging. We proposed a new approach, Della-merging, which combines expert models from various domains into a single, versatile model. Della employs a magnitude-based sampling approach to eliminate redundant delta parameters, reducing interference when merging homologous models (those fine-tuned from the same backbone).

Della outperforms existing homologous model merging techniques such as DARE and TIES. Across three expert models (LM, Math, Code) and their corresponding benchmark datasets (AlpacaEval, GSM8K, MBPP), Della achieves an improvement of 3.6 points over TIES and 1.2 points over DARE.

Paper: DELLA-Merging: Reducing Interference in Model Merging through Magnitude-Based Sampling (2406.11617)
Github: https://github.com/declare-lab/della

@soujanyaporia @Tej3

3 replies

RishabhBhardwaj

authored 4 papers 6 months ago

Recognizing Emotion Cause in Conversations

Paper • 2012.11820 • Published Dec 22, 2020

Red-Teaming Large Language Models using Chain of Utterances for Safety-Alignment

Paper • 2308.09662 • Published Aug 18, 2023 • 3

Language Models are Homer Simpson! Safety Re-Alignment of Fine-tuned Language Models through Task Arithmetic

Paper • 2402.11746 • Published Feb 19 • 2

Language Model Unalignment: Parametric Red-Teaming to Expose Hidden Harms and Biases

Paper • 2310.14303 • Published Oct 22, 2023 • 1

Xa9aX

authored a paper 8 months ago

Just Say the Name: Online Continual Learning with Category Names Only via Data Generation

Paper • 2403.10853 • Published Mar 16

deepanway

authored 6 papers 8 months ago

Tango 2: Aligning Diffusion-based Text-to-Audio Generations through Direct Preference Optimization

Paper • 2404.09956 • Published Apr 15 • 11

Xa9aX

authored a paper 9 months ago

Aurora-M: The First Open Source Multilingual Language Model Red-teamed according to the U.S. Executive Order

Paper • 2404.00399 • Published Mar 30 • 41

Xa9aX

authored a paper about 1 year ago

Rotate to Attend: Convolutional Triplet Attention Module

Paper • 2010.03045 • Published Oct 6, 2020

AI & ML interests

Recent Activity

Team members 4

sabkuch-align-karo's activity