safety-by-imitation

community

Activity Feed Request to join this org

AI & ML interests

None defined yet.

Recent Activity

vinid authored a paper 7 months ago

Easily Accessible Text-to-Image Generation Amplifies Demographic Stereotypes at Large Scale

vinid authored a paper 7 months ago

Contrastive Language-Image Pre-training for the Italian Language

vinid authored a paper 7 months ago

XSTest: A Test Suite for Identifying Exaggerated Safety Behaviours in Large Language Models

View all activity

safety-by-imitation's activity

vinid

authored 6 papers 7 months ago

Easily Accessible Text-to-Image Generation Amplifies Demographic Stereotypes at Large Scale

Paper • 2211.03759 • Published Nov 7, 2022

Contrastive Language-Image Pre-training for the Italian Language

Paper • 2108.08688 • Published Aug 19, 2021 • 2

XSTest: A Test Suite for Identifying Exaggerated Safety Behaviours in Large Language Models

Paper • 2308.01263 • Published Aug 2, 2023

Safety-Tuned LLaMAs: Lessons From Improving the Safety of Large Language Models that Follow Instructions

Paper • 2309.07875 • Published Sep 14, 2023

When and why vision-language models behave like bags-of-words, and what to do about it?

Paper • 2210.01936 • Published Oct 4, 2022

TextGrad: Automatic "Differentiation" via Text

Paper • 2406.07496 • Published Jun 11, 2024 • 28

Paul

authored a paper 9 months ago

Introducing v0.5 of the AI Safety Benchmark from MLCommons

Paper • 2404.12241 • Published Apr 18, 2024 • 10

g8a9

authored 4 papers about 1 year ago

XSTest: A Test Suite for Identifying Exaggerated Safety Behaviours in Large Language Models

Paper • 2308.01263 • Published Aug 2, 2023

Safety-Tuned LLaMAs: Lessons From Improving the Safety of Large Language Models that Follow Instructions

Paper • 2309.07875 • Published Sep 14, 2023

Explaining Speech Classification Models via Word-Level Audio Segments and Paralinguistic Features

Paper • 2309.07733 • Published Sep 14, 2023

A Tale of Pronouns: Interpretability Informs Gender Bias Mitigation for Fairer Instruction-Tuned Machine Translation

Paper • 2310.12127 • Published Oct 18, 2023 • 1

g8a9

authored 3 papers over 1 year ago

ITALIC: An Italian Intent Classification Dataset

Paper • 2306.08502 • Published Jun 14, 2023 • 3

Entropy-based Attention Regularization Frees Unintended Bias Mitigation from Lists

Paper • 2203.09192 • Published Mar 17, 2022

Contrastive Language-Image Pre-training for the Italian Language

Paper • 2108.08688 • Published Aug 19, 2021 • 2

AI & ML interests

Recent Activity

Team members 6

safety-by-imitation's activity