Saurav singh's picture
4

Saurav singh PRO

sauravssss
Β·

AI & ML interests

None yet

Recent Activity

published a model about 1 month ago
sauravssss/Video_agent
published a Space about 1 month ago
sauravssss/Video_Agent
View all activity

Organizations

Hugging Face Discord Community's profile picture AI Starter Pack's profile picture

sauravssss's activity

published a model about 1 month ago
liked a Space 7 months ago
updated a Space 7 months ago
reacted to victor's post with ❀️ 8 months ago
view post
Post
5913
πŸ™‹ Calling all Hugging Face users! We want to hear from YOU!

What feature or improvement would make the biggest impact on Hugging Face?

Whether it's the Hub, better documentation, new integrations, or something completely different – we're all ears!

Your feedback shapes the future of Hugging Face. Drop your ideas in the comments below! πŸ‘‡
Β·
replied to vladbogo's post about 1 year ago
reacted to vladbogo's post with ❀️ about 1 year ago
view post
Post
1812
Anthropic introduces "Many-shot Jailbreaking" (MSJ), a new attack on large language models! MSJ exploits long context windows to override safety constraints.

Key Points:
* Prompts LLMs with hundreds of examples of harmful behavior formatted as a dialogue
* Generates malicious examples using an uninhibited "helpful-only" model
* Effective at jailbreaking models like Claude 2.0, GPT-3.5, GPT-4
* Standard alignment techniques provide limited protection against long context attacks

Paper: https://www.anthropic.com/research/many-shot-jailbreaking
More details in my blog: https://huggingface.co/blog/vladbogo/many-shot-jailbreaking

Congrats to the authors for their work!
  • 2 replies
Β·