renqibing's picture

2 3

renqibing

renqibing

·

renqibing

AI & ML interests

large language model, trustworthy AI

Organizations

renqibing's activity

upvoted 2 papers 3 months ago

Derail Yourself: Multi-turn LLM Jailbreak Attack through Self-discovered Clues

Paper • 2410.10700 • Published Oct 14, 2024 • 2

CodeAttack: Revealing Safety Generalization Challenges of Large Language Models via Code Completion

Paper • 2403.07865 • Published Mar 12, 2024 • 1