Kevin Klyman

kklyman

AI & ML interests

Usage policies for foundation models

Recent Activity

Organizations

None yet

kklyman's activity

upvoted an article 7 days ago
view article
Article

An Analysis of Chinese LLM Censorship and Bias with Qwen 2 Instruct

By leonardlin β€’
β€’ 50
liked a Space 9 months ago
reacted to sted97's post with πŸ”₯ 9 months ago
view post
Post
2456
πŸ“£ I'm thrilled to announce "ALERT: A Comprehensive #Benchmark for Assessing #LLMs’ Safety through #RedTeaming" 🚨

πŸ“„ Paper: https://arxiv.org/pdf/2404.08676.pdf
πŸ—ƒοΈ Repo: https://github.com/Babelscape/ALERT
πŸ€— ALERT benchmark: Babelscape/ALERT
πŸ€— ALERT DPO data: Babelscape/ALERT_DPO

As a key design principle for ALERT, we developed a fine-grained safety risk taxonomy (Fig. 2). This taxonomy serves as the foundation for the benchmark to provide detailed insights about a model’s weaknesses and vulnerabilities as well as inform targeted safety enhancements πŸ›‘οΈ

For collecting our prompts, we started from the popular
Anthropic's HH-RLHF data, and used automated strategies to filter/classify prompts. We then designed templates to create new prompts (providing sufficient support for each category, cf. Fig. 3) and implemented adversarial attacks.

In our experiments, we extensively evaluated several open- and closed-source LLMs (e.g. #ChatGPT, #Llama and #Mistral), highlighting their strengths and weaknesses (Table 1).

For more details, check out our preprint: https://arxiv.org/pdf/2404.08676.pdf πŸ€“

Huge thanks to @felfri , @PSaiml , Kristian Kersting, @navigli , @huu-ontocord and @BoLi-aisecure (and all the organizations involved: Babelscape, Sapienza NLP, TU Darmstadt, Hessian.AI, DFKI, Ontocord.AI, UChicago and UIUC)πŸ«‚
  • 1 reply
Β·
New activity in allenai/OLMo-7B-0424 9 months ago

Update README.md

#1 opened 9 months ago by
kklyman
reacted to yjernite's post with πŸ€— 11 months ago
view post
Post
πŸ‘·πŸ½β€β™€οΈπŸ“šπŸ”¨ Announcing the Foundation Model Development Cheatsheet!

My first πŸ€—PostπŸ€— ever to announce the release of a fantastic collaborative resource to support model developers across the full development stack: The FM Development Cheatsheet available here: https://fmcheatsheet.org/

The cheatsheet is a growing database of the many crucial resources coming from open research and development efforts to support the responsible development of models. This new resource highlights essential yet often underutilized tools in order to make it as easy as possible for developers to adopt best practices, covering among other aspects:
πŸ§‘πŸΌβ€πŸ€β€πŸ§‘πŸΌ data selection, curation, and governance;
πŸ“– accurate and limitations-aware documentation;
⚑ energy efficiency throughout the training phase;
πŸ“Š thorough capability assessments and risk evaluations;
🌏 environmentally and socially conscious deployment strategies.

We strongly encourage developers working on creating and improving models to make full use of the tools listed here, and to help keep the resource up to date by adding the resources that you yourself have developed or found useful in your own practice πŸ€—

Congrats to all the participants in this effort for the release! Read more about it from:
@Shayne - https://twitter.com/ShayneRedford/status/1763215814860186005
@hails and @stellaathena - https://blog.eleuther.ai/fm-dev-cheatsheet/
@alon-albalak - http://nlp.cs.ucsb.edu/blog/a-new-guide-for-the-responsible-development-of-foundation-models.html

And also to @gabrielilharco @sayashk @kklyman @kylel @mbrauh @fauxneticien @avi-skowron @Bertievidgen Laura Weidinger, Arvind Narayanan, @VictorSanh @Davlan @percyliang Rishi Bommasani, @breakend @sasha πŸ”₯
  • 1 reply
Β·