Beat the AI: Investigating Adversarial Human Annotation for Reading Comprehension Paper • 2002.00293 • Published Feb 2, 2020
Interpretation of Natural Language Rules in Conversational Machine Reading Paper • 1809.01494 • Published Aug 28, 2018
Winoground: Probing Vision and Language Models for Visio-Linguistic Compositionality Paper • 2204.03162 • Published Apr 7, 2022 • 1
Improving Question Answering Model Robustness with Synthetic Adversarial Data Generation Paper • 2104.08678 • Published Apr 18, 2021
Dynatask: A Framework for Creating Dynamic AI Benchmark Tasks Paper • 2204.01906 • Published Apr 5, 2022
Models in the Loop: Aiding Crowdworkers with Generative Annotation Assistants Paper • 2112.09062 • Published Dec 16, 2021
DMLR: Data-centric Machine Learning Research -- Past, Present and Future Paper • 2311.13028 • Published Nov 21, 2023 • 1
The PRISM Alignment Project: What Participatory, Representative and Individualised Human Feedback Reveals About the Subjective and Multicultural Alignment of Large Language Models Paper • 2404.16019 • Published Apr 24 • 1
Rigorously Assessing Natural Language Explanations of Neurons Paper • 2309.10312 • Published Sep 19, 2023
MQuAKE: Assessing Knowledge Editing in Language Models via Multi-Hop Questions Paper • 2305.14795 • Published May 24, 2023
A Reply to Makelov et al. (2023)'s "Interpretability Illusion" Arguments Paper • 2401.12631 • Published Jan 23
pyvene: A Library for Understanding and Improving PyTorch Models via Interventions Paper • 2403.07809 • Published Mar 12 • 1
CEBaB: Estimating the Causal Effects of Real-World Concepts on NLP Model Behavior Paper • 2205.14140 • Published May 27, 2022
Aya Dataset: An Open-Access Collection for Multilingual Instruction Tuning Paper • 2402.06619 • Published Feb 9 • 54
Interpretability at Scale: Identifying Causal Mechanisms in Alpaca Paper • 2305.08809 • Published May 15, 2023 • 2