AI & ML interests

Language models, reasoning, robustness, question answering, evaluation, theorem proving, knowledge graphs, mechanistic interpretability, adversarial training, dynamic adversarial data collection, in-context learning, natural language explanations, safety and security, self-training, knowledge distillation, natural language processing

Recent Activity