arxiv:2605.26045
Federico Torrielli
EvilScript
AI & ML interests
AI Safety & Mechanistic interpretability
Recent Activity
updated a model 2 minutes ago
EvilScript/activation-oracle-Qwen3_6-27B authored a paper 5 minutes ago
Confidence and Calibration of Activation Oracles for Reliable Interpretation of Language Model Internals updated a model 9 days ago
EvilScript/gemma-3-27b-it-taboo-smile