Poser: Unmasking Alignment Faking LLMs by Manipulating Their Internals Paper • 2405.05466 • Published May 8, 2024 • 1
SliderSpace: Decomposing the Visual Capabilities of Diffusion Models Paper • 2502.01639 • Published Feb 3 • 25
NNsight and NDIF: Democratizing Access to Foundation Model Internals Paper • 2407.14561 • Published Jul 18, 2024 • 35
view article Article StarCoder2-Instruct: Fully Transparent and Permissive Self-Alignment for Code Generation Apr 29, 2024 • 76