Debugging misaligned completions with sparse-autoencoder latent attribution

(alignment.openai.com)

1 points | by gmays an hour ago ago

No comments yet.