Models self-report difference between RLHF trained responses and base cognition

(github.com)

2 points | by daniel-navarro 11 hours ago ago

1 comments