Deliberative alignment: reasoning enables safer language models

/ Artificial Intelligence, Safety & Alignment / By hi@aiweekly.co.in

Deliberative alignment: reasoning enables safer language models
Introducing our new alignment strategy for o1 models, which are directly taught safety specifications and how to reason over them.