An update on our safety & security practices
An update on our safety & security practices
Auto Added by WPeMatico
We banned accounts linked to a covert Iranian influence operation using ChatGPT to generate website and social media content focused on multiple topics, including the U.S. presidential campaign. We have seen no indication that this content reached a meaningful audience.
This report outlines the safety work carried out prior to releasing GPT-4o including external red teaming, frontier risk evaluations according to our Preparedness Framework, and an overview of the mitigations we built in to address key risk areas.
CriticGPT, a model based on GPT-4, writes critiques of ChatGPT responses to help human trainers spot mistakes during RLHF
Exploring the technology behind our text-to-speech model.
Expanding on how Voice Engine works and our safety research Read More »
We are improving our AI systems’ ability to learn from human feedback and to assist humans at evaluating AI. Our goal is to build a sufficiently aligned AI system that can help us solve all other alignment problems.
We trained “critique-writing” models to describe flaws in summaries. Human evaluators find flaws in summaries much more often when shown our model’s critiques. Larger models are better at self-critiquing, with scale improving critique-writing more than summary-writing. This shows promise for using AI systems to assist human supervision of AI systems on difficult tasks.
Cohere, OpenAI, and AI21 Labs have developed a preliminary set of best practices applicable to any organization developing or deploying large language models.
We describe our latest thinking in the hope of helping other AI developers address safety and misuse of deployed models.
Lessons learned on language model safety and misuse Read More »