Safety & Alignment

Auto Added by WPeMatico

Practices for Governing Agentic AI Systems

Artificial Intelligence, Safety & Alignment

Practices for Governing Agentic AI Systems Read More »

Weak-to-strong generalization

Artificial Intelligence, Safety & Alignment

We present a new research direction for superalignment, together with promising initial results: can we leverage the generalization properties of deep learning to control strong models with weak supervisors?

Weak-to-strong generalization Read More »

Superalignment Fast Grants

Artificial Intelligence, Safety & Alignment

We’re launching $10M in grants to support technical research towards the alignment and safety of superhuman AI systems, including weak-to-strong generalization, interpretability, scalable oversight, and more.

Superalignment Fast Grants Read More »

Frontier risk and preparedness

Artificial Intelligence, Safety & Alignment

To support the safety of highly-capable AI systems, we are developing our approach to catastrophic risk preparedness, including building a Preparedness team and launching a challenge.

Frontier risk and preparedness Read More »

OpenAI Red Teaming Network

Artificial Intelligence, Safety & Alignment

We’re announcing an open call for the OpenAI Red Teaming Network and invite domain experts interested in improving the safety of OpenAI’s models to join our efforts.

OpenAI Red Teaming Network Read More »

GPT-4V(ision) system card

Artificial Intelligence, Safety & Alignment

GPT-4V(ision) system card Read More »

DALL·E 3 system card

Artificial Intelligence, Safety & Alignment

DALL·E 3 system card Read More »

Frontier Model Forum

Artificial Intelligence, Safety & Alignment

We’re forming a new industry body to promote the safe and responsible development of frontier AI systems: advancing AI safety research, identifying best practices and standards, and facilitating information sharing among policymakers and industry.

Frontier Model Forum Read More »

Confidence-Building Measures for Artificial Intelligence: Workshop proceedings

Artificial Intelligence, Safety & Alignment

Confidence-Building Measures for Artificial Intelligence: Workshop proceedings Read More »

Using GPT-4 for content moderation

Artificial Intelligence, Safety & Alignment

We use GPT-4 for content policy development and content moderation decisions, enabling more consistent labeling, a faster feedback loop for policy refinement, and less involvement from human moderators.

Using GPT-4 for content moderation Read More »