Safety & Alignment

Auto Added by WPeMatico

Adversarial attacks on neural network policies

Artificial Intelligence, Safety & Alignment

Adversarial attacks on neural network policies Read More »

AI safety needs social scientists

Artificial Intelligence, Safety & Alignment

We’ve written a paper arguing that long-term AI safety research needs social scientists to ensure AI alignment algorithms succeed when actual humans are involved. Properly aligning advanced AI systems with human values requires resolving many uncertainties related to the psychology of human rationality, emotion, and biases. The aim of this paper is to spark further

AI safety needs social scientists Read More »

Learning complex goals with iterated amplification

Artificial Intelligence, Safety & Alignment

We’re proposing an AI safety technique called iterated amplification that lets us specify complicated behaviors and goals that are beyond human scale, by demonstrating how to decompose a task into simpler sub-tasks, rather than by providing labeled data or a reward function. Although this idea is in its very early stages and we have only

Learning complex goals with iterated amplification Read More »

Improving language understanding with unsupervised learning

Artificial Intelligence, Safety & Alignment

We’ve obtained state-of-the-art results on a suite of diverse language tasks with a scalable, task-agnostic system, which we’re also releasing. Our approach is a combination of two existing ideas: transformers and unsupervised pre-training. These results provide a convincing example that pairing supervised learning methods with unsupervised pre-training works very well; this is an idea that many have explored

Improving language understanding with unsupervised learning Read More »

AI safety via debate

Artificial Intelligence, Safety & Alignment

We’re proposing an AI safety technique which trains agents to debate topics with one another, using a human to judge who wins.

AI safety via debate Read More »

Preparing for malicious uses of AI

Artificial Intelligence, Safety & Alignment

We’ve co-authored a paper that forecasts how malicious actors could misuse AI technology, and potential ways we can prevent and mitigate these threats. This paper is the outcome of almost a year of sustained work with our colleagues at the Future of Humanity Institute, the Centre for the Study of Existential Risk, the Center for

Preparing for malicious uses of AI Read More »

OpenAI safety practices

Artificial Intelligence, Safety & Alignment

Artificial general intelligence has the potential to benefit nearly every aspect of our lives—so it must be developed and deployed responsibly.

OpenAI safety practices Read More »

OpenAI’s commitment to child safety: adopting safety by design principles

Artificial Intelligence, Safety & Alignment

OpenAI’s commitment to child safety: adopting safety by design principles Read More »

How OpenAI is approaching 2024 worldwide elections

Artificial Intelligence, Safety & Alignment

We’re working to prevent abuse, provide transparency on AI-generated content, and improve access to accurate voting information.

How OpenAI is approaching 2024 worldwide elections Read More »

Democratic inputs to AI grant program: lessons learned and implementation plans

Artificial Intelligence, Safety & Alignment

We funded 10 teams from around the world to design ideas and tools to collectively govern AI. We summarize the innovations, outline our learnings, and call for researchers and engineers to join us as we continue this work.

Democratic inputs to AI grant program: lessons learned and implementation plans Read More »