Research

Auto Added by WPeMatico

Weight normalization: A simple reparameterization to accelerate training of deep neural networks

Weight normalization: A simple reparameterization to accelerate training of deep neural networks Read More »

OpenAI Gym Beta

We’re releasing the public beta of OpenAI Gym, a toolkit for developing and comparing reinforcement learning (RL) algorithms. It consists of a growing suite of environments (from simulated robots to Atari games), and a site for comparing and reproducing results.

OpenAI Gym Beta Read More »

Nonlinear computation in deep linear networks

Artificial Intelligence, Research

Nonlinear computation in deep linear networks Read More »

Learning to model other minds

Artificial Intelligence, Research

We’re releasing an algorithm which accounts for the fact that other agents are learning too, and discovers self-interested yet collaborative strategies like tit-for-tat in the iterated prisoner’s dilemma. This algorithm, Learning with Opponent-Learning Awareness (LOLA), is a small step towards agents that model other minds.

Learning to model other minds Read More »

Dota 2

Artificial Intelligence, Research

We’ve created a bot which beats the world’s top professionals at 1v1 matches of Dota 2 under standard tournament rules. The bot learned the game from scratch by self-play, and does not use imitation learning or tree search. This is a step towards building AI systems which accomplish well-defined goals in messy, complicated situations involving

Dota 2 Read More »

More on Dota 2

Artificial Intelligence, Research

Our Dota 2 result shows that self-play can catapult the performance of machine learning systems from far below human level to superhuman, given sufficient compute. In the span of a month, our system went from barely matching a high-ranked player to beating the top pros and has continued to improve since then. Supervised deep learning

OpenAI Baselines: ACKTR & A2C

Artificial Intelligence, Research

We’re releasing two new OpenAI Baselines implementations: ACKTR and A2C. A2C is a synchronous, deterministic variant of Asynchronous Advantage Actor Critic (A3C) which we’ve found gives equal performance. ACKTR is a more sample-efficient reinforcement learning algorithm than TRPO and A2C, and requires only slightly more computation than A2C per update.

OpenAI Baselines: ACKTR & A2C Read More »

Learning with opponent-learning awareness

Artificial Intelligence, Research

Learning with opponent-learning awareness Read More »

Hindsight Experience Replay

Artificial Intelligence, Research

Hindsight Experience Replay Read More »

Robust adversarial inputs

Artificial Intelligence, Research

We’ve created images that reliably fool neural network classifiers when viewed from varied scales and perspectives. This challenges a claim from last week that self-driving cars would be hard to trick maliciously since they capture images from multiple scales, angles, perspectives, and the like.

Robust adversarial inputs Read More »