Equivalence between policy gradients and soft Q-learning
Equivalence between policy gradients and soft Q-learning Read More »
Auto Added by WPeMatico
We’ve discovered that evolution strategies (ES), an optimization technique that’s been known for decades, rivals the performance of standard reinforcement learning (RL) techniques on modern RL benchmarks (e.g. Atari/MuJoCo), while overcoming many of RL’s inconveniences.
Evolution strategies as a scalable alternative to reinforcement learning Read More »
We’ve created the world’s first Spam-detecting AI trained entirely in simulation and deployed on a physical robot.
Spam detection in the physical world Read More »
We’ve developed an unsupervised system which learns an excellent representation of sentiment, despite being trained only to predict the next character in the text of Amazon reviews.
Unsupervised sentiment neuron Read More »
In this post we’ll outline new OpenAI research in which agents develop their own language.
Learning to communicate Read More »