Bi-weekly updates

This is the first of our bi-weekly updates. The goal is to keep you up to date, and to help us make greater use from your visits.
Compute:
Compute is used in two ways: it is used to run a big experiment quickly, and it is used to run many experiments in parallel.
95% of progress comes from the ability to run big experiments quickly. The utility of running many experiments is much less useful.
In the old days, a large cluster could help you run more experiments, but it could not help with running a single large experiment quickly.
For this reason, an academic lab could compete with Google, because Google’s only advantage was the ability to run many experiments. This is not a great advantage.
Recently, it has become possible to combine 100s of GPUs and 100s of CPUs to run an experiment that’s 100x bigger than what is possible on a single machine while requiring comparable time. This has become possible due to the work of many different groups. As a result, the minimum necessary cluster for being competitive is now 10–100x larger than it was before.
Currently, every Dota experiment uses 1000+ cores, and it is only for the small 1v1 variant, and on extremely small neural network policies. We will need more compute to just win the 1v1 variant. To win the full 5v5 game, we will need to run fewer experiments, where each experiment is at least 1 order of magnitude larger (possibly more!).
TLDR: What matters is the size and speed of our experiments. In the old days, a big cluster could not let anyone run a larger experiment quickly. Today, a big cluster lets us run a large experiment 100x faster.
In order to be capable of accomplishing our projects even in theory, we need to increase the number of our GPUs by a factor of 10x in the next 1–2 months (we have enough CPUs). We will discuss the specifics in our in-person meeting.
Dota 2:
We will solve the 1v1 version of the game in 1 month. Fans of the game care about 1v1 a fair bit.
We are now at a point where a single experiment consumes 1000s of cores, and where adding more distributed compute increases performance.
Here is a cool video of our bot doing something rather clever: https://www.youtube.com/watch?v=Y-vxbREX5ck&feature=youtu.be&t=99.
Rapid learning of new games:
Infra work is underway
We implemented several baselines
Fundamentally, we’re not where we want to be, and are taking action to correct this.
Robotics:
Current status: The HER algorithm (https://www.youtube.com/watch?v=Dz_HuzgMzxo) can learn to solve many low-dimensional robotics tasks that were previously unsolvable very rapidly. It is non-obvious, simple, and effective.
In 6 months, we will accomplish at least one of: single-handed Rubik’s cube, pen spinning (https://www.youtube.com/watch?v=dDavyRnEPrI), Chinese balls spinning (https://www.youtube.com/watch?v=M9N1duIl4Fc) using the HER algorithm and using a sim2real method [such as https://blog.openai.com/spam-detection-in-the-physical-world/].
The above will be deployed on the robotic hand: [Link to Google Drive] [this video is human controlled, not algorithmic controlled. Need to be logged in to the OpenAI account to see the video].
Self play as a key path to AGI:
Self play in multiagent environments is magical: if you place agents into an environment, then no matter how smart (or not smart) they are, the environment will provide them with the exact level of challenge, which can be faced only by outsmarting the competition. So for example, if you have a group of children, they will find each other’s company to be challenging; likewise for a collection of super intelligences of comparable intelligence. So the “solution” to self-play is to become more and more intelligent, without bound.
Self-play lets us get “something out of nothing.” The rules of a competitive game can be simple, but the best strategy for playing this game can be immensely complex. [motivating example: https://www.youtube.com/watch?v=u2T77mQmJYI].
Training agents in simulation to develop very good dexterity via competitive fighting, such as wrestling. Here is a video of ant-shaped robots that we trained to struggle: [caviardé]
Current work on self-play: getting agents to learn to develop a language [gifs in https://blog.openai.com/learning-to-cooperate-compete-and-communicate/].Agents are doing “stuff,” but it’s still work in progress.
We have a few more cool smaller projects. Updates to be presented as they produce significant results.