Reinforcement Learning with Prediction-Based Rewards
We’ve developed Random Network Distillation (RND), a prediction-based method for encouraging reinforcement learning agents to explore their environments through curiosity, which for the first time1 exceeds average human performance on Montezuma’s Revenge. RND achieves state-of-the-art performance, periodically finds all 24 rooms and solves the first level without using demonstrations or having access to the underlying state of the game.
Reinforcement Learning with Prediction-Based Rewards
Tags: Machine Learning, RND