Curriculum goal masking for continuous deep reinforcement learning

2019 Joint IEEE International Conference on Development and Learning and Epigenetic Robotics (ICDL-EpiRob), pages 183--188 - Aug 2019.
Associated documents : Eppe2019_CGM.pdf [1Mo]  
Deep reinforcement learning has recently gained a focus on problems where policy or value functions are based on universal value function approximators (UVFAs) which renders them independent of goals. Evidence exists that the sampling of goals has a strong effect on the learning performance, and the problem of optimizing the goal sampling is frequently tackled with intrinsic motivation methods. However, there is a lack of general mechanisms that focus on goal sampling in the context of deep reinforcement learning based on UVFAs. In this work, we introduce goal masking as a method to estimate a goal’s difficulty level and to exploit this estimation to realize curriculum learning. Our results indicate that focusing on goals with a medium difficulty level is appropriate for deep deterministic policy gradient (DDPG) methods, while an “aim for the stars and reach the moon-strategy”, where difficult goals are sampled much more often than simple goals, leads to the best learning performance in cases where DDPG is combined with hindsight experience replay (HER).

 

@InProceedings\{EMW19,
  author       = "Eppe, Manfred and Magg, Sven and Wermter, Stefan",
  title        = "Curriculum goal masking for continuous deep reinforcement learning",
  booktitle    = "2019 Joint IEEE International Conference on Development and Learning and Epigenetic Robotics (ICDL-EpiRob)",
  pages        = "183--188",
  month        = "Aug",
  year         = "2019",
  publisher    = "IEEE",
  url          = "https://www2.informatik.uni-hamburg.de/wtm/publications/2019/EMW19/Eppe2019_CGM.pdf"
}

» Manfred Eppe
» Sven Magg
» Stefan Wermter