TY - JOUR
T1 - Weakly-supervised reinforcement learning for controllable behavior
AU - Lee, Lisa
AU - Eysenbach, Benjamin
AU - Salakhutdinov, Ruslan
AU - Gu, Shane
AU - Finn, Chelsea
N1 - Funding Information: We thank Yining Chen, Vitchyr Pong, Ben Poole, and Archit Sharma for helpful discussions and comments. We thank Michael Ahn for help with running the manipulation experiments, and also thank Benjamin Narin, Rafael Rafailov, and Riley DeHaan for help with initial experiments. LL is supported by the National Science Foundation (DGE-1745016). BE is supported by the Fannie and John Hertz Foundation and the National Science Foundation (DGE-1745016). RS is supported by NSF IIS1763562, ONR Grant N000141812861, and US Army. CF is a CIFAR Fellow. Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation. Publisher Copyright: © 2020 Neural information processing systems foundation. All rights reserved.
PY - 2020
Y1 - 2020
N2 - Reinforcement learning (RL) is a powerful framework for learning to take actions to solve tasks. However, in many settings, an agent must winnow down the inconceivably large space of all possible tasks to the single task that it is currently being asked to solve. Can we instead constrain the space of tasks to those that are semantically meaningful? In this work, we introduce a framework for using weak supervision to automatically disentangle this semantically meaningful subspace of tasks from the enormous space of nonsensical “chaff” tasks. We show that this learned subspace enables efficient exploration and provides a representation that captures distance between states. On a variety of challenging, vision-based continuous control problems, our approach leads to substantial performance gains, particularly as the complexity of the environment grows.
AB - Reinforcement learning (RL) is a powerful framework for learning to take actions to solve tasks. However, in many settings, an agent must winnow down the inconceivably large space of all possible tasks to the single task that it is currently being asked to solve. Can we instead constrain the space of tasks to those that are semantically meaningful? In this work, we introduce a framework for using weak supervision to automatically disentangle this semantically meaningful subspace of tasks from the enormous space of nonsensical “chaff” tasks. We show that this learned subspace enables efficient exploration and provides a representation that captures distance between states. On a variety of challenging, vision-based continuous control problems, our approach leads to substantial performance gains, particularly as the complexity of the environment grows.
UR - http://www.scopus.com/inward/record.url?scp=85104096918&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85104096918&partnerID=8YFLogxK
M3 - Conference article
SN - 1049-5258
VL - 2020-December
JO - Advances in Neural Information Processing Systems
JF - Advances in Neural Information Processing Systems
T2 - 34th Conference on Neural Information Processing Systems, NeurIPS 2020
Y2 - 6 December 2020 through 12 December 2020
ER -