Exploiting Unlabeled Data for Feedback Efficient Human Preference based Reinforcement Learning
Published in The AAAI Workshop on Representation Learning for Responsible Human-Centric AI (R2HCAI), and ICML - Many Facets of Preference Learning Workshop, 2023
Recommended citation: Verma, Mudit, Siddhant Bhambri, and Subbarao Kambhampati. "Exploiting Unlabeled Data for Feedback Efficient Human Preference based Reinforcement Learning." arXiv preprint arXiv:2302.08738 (2023). https://arxiv.org/abs/2302.08738