An online hyper‐volume action bounding approach for accelerating the process of deep reinforcement learning from multiple controllers

Aflakian, Ali, Rastegarpanah, Alireza, Hathaway, Jamie and Stolkin, Rustam (2024). An online hyper‐volume action bounding approach for accelerating the process of deep reinforcement learning from multiple controllers. Journal of Field Robotics, 41 (6), pp. 1814-1828.

Abstract

This paper fuses ideas from reinforcement learning (RL), Learning from Demonstration (LfD), and Ensemble Learning into a single paradigm. Knowledge from a mixture of control algorithms (experts) are used to constrain the action space of the agent, enabling faster RL refining of a control policy, by avoiding unnecessary explorative actions. Domain‐specific knowledge of each expert is exploited. However, the resulting policy is robust against errors of individual experts, since it is refined by a RL reward function without copying any particular demonstration. Our method has the potential to supplement existing RLfD methods when multiple algorithmic approaches are available to function as experts, specifically in tasks involving continuous action spaces. We illustrate our method in the context of a visual servoing (VS) task, in which a 7‐DoF robot arm is controlled to maintain a desired pose relative to a target object. We explore four methods for bounding the actions of the RL agent during training. These methods include using a hypercube and convex hull with modified loss functions, ignoring actions outside the convex hull, and projecting actions onto the convex hull. We compare the training progress of each method using expert demonstrators, employing one expert demonstrator with the DAgger algorithm, and without using any demonstrators. Our experiments show that using the convex hull with a modified loss function not only accelerates learning but also provides the most optimal solution compared with other approaches. Furthermore, we demonstrate faster VS error convergence while maintaining higher manipulability of the arm, compared with classical image‐based VS, position‐based VS, and hybrid‐decoupled VS.

Publication DOI:	https://doi.org/10.1002/rob.22355
Divisions:	College of Engineering & Physical Sciences > School of Computer Science and Digital Technologies College of Engineering & Physical Sciences > School of Computer Science and Digital Technologies > Applied AI & Robotics Aston University (General)
Funding Information:	his work was supported in part by the project called “Research and Development of a Highly Automated and Safe Streamlined Process for Increase Lithium-ion Battery Repurposing and Recycling” (REBELION) under Grant 101104241 and in part by the UK Research a
Additional Information:	Copyright © 2024 The Authors. Journal of Field Robotics published by Wiley Periodicals LLC. This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited.
Uncontrolled Keywords:	online learning,multi‐expert demonstrations,imitation learning,reinforcement learning,optimization technique
Publication ISSN:	1556-4959
Last Modified:	17 Nov 2025 08:31
Date Deposited:	29 Aug 2025 15:36
Full Text Link:
Related URLs:	https://onlinel ... .1002/rob.22355 (Publisher URL)
PURE Output Type:	Article
Published Date:	2024-09
Published Online Date:	2024-04-28
Accepted Date:	2024-04-15
Authors:	Aflakian, Ali Rastegarpanah, Alireza ( 0000-0003-4264-6857) Hathaway, Jamie Stolkin, Rustam

Download

Version: Published Version

License: Creative Commons Attribution

Export / Share Citation

Explore Further

Statistics

Additional statistics for this record

Record administration