Learning Free Terminal Time Optimal Closed-loop Control of Manipulators

Wei Hu*, Yue Zhao*, Weinan E, Jiequn Han, Jihao Long

*Indicates Equal Contribution

longjh1998@gmail.com
arXiv (Full Version with Appendix)
IVP-ART with MLP
IVP-ART with MLP
DAgger with QRnet
IVP-ART with QRnet
IVP-ART with the modified QRnet
IVP-ART with MLP
IVP-ART with MLP
DAgger with QRnet
DAgger with the modified QRnet

Simulation results of controllers with different enhanced sampling strategies and architectures. When the left controller (IVP-ART with the modified QRnet) reaches the terminal state, the timestamp colors in the row are changed to red.
Left: IVP-ART with the modified QRnet reaches the terminal state and maintains it;
Middle: IVP-ART with MLP fails to reach the terminal state within an acceptable margin and exhibits instability around this state;
Right: DAgger with the modified QRnet succeeds or fails to approach the terminal state.

Abstract

This paper presents a novel approach to learning free terminal time closed-loop control for robotic manipulation tasks, enabling dynamic adjustment of task duration and control inputs to enhance performance. We extend the supervised learning approach, namely solving selected optimal open-loop problems and utilizing them as training data for a policy network, to the free terminal time scenario. Three main challenges are addressed in this extension. First, we introduce a marching scheme that enhances the solution quality and increases the success rate of the open-loop solver by gradually refining time discretization. Second, we extend the QRnet in Nakamura-Zimmerer et al. (2021b) to the free terminal time setting to address discontinuity and improve stability at the terminal state. Third, we present a more automated version of the initial value problem (IVP) enhanced sampling method from previous work (Zhang et al., 2022) to adaptively update the training dataset, significantly improving its quality. By integrating these techniques, we develop a closed-loop policy that operates effectively over a broad domain with varying optimal time durations, achieving near globally optimal total costs.

BibTeX

@article{hu2023learning,
  title={Learning Free Terminal Time Optimal Closed-loop Control of Manipulators},
  author={Hu, Wei and Zhao, Yue and E, Weinan and Han, Jiequn and Long, Jihao},
  journal={arXiv preprint arXiv:2311.17749},
  year={2023}
}