End-to-end example-based sim-to-real RL policy transfer based on neural stylisation with application to robotic cutting

Abstract

Whereas reinforcement learning has been applied with success to a range of robotic control problems in complex, uncertain environments, reliance on extensive data - typically sourced from simulation environments - limits real-world deployment due to the domain gap between simulated and physical systems, coupled with limited real-world sample availability. We propose a novel method for sim-to-real transfer of reinforcement learning policies, based on a reinterpretation of neural style transfer from image processing to synthesise novel training data from unpaired unlabelled real world datasets. We employ a variational autoencoder to jointly learn self-supervised feature representations for style transfer and generate weakly paired source-target trajectories to improve physical realism of synthesised trajectories. We demonstrate the application of our approach based on the case study of robot cutting of unknown materials. Compared to baseline methods, including our previous work, CycleGAN, and conditional variational autoencoder-based time series translation, our approach achieves improved task completion time and behavioural stability with minimal real-world data. Our framework demonstrates robustness to geometric and material variation, and highlights the feasibility of policy adaptation in challenging contact-rich tasks where real-world reward information is unavailable.

Publication DOI: https://doi.org/10.1038/s41598-026-41735-5
Divisions: College of Engineering & Physical Sciences > School of Computer Science and Digital Technologies > Applied AI & Robotics
College of Engineering & Physical Sciences
College of Engineering & Physical Sciences > Aston Centre for Artifical Intelligence Research and Application
College of Engineering & Physical Sciences > School of Computer Science and Digital Technologies
Aston University (General)
Funding Information: This work was supported by the UK Research and Innovation (UKRI) project “Research and Development of a Highly Automated and Safe Streamlined Process for Increase Lithium-ion Battery Repurposing and Recycling” (REBELION) under Grant 10079049.
Publication ISSN: 2045-2322
Data Access Statement: The datasets generated during and/or analysed during the current study are available in the Figshare repository, DOI 10.6084/m9.figshare.28983659.
Last Modified: 01 Apr 2026 07:17
Date Deposited: 31 Mar 2026 14:10
Full Text Link:
Related URLs: https://www.nat ... 598-026-41735-5 (Publisher URL)
PURE Output Type: Article
Published Date: 2026-03-12
Published Online Date: 2026-03-12
Accepted Date: 2025-02-23
Authors: Hathaway, Jamie
Rastegarpanah, Alireza (ORCID Profile 0000-0003-4264-6857)
Stolkin, Rustam

Export / Share Citation


Statistics

Additional statistics for this record