Synthetic Biological Signals Machine-generated by GPT-2 improve the Classification of EEG and EMG through Data Augmentation

Abstract

Synthetic data augmentation is of paramount importance for machine learning classification, particularly for biological data, which tend to be high dimensional and with a scarcity of training samples. The applications of robotic control and augmentation in disabled and able-bodied subjects still rely mainly on subject-specific analyses. Those can rarely be generalised to the whole population and appear to over complicate simple action recognition such as grasp and release (standard actions in robotic prosthetics and manipulators). We show for the first time that multiple GPT-2 models can machine-generate synthetic biological signals (EMG and EEG) and improve real data classification. Models trained solely on GPT-2 generated EEG data can classify a real EEG dataset at 74.71% accuracy and models trained on GPT-2 EMG data can classify real EMG data at 78.24% accuracy. Synthetic and calibration data are then introduced within each cross validation fold when benchmarking EEG and EMG models. Results show algorithms are improved when either or both additional data are used. A Random Forest achieves a mean 95.81% (1.46) classification accuracy of EEG data, which increases to 96.69% (1.12) when synthetic GPT-2 EEG signals are introduced during training. Similarly, the Random Forest classifying EMG data increases from 93.62% (0.8) to 93.9% (0.59) when training data is augmented by synthetic EMG signals. Additionally, as predicted, augmentation with synthetic biological signals also increases the classification accuracy of data from new subjects that were not observed during training. A Robotiq 2F-85 Gripper was finally used for real-time gesture-based control, with synthetic EMG data augmentation remarkably improving gesture recognition accuracy, from 68.29% to 89.5%.

Publication DOI: https://doi.org/10.1109/LRA.2021.3056355
Divisions: College of Engineering & Physical Sciences > School of Informatics and Digital Engineering > Computer Science
College of Engineering & Physical Sciences
College of Engineering & Physical Sciences > School of Engineering and Technology > Mechanical, Biomedical & Design
College of Engineering & Physical Sciences > Aston Institute of Urban Technology and the Environment (ASTUTE)
College of Engineering & Physical Sciences > Systems analytics research institute (SARI)
Additional Information: © 2021 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.
Uncontrolled Keywords: Biological signal processing,data augmentation,electroencephalography,electromyography,synthetic data,Control and Systems Engineering,Biomedical Engineering,Human-Computer Interaction,Mechanical Engineering,Computer Vision and Pattern Recognition,Computer Science Applications,Control and Optimization,Artificial Intelligence
Full Text Link:
Related URLs: https://ieeexpl ... rce=SEARCHALERT (Publisher URL)
http://www.scop ... tnerID=8YFLogxK (Scopus URL)
PURE Output Type: Article
Published Date: 2021-04-01
Published Online Date: 2021-02-02
Accepted Date: 2021-01-21
Authors: Bird, Jordan J.
Pritchard, Michael (ORCID Profile 0000-0002-3783-0230)
Fratini, Antonio (ORCID Profile 0000-0001-8894-461X)
Ekart, Aniko (ORCID Profile 0000-0001-6967-5397)
Faria, Diego (ORCID Profile 0000-0002-2771-1713)

Download

[img]

Version: Accepted Version

| Preview

Export / Share Citation


Statistics

Additional statistics for this record