MCV-UNet: a modified convolution & transformer hybrid encoder-decoder network with multi-scale information fusion for ultrasound image semantic segmentation

Abstract

In recent years, the growing importance of accurate semantic segmentation in ultrasound images has led to numerous advances in deep learning-based techniques. In this article, we introduce a novel hybrid network that synergistically combines convolutional neural networks (CNN) and Vision Transformers (ViT) for ultrasound image semantic segmentation. Our primary contribution is the incorporation of multi-scale CNN in both the encoder and decoder stages, enhancing feature learning capabilities across multiple scales. Further, the bottleneck of the network leverages the ViT to capture long-range high-dimension spatial dependencies, a critical factor often overlooked in conventional CNN-based approaches. We conducted extensive experiments using a public benchmark ultrasound nerve segmentation dataset. Our proposed method was benchmarked against 17 existing baseline methods, and the results underscored its superiority, as it outperformed all competing methods including a 4.6% improvement of Dice compared against TransUNet, 13.0% improvement of Dice against Attention UNet, 10.5% improvement of precision compared against UNet. This research offers significant potential for real-world applications in medical imaging, demonstrating the power of blending CNN and ViT in a unified framework.

Publication DOI: https://doi.org/10.7717/peerj-cs.2146
Divisions: College of Engineering & Physical Sciences > School of Computer Science and Digital Technologies
Aston University (General)
Additional Information: Copyright © 2024 Xu and Wang. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Computer Science) and either DOI or URL of the article must be cited.
Uncontrolled Keywords: Convolutional neural network,Image semantic segmentation,Ultrasound imaging,Vision transformer,General Computer Science
Publication ISSN: 2376-5992
Last Modified: 09 Oct 2025 07:14
Date Deposited: 08 Oct 2025 16:04
Full Text Link:
Related URLs: https://peerj.c ... ticles/cs-2146/ (Publisher URL)
http://www.scop ... tnerID=8YFLogxK (Scopus URL)
PURE Output Type: Article
Published Date: 2024-06-24
Accepted Date: 2024-05-30
Authors: Xu, Zihong
Wang, Ziyang (ORCID Profile 0000-0003-1605-0873)

Download

[img]

Version: Published Version

License: Creative Commons Attribution


Export / Share Citation


Statistics

Additional statistics for this record