Wang, Ziru and Wang, Ziyang (2026). When RNN Meets CNN and ViT: The Development of a Hybrid U-Net for Medical Image Segmentation. Fractal and Fractional, 10 (1),
Abstract
Deep learning for semantic segmentation has made significant advances in recent years, achieving state-of-the-art performance. Medical image segmentation, as a key component of healthcare systems, plays a vital role in the diagnosis and treatment planning of diseases. Due to the fractal and scale-invariant nature of biological structures, effective medical image segmentation requires models capable of capturing hierarchical and self-similar representations across multiple spatial scales. In this paper, a Recurrent Neural Network (RNN) is explored within the Convolutional Neural Network (CNN) and Vision Transformer (ViT)-based hybrid U-shape network, named RCV-UNet. First, the ViT-based layer was developed in the bottleneck to effectively capture the global context of an image and establish long-range dependencies through the self-attention mechanism. Second, recurrent residual convolutional blocks (RRCBs) were introduced in both the encoder and decoder to enhance the ability to capture local features and preserve fine details. Third, by integrating the global feature extraction capability of ViT with the local feature enhancement strength of RRCBs, RCV-UNet achieved promising global consistency and boundary refinement, addressing key challenges in medical image segmentation. From a fractal–fractional perspective, the multi-scale encoder–decoder hierarchy and attention-driven aggregation in RCV-UNet naturally accommodate fractal-like, scale-invariant regularity, while the recurrent and residual connections approximate fractional-order dynamics in feature propagation, enabling continuous and memory-aware representation learning. The proposed RCV-UNet was evaluated on four different modalities of images, including CT, MRI, Dermoscopy, and ultrasound, using the Synapse, ACDC, ISIC 2018, and BUSI datasets. Experimental results demonstrate that RCV-UNet outperforms other popular baseline methods, achieving strong performance across different segmentation tasks. The code of the proposed method will be made publicly available.
| Publication DOI: | https://doi.org/10.3390/fractalfract10010018 |
|---|---|
| Divisions: | College of Engineering & Physical Sciences > School of Computer Science and Digital Technologies Aston University (General) |
| Additional Information: | Copyright © 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license. |
| Publication ISSN: | 2504-3110 |
| Last Modified: | 16 Jan 2026 08:13 |
| Date Deposited: | 14 Jan 2026 15:08 |
| Full Text Link: | |
| Related URLs: |
https://www.mdp ... 04-3110/10/1/18
(Publisher URL) |
PURE Output Type: | Article |
| Published Date: | 2026-01-01 |
| Published Online Date: | 2025-12-27 |
| Accepted Date: | 2025-12-26 |
| Authors: |
Wang, Ziru
Wang, Ziyang (
0000-0003-1605-0873)
|
0000-0003-1605-0873