Rethinking Hybrid U-Shape Network with Pixel-Level Feature Learning for Retinal Vessel Segmentation

Abstract

Retinal vessel segmentation is a critical, non-destructive medical imaging task in computer vision, essential for diagnosing fundus diseases. Although deep learning methods dominate this field, existing U-shaped encoder-decoder networks with skip connections face limitations when handling discrepancies in multi-scale features. Shallow encoder and decoder stages produce high-resolution but low-dimensional feature maps, effectively capturing fine vessel details, whereas deeper stages (such as the bottleneck) generate lower-resolution, high-dimensional feature maps rich in semantic information. Traditional U-shaped architectures often struggle to effectively integrate these distinct types of features. To address these challenges, this paper introduces a redesigned U-shaped network that incorporates modified convolution and transformer layers tailored specifically for segmenting slender and tortuous retinal vessel structures. A Multi-Core Channel-Spatial Attention (MCCSA) block replaces conventional skip connections, enhancing the extraction of high-frequency texture features in shallow stages. For deeper stages, a Pixel-level Vision Transformer (P-ViT) is introduced to model semantic interconnections among pixels, thereby improving semantic feature recognition. Furthermore, a Pixel-level residual dynamic adaptive Convolutional Neural Network (P-CNN) is proposed to better capture the intricate curved topology of blood vessels. The proposed method is evaluated on two publicly available benchmark datasets, demonstrating significant segmentation performance improvements compared to existing U-shaped methods. Our contributions include enhanced multi-scale feature integration, improved semantic feature learning, and refined extraction of vessel topology.

Publication DOI: https://doi.org/10.1109/ACCESS.2026.3663080
Divisions: College of Engineering & Physical Sciences
College of Engineering & Physical Sciences > School of Computer Science and Digital Technologies
Aston University (General)
Additional Information: This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
Uncontrolled Keywords: Medical image segmentation, vision transformer, convolution, retinal vessels,Convolution, Image segmentation, Feature extraction, Retinal vessels, Convolutional neural networks, Transformers, Semantics, Decoding, Attention mechanisms, Representation learning,General Computer Science,General Materials Science,General Engineering
Publication ISSN: 2169-3536
Data Access Statement: The code, evaluation metrics, trained network weights, and datasets will be made publicly available at https://github.com/ziyangwang007/CVPixUNet.
Last Modified: 09 Mar 2026 18:16
Date Deposited: 11 Feb 2026 10:42
Full Text Link:
Related URLs: https://ieeexpl ... cument/11386860 (Publisher URL)
https://www.sco ... ns/105030053760 (Scopus URL)
PURE Output Type: Article
Published Date: 2026-02-13
Published Online Date: 2026-02-09
Accepted Date: 2026-02-06
Authors: Wang, Ziyang (ORCID Profile 0000-0003-1605-0873)
Wu, Mian

Download

[img]

Version: Accepted Version

License: Creative Commons Attribution


[img]

Version: Published Version

License: Creative Commons Attribution


Export / Share Citation


Statistics

Additional statistics for this record