Hybrid Vision Transformer Architectures with CNN Blocks for Multi-Label Chest Disease Classification

Rajendra D. Bhosale, D. M. Yadav

doi:10.52783/pst.1729

PDF

Published: Apr 3, 2025

Keywords:

Vision Transformer, Hybrid CNN-Transformer, Chest X-ray Classification, Multi-label Disease Detection, CXR-14 Dataset

Rajendra D. Bhosale, D. M. Yadav

Abstract

This study presents a novel investigation into Vision Transformer (ViT)-based hybrid architectures for multi-label chest disease classification using the CXR-14 dataset. Traditional Convolutional Neural Networks (CNNs), though effective in local feature extraction, often struggle to capture global contextual dependencies. To address this limitation, three ViT-integrated models are proposed by embedding ViT blocks within standard CNN structures: Residual ViT, Bottleneck ViT, and MBConv-SE ViT. Each model replaces conventional 3×3 convolution units within respective blocks to leverage the self-attention mechanism for enhanced feature representation. These hybrid architectures combine the inductive bias of CNNs with the global reasoning capabilities of Transformers, improving classification accuracy and interpretability. The models are evaluated against a comprehensive set of baseline methods, including attention-guided, region-guided, and semantic-guided models. Experimental results demonstrate that the proposed MBConv-SE ViT model outperforms existing approaches across multiple disease categories, highlighting the advantages of combining efficient convolutions, attention recalibration, and global context modeling. This work establishes a robust framework for designing transformer-augmented CNNs and shows their effectiveness in high-resolution, multi-label medical image analysis tasks such as automated chest X-ray diagnosis.

DOI : https://doi.org/10.52783/pst.1729

Issue

Vol. 49 No. 1 (2025)

Section

Articles

Acceptance Rate:	24%
Review Speed:	29 days
Issue Per Year:	4
Number of Articles:	1
Number of Reviewers:	489
Number of Contributors:	8296
Contributing Countries:	42
No. of Scopus Citations:	64269
No. of WoS Citations:	3269
Abstract Views:	82,897
PDF Download:	94,708

Article Sidebar

Main Article Content

Abstract

Article Details