submitted on 2025-06-18, 11:43 and posted on 2025-06-18, 11:45authored byAbdulrahman Gamal Farhan Al-Raimi
Semantic image segmentation is a computer vision task that revolves around labelling each pixel in an image into a class, such as road, sky, or car. This task is crucial for many real-life applications, such as medical imaging and self-driving cars. Hence, many deep neural network architectures, such as the High-Resolution Network (HRNet), were invented for this specific task. However, such networks are enormous, making it very challenging for commercial hardware to train from scratch. Besides, such networks have many trainable parameters and convolution operations, making them inefficient and more prone to overfitting and gradient vanishing. Hence, this thesis scales down the HRNet architecture to make it lightweight while preserving acceptable performance. Three different approaches were investigated in this research. The third approach model showed the best performance on all metrics on the Cityscape dataset by scoring 66% mean intersection over union (mIoU) on the validation set and 64% on the test set while having only 13% of the trainable parameters of the original architecture.