Enhancing the Machine Learning Pipeline for a Sustainable Future

Islam, Ashhadul

doi:10.57945/manara.hbku.27290019.v1

Enhancing the Machine Learning Pipeline for a Sustainable Future

thesis

submitted on 2024-10-29, 10:31 and posted on 2024-10-30, 04:43 authored by Ashhadul Islam

The energy appetite of AI is a growing environmental concern. Training large machine learning models requires a lot of computational resources, which in turn needs significant energy. This leads to the emission of greenhouse gases that contribute to climate change. Furthermore, the amount of water used to cool GPU clusters is also under scrutiny in addition to carbon emissions. Although the amount of data in the world is finite, AI models can be as large as the memory capacity and computational power used to train them. As these models become larger, there is an associated increase in the costs of computation, carbon emissions, and water consumption. Consequently, there is a need to develop more energy-efficient machine learning algorithms and computational infrastructure in order to minimize the impact on environment. In this thesis, the focus is on pruning neural networks to produce faster and more compact models that are many times smaller in size but still retain the same level of discrimination capability. This approach also improves the model’s resistance to adversarial attacks and reduces data requirements during transfer learning. The methods presented were tested against existing compression techniques using standard models and datasets, which resulted in impressive compression rates and accuracy performance. The thesis also addresses the problem of imbalance in datasets by broaching a novel data oversampling algorithm that is capable of creating new data points to enhance the decision capability of the discriminators. The approach is extended to regression data as well as image data. We compare the efficacy of our algorithm with the state-of-art data oversamplers on benchmark datasets to show the superiority of our algorithm. This method also gives better results in comparison to Generative Adversarial Networks in some cases. The pruning algorithm is made open source at https://github.com/ashhadulislam/SmartPruningDeepNN while the oversampling technique is published as a python library at https://pypi.org/project/knnor/.

History

Language

English

Publication Year

2023

License statement

© The author. The author has granted HBKU and Qatar Foundation a non-exclusive, worldwide, perpetual, irrevocable, royalty-free license to reproduce, display and distribute the manuscript in whole or in part in any form to be posted in digital or print format and made available to the public at no charge. Unless otherwise specified in the copyright statement or the metadata, all rights are reserved by the copyright holder. For permission to reuse content, please contact the author.

Institution affiliated with

Hamad Bin Khalifa University
College of Science and Engineering - HBKU

Degree Date

2023

Degree Type

Doctorate

Advisors

Brahim Belhaouari Samir

Committee Members

Zain Abdelwahid Ibrahim Mohamed ; Abdulazeem Abozaid ; Magdy Abita

Department/Program

College of Science & Engineering

Usage metrics

Keywords

Carbon emission Data imbalance Neural networks Environmental concern Machine learning

Licence

In Copyright

Enhancing the Machine Learning Pipeline for a Sustainable Future

History

Language

Publication Year

License statement

Institution affiliated with

Degree Date

Degree Type

Advisors

Committee Members

Department/Program

Usage metrics

Categories

Keywords

Licence

Exports