Manara - Qatar Research Repository
Browse

The effects of data balancing approaches: A case study

Download (6.45 MB)
journal contribution
submitted on 2023-11-06, 09:37 and posted on 2023-11-06, 10:06 authored by Paul Mooijman, Cagatay Catal, Bedir Tekinerdogan, Arjen Lommen, Marco Blokland

Imbalanced datasets affect the performance of machine learning algorithms adversely. To cope with this problem, several resampling methods have been developed recently. In this article, we present a case study approach for investigating the effects of data balancing approaches. The case study concerns the discrimination between growth hormone treated and non-treated animals using Liquid Chromatography-High Resolution Mass Spectrometry (LC-HRMS) data. Our LC-HRMS dataset contains 1241 bovine urine samples, of which only 65 specimens were from animal studies and guaranteed to contain growth-stimulating hormones while the rest has been reported to be untreated, making it a ∼5% imbalanced dataset. In this research, classification algorithms, combined with resampling strategies and dimensionality reduction methods, were investigated to find a prediction model to correctly identify the samples of treated animals. Furthermore, to cope with a large number of missing data points in the given dataset, a replacement with random low values strategy was applied. Our results showed that the replacement method was effective, and LogisticRegression combined with the oversampling algorithms SMOTE or ADASYN, GaussianProcessClassifier with the oversampling algorithm SMOTE, and LinearDiscriminantAnalysis were the best performing models after log transformation of the dataset was followed by Recursive Feature Elimination.

Other Information

Published in: Applied Soft Computing
License: http://creativecommons.org/licenses/by/4.0/
See article on publisher's website: https://dx.doi.org/10.1016/j.asoc.2022.109853

Funding

Open Access funding provided by the Qatar National Library

History

Language

  • English

Publisher

Elsevier

Publication Year

  • 2023

License statement

This Item is licensed under the Creative Commons Attribution 4.0 International License

Institution affiliated with

  • Qatar University
  • College of Engineering - QU