submitted on 2025-06-18, 11:12 and posted on 2025-06-18, 11:14authored byAhmed Elmasry
Obtaining high-quality datasets is crucial for the effective training of machine learning (ML) algorithms and ensuring precise classifications. This is particularly challenging when obtaining datasets relating to IEC61850 Generic Object Oriented Substation Event (GOOSE) communications within smart grids, where researchers encounter significant obstacles due to strict privacy regulations and the risk of industrial data misuse. These challenges necessitate alternative methods for datasets generation. This thesis describes the development and implementation of a versatile simulation testbed, utilizing MATLAB/Simulink, OpenPLC, and the lib61850 library, engineered to emulate smart grid systems and generate authentic IEC61850 GOOSE message datasets. The testbed’s efficacy is evaluated using a simplified smart grid model that incorporates multiple protection relays, also known as Intelligent Electronic Devices (IEDs), facilitating an analysis of the protection system’s response times across a spectrum of fault conditions. The inherent flexibility of the testbed supports customization and scalability, empowering researchers to craft a variety of datasets that reflect the diverse architectures and operational characteristics of smart grid networks, thereby broadening the applicability of ML in this field. Further, this study illustrates how the simulation testbed can be adapted to accommodate different protocols and designs, enabling the generation of a wide range of datasets. These datasets are then employed to enhance our GOOSE Intrusion Detection System (GIDS) through the application of ML models demonstrating the potential of our approach in advancing cybersecurity measures for smart grid systems.