Manara - Qatar Research Repository
Browse

An efficient failure-resilient mutual exclusion algorithm for distributed systems leveraging a novel zero-message overlay structure

Download (2.52 MB)
journal contribution
submitted on 2024-03-20, 07:34 and posted on 2024-03-20, 07:35 authored by Mouna Rabhi, Roberto Di Pietro

In this paper, we provide a novel failure-resilient token-based mutual exclusion (ME) algorithm for distributed systems. Like a few other solutions in the literature, the proposed solution leverages a logical tree as its underlying topology. However, unlike any other solution, the tree is built using a probabilistic, fully distributed approach, without exchanging messages. The current tree-based ME algorithms often overlook considerations for node/link failures or offer costly methods for failure recovery. The proposed algorithm overcomes these limitations by providing an effective solution to maintain a logarithmic cost in case of node failures. The overlay structure – a unique logical tree – used to control access to the critical section maintains its consistency even when nodes fail. An extensive simulation study demonstrates the viability and efficiency of the proposed algorithm under various node failure models, and relevant metrics (e.g., node queue dimension, number of exchanged messages, and number of disconnected nodes) indicate a graceful degradation in performance with decreasing number of functioning nodes. For instance, for 4,096 nodes organized in a logical tree of arity 4 and with 4 physical nodes for logical node, a negligible number of nodes is disconnected from the tree after 250 epochs when the per-node per-epoch failure probability is p f ≤ 0 . 0008 . With a p f = 0 . 0016 , less than 10% of the nodes are disconnected. The proposed algorithm avoids node disconnection while minimally impacting the load of the logical tree nodes. In addition, for the same architecture, when both tree arity and cardinality are 4, after 250 epochs, the node load has demonstrated minimal variation and remains in close proximity to the original load. Moreover, experimental results also reveal a graceful degradation of algorithm performance. The fully distributed solution, rich parametrization for different trade-offs, and viability and resilience of the proposed algorithm also pave the way for future research.

Other Information

Published in: Computer Communications
License: http://creativecommons.org/licenses/by/4.0/
See article on publisher's website: https://dx.doi.org/10.1016/j.comcom.2024.02.007

Funding

Open Access funding provided by the Qatar National Library.

History

Language

  • English

Publisher

Elsevier

Publication Year

  • 2024

License statement

This Item is licensed under the Creative Commons Attribution 4.0 International License.

Institution affiliated with

  • Hamad Bin Khalifa University
  • College of Science and Engineering - HBKU

Usage metrics

    College of Science and Engineering - HBKU

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC