submitted on 2025-06-18, 05:56 and posted on 2025-06-18, 05:57authored byShehzad Hassan Khan
<p dir="ltr">This study presents a comprehensive analysis framework for understanding public sentiment and topics surrounding the Turkey-Syria earthquake of 2023, utilizing advanced Natural Language Processing (NLP) techniques. The research incorporates RoBERTa sentiment analysis model alongside state-of-the-art tools such as the Auto Tokenizer from Transformer, BertTopic Model, and Text Vectorization using TF-IDF. The inclusion of pre-trained embeddings enhances the depth of semantic analysis.</p><p dir="ltr">By applying AutoTokenizer from Transformer, the study efficiently preprocesses Twitter data, preparing it for subsequent analysis. The BertTopic Model, grounded in BERT architecture, facilitates nuanced topic modeling, capturing context-aware word representations to extract detailed thematic insights from the tweets.</p><p dir="ltr">TF-IDF enables the quantification of term importance in the context of the earthquake discourse during text vectorization steps. Integrating pre-trained embeddings ensures a richer understanding of semantic relationships within the textual data, contributing to more accurate sentiment analysis and topic extraction.</p><p dir="ltr">The research adopts a topic-wise analysis approach, allowing for exploring and identifying specific themes related to the Turkey-Syria earthquake. The methodology is applied to Twitter data, providing real-time insights into public sentiment and concerns and a detailed breakdown of the topics discussed during and after the seismic event.</p><p dir="ltr">Results from diverse datasets illustrate the success of the proposed strategy while capturing sentiment nuances and revealing nuanced topics within the Twitter discourse. This study contributes to an emerging and evolving disaster-related social media analysis field, offering a robust framework for extracting meaningful insights from real-time, user-generated content during seismic events.</p>