1. Introduction

JCC

Journal of Computer and Communications

2327-5219

Scientific Research Publishing

10.4236/jcc.2023.1111010

JCC-129640

Articles

Computer Science&Communications

Emotion Deduction from Social Media Text Data Using Machine Learning Algorithm

Thambusamy

Velmurugan

¹ Baskaran

Jayapradha

PG and Research Department of Computer Science, Dwaraka Doss Govardhan Doss Vaishnav College, Chennai, India

PG and Research Department of Computer Science, Dr. Ambedkar Government Arts College, Chennai, India

07 11 2023

11 11 183 196 8, October 2023 27, November 2023 30, November 2023

2014

This work is licensed under the Creative Commons Attribution International License (CC BY). http://creativecommons.org/licenses/by/4.0/

Emotion represents the feeling of an individual in a given situation. There are various ways to express the emotions of an individual. It can be categorized into verbal expressions, written expressions, facial expressions and gestures. Among these various ways of expressing the emotion, the written method is a challenging task to extract the emotions, as the data is in the form of textual dat. Finding the different kinds of emotions is also a tedious task as it requires a lot of pre preparations of the textual data taken for the research. This research work is carried out to analyse and extract the emotions hidden in text data. The text data taken for the analysis is from the social media dataset. Using the raw text data directly from the social media will not serve the purpose. Therefore, the text data has to be pre-processed and then utilised for further processing. Pre-processing makes the text data more efficient and would infer valuable insights of the emotions hidden in it. The preprocessing steps also help to manage the text data for identifying the emotions conveyed in the text. This work proposes to deduct the emotions taken from the social media text data by applying the machine learning algorithm. Finally, the usefulness of the emotions is suggested for various stake holders, to find the attitude of individuals at that moment, the data is produced.

Data Pre-Processing Machine Learning Algorithms Emotion Deduction Sentiment Analysis

1. Introduction

A significant amount of text data has accumulated as a result of the post-COVID spike in social media. This textual information reservoir has the capacity to forecast a number of significant variables and provide insightful information. In order to gain important insights into people’s attitudes and sentiments, researchers are actively involved in the analysis of social media text data to extract emotions. Extracting the emotion involved in text data is a challenging task as it could bring out different understandings for the same text by different people and also depends on how the text data is read. The dataset taken for this research work consists of social media text data. It consists of comments of people in text format. The primary aim of this research would be to extract the emotion hidden in the text data.

Social media has taken a vast shape after the COVID-19 pandemic and has reshaped the way we interact, work, and communicate [1]. With social distancing measures in place, people turned to digital platforms more than ever to stay connected, informed, and entertained. Social media, in particular, witnessed unprecedented growth during this period. With the growth of text-based date in this period [2], the analysis of text-based data became an essential tool for understanding human emotions, sentiments, and behaviours. This article explores the surge in social media usage and the pivotal role that text-based data analysis plays an essential role in estimating human emotions in today’s digital age.

Emotion is a mental state that is in line with feelings and thoughts, usually with regard to a particular thing. Emotion is a behaviour that expresses personal significance or opinion about how we connect with other people or about a particular occurrence. Due to different misconceptions about how they see the text data, humans are unable to comprehend its essence. This research uses machine learning technique to extract the information contained in the text data [3]. Emotion extraction from text is a natural language processing (NLP) task that involves identifying and categorizing the emotional content expressed in written or textual data. The goal is to determine the emotions, sentiments, or affective states conveyed by the author of the text. This technology has gained significant importance in various fields, including marketing, customer service, mental health, and social media analysis, as it provides valuable insights into how people feel and react in different contexts.

This paper is organized in the following way. Section 2 gives the literature survey of some of the papers related to this research. Section 3 describes the material and the methods used for the research. It has the dataset description taken for the research and the techniques used for pre-processing and also the result of applying the pre-processing techniques to the text data. In addition, it elaborates on the RoBERTa method for extracting the emotions from text data. Section 4 consists of the Experimental results and their discussions and the last section, Section 5 concludes the research with some of its findings in this research work.

2. Literature Survey

The significance of extracting emotions from text by using Natural Language Processing (NLP) has kindled the research interests of many researchers in this domain. While it’s impractical to analyse deep into every research study comprehensively in this section, some of the related works on the related area has been discussed in this section. The main innovation in a study by Kantrowitz [4] is the recommendation to use a dictionary-based stemmer, which is effectively a perfect stemmer to analyse its impact on data retrieval. Its performance can be selectively changed in terms of coverage and accuracy. The system designers can more accurately evaluate the relative trade-offs between desired levels and increase stemming accuracy by using this stemmer.

Another research work by Sridevi et al., titled “Impact of Preprocessing on Twitter Based Covid-19 Vaccination Text Data by Classification Techniques “in [5], takes up Twitter dataset and performs pre-processing on the data. It uses the classification algorithms LIBLINEAR and Bayes Net to determine the most effective techniques for data for preprocessing purposes. It is determined that pre-processed data results in greater performance and precision for the data analysis in contrast to the raw data.

The Sentiment Analysis and Emotion Detection subfield, with a focus on text-based emotion detection, is covered in detail in the article [6] by Acheampong. It begins by outlining the fundamental ideas of text-based emotion detection, emotion models, and emphasising the accessibility of big datasets necessary for the field of study. The article then describes the three main strategies frequently used in the creation of text-based emotion detection systems, outlining their advantages and disadvantages. The paper concludes by outlining current difficulties and prospective future study avenues for academics and researchers in the field of text-based data.

A research work titled, “Hierarchical Bi-LSTM based emotion analysis of textual data “by Mahto et al., [7] suggests an improved deep neural network (EDNN) based on a hierarchical Bidirectional Long Short-Term Memory (Bi-LSTM) model for emotion analysis. The findings show that, in comparison to the current CNN-LSTM model, the suggested hierarchical Bi-LSTM technique achieves an average accuracy of 89% for emotion analysis. Another work carried out by Kumar et al. [8], has put forward the Emotion-Cause Pair Extraction (ECPE) technique to preprocess the text data at the clause level. To create sets of emotion and cause pairs for a document, it isolates cause clauses from emotion clauses, pairs them, and filters them. The BERT model receives its input from these pre-processed data. The classifier model performs at the cutting edge on a benchmark corpus for emotion analysis. The ECPE-BERT emotion classifier beats previous models on English sentences, obtaining a remarkable accuracy of 98%.

An article by Rashid et al. in [9], the researchers describe the Aimens system, which analyses textual dialogue to identify emotions. The Long Short-Term Memory (LSTM) model, which is based on deep learning, is employed by the system to identify emotions like happiness, sadness, and anger in context-sensitive speech. The system’s primary input is a mixture of word2vec and doc2vec embeddings. The output findings exhibit significant f-score changes from the baseline model, where the Aimens system score is 0.7185. In the research article titled “An effective approach for emotion detection in multimedia text data using sequence based convolutional neural network” by Shrivastava et al., in [10], the authors offer a framework built upon Deep Neural Networks (DNN) for handling the problem of emotion identification inside multimodal text data. A TV show’s transcript was used to create a brand-new dataset that was carefully curated for the emotion recognition test. In order to extract pertinent characteristics from the text dataset, a CNN model with an attention mechanism was trained using the obtained information. The effectiveness of the suggested model was assessed and contrasted with benchmark models like LSTM and Random Forest classifiers.

3. Methods and Materials

There are various methods that are used for text pre-processing and emotion prediction from text data. This article is categorized into two stages as Data preprocessing and emotion extraction. The various methods that are used for preprocessing is detailed along with the dataset that is taken for this research a then the emotion extraction is applied to the pre-processed text.

3.1. Description of the Dataset

The data set taken for this research work is from a text-based, social media dataset consisting of text in the form of a sentence. This sentence expresses the current emotion of an individual such as joy, sad, fear, anger, surprise and so on. The emotion of the induvial can never be predicted from the text easily. The purpose of this research work is to predict the emotion of an individual from the text data that is taken for the study after pre-processing by applying the machine learning model.

3.2. Dataset before Pre-Processing

The chosen dataset for this work simply consists of induvial expressions in the form of sentences. It consists of only one attribute. The attribute is in the form of a sentence by a person expressed in direct speech. The text data which is taken for this research work is an uncleaned data and need to be pre-processed for the effective application of the machine learning algorithm.

Table 1 consists of the dataset used for the research before pre-processing. The text data is a combination of words, punctuations and many other textual representations. The objective is to eliminate the unnecessary words and symbols which are expressed along with the root word and to predict the emotion from the text taken for the research by using the machine learning algorithm.

3.3. Dataset after Pre-Processing

Raw data must be transformed into legible and defined sets, in order for researchers to conduct data mining, analyse the data, and process it for various activities. It is a must to correctly preprocess their data as a variety of inputs they utilise to gather raw data might have an impact on the data’s quality. Preprocessing data is crucial because raw data may be formatted inconsistently or incompletely. Preprocessing raw data effectively can increase its accuracy, which can raise project quality and reliability. The various stages that are involved in the process of preprocessing of text data in this research are lowercasing, punctuation removal, stop word removal, tokenization, stemming and lemmatization [11]. These steps help the researchers effectively to interpret the underlying emotion in the text involved from the dataset taken. By pre-processing researchers would be able to uncover valuable insights, detect patterns and predict user behaviour and understand the emotional content of any individual easily.

Table 1 Dataset before pre-processing

References 1

Mason, A.N., Narcum, J. and Mason, K. (2021) Social Media Marketing Gains Importance after Covid-19. Cogent Business & Management, 8, Article ID: 1870797. https://doi.org/10.1080/23311975.2020.1870797

Feldkamp, J. (2021) The Rise of TikTok: The Evolution of a Social Media Platform during COVID-19. In: Hovestadt, C., Recker, J., Richter, J. and Werder, K., eds., Digital Responses to Covid-19: Digital Innovation, Transformation, and Entrepreneurship during Pandemic Outbreaks, Springer, Cham, 73-85. https://doi.org/10.1007/978-3-030-66611-8_6

Chong, W.Y., Selvaretnam, B. and Soon, L.-K. (2014) Natural Language Processing for Sentiment Analysis: An Exploratory Analysis on Tweets. 2014 4th International Conference on Artificial Intelligence with Applications in Engineering and Technology, Kota Kinabalu, Malaysia, 03-05 December 2014. https://doi.org/10.1109/ICAIET.2014.43

Kantrowitz, M., Mohit, B. and Mittal, V. (2000) Stemming and Its Effects on TFIDF Ranking. Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Athens, Greece, 24-28 July 2000, 357-359. https://doi.org/10.1145/345508.345650

Sridevi, P.C. and Velmurugan, T. (2022) Impact of Preprocessing on Twitter Based Covid-19 Vaccination Text Data by Classification Techniques. 2022 International Conference on Applied Artificial Intelligence and Computing (ICAAIC), Salem, India, 9-11 May 2022. https://doi.org/10.1109/ICAAIC53929.2022.9792768

Acheampong, F.A., Chen, W.Y. and Nunoo-Mensah, H. (2020) Text-Based Emotion Detection: Advances, Challenges and Opportunities. Engineering Reports, 2, Article ID: e12189. https://doi.org/10.1002/eng2.12189

Dashrath and Subhash Chandra Yadav (2022) Hierarchical Bi-LSTM Based Emotion Analysis of Textual Data. Bulletin of the Polish Academy of Sciences, Technical Sciences, 70, Article No. e141001.

Kumar, A. and Jain, A.K. (2022) Emotion Detection in Psychological Texts by Fine-Tuning BERT Using Emotion-Cause Pair Extraction. International Journal of Speech Technology, 25, 727-743. https://doi.org/10.1007/s10772-022-09982-9

Rashid, U., Iqbal, M.W., Skiandar, M.A., Raiz, M.Q., Naqvi, M.R. and Shahzad, S.K. (2020) Emotion Detection of Contextual Text Using Deep Learning. 2020 4th International Symposium on Multidisciplinary Studies and Innovative Technologies (ISMSIT), Istanbul, Turkey, 22-24 October 2020, 1-5. https://doi.org/10.1109/ISMSIT50672.2020.9255279

Shrivastava, K., Kumar, S. and Jain, D.K. (2019) An Effective Approach for Emotion Detection in Multimedia Text Data Using Sequence Based Convolutional Neural Network. Multimedia Tools and Applications, 78, 29607-29639. https://doi.org/10.1007/s11042-019-07813-9

Jayapradha, B. and Velmurugan, T. (2003) Pre-Processing Emotional Text Data for Sentiment Analysis. International Conference on Information, System and Convergence Applications. Tashkent, Uzbekistan, 3-6 July 2023, 350-360.

Plisson, J., Lavrac, N. and Mladenic, D. (2004) A Rule Based Approach to Word lemmatization. Proceedings of IS, 3, 83-86.

Purachary, M. and Adilakshmi, T. (2003) Finetuned RoBERTa Architecture for MOOCS Evaluation using Adversarial Training. Journal of Theoretical and Applied Information Technology, 101.

Lin, T.-M., Chang, J.-Y. and Lee, L.-H. (2023) NCUEE-NLP at WASSA 2023 Shared Task 1: Empathy and Emotion Prediction Using Sentiment-Enhanced RoBERTa Transformers. Proceedings of the 13th Workshop on Computational Approaches to Subjectivity, Sentiment, & Social Media Analysis, 548-552. https://doi.org/10.18653/v1/2023.wassa-1.49

Adoma, A.F., Henry, N.-M. and Chen, W.Y.(2020) Comparative Analyses of Bert, Roberta, Distilbert, and Xlnet for Text-Based Emotion Recognition. IEEE International Computer Conference on Wavelet Active Media Technology and Information Processing. https://doi.org/10.1109/ICCWAMTIP51612.2020.9317379

Acheampong, F.A., Nunoo-Mensah, H. and Chen, W.Y. (2021) Transformer Models for Text-Based Emotion Detection: A Review of BERT-Based Approaches. Artificial Intelligence Review, 1-41. https://doi.org/10.1007/s10462-021-09958-2

Ameer, I., Bölücü, N., Siddiqui, M.H.F., Can, B., Sidorov, G. and Gelbukh, A. (2023) Multi-Label Emotion Classification in Texts Using Transfer Learning.Expert Systems with Applications, 213, 118534. https://doi.org/10.1016/j.eswa.2022.118534

Zhou, Y., Xing, Y.Y., Huang, G.M., Guo, Q.K. and Deng, N.X. (2023) Multimodal Emotion Recognition Based on Multilevel Acoustic and Textual Information. Proceedings of Fifth International Conference on Artificial Intelligence and Computer Science, 12803, 594-599. https://doi.org/10.1117/12.3009468

Qin, X.Y., Wu, Z.Y., Zhang, T.T., Li, Y.R., Luan, J., Wang, B., Wang, L. and Cui, J.S. (2023) BERT-ERC: Fine-Tuning BERT Is Enough for Emotion Recognition in Conversation. Proceedings of the AAAI Conference on Artificial Intelligence, 37, 13492-13500. https://doi.org/10.1609/aaai.v37i11.26582

Rajapaksha, P., Farahbakhsh, R. and Crespi, N. (2021) Bert, XLNet or Roberta: The Best Transfer Learning Model to Detect Clickbaits. IEEE Access, 9, 154704-154716. https://doi.org/10.1109/ACCESS.2021.3128742

Xu, J.X. and Vinluan, A.A. (2023) Emotional Analysis and Prediction Based on Online Book User Comments. Proceedings of Fifth International Conference on Artificial Intelligence and Computer Science, 12803, 157-164. https://doi.org/10.1117/12.3009554