Multilingual Code-Mixed Sentiment Analysis in Hate Speech
Main Article Content
Abstract
Sentiment analysis discovers the emotion expressed in a text. It helps in analyzing the product reviews, customer feedback and survey responses. Researchers have developed various algorithms for this purpose, however, they have majorly focused only on the sentiment analysis in English language. Although, few works are available for Hindi and multilingual sentiment analysis, however, these works are not efficient enough to perform sentiment analysis in code-mixed languages. To overcome the limitation of the existing works, this paper presents a multilingual code-mixed language model which identifies the sentiments of the hate speech dataset extracted from Twitter. As the hate speech dataset with sentiment labels are not available, we first collect the data from Twitter. After that we label the data using a transformer-based pretrained sentiment analysis model trained on a large corpus of tweets in multiple languages. We pass our collected data as test data to this model and predict the sentiment labels. Now, we train six different machine learning models to perform our own task i.e sentiment analysis for multilingual code-mixed hate speech dataset. The machine learning models perform well across multiple languages and also code-mixed languages. In future, it can be easily adapted to different classification tasks based on code-mixed languages. The results yield that hate speech invokes negative sentiment whereas non-hate speech reflects either positive or neutral sentiment.