Multilingual Code-Mixed Sentiment Analysis in Hate Speech

Tulika Ranjan; Anish Singh; Rina Kumari; Sujata Swain; Anjan Bandyopadhyay; Ajaya Kumar Parida

doi:10.12694/scpe.v24i4.2375

Authors

Tulika Ranjan School of Computer Engineering, Kalinga Institute of Industrial Technology, India
Anish Singh School of Computer Engineering, Kalinga Institute of Industrial Technology, India
Rina Kumari School of Computer Engineering, Kalinga Institute of Industrial Technology, India
Sujata Swain School of Computer Engineering, Kalinga Institute of Industrial Technology, India
Anjan Bandyopadhyay School of Computer Engineering, Kalinga Institute of Industrial Technology, India
Ajaya Kumar Parida School of Computer Engineering, Kalinga Institute of Industrial Technology, India

DOI:

https://doi.org/10.12694/scpe.v24i4.2375

Keywords:

Code-Mixed, Multilingual text data, Sentiment analysis, Hate speech, Natural Language Processing, Machine learning

Abstract

Sentiment analysis discovers the emotion expressed in a text. It helps in analyzing the product reviews, customer feedback and survey responses. Researchers have developed various algorithms for this purpose, however, they have majorly focused only on the sentiment analysis in English language. Although, few works are available for Hindi and multilingual sentiment analysis, however, these works are not efficient enough to perform sentiment analysis in code-mixed languages. To overcome the limitation of the existing works, this paper presents a multilingual code-mixed language model which identifies the sentiments of the hate speech dataset extracted from Twitter. As the hate speech dataset with sentiment labels are not available, we first collect the data from Twitter. After that we label the data using a transformer-based pretrained sentiment analysis model trained on a large corpus of tweets in multiple languages. We pass our collected data as test data to this model and predict the sentiment labels. Now, we train six different machine learning models to perform our own task i.e sentiment analysis for multilingual code-mixed hate speech dataset. The machine learning models perform well across multiple languages and also code-mixed languages. In future, it can be easily adapted to different classification tasks based on code-mixed languages. The results yield that hate speech invokes negative sentiment whereas non-hate speech reflects either positive or neutral sentiment.

Author Biographies

Tulika Ranjan, School of Computer Engineering, Kalinga Institute of Industrial Technology, India
Anish Singh, School of Computer Engineering, Kalinga Institute of Industrial Technology, India
Rina Kumari, School of Computer Engineering, Kalinga Institute of Industrial Technology, India
Sujata Swain, School of Computer Engineering, Kalinga Institute of Industrial Technology, India
Anjan Bandyopadhyay, School of Computer Engineering, Kalinga Institute of Industrial Technology, India
Ajaya Kumar Parida, School of Computer Engineering, Kalinga Institute of Industrial Technology, India

Multilingual Code-Mixed Sentiment Analysis in Hate Speech

Authors

DOI:

Keywords:

Abstract

Author Biographies

Downloads

Published

Issue

Section

announcement

Indexed In

SUBMIT

Metrics

Journal Information