Hate Speech Detection in Low-Resource Bodo and Assamese Texts with ML-DL and BERT Models

Koyel Ghosh; Apurbalal Senapati; Mwnthai Narzary; Maharaj Brahma

doi:10.12694/scpe.v24i4.2469

PDF

Published: Nov 17, 2023

DOI: https://doi.org/10.12694/scpe.v24i4.2469

Keywords:

Hate Speech Natural Language Processing Deep Learning SVM LSTM BiLSTM CNN BERT

Koyel Ghosh

Department of Computer Science and Engineering, Central Institute of Technology, Kokrajhar, Assam, India

Apurbalal Senapati

Department of Computer Science and Engineering, Central Institute of Technology, Kokrajhar, Assam, India

Mwnthai Narzary

Department of Computer Science and Engineering, Central Institute of Technology, Kokrajhar, Assam, India

Maharaj Brahma

Department of Computer Science and Engineering, IIT Hyderabad, India

Abstract

Hate speech detection research is a recent sizzling topic in natural language processing (NLP). Unburdened uses of social media platforms make people over-opinionative, which crosses the limit of leaving comments and posts toxic. A toxic outlook increases violence towards the neighbour, state, country, and continent. Several laws have been introduced in different countries to end the emergency problem. Now, all the media platforms have started working on restricting hate posts or comments. Hate speech detection is generally a text classification problem if considered a supervised observation. To tackle text in terms of computation perspective is challenging because of its semantic and complex grammatical nature. Resource-rich languages leverage their richness, whereas resource scarce language suffers significantly from a lack of dataset. This paper makes a multifaceted contribution encompassing resource generation, experimentation with Machine Learning (ML), Deep Learning (DL) and state-of-the-art transformer-based models, and a comprehensive evaluation of model performance, including thorough error analysis. In the realm of resource generation, it adds to the North-East Indian Hate Speech tagged dataset (NEIHS version 1), which encompasses two languages: Assamese and Bodo.

Issue

Vol. 24 No. 4 (2023)

Section

Special Issue - Sentiment Analysis and Affective computing in Multimedia Data on Social Network

Article Sidebar

Main Article Content

Abstract

Article Details