Character-Level Embedding using FastText and LSTM for Biomedical Named Entity Recognition

Main Article Content

Ahmed Sabah Ahmed AL-Jumaili
Huda Kadhim Tayyeh

Abstract

Extracting biomedical entities has caught many researchers’ attention in which the recent technique of word embedding is employed for such a task. Yet, the traditional word embedding architectures of Word2vec or Glove are still suffering from the ‘out-of-vocabulary’ (OOV) problem. This problem occurs when an unseen term might be encountered during the testing which leads to absence of embedding vector. Hence, this study aims to propose a character-level embedding through FastText architecture. In fact, handling the character-level seems a promising solution for the OOV problem. To this end, the proposed FastText architecture has been used to generate embedding vectors for the possible N-gram combinations of each word. Consequentially, these vectors have been fed to a Long Short Term Memory (LSTM) architecture for classifying the words into its biomedical classes. Using two benchmark datasets of BioCreative-II and NCBI, the proposed method was able to produce an f-measure of 0.912 and 0.918 respectively. Comparing these results with the baseline studies demonstrates the superiority of the proposed character-level embedding of FastText in terms of Biomedical Named Entity Recognition (BNER) task.

Article Details

Section
Special Issue - Synergies of Neural Networks, Neurorobotics, and Brain-Computer Interface Technology: Advancements and Applications
Author Biographies

Ahmed Sabah Ahmed AL-Jumaili, Department of Business Information Technology (BIT), College of Business Informatics, University of Information Technology and Communications, Baghdad, Iraq

Ahmed Sabah Ahmed AL-Jumaili holds a PhD in Computer Science from University of Technology, Baghdad, Iraq.  He is currently works as an Assistant Professor in the Department of Business Information Technology (BIT), College of Business Informatics (BIC),  University of Information Technology and Communications (UOITC). His research interestsare Artificial Intelligence, Image Processing, Database, Multimedia,  Computer Networks, Information Security, and Information Hiding

Huda Kadhim Tayyeh, Department of Informatics Systems Management (ISM), College of Business Informatics, University of Information Technology and Communications, Baghdad, Iraq

Huda Kadhim Tayyeh holds a PhD in Computer Science from University of Technology, Baghdad, Iraq. She is currently works as an Assistant Professor in the Department of Informatics Systems Management (ISM), Business Informatics College (BIC), University of Information Technology and Communications (UOITC). Her research interestsare Information Security, Information Hiding, Artificial Intelligence,  Image Processing, Database, and Multimedia.