Character-Level Embedding using FastText and LSTM for Biomedical Named Entity Recognition
DOI:
https://doi.org/10.12694/scpe.v25i6.3365Keywords:
analysis of energy consumption of motes;Abstract
Extracting biomedical entities has caught many researchers’ attention in which the recent technique of word embedding is employed for such a task. Yet, the traditional word embedding architectures of Word2vec or Glove are still suffering from the ‘out-of-vocabulary’ (OOV) problem. This problem occurs when an unseen term might be encountered during the testing which leads to absence of embedding vector. Hence, this study aims to propose a character-level embedding through FastText architecture. In fact, handling the character-level seems a promising solution for the OOV problem. To this end, the proposed FastText architecture has been used to generate embedding vectors for the possible N-gram combinations of each word. Consequentially, these vectors have been fed to a Long Short Term Memory (LSTM) architecture for classifying the words into its biomedical classes. Using two benchmark datasets of BioCreative-II and NCBI, the proposed method was able to produce an f-measure of 0.912 and 0.918 respectively. Comparing these results with the baseline studies demonstrates the superiority of the proposed character-level embedding of FastText in terms of Biomedical Named Entity Recognition (BNER) task.
Downloads
Published
Issue
Section
License
Copyright (c) 2024 Scalable Computing: Practice and Experience

This work is licensed under a Creative Commons Attribution 4.0 International License.